Unsupervised learning has been widely used in many real-world applications. One of the simplest and most important unsupervised learning models is the Gaussian mixture model (GMM). In this work, we study the multi-task learning problem on GMMs, which aims to leverage potentially similar GMM parameter structures among tasks to obtain improved learning performance compared to single-task learning. We propose a multi-task GMM learning procedure based on the Expectation-Maximization (EM) algorithm that not only can effectively utilize unknown similarity between related tasks but is also robust against a fraction of outlier tasks from arbitrary sources. The proposed procedure is shown to achieve minimax optimal rate of convergence for both parameter estimation error and the excess mis-clustering error, in a wide range of regimes. Moreover, we generalize our approach to tackle the problem of transfer learning for GMMs, where similar theoretical results are derived. Finally, we demonstrate the effectiveness of our methods through simulations and a real data analysis. To the best of our knowledge, this is the first work studying multi-task and transfer learning on GMMs with theoretical guarantees. This package implements the algorithms proposed in Tian, Y., Weng, H., & Feng, Y. (2022) <arXiv:2209.15224>
.
The depmap package is a data package that accesses datsets from the Broad Institute DepMap
cancer dependency study using ExperimentHub
. Datasets from the most current release are available, including RNAI and CRISPR-Cas9 gene knockout screens quantifying the genetic dependency for select cancer cell lines. Additional datasets are also available pertaining to the log copy number of genes for select cell lines, protein expression of cell lines as measured by reverse phase protein lysate microarray (RPPA), Transcript Per Million (TPM) data, as well as supplementary datasets which contain metadata and mutation calls for the other datasets found in the current release. The 19Q3 release adds the drug_dependency dataset, that contains cancer cell line dependency data with respect to drug and drug-candidate compounds. The 20Q2 release adds the proteomic dataset that contains quantitative profiling of proteins via mass spectrometry. This package will be updated on a quarterly basis to incorporate the latest Broad Institute DepMap
Public cancer dependency datasets. All data made available in this package was generated by the Broad Institute DepMap
for research purposes and not intended for clinical use. This data is distributed under the Creative Commons license (Attribution 4.0 International (CC BY 4.0)).
Extensive functions for Lmoments (LMs) and probability-weighted moments (PWMs), distribution parameter estimation, LMs for distributions, LM ratio diagrams, multivariate Lcomoments, and asymmetric (asy) trimmed LMs (TLMs). Maximum likelihood and maximum product spacings estimation are available. Right-tail and left-tail LM censoring by threshold or indicator variable are available. LMs of residual (resid) and reversed (rev) residual life are implemented along with 13 quantile operators for reliability analyses. Exact analytical bootstrap estimates of order statistics, LMs, and LM var-covars are available. Harri-Coble Tau34-squared Normality Test is available. Distributions with L, TL, and added (+) support for right-tail censoring (RC) encompass: Asy Exponential (Exp) Power [L], Asy Triangular [L], Cauchy [TL], Eta-Mu [L], Exp. [L], Gamma [L], Generalized (Gen) Exp Poisson [L], Gen Extreme Value [L], Gen Lambda [L, TL], Gen Logistic [L], Gen Normal [L], Gen Pareto [L+RC, TL], Govindarajulu [L], Gumbel [L], Kappa [L], Kappa-Mu [L], Kumaraswamy [L], Laplace [L], Linear Mean Residual Quantile Function [L], Normal [L], 3p log-Normal [L], Pearson Type III [L], Polynomial Density-Quantile 3 and 4 [L], Rayleigh [L], Rev-Gumbel [L+RC], Rice [L], Singh Maddala [L], Slash [TL], 3p Student t [L], Truncated Exponential [L], Wakeby [L], and Weibull [L].
Various tools dealing with batch effects, in particular enabling the removal of discrepancies between training and test sets in prediction scenarios. Moreover, addon quantile normalization and addon RMA normalization (Kostka & Spang, 2008) is implemented to enable integrating the quantile normalization step into prediction rules. The following batch effect removal methods are implemented: FAbatch, ComBat
, (f)SVA, mean-centering, standardization, Ratio-A and Ratio-G. For each of these we provide an additional function which enables a posteriori ('addon') batch effect removal in independent batches ('test data'). Here, the (already batch effect adjusted) training data is not altered. For evaluating the success of batch effect adjustment several metrics are provided. Moreover, the package implements a plot for the visualization of batch effects using principal component analysis. The main functions of the package for batch effect adjustment are ba()
and baaddon()
which enable batch effect removal and addon batch effect removal, respectively, with one of the seven methods mentioned above. Another important function here is bametric()
which is a wrapper function for all implemented methods for evaluating the success of batch effect removal. For (addon) quantile normalization and (addon) RMA normalization the functions qunormtrain()
, qunormaddon()
, rmatrain()
and rmaaddon()
can be used.
This package provides fast and efficient procedures for Bayesian analysis of Structural Vector Autoregressions. This package estimates a wide range of models, including homo-, heteroskedastic, and non-normal specifications. Structural models can be identified by adjustable exclusion restrictions, time-varying volatility, or non-normality. They all include a flexible three-level equation-specific local-global hierarchical prior distribution for the estimated level of shrinkage for autoregressive and structural parameters. Additionally, the package facilitates predictive and structural analyses such as impulse responses, forecast error variance and historical decompositions, forecasting, verification of heteroskedasticity, non-normality, and hypotheses on autoregressive parameters, as well as analyses of structural shocks, volatilities, and fitted values. Beautiful plots, informative summary functions, and extensive documentation including the vignette by Woźniak (2024) <doi:10.48550/arXiv.2410.15090>
complement all this. The implemented techniques align closely with those presented in Lütkepohl, Shang, Uzeda, & Woźniak (2024) <doi:10.48550/arXiv.2404.11057>
, Lütkepohl & Woźniak (2020) <doi:10.1016/j.jedc.2020.103862>, and Song & Woźniak (2021) <doi:10.1093/acrefore/9780190625979.013.174>. The bsvars package is aligned regarding objects, workflows, and code structure with the R package bsvarSIGNs
by Wang & Woźniak (2024) <doi:10.32614/CRAN.package.bsvarSIGNs>
, and they constitute an integrated toolset.
O-statistics, or overlap statistics, measure the degree of community-level trait overlap. They are estimated by fitting nonparametric kernel density functions to each speciesâ trait distribution and calculating their areas of overlap. For instance, the median pairwise overlap for a community is calculated by first determining the overlap of each species pair in trait space, and then taking the median overlap of each species pair in a community. This median overlap value is called the O-statistic (O for overlap). The Ostats()
function calculates separate univariate overlap statistics for each trait, while the Ostats_multivariate()
function calculates a single multivariate overlap statistic for all traits. O-statistics can be evaluated against null models to obtain standardized effect sizes. Ostats is part of the collaborative Macrosystems Biodiversity Project "Local- to continental-scale drivers of biodiversity across the National Ecological Observatory Network (NEON)." For more information on this project, see the Macrosystems Biodiversity Website (<https://neon-biodiversity.github.io/>). Calculation of O-statistics is described in Read et al. (2018) <doi:10.1111/ecog.03641>, and a teaching module for introducing the underlying biological concepts at an undergraduate level is described in Grady et al. (2018) <http://tiee.esa.org/vol/v14/issues/figure_sets/grady/abstract.html>.
This package provides a framework for simulating spatially explicit genomic data which leverages real cartographic information for programmatic and visual encoding of spatiotemporal population dynamics on real geographic landscapes. Population genetic models are then automatically executed by the SLiM
software by Haller et al. (2019) <doi:10.1093/molbev/msy228> behind the scenes, using a custom built-in simulation SLiM
script. Additionally, fully abstract spatial models not tied to a specific geographic location are supported, and users can also simulate data from standard, non-spatial, random-mating models. These can be simulated either with the SLiM
built-in back-end script, or using an efficient coalescent population genetics simulator msprime by Baumdicker et al. (2022) <doi:10.1093/genetics/iyab229> with a custom-built Python script bundled with the R package. Simulated genomic data is saved in a tree-sequence format and can be loaded, manipulated, and summarised using tree-sequence functionality via an R interface to the Python module tskit by Kelleher et al. (2019) <doi:10.1038/s41588-019-0483-y>. Complete model configuration, simulation and analysis pipelines can be therefore constructed without a need to leave the R environment, eliminating friction between disparate tools for population genetic simulations and data analysis.
This package provides functions for calculating biochemical methane potential (BMP) from laboratory measurements and other types of data processing and prediction useful for biogas research. Raw laboratory measurements for diverse methods (volumetric, manometric, gravimetric, gas density) can be processed to calculate BMP. Theoretical maximum BMP or methane or biogas yield can be predicted from various measures of substrate composition. Molar mass and calculated oxygen demand (COD') can be determined from a chemical formula. Measured gas volume can be corrected for water vapor and to standard (or user-defined) temperature and pressure. Gas quantity can be converted between volume, mass, and moles. A function for planning BMP experiments can consider multiple constraints in suggesting substrate or inoculum quantities, and check for problems. Inoculum and substrate mass can be determined for planning BMP experiments. Finally, a set of first-order models can be fit to measured methane production rate or cumulative yield in order to extract estimates of ultimate yield and kinetic constants. See Hafner et al. (2018) <doi:10.1016/j.softx.2018.06.005> for details. OBA is a web application that provides access to some of the package functionality: <https://biotransformers.shinyapps.io/oba1/>. The Standard BMP Methods website documents the calculations in detail: <https://www.dbfz.de/en/BMP>.
Model-based clustering of multivariate continuous data using Bayesian mixtures of factor analyzers (Papastamoulis (2019) <DOI:10.1007/s11222-019-09891-z> (2018) <DOI:10.1016/j.csda.2018.03.007>). The number of clusters is estimated using overfitting mixture models (Rousseau and Mengersen (2011) <DOI:10.1111/j.1467-9868.2011.00781.x>): suitable prior assumptions ensure that asymptotically the extra components will have zero posterior weight, therefore, the inference is based on the ``alive components. A Gibbs sampler is implemented in order to (approximately) sample from the posterior distribution of the overfitting mixture. A prior parallel tempering scheme is also available, which allows to run multiple parallel chains with different prior distributions on the mixture weights. These chains run in parallel and can swap states using a Metropolis-Hastings move. Eight different parameterizations give rise to parsimonious representations of the covariance per cluster (following Mc Nicholas and Murphy (2008) <DOI:10.1007/s11222-008-9056-0>). The model parameterization and number of factors is selected according to the Bayesian Information Criterion. Identifiability issues related to label switching are dealt by post-processing the simulated output with the Equivalence Classes Representatives algorithm (Papastamoulis and Iliopoulos (2010) <DOI:10.1198/jcgs.2010.09008>, Papastamoulis (2016) <DOI:10.18637/jss.v069.c01>).
This package provides a set of tools to analyze texts. Includes, amongst others, functions for automatic language detection, hyphenation, several indices of lexical diversity (e.g., type token ratio, HD-D/vocd-D, MTLD) and readability (e.g., Flesch, SMOG, LIX, Dale-Chall). Basic import functions for language corpora are also provided, to enable frequency analyses (supports Celex and Leipzig Corpora Collection file formats) and measures like tf-idf. Note: For full functionality a local installation of TreeTagger
is recommended. It is also recommended to not load this package directly, but by loading one of the available language support packages from the l10n repository <https://undocumeantit.github.io/repos/l10n/>. koRpus
also includes a plugin for the R GUI and IDE RKWard, providing graphical dialogs for its basic features. The respective R package rkward cannot be installed directly from a repository, as it is a part of RKWard. To make full use of this feature, please install RKWard from <https://rkward.kde.org> (plugins are detected automatically). Due to some restrictions on CRAN, the full package sources are only available from the project homepage. To ask for help, report bugs, request features, or discuss the development of the package, please subscribe to the koRpus-dev
mailing list (<https://korpusml.reaktanz.de>).
In forensics, it is common and effective practice to analyse glass fragments from the scene and suspects to gain evidence of placing a suspect at the crime scene. This kind of analysis involves comparing the physical and chemical attributes of glass fragments that exist on both the person and at the crime scene, and assessing the significance in a likeness that they share. The package implements the Scott-Knott Modification 2 algorithm (SKM2) (Christopher M. Triggs and James M. Curran and John S. Buckleton and Kevan A.J. Walsh (1997) <doi:10.1016/S0379-0738(96)02037-3> "The grouping problem in forensic glass analysis: a divisive approach", Forensic Science International, 85(1), 1--14) for small sample glass fragment analysis using the refractive index (ri) of a set of glass samples. It also includes an experimental multivariate analog to the Scott-Knott algorithm for similar analysis on glass samples with multiple chemical concentration variables and multiple samples of the same item; testing against the Hotellings T^2 distribution (J.M. Curran and C.M. Triggs and J.R. Almirall and J.S. Buckleton and K.A.J. Walsh (1997) <doi:10.1016/S1355-0306(97)72197-X> "The interpretation of elemental composition measurements from forensic glass evidence", Science & Justice, 37(4), 241--244).
The CoTiMA
package performs meta-analyses of correlation matrices of repeatedly measured variables taken from studies that used different time intervals. Different time intervals between measurement occasions impose problems for meta-analyses because the effects (e.g. cross-lagged effects) cannot be simply aggregated, for example, by means of common fixed or random effects analysis. However, continuous time math, which is applied in CoTiMA
', can be used to extrapolate or intrapolate the results from all studies to any desired time lag. By this, effects obtained in studies that used different time intervals can be meta-analyzed. CoTiMA
fits models to empirical data using the structural equation model (SEM) package ctsem', the effects specified in a SEM are related to parameters that are not directly included in the model (i.e., continuous time parameters; together, they represent the continuous time structural equation model, CTSEM). Statistical model comparisons and significance tests are then performed on the continuous time parameter estimates. CoTiMA
also allows analysis of publication bias (Egger's test, PET-PEESE estimates, zcurve analysis etc.) and analysis of statistical power (post hoc power, required sample sizes). See Dormann, C., Guthier, C., & Cortina, J. M. (2019) <doi:10.1177/1094428119847277>. and Guthier, C., Dormann, C., & Voelkle, M. C. (2020) <doi:10.1037/bul0000304>.
The phenology of plants (i.e. the timing of their annual life phases) depends on climatic cues. For temperate trees and many other plants, spring phases, such as leaf emergence and flowering, have been found to result from the effects of both cool (chilling) conditions and heat. Fruit tree scientists (pomologists) have developed some metrics to quantify chilling and heat (e.g. see Luedeling (2012) <doi:10.1016/j.scienta.2012.07.011>). chillR
contains functions for processing temperature records into chilling (Chilling Hours, Utah Chill Units and Chill Portions) and heat units (Growing Degree Hours). Regarding chilling metrics, Chill Portions are often considered the most promising, but they are difficult to calculate. This package makes it easy. chillR
also contains procedures for conducting a PLS analysis relating phenological dates (e.g. bloom dates) to either mean temperatures or mean chill and heat accumulation rates, based on long-term weather and phenology records (Luedeling and Gassner (2012) <doi:10.1016/j.agrformet.2011.10.020>). As of version 0.65, it also includes functions for generating weather scenarios with a weather generator, for conducting climate change analyses for temperature-based climatic metrics and for plotting results from such analyses. Since version 0.70, chillR
contains a function for interpolating hourly temperature records.
This package provides a flexible statistical framework for network-valued data analysis. It leverages the complexity of the space of distributions on graphs by using the permutation framework for inference as implemented in the flipr package. Currently, only the two-sample testing problem is covered and generalization to k samples and regression will be added in the future as well. It is a 4-step procedure where the user chooses a suitable representation of the networks, a suitable metric to embed the representation into a metric space, one or more test statistics to target specific aspects of the distributions to be compared and a formula to compute the permutation p-value. Two types of inference are provided: a global test answering whether there is a difference between the distributions that generated the two samples and a local test for localizing differences on the network structure. The latter is assumed to be shared by all networks of both samples. References: Lovato, I., Pini, A., Stamm, A., Vantini, S. (2020) "Model-free two-sample test for network-valued data" <doi:10.1016/j.csda.2019.106896>; Lovato, I., Pini, A., Stamm, A., Taquet, M., Vantini, S. (2021) "Multiscale null hypothesis testing for network-valued data: Analysis of brain networks of patients with autism" <doi:10.1111/rssc.12463>.
This package provides a pipeline for identifying differentially methylated CpG
sites using Hidden Markov Model in bisulfite sequencing data. DNA methylation studies have enabled researchers to understand methylation patterns and their regulatory roles in biological processes and disease. However, only a limited number of statistical approaches have been developed to provide formal quantitative analysis. Specifically, a few available methods do identify differentially methylated CpG
(DMC) sites or regions (DMR), but they suffer from limitations that arise mostly due to challenges inherent in bisulfite sequencing data. These challenges include: (1) that read-depths vary considerably among genomic positions and are often low; (2) both methylation and autocorrelation patterns change as regions change; and (3) CpG
sites are distributed unevenly. Furthermore, there are several methodological limitations: almost none of these tools is capable of comparing multiple groups and/or working with missing values, and only a few allow continuous or multiple covariates. The last of these is of great interest among researchers, as the goal is often to find which regions of the genome are associated with several exposures and traits. To tackle these issues, we have developed an efficient DMC identification method based on Hidden Markov Models (HMMs) called “DMCHMM” which is a three-step approach (model selection, prediction, testing) aiming to address the aforementioned drawbacks.
This package is an implementation the Quantitative Set Analysis for Gene Expression (QuSAGE
) method described in (Yaari G. et al, Nucl Acids Res, 2013). This is a novel Gene Set Enrichment-type test, which is designed to provide a faster, more accurate, and easier to understand test for gene expression studies. qusage accounts for inter-gene correlations using the Variance Inflation Factor technique proposed by Wu et al. (Nucleic Acids Res, 2012). In addition, rather than simply evaluating the deviation from a null hypothesis with a single number (a P value), qusage quantifies gene set activity with a complete probability density function (PDF). From this PDF, P values and confidence intervals can be easily extracted. Preserving the PDF also allows for post-hoc analysis (e.g., pair-wise comparisons of gene set activity) while maintaining statistical traceability. Finally, while qusage is compatible with individual gene statistics from existing methods (e.g., LIMMA), a Welch-based method is implemented that is shown to improve specificity. The QuSAGE
package also includes a mixed effects model implementation, as described in (Turner JA et al, BMC Bioinformatics, 2015), and a meta-analysis framework as described in (Meng H, et al. PLoS
Comput Biol. 2019). For questions, contact Chris Bolen (cbolen1@gmail.com) or Steven Kleinstein (steven.kleinstein@yale.edu).
Implementation of multisource exchangeability models for Bayesian analyses of prespecified subgroups arising in the context of basket trial design and monitoring. The R basket package facilitates implementation of the binary, symmetric multi-source exchangeability model (MEM) with posterior inference arising from both exact computation and Markov chain Monte Carlo sampling. Analysis output includes full posterior samples as well as posterior probabilities, highest posterior density (HPD) interval boundaries, effective sample sizes (ESS), mean and median estimations, posterior exchangeability probability matrices, and maximum a posteriori MEMs. In addition to providing "basketwise" analyses, the package includes similar calculations for "clusterwise" analyses for which subgroups are combined into meta-baskets, or clusters, using graphical clustering algorithms that treat the posterior exchangeability probabilities as edge weights. In addition plotting tools are provided to visualize basket and cluster densities as well as their exchangeability. References include Hyman, D.M., Puzanov, I., Subbiah, V., Faris, J.E., Chau, I., Blay, J.Y., Wolf, J., Raje, N.S., Diamond, E.L., Hollebecque, A. and Gervais, R (2015) <doi:10.1056/NEJMoa1502309>; Hobbs, B.P. and Landin, R. (2018) <doi:10.1002/sim.7893>; Hobbs, B.P., Kane, M.J., Hong, D.S. and Landin, R. (2018) <doi:10.1093/annonc/mdy457>; and Kaizer, A.M., Koopmeiners, J.S. and Hobbs, B.P. (2017) <doi:10.1093/biostatistics/kxx031>.
This package performs power and sample size calculation for non-proportional hazards model using the Fleming-Harrington family of weighted log-rank tests. The sequentially calculated log-rank test score statistics are assumed to have independent increments as characterized in Anastasios A. Tsiatis (1982) <doi:10.1080/01621459.1982.10477898>. The mean and variance of log-rank test score statistics are calculated based on Kaifeng Lu (2021) <doi:10.1002/pst.2069>. The boundary crossing probabilities are calculated using the recursive integration algorithm described in Christopher Jennison and Bruce W. Turnbull (2000, ISBN:0849303168). The package can also be used for continuous, binary, and count data. For continuous data, it can handle missing data through mixed-model for repeated measures (MMRM). In crossover designs, it can estimate direct treatment effects while accounting for carryover effects. For binary data, it can design Simon's 2-stage, modified toxicity probability-2 (mTPI-2
), and Bayesian optimal interval (BOIN) trials. For count data, it can design group sequential trials for negative binomial endpoints with censoring. Additionally, it facilitates group sequential equivalence trials for all supported data types. Moreover, it can design adaptive group sequential trials for changes in sample size, error spending function, number and spacing or future looks. Finally, it offers various options for adjusted p-values, including graphical and gatekeeping procedures.
Performance of functional kriging, cokriging, optimal sampling and simulation for spatial prediction of functional data. The framework of spatial prediction, optimal sampling and simulation are extended from scalar to functional data. SpatFD
is based on the Karhunen-Loève expansion that allows to represent the observed functions in terms of its empirical functional principal components. Based on this approach, the functional auto-covariances and cross-covariances required for spatial functional predictions and optimal sampling, are completely determined by the sum of the spatial auto-covariances and cross-covariances of the respective score components. The package provides new classes of data and functions for modeling spatial dependence structure among curves. The spatial prediction of curves at unsampled locations can be carried out using two types of predictors, and both of them report, the respective variances of the prediction error. In addition, there is a function for the determination of spatial locations sampling configuration that ensures minimum variance of spatial functional prediction. There are also two functions for plotting predicted curves at each location and mapping the surface at each time point, respectively. References Bohorquez, M., Giraldo, R., and Mateu, J. (2016) <doi:10.1007/s10260-015-0340-9>, Bohorquez, M., Giraldo, R., and Mateu, J. (2016) <doi:10.1007/s00477-016-1266-y>, Bohorquez M., Giraldo R. and Mateu J. (2021) <doi:10.1002/9781119387916>.
Encode/Decode base64', with support for JSON format, using two functions: j_encode()
and j_decode()
. Base64 is a group of similar binary-to-text encoding schemes that represent binary data in an ASCII string format by translating it into a radix-64 representation, used when there is a need to encode binary data that needs to be stored and transferred over media that are designed to deal with textual data, ensuring that the data will remain intact and without modification during transport. <https://developer.mozilla.org/en-US/docs/Web/API/WindowBase64/Base64_encoding_and_decoding>
On the other side, JSON (JavaScript
Object Notation) is a lightweight data-interchange format. Easy to read, write, parse and generate. It is based on a subset of the JavaScript
Programming Language. JSON is a text format that is completely language independent but uses conventions that are familiar to programmers of the C-family of languages, including C, C++, C#, Java, JavaScript
, Perl, Python, and many others. JSON structure is built around name:value pairs and ordered list of values. <https://www.json.org> The first function, j_encode()
, let you transform a data.frame or list to a base64 encoded JSON (or JSON string). The j_decode()
function takes a base64 string (could be an encoded JSON) and transform it to a data.frame (or list, depending of the JSON structure).
Miscellaneous printing of numeric or statistical results in R Markdown or Quarto documents according to guidelines of the "Publication Manual" of the American Psychological Association (2020, ISBN: 978-1-4338-3215-4). These guidelines are usually referred to as APA style (<https://apastyle.apa.org/>) and include specific rules on the formatting of numbers and statistical test results. APA style has to be implemented when submitting scientific reports in a wide range of research fields, especially in the social sciences. The default output of numbers in the R console or R Markdown and Quarto documents does not meet the APA style requirements, and reformatting results manually can be cumbersome and error-prone. This package covers the automatic conversion of R objects to textual representations that meet the APA style requirements, which can be included in R Markdown or Quarto documents. It covers some basic statistical tests (t-test, ANOVA, correlation, chi-squared test, Wilcoxon test) as well as some basic number printing manipulations (formatting p-values, removing leading zeros for numbers that cannot be greater than one, and others). Other packages exist for formatting numbers and tests according to the APA style guidelines, such as papaja (<https://cran.r-project.org/package=papaja>) and apa (<https://cran.r-project.org/package=apa>), but they do not offer all convenience functionality included in prmisc'. The vignette has an overview of most of the functions included in the package.
Computes a confidence interval for a specified linear combination of the regression parameters in a linear regression model with iid normal errors with known variance when there is uncertain prior information that a distinct specified linear combination of the regression parameters takes a given value. This confidence interval, found by numerical nonlinear constrained optimization, has the required minimum coverage and utilizes this uncertain prior information through desirable expected length properties. This confidence interval has the following three practical applications. Firstly, if the error variance has been accurately estimated from previous data then it may be treated as being effectively known. Secondly, for sufficiently large (dimension of the response vector) minus (dimension of regression parameter vector), greater than or equal to 30 (say), if we replace the assumed known value of the error variance by its usual estimator in the formula for the confidence interval then the resulting interval has, to a very good approximation, the same coverage probability and expected length properties as when the error variance is known. Thirdly, some more complicated models can be approximated by the linear regression model with error variance known when certain unknown parameters are replaced by estimates. This confidence interval is described in Mainzer, R. and Kabaila, P. (2019) <doi:10.32614/RJ-2019-026>, and is a member of the family of confidence intervals proposed by Kabaila, P. and Giri, K. (2009) <doi:10.1016/j.jspi.2009.03.018>.
This database contains necessary data relevant to medical costs on obesity throughout the United States. This database, in form of an R package, could output necessary data frames relevant to obesity costs, where the clients could easily manipulate the output using difference parameters, e.g. relative risks for each illnesses. This package contributes to parts of our published journal named "Modeling the Economic Cost of Obesity Risk and Its Relation to the Health Insurance Premium in the United States: A State Level Analysis". Please use the following citation for the journal: Woods Thomas, Tatjana Miljkovic (2022) "Modeling the Economic Cost of Obesity Risk and Its Relation to the Health Insurance Premium in the United States: A State Level Analysis" <doi:10.3390/risks10100197>. The database is composed of the following main tables: 1. Relative_Risks: (constant) Relative risks for a given disease group with a risk factor of obesity; 2. Disease_Cost: (obesity_cost_disease) Supplementary output with all variables related to individual disease groups in a given state and year; 3. Full_Cost: (obesity_cost_full) Complete output with all variables used to make cost calculations, as well as cost calculations in a given state and year; 4. National_Summary: (obesity_cost_national_summary) National summary cost calculations in a given year. Three functions are included to assist users in calling and adjusting the mentioned tables and they are data_load()
, data_produce()
, and rel_risk_fun()
.
Comprehensive R package designed to facilitate the calculation of Nitrogen Use Efficiency (NUE) indicators using experimentally derived data. The package incorporates 23 parameters categorized into six fertilizer-based, four plant-based, three soil-based, three isotope-based, two ecology-based, and four system-based indicators, providing a versatile platform for NUE assessment. As of the current version, NUETON serves as a starting point for users to compute NUE indicators from their experimental data. Future updates are planned to enhance the package's capabilities, including robust data visualization tools and error margin consideration in calculations. Additionally, statistical methods will be integrated to ensure the accuracy and reliability of the calculated indicators. All formulae used in NUETON are thoroughly referenced within the source code, and the package is released as open source software. Users are encouraged to provide feedback and contribute to the improvement of this package. It is important to note that the current version of NUETON is not intended for rigorous research purposes, and users are responsible for validating their results. The package developers do not assume liability for any inaccuracies in calculations. This package includes content from Congreves KA, Otchere O, Ferland D, Farzadfar S, Williams S and Arcand MM (2021) Nitrogen Use Efficiency Definitions of Today and Tomorrow. Front. Plant Sci. 12:637108. <doi:10.3389/fpls.2021.637108>. The article is available under the Creative Commons Attribution License (CC BY) C. 2021 Congreves, Otchere, Ferland, Farzadfar, Williams and Arcand.