This tool computes the probability of detection (POD) curve and the limit of detection (LOD), i.e. the number of copies of the target DNA sequence required to ensure a 95 % probability of detection (LOD95). Other quantiles of the LOD can be specified. This is a reimplementation of the mathematical-statistical modelling of the validation of qualitative polymerase chain reaction (PCR) methods within a single laboratory as provided by the commercial tool PROLab <http://quodata.de/>. The modelling itself has been described by Uhlig et al. (2015) <doi:10.1007/s00769-015-1112-9>.
The overall performance of soil ecosystem services and productivity greatly relies on soil health, making it a crucial indicator. The evaluation of soil physical, chemical, and biological parameters is necessary to determine the overall soil quality index. In our package, three commonly used methods, including linear scoring, regression-based, and principal component-based soil quality indexing, are employed to calculate the soil quality index. This package has been developed using concept of Bastida et al. (2008) and Doran and Parkin (1994) <doi:10.1016/j.geoderma.2008.08.007> <doi:10.2136/sssaspecpub35.c1>.
Package ACV (short for Affine Cross-Validation) offers an improved time-series cross-validation loss estimator which utilizes both in-sample and out-of-sample forecasting performance via a carefully constructed affine weighting scheme. Under the assumption of stationarity, the estimator is the best linear unbiased estimator of the out-of-sample loss. Besides that, the package also offers improved versions of Diebold-Mariano and Ibragimov-Muller tests of equal predictive ability which deliver more power relative to their conventional counterparts. For more information, see the accompanying article Stanek (2021) <doi:10.2139/ssrn.3996166>.
Sequential Poisson sampling is a variation of Poisson sampling for drawing probability-proportional-to-size samples with a given number of units, and is commonly used for price-index surveys. This package gives functions to draw stratified sequential Poisson samples according to the method by Ohlsson (1998, ISSN:0282-423X), as well as other order sample designs by Rosén (1997, <doi:10.1016/S0378-3758(96)00186-3>), and generate appropriate bootstrap replicate weights according to the generalized bootstrap method by Beaumont and Patak (2012, <doi:10.1111/j.1751-5823.2011.00166.x>).
Facilities for constructing variance dispersion graphs, fraction- of-design-space plots and similar graphics for exploring the properties of experimental designs. The design region is explored via random sampling, which allows for more flexibility than traditional variance dispersion graphs. A formula interface is leveraged to provide access to complex model formulae. Graphics can be constructed simultaneously for multiple experimental designs and/or multiple model formulae. Instead of using pointwise optimization to find the minimum and maximum scaled prediction variance curves, which can be inaccurate and time consuming, this package uses quantile regression as an alternative.
This is a set of minimization tools (maximum likelihood estimation and least square fitting) to solve examples in the Johan Gabrielsson and Dan Weiner's book "Pharmacokinetic and Pharmacodynamic Data Analysis - Concepts and Applications" 5th ed. (ISBN:9198299107). Examples include linear and nonlinear compartmental model, turn-over model, single or multiple dosing bolus/infusion/oral models, allometry, toxicokinetics, reversible metabolism, in-vitro/in-vivo extrapolation, enterohepatic circulation, metabolite modeling, Emax model, inhibitory model, tolerance model, oscillating response model, enantiomer interaction model, effect compartment model, drug-drug interaction model, receptor occupancy model, and rebound phenomena model.
This package provides an interface to a large number of classification and regression techniques. These techniques include machine-readable parameter descriptions. There is also an experimental extension for survival analysis, clustering and general, example-specific cost-sensitive learning. Also included:
Generic resampling, including cross-validation, bootstrapping and subsampling;
Hyperparameter tuning with modern optimization techniques, for single- and multi-objective problems;
Filter and wrapper methods for feature selection;
Extension of basic learners with additional operations common in machine learning, also allowing for easy nested resampling.
Most operations can be parallelized.
This data management package provides some helper classes for publicly available data sources (HMD, DESTATIS) in Demography. Similar to ideas developed in the Bioconductor project <https://bioconductor.org> we strive to encapsulate data in easy to use S4 objects. If original data is provided in a text file, the resulting S4 object contains all information from that text file. But the information is somehow structured (header, footer, etc). Further the classes provide methods to make a subset for selected calendar years or selected regions. The resulting subset objects still contain the original header and footer information.
It fits a robust linear quantile regression model using a new family of zero-quantile distributions for the error term. Missing values and censored observations can be handled as well. This family of distribution includes skewed versions of the Normal, Student's t, Laplace, Slash and Contaminated Normal distribution. It also performs logistic quantile regression for bounded responses as shown in Galarza et.al.(2020) <doi:10.1007/s13571-020-00231-0>. It provides estimates and full inference. It also provides envelopes plots for assessing the fit and confidences bands when several quantiles are provided simultaneously.
Flexible and informed regression with Multiple Change Points. mcp can infer change points in means, variances, autocorrelation structure, and any combination of these, as well as the parameters of the segments in between. All parameters are estimated with uncertainty and prediction intervals are supported - also near the change points. mcp supports hypothesis testing via Savage-Dickey density ratio, posterior contrasts, and cross-validation. mcp is described in Lindeløv (submitted) <doi:10.31219/osf.io/fzqxv> and generalizes the approach described in Carlin, Gelfand, & Smith (1992) <doi:10.2307/2347570> and Stephens (1994) <doi:10.2307/2986119>.
This package provides a number of functions to facilitate extracting information in YAML fragments from one or multiple files, optionally structuring the information in a data.tree'. YAML (recursive acronym for "YAML ain't Markup Language") is a convention for specifying structured data in a format that is both machine- and human-readable. YAML therefore lends itself well for embedding (meta)data in plain text files, such as Markdown files. This principle is implemented in yum with minimal dependencies (i.e. only the yaml packages, and the data.tree package can be used to enable additional functionality).
This package provides tools to fit Rasch models (RM), linear logistic test models (LLTM), rating scale model (RSM), linear rating scale models (LRSM), partial credit models (PCM), and linear partial credit models (LPCM). Missing values are allowed in the data matrix. Additional features are the ML estimation of the person parameters, Andersen's LR-test, item-specific Wald test, Martin-Loef-Test, nonparametric Monte-Carlo Tests, itemfit and personfit statistics including infit and outfit measures, ICC and other plots, automated stepwise item elimination, and a simulation module for various binary data matrices.
Draw posterior samples to estimate the precision matrix for multivariate Gaussian data. Posterior means of the samples is the graphical horseshoe estimate by Li, Bhadra and Craig(2017) <arXiv:1707.06661>
. The function uses matrix decomposition and variable change from the Bayesian graphical lasso by Wang(2012) <doi:10.1214/12-BA729>, and the variable augmentation for sampling under the horseshoe prior by Makalic and Schmidt(2016) <arXiv:1508.03884>
. Structure of the graphical horseshoe function was inspired by the Bayesian graphical lasso function using blocked sampling, authored by Wang(2012) <doi:10.1214/12-BA729>.
This is a complete suite to estimate models based on moment conditions. It includes the two step Generalized method of moments (Hansen 1982; <doi:10.2307/1912775>), the iterated GMM and continuous updated estimator (Hansen, Eaton and Yaron 1996; <doi:10.2307/1392442>) and several methods that belong to the Generalized Empirical Likelihood family of estimators (Smith 1997; <doi:10.1111/j.0013-0133.1997.174.x>, Kitamura 1997; <doi:10.1214/aos/1069362388>, Newey and Smith 2004; <doi:10.1111/j.1468-0262.2004.00482.x>, and Anatolyev 2005 <doi:10.1111/j.1468-0262.2005.00601.x>).
This package provides a novel decision tree algorithm in the hypothesis testing framework. The algorithm examines the distribution difference between two child nodes over all possible binary partitions. The test statistic of the hypothesis testing is equivalent to the generalized energy distance, which enables the algorithm to be more powerful in detecting the complex structure, not only the mean difference. It is applicable for numeric, nominal, ordinal explanatory variables and the response in general metric space of strong negative type. The algorithm has superior performance compared to other tree models in type I error, power, prediction accuracy, and complexity.
Based on the aggregated shares retained by individual firms or actors within a market or space, the Herfindahl-Hirschman Index (HHI) measures the level of concentration in a space. This package allows for intuitive and straightforward computation of HHI scores, requiring placement of objects of interest directly into the function. The package also includes a plot function for quick visual display of an HHI time series using any measure of time (year, quarter, month, etc.). For usage, please cite the Journal of Open Source Software paper associated with the package: Waggoner, Philip D. (2018) <doi:10.21105/joss.00828>.
This package provides functions to identify plausible and replicable factor structures for a set of variables via k-fold cross validation. The process combines the exploratory and confirmatory factor analytic approach to scale development (Flora & Flake, 2017) <doi:10.1037/cbs0000069> with a cross validation technique that maximizes the available data (Hastie, Tibshirani, & Friedman, 2009) <isbn:978-0-387-21606-5>. Also available are functions to determine k by drawing on power analytic techniques for covariance structures (MacCallum
, Browne, & Sugawara, 1996) <doi:10.1037/1082-989X.1.2.130>, generate model syntax, and summarize results in a report.
The Structural Topic and Sentiment-Discourse (STS) model allows researchers to estimate topic models with document-level metadata that determines both topic prevalence and sentiment-discourse. The sentiment-discourse is modeled as a document-level latent variable for each topic that modulates the word frequency within a topic. These latent topic sentiment-discourse variables are controlled by the document-level metadata. The STS model can be useful for regression analysis with text data in addition to topic modelingâ s traditional use of descriptive analysis. The method was developed in Chen and Mankad (2024) <doi:10.1287/mnsc.2022.00261>.
This package provides a collection of integrated tools designed to seamlessly interact with each other for the analysis of biogenic silica bSi
in inland and marine sediments. These tools share common data representations and follow a consistent API design. The primary goal of the bSi
package is to simplify the installation process, facilitate data loading, and enable the analysis of multiple samples for biogenic silica fluxes. This package is designed to enhance the efficiency and coherence of the entire bSi
analytic workflow, from data loading to model construction and visualization tailored towards reconstructing productivity in aquatic ecosystems.
This package produces statistical indicators of the impact of migration on the socio-demographic composition of an area. Three measures can be used: ratios, percentages and the Duncan index of dissimilarity. The input data files are assumed to be in an origin-destination matrix format, with each cell representing a flow count between an origin and a destination area. Columns are expected to represent origins, and rows are expected to represent destinations. The first row and column are assumed to contain labels for each area. See Rodriguez-Vignoli and Rowe (2018) <doi:10.1080/00324728.2017.1416155> for technical details.
The aim of the package is two-fold: (i) To implement the MMD method for attribution of individuals to sources using the Hamming distance between multilocus genotypes. (ii) To select informative genetic markers based on information theory concepts (entropy, mutual information and redundancy). The package implements the functions introduced by Perez-Reche, F. J., Rotariu, O., Lopes, B. S., Forbes, K. J. and Strachan, N. J. C. Mining whole genome sequence data to efficiently attribute individuals to source populations. Scientific Reports 10, 12124 (2020) <doi:10.1038/s41598-020-68740-6>. See more details and examples in the README file.
This package provides methods for decomposing seasonal data: STR (a Seasonal-Trend time series decomposition procedure based on Regression) and Robust STR. In some ways, STR is similar to Ridge Regression and Robust STR can be related to LASSO. They allow for multiple seasonal components, multiple linear covariates with constant, flexible and seasonal influence. Seasonal patterns (for both seasonal components and seasonal covariates) can be fractional and flexible over time; moreover they can be either strictly periodic or have a more complex topology. The methods provide confidence intervals for the estimated components. The methods can also be used for forecasting.
An implementation of the RuleFit
algorithm as described in Friedman & Popescu (2008) <doi:10.1214/07-AOAS148>. eXtreme
Gradient Boosting ('XGBoost') is used to build rules, and glmnet is used to fit a sparse linear model on the raw and rule features. The result is a model that learns similarly to a tree ensemble, while often offering improved interpretability and achieving improved scoring runtime in live applications. Several algorithms for reducing rule complexity are provided, most notably hyperrectangle de-overlapping. All algorithms scale to several million rows and support sparse representations to handle tens of thousands of dimensions.
This package implements Meng's data defect index (ddi), which represents the degree of sample bias relative to an iid sample. The data defect correlation (ddc) represents the correlation between the outcome of interest and the selection into the sample; when the sample selection is independent across the population, the ddc is zero. Details are in Meng (2018) <doi:10.1214/18-AOAS1161SF>, "Statistical Paradises and Paradoxes in Big Data (I): Law of Large Populations, Big Data Paradox, and the 2016 US Presidential Election." Survey estimates from the Cooperative Congressional Election Study (CCES) is included to replicate the article's results.