This package provides functions for the design process of survey sampling, with specific tools for multi-wave and multi-phase designs. Perform optimum allocation using Neyman (1934) <doi:10.2307/2342192> or Wright (2012) <doi:10.1080/00031305.2012.733679> allocation, split strata based on quantiles or values of known variables, randomly select samples from strata, allocate sampling waves iteratively, and organize a complex survey design. Also includes a Shiny application for observing the effects of different strata splits. A paper on this package was published in the Journal of Statistical Software <doi:10.18637/jss.v114.i10>.
This package provides a set of concise and efficient tools for statistical production. Can also be used for data management. In statistical production, you deal with complex data and need to control your process at each step of your work. Concise functions are very helpful, because you do not hesitate to use them. The following functions are included in the package. dup checks duplicates. miss checks missing values. tac computes contingency table of all columns. toc compares two tables, spotting significant deviations. chi2_find compares columns within a data.frame, spotting related categories of (a more complex function).
This package provides tools for exploratory process data analysis. Process data refers to the data describing participants problem-solving processes in computer-based assessments. It is often recorded in computer log files. This package provides functions to read, process, and write process data. It also implements two feature extraction methods to compress the information stored in process data into standard numerical vectors. This package also provides recurrent neural network based models that relate response processes with other binary or scale variables of interest. The functions that involve training and evaluating neural networks are wrappers of functions in keras'.
This takes spatial single-cell-type RNA-seq data (specifically designed for Slide-seq v2) that calls copy number alterations (CNAs) using pseudo-spatial binning, clusters cellular units (e.g. beads) based on CNA profile, and visualizes spatial CNA patterns. Documentation about SlideCNA is included in the the pre-print by Zhang et al. (2022, <doi:10.1101/2022.11.25.517982>). The package enrichR (>= 3.0), conditionally used to annotate SlideCNA-determined clusters with gene ontology terms, can be installed at <https://github.com/wjawaid/enrichR> or with install_github("wjawaid/enrichR").
MiDAS is a R package for immunogenetics data transformation and statistical analysis. MiDAS accepts input data in the form of HLA alleles and KIR types, and can transform it into biologically meaningful variables, enabling HLA amino acid fine mapping, analyses of HLA evolutionary divergence, KIR gene presence, as well as validated HLA-KIR interactions. Further, it allows comprehensive statistical association analysis workflows with phenotypes of diverse measurement scales. MiDAS closes a gap between the inference of immunogenetic variation and its efficient utilization to make relevant discoveries related to T cell, Natural Killer cell, and disease biology.
Estimation of functional spaces based on traits of organisms. The package includes functions to impute missing trait values (with or without considering phylogenetic information), and to create, represent and analyse two dimensional functional spaces based on principal components analysis, other ordination methods, or raw traits. It also allows for mapping a third variable onto the functional space. See Carmona et al. (2021) <doi:10.1038/s41586-021-03871-y>, Puglielli et al. (2021) <doi:10.1111/nph.16952>, Carmona et al. (2021) <doi:10.1126/sciadv.abf2675>, Carmona et al. (2019) <doi:10.1002/ecy.2876> for more information.
This package provides a higher-level interface to the torch package for defining, training, and fine-tuning neural networks, including its depth, powered by code generation. This package supports few to several architectures, including feedforward (multi-layer perceptron) and recurrent neural networks (Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU)), while also reduces boilerplate torch code while enabling seamless integration with torch'. The model methods to train neural networks from this package also bridges to titanic ML frameworks in R, namely tidymodels ecosystem, which enables the parsnip model specifications, workflows, recipes, and tuning tools.
Analysis of musical scales (& modes, grooves, etc.) in the vein of Sherrill 2025 <doi:10.1215/00222909-11595194>. The initials MCT in the package title refer to the article's title: "Modal Color Theory." Offers support for conventional musical pitch class set theory as developed by Forte (1973, ISBN: 9780300016109) and David Lewin (1987, ISBN: 9780300034936), as well as for the continuous geometries of Callender, Quinn, & Tymoczko (2008) <doi:10.1126/science.1153021>. Identifies structural properties of scales and calculates derived values (sign vector, color number, brightness ratio, etc.). Creates plots such as "brightness graphs" which visualize these properties.
This package provides a series of numerical methods for extracting parameters of distributions for risks based on knowing the expected value and c-statistics (e.g., from a published report on the performance of a risk prediction model). This package implements the methodology described in Sadatsafavi et al (2024) <doi:10.48550/arXiv.2409.09178>. The core of the package is mcmap(), which takes a pair of (mean, c-statistic) and the distribution type requested. This function provides a generic interface to more customized functions (mcmap_beta(), mcmap_logitnorm(), mcmap_probitnorm()) for specific distributions.
Simulate genotypes in SNP (single nucleotide polymorphisms) Matrix as random numbers from an uniform distribution, for diploid organisms (coded by 0, 1, 2), Sikorska et al., (2013) <doi:10.1186/1471-2105-14-166>, or half-sib/full-sib SNP matrix from real or simulated parents SNP data, assuming mendelian segregation. Simulate phenotypic traits for real or simulated SNP data, controlled by a specific number of quantitative trait loci and their effects, sampled from a Normal or an Uniform distributions, assuming a pure additive model. This is useful for testing association and genomic prediction models or for educational purposes.
This package provides a minimalist implementation of model stacking by Wolpert (1992) <doi:10.1016/S0893-6080(05)80023-1> for boosted tree models. A classic, two-layer stacking model is implemented, where the first layer generates features using gradient boosting trees, and the second layer employs a logistic regression model that uses these features as inputs. Utilities for training the base models and parameters tuning are provided, allowing users to experiment with different ensemble configurations easily. It aims to provide a simple and efficient way to combine multiple gradient boosting models to improve predictive model performance and robustness.
The stochastic (also called on-line) version of the Self-Organising Map (SOM) algorithm is provided. Different versions of the algorithm are implemented, for numeric and relational data and for contingency tables as described, respectively, in Kohonen (2001) <isbn:3-540-67921-9>, Olteanu & Villa-Vialaneix (2005) <doi:10.1016/j.neucom.2013.11.047> and Cottrell et al (2004) <doi:10.1016/j.neunet.2004.07.010>. The package also contains many plotting features (to help the user interpret the results), can handle (and impute) missing values and is delivered with a graphical user interface based on shiny'.
This package implements methods for inference on potential waning of vaccine efficacy and for estimation of vaccine efficacy at a user-specified time after vaccination based on data from a randomized, double-blind, placebo-controlled vaccine trial in which participants may be unblinded and placebo subjects may be crossed over to the study vaccine. The methods also allow adjustment for possible confounding via inverse probability weighting through specification of models for the trial entry process, unblinding mechanisms, and the probability an unblinded placebo participant accepts study vaccine: Tsiatis, A. A. and Davidian, M. (2022) <doi:10.1111/biom.13509>.
This package provides a spline based scRNA-seq method for identifying differentially variable (DV) genes across two experimental conditions. Spline-DV constructs a 3D spline from 3 key gene statistics: mean expression, coefficient of variance, and dropout rate. This is done for both conditions. The 3D spline provides the “expected” behavior of genes in each condition. The distance of the observed mean, CV and dropout rate of each gene from the expected 3D spline is used to measure variability. As the final step, the spline-DV method compares the variabilities of each condition to identify differentially variable (DV) genes.
Integrates methods for epidemiological analysis, modeling, and visualization, including functions for summary statistics, SIR (Susceptible-Infectious-Recovered) modeling, DALY (Disability-Adjusted Life Years) estimation, age standardization, diagnostic test evaluation, NLP (Natural Language Processing) keyword extraction, clinical trial power analysis, survival analysis, SNP (Single Nucleotide Polymorphism) association, and machine learning methods such as logistic regression, k-means clustering, Random Forest, and Support Vector Machine (SVM). Includes datasets for prevalence estimation, SIR modeling, genomic analysis, clinical trials, DALY, diagnostic tests, and survival analysis. Methods are based on Gelman et al. (2013) <doi:10.1201/b16018> and Wickham et al. (2019, ISBN:9781492052040>.
This package provides a collection of miscellaneous helper function for running multilevel/mixed models in lme4'. This package aims to provide functions to compute common tasks when estimating multilevel models such as computing the intraclass correlation and design effect, centering variables, estimating the proportion of variance explained at each level, pseudo-R squared, random intercept and slope reliabilities, tests for homogeneity of variance at level-1, and cluster robust and bootstrap standard errors. The tests and statistics reported in the package are from Raudenbush & Bryk (2002, ISBN:9780761919049), Hox et al. (2018, ISBN:9781138121362), and Snijders & Bosker (2012, ISBN:9781849202015).
This package provides tools for data-driven statistical analysis using local polynomial regression and kernel density estimation methods as described in Calonico, Cattaneo and Farrell (2018, <doi:10.1080/01621459.2017.1285776>): lprobust() for local polynomial point estimation and robust bias-corrected inference, lpbwselect() for local polynomial bandwidth selection, kdrobust() for kernel density point estimation and robust bias-corrected inference, kdbwselect() for kernel density bandwidth selection, and nprobust.plot() for plotting results. The main methodological and numerical features of this package are described in Calonico, Cattaneo and Farrell (2019, <doi:10.18637/jss.v091.i08>).
Calculate common types of tables for weighted survey data. Options include topline and (2-way and 3-way) crosstab tables of categorical or ordinal data as well as summary tables of weighted numeric variables. Optionally, include the margin of error at selected confidence intervals including the design effect. The design effect is calculated as described by Kish (1965) <doi:10.1002/bimj.19680100122> beginning on page 257. Output takes the form of tibbles (simple data frames). This package conveniently handles labelled data, such as that commonly used by Stata and SPSS. Complex survey design is not supported at this time.
This package implements two tests for same-source of toolmarks. The chumbley_non_random() test follows the paper "An Improved Version of a Tool Mark Comparison Algorithm" by Hadler and Morris (2017) <doi:10.1111/1556-4029.13640>. This is an extension of the Chumbley score as previously described in "Validation of Tool Mark Comparisons Obtained Using a Quantitative, Comparative, Statistical Algorithm" by Chumbley et al (2010) <doi:10.1111/j.1556-4029.2010.01424.x>. fixed_width_no_modeling() is based on correlation measures in a diamond shaped area of the toolmark as described in Hadler (2017).
The model, developed at the Vienna University of Technology, is a lumped conceptual rainfall-runoff model, following the structure of the HBV model. The model can also be run in a semi-distributed fashion and with dual representation of soil layer. The model runs on a daily or shorter time step and consists of a snow routine, a soil moisture routine and a flow routing routine. See Parajka, J., R. Merz, G. Bloeschl (2007) <DOI:10.1002/hyp.6253> Uncertainty and multiple objective calibration in regional water balance modelling: case study in 320 Austrian catchments, Hydrological Processes, 21, 435-446.
Estimation of Panel Quantile Autoregressive Distributed Lag (PQARDL) models that combine panel ARDL methodology with quantile regression. Supports Pooled Mean Group (PMG), Mean Group (MG), and Dynamic Fixed Effects (DFE) estimators across multiple quantiles. Computes long-run cointegrating parameters, error correction term speed of adjustment, half-life of adjustment, and performs Wald tests for parameter equality across quantiles. Based on the econometric frameworks of Pesaran, Shin, and Smith (1999) <doi:10.1080/01621459.1999.10474156>, Cho, Kim, and Shin (2015) <doi:10.1016/j.jeconom.2015.02.030>, and Bildirici and Kayikci (2022) <doi:10.1016/j.energy.2022.124303>.
RNA degradation is monitored through measurement of RNA abundance after inhibiting RNA synthesis. This package has functions and example scripts to facilitate (1) data normalization, (2) data modeling using constant decay rate or time-dependent decay rate models, (3) the evaluation of treatment or genotype effects, and (4) plotting of the data and models. Data Normalization: functions and scripts make easy the normalization to the initial (T0) RNA abundance, as well as a method to correct for artificial inflation of Reads per Million (RPM) abundance in global assessments as the total size of the RNA pool decreases. Modeling: Normalized data is then modeled using maximum likelihood to fit parameters. For making treatment or genotype comparisons (up to four), the modeling step models all possible treatment effects on each gene by repeating the modeling with constraints on the model parameters (i.e., the decay rate of treatments A and B are modeled once with them being equal and again allowing them to both vary independently). Model Selection: The AICc value is calculated for each model, and the model with the lowest AICc is chosen. Modeling results of selected models are then compiled into a single data frame. Graphical Plotting: functions are provided to easily visualize decay data model, or half-life distributions using ggplot2 package functions.
This package provides functions for performing quick observations or evaluations of data, including a variety of ways to list objects by size, class, etc. The functions seqle and reverse.seqle mimic the base rle but can search for linear sequences. The function splatnd allows the user to generate zero-argument commands without the need for makeActiveBinding . Functions provided to convert from any base to any other base, and to find the n-th greatest max or n-th least min. In addition, functions which mimic Unix shell commands, including head', tail ,'pushd ,and popd'. Various other goodies included as well.
This package implements the Polynomial Maximization Method ('PMM') for parameter estimation in linear and time series models when error distributions deviate from normality. The PMM2 variant achieves lower variance parameter estimates compared to ordinary least squares ('OLS') when errors exhibit significant skewness. Includes methods for linear regression, AR'/'MA'/'ARMA'/'ARIMA models, and bootstrap inference. Methodology described in Zabolotnii, Warsza, and Tkachenko (2018) <doi:10.1007/978-3-319-77179-3_75>, Zabolotnii, Tkachenko, and Warsza (2022) <doi:10.1007/978-3-031-03502-9_37>, and Zabolotnii, Tkachenko, and Warsza (2023) <doi:10.1007/978-3-031-25844-2_21>.