This package provides support for all calendars as specified in the Climate and Forecast (CF) Metadata Conventions for climate and forecasting data. The CF Metadata Conventions is widely used for distributing files with climate observations or projections, including the Coupled Model Intercomparison Project (CMIP) data used by climate change scientists and the Intergovernmental Panel on Climate Change (IPCC). This package specifically allows the user to work with any of the CF-compliant calendars (many of which are not compliant with POSIXt). The CF time coordinate is formally defined in the CF Metadata Conventions document.
This package provides two methods of estimating income inequality statistics from binned income data, such as the income data provided in the Census. These methods use different interpolation techniques to infer the distribution of incomes within income bins. One method is an implementation of Jargowsky and Wheeler's mean-constrained integration over brackets (MCIB). The other method is based on a new technique, Lorenz interpolation, which estimates income inequality by constructing an interpolated Lorenz curve based on the binned income data. These methods can be used to estimate three income inequality measures: the Gini (the default measure returned), the Theil, and the Atkinson's index. Jargowsky and Wheeler (2018) <doi:10.1177/0081175018782579>.
Extremely efficient procedures for fitting the entire group lasso and group elastic net regularization path for GLMs, multinomial, the Cox model and multi-task Gaussian models. Similar to the R package glmnet in scope of models, and in computational speed. This package provides R bindings to the C++ code underlying the corresponding Python package adelie'. These bindings offer a general purpose group elastic net solver, a wide range of matrix classes that can exploit special structure to allow large-scale inputs, and an assortment of generalized linear model classes for fitting various types of data. The package is an implementation of Yang, J. and Hastie, T. (2024) <doi:10.48550/arXiv.2405.08631>
.
This package provides a set of functions for conducting cognitive diagnostic computerized adaptive testing applications (Chen, 2009) <DOI:10.1007/s11336-009-9123-2>). It includes different item selection rules such us the global discrimination index (Kaplan, de la Torre, and Barrada (2015) <DOI:10.1177/0146621614554650>) and the nonparametric selection method (Chang, Chiu, and Tsai (2019) <DOI:10.1177/0146621618813113>), as well as several stopping rules. Functions for generating item banks and responses are also provided. To guide item bank calibration, model comparison at the item level can be conducted using the two-step likelihood ratio test statistic by Sorrel, de la Torre, Abad and Olea (2017) <DOI:10.1027/1614-2241/a000131>.
Efficient algorithms for fitting generalized linear and additive models with group elastic net penalties as described in Helwig (2025) <doi:10.1080/10618600.2024.2362232>. Implements group LASSO, group MCP, and group SCAD with an optional group ridge penalty. Computes the regularization path for linear regression (gaussian), multivariate regression (multigaussian), smoothed support vector machines (svm1), squared support vector machines (svm2), logistic regression (binomial), multinomial logistic regression (multinomial), log-linear count regression (poisson and negative.binomial), and log-linear continuous regression (gamma and inverse gaussian). Supports default and formula methods for model specification, k-fold cross-validation for tuning the regularization parameters, and nonparametric regression via tensor product reproducing kernel (smoothing spline) basis function expansion.
Enables the user to calculate Value at Risk (VaR
) and Expected Shortfall (ES) by means of various types of historical simulation. Currently plain-, age-, volatility-weighted- and filtered historical simulation are implemented in this package. Volatility weighting can be carried out via an exponentially weighted moving average model (EWMA) or other GARCH-type models. The performance can be assessed via Traffic Light Test, Coverage Tests and Loss Functions. The methods of the package are described in Gurrola-Perez, P. and Murphy, D. (2015) <https://EconPapers.repec.org/RePEc:boe:boeewp:0525>
as well as McNeil
, J., Frey, R., and Embrechts, P. (2015) <https://ideas.repec.org/b/pup/pbooks/10496.html>.
Simplifies and largely automates practical voice analytics for social science research. This package offers an accessible and easy-to-use interface, including an interactive Shiny app, that simplifies the processing, extraction, analysis, and reporting of voice recording data in the behavioral and social sciences. The package includes batch processing capabilities to read and analyze multiple voice files in parallel, automates the extraction of key vocal features for further analysis, and automatically generates APA formatted reports for typical between-group comparisons in experimental social science research. A more extensive methodological introduction that inspired the development of the voiceR
package is provided in Hildebrand et al. 2020 <doi:10.1016/j.jbusres.2020.09.020>.
Cochran-Mantel-Haenszel methods (Cochran (1954) <doi:10.2307/3001616>; Mantel and Haenszel (1959) <doi:10.1093/jnci/22.4.719>; Landis et al. (1978) <doi:10.2307/1402373>) are a suite of tests applicable to categorical data. A competitor to those tests is the procedure of Nonparametric ANOVA which was initially introduced in Rayner and Best (2013) <doi:10.1111/anzs.12041>. The methodology was then extended in Rayner et al. (2015) <doi:10.1111/anzs.12113>. This package employs functions related to both methodologies and serves as an accompaniment to the book: An Introduction to Cochranâ Mantelâ Haenszel and Non-Parametric ANOVA. The package also contains the data sets used in that text.
This package provides functions to describe sampling and diversity dynamics of fossil occurrence datasets (e.g. from the Paleobiology Database). The package includes methods to calculate range- and occurrence-based metrics of taxonomic richness, extinction and origination rates, along with traditional sampling measures. A powerful subsampling tool is also included that implements frequently used sampling standardization methods in a multiple bin-framework. The plotting of time series and the occurrence data can be simplified by the functions incorporated in the package, as well as other calculations, such as environmental affinities and extinction selectivity testing. Details can be found in: Kocsis, A.T.; Reddin, C.J.; Alroy, J. and Kiessling, W. (2019) <doi:10.1101/423780>.
User friendly interface based on the R package gstat to fit exponential parametric models to empirical semi-variograms in order to model the spatial correlation structure of health data. Geo-located health outcomes of survey participants may be used to model spatial effects on health in an ego-centred approach. The package contains a range of functions to help explore the spatial structure of the data as well as visualize the fit of exponential models for various metaparameter combinations with respect to the number of lag intervals and maximal distance. Furthermore, the outcome of interest can be adjusted for covariates by fitting a linear regression in a preliminary step before the semi-variogram fitting process.
Samples generalized random product graphs, a generalization of a broad class of network models. Given matrices X, S, and Y with with non-negative entries, samples a matrix with expectation X S Y^T and independent Poisson or Bernoulli entries using the fastRG
algorithm of Rohe et al. (2017) <https://www.jmlr.org/papers/v19/17-128.html>. The algorithm first samples the number of edges and then puts them down one-by-one. As a result it is O(m) where m is the number of edges, a dramatic improvement over element-wise algorithms that which require O(n^2) operations to sample a random graph, where n is the number of nodes.
Fits the lifespan datasets of biological systems such as yeast, fruit flies, and other similar biological units with well-known finite mixture models introduced by Farewell V. (1982) <doi:10.2307/2529885> and Al-Hussaini et al. (2000) <doi:10.1080/00949650008812033>. Estimates parameter space fitting of a lifespan dataset with finite mixtures of parametric distributions. Computes the following tasks; 1) Estimates parameter space of the finite mixture model by implementing the expectation maximization (EM) algorithm. 2) Finds a sequence of four goodness-of-fit measures consist of Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), Kolmogorov-Smirnov (KS), and log-likelihood (log-likelihood) statistics. 3)The initial values is determined by k-means clustering.
This package provides a subgroup identification method for precision medicine based on quantitative objectives. This method can handle continuous, binary and survival endpoint for both prognostic and predictive case. For the predictive case, the method aims at identifying a subgroup for which treatment is better than control by at least a pre-specified or auto-selected constant. For the prognostic case, the method aims at identifying a subgroup that is at least better than a pre-specified/auto-selected constant. The derived signature is a linear combination of predictors, and the selected subgroup are subjects with the signature > 0. The false discover rate when no true subgroup exists is controlled at a user-specified level.
EPE's (Empresa de Pesquisa Energética) 4MD (Modelo de Mercado da Micro e Minigeração Distribuà da - Micro and Mini Distributed Generation Market Model) model to forecast the adoption of Distributed Generation. Given the user's assumptions, it is possible to estimate how many consumer units will have distributed generation in Brazil over the next 10 years, for example. In addition, it is possible to estimate the installed capacity, the amount of investments that will be made in the country and the monthly energy contribution of this type of generation. <https://www.epe.gov.br/sites-pt/publicacoes-dados-abertos/publicacoes/PublicacoesArquivos/publicacao-689/topico-639/NT_Metodologia_4MD_PDE_2032_VF.pdf>
.
Scalable Bayesian clustering of categorical datasets. The package implements a hierarchical Dirichlet (Process) mixture of multinomial distributions. It is thus a probabilistic latent class model (LCM) and can be used to reduce the dimensionality of hierarchical data and cluster individuals into latent classes. It can automatically infer an appropriate number of latent classes or find k classes, as defined by the user. The model is based on a paper by Dunson and Xing (2009) <doi:10.1198/jasa.2009.tm08439>, but implements a scalable variational inference algorithm so that it is applicable to large datasets. It is described and tested in the accompanying paper by Ahlmann-Eltze and Yau (2018) <doi:10.1109/DSAA.2018.00068>.
The utility of this package is in simulating mixtures of Gaussian distributions with different levels of overlap between mixture components. Pairwise overlap, defined as a sum of two misclassification probabilities, measures the degree of interaction between components and can be readily employed to control the clustering complexity of datasets simulated from mixtures. These datasets can then be used for systematic performance investigation of clustering and finite mixture modeling algorithms. Among other capabilities of MixSim
', there are computing the exact overlap for Gaussian mixtures, simulating Gaussian and non-Gaussian data, simulating outliers and noise variables, calculating various measures of agreement between two partitionings, and constructing parallel distribution plots for the graphical display of finite mixture models.
Calculate and compare the prediction probability (PK) values for Anesthetic Depth Indicators. The PK values are widely used for measuring the performance of anesthetic depth and were first proposed by the group of Dr. Warren D. Smith in the paper Warren D. Smith; Robert C. Dutton; Ty N. Smith (1996) <doi:10.1097/00000542-199601000-00005> and Warren D. Smith; Robert C. Dutton; Ty N. Smith (1996) <doi:10.1002/(SICI)1097-0258(19960615)15:11%3C1199::AID-SIM218%3E3.0.CO;2-Y>. The authors provided two Microsoft Excel files in xls format for calculating and comparing PK values. This package provides an easy-to-use API for calculating and comparing PK values in R.
This is an R implementation of a constrained l1 minimization approach for estimating multiple Sparse Gaussian or Nonparanormal Graphical Models (SIMULE). The SIMULE algorithm can be used to estimate multiple related precision matrices. For instance, it can identify context-specific gene networks from multi-context gene expression datasets. By performing data-driven network inference from high-dimensional and heterogenous data sets, this tool can help users effectively translate aggregated data into knowledge that take the form of graphs among entities. Please run demo(simuleDemo
) to learn the basic functions provided by this package. For further details, please read the original paper: Beilun Wang, Ritambhara Singh, Yanjun Qi (2017) <DOI:10.1007/s10994-017-5635-7>.
This package provides a process-oriented and trajectory-based Discrete-Event Simulation (DES) package for R. It is designed as a generic yet powerful framework. The architecture encloses a robust and fast simulation core written in C++ with automatic monitoring capabilities. It provides a rich and flexible R API that revolves around the concept of trajectory, a common path in the simulation model for entities of the same type. Documentation about simmer is provided by several vignettes included in this package, via the paper by Ucar, Smeets & Azcorra (2019, <doi:10.18637/jss.v090.i02>), and the paper by Ucar, Hernández, Serrano & Azcorra (2018, <doi:10.1109/MCOM.2018.1700960>); see citation("simmer") for details.
This package provides movies to help students to understand statistical concepts. The rpanel package <https://cran.r-project.org/package=rpanel> is used to create interactive plots that move to illustrate key statistical ideas and methods. There are movies to: visualise probability distributions (including user-supplied ones); illustrate sampling distributions of the sample mean (central limit theorem), the median, the sample maximum (extremal types theorem) and (the Fisher transformation of the) product moment correlation coefficient; examine the influence of an individual observation in simple linear regression; illustrate key concepts in statistical hypothesis testing. Also provided are dpqr functions for the distribution of the Fisher transformation of the correlation coefficient under sampling from a bivariate normal distribution.
This package provides a collection of tools for clinical trial data management and analysis in research and teaching. The package is mainly collected for personal use, but any use beyond that is encouraged. This package has migrated functions from agdamsbo/daDoctoR
', and new functions has been added. Version follows months and year. See NEWS/Changelog for release notes. This package includes sampled data from the TALOS trial (Kraglund et al (2018) <doi:10.1161/STROKEAHA.117.020067>). The win_prob()
function is based on work by Zou et al (2022) <doi:10.1161/STROKEAHA.121.037744>. The age_calc()
function is based on work by Becker (2020) <doi:10.18637/jss.v093.i02>.
This package provides a RangedSummarizedExperiment
object of read counts in genes for an RNA-Seq experiment on four human airway smooth muscle cell lines treated with dexamethasone. Details on the gene model and read counting procedure are provided in the package vignette. The citation for the experiment is: Himes BE, Jiang X, Wagner P, Hu R, Wang Q, Klanderman B, Whitaker RM, Duan Q, Lasky-Su J, Nikolos C, Jester W, Johnson M, Panettieri R Jr, Tantisira KG, Weiss ST, Lu Q. RNA-Seq Transcriptome Profiling Identifies CRISPLD2 as a Glucocorticoid Responsive Gene that Modulates Cytokine Function in Airway Smooth Muscle Cells. PLoS
One. 2014 Jun 13;9(6):e99625. PMID: 24926665. GEO: GSE52778.
The Langmuir and Freundlich adsorption isotherms are pivotal in characterizing adsorption processes, essential across various scientific disciplines. Proper interpretation of adsorption isotherms involves robust fitting of data to the models, accurate estimation of parameters, and efficiency evaluation of the models, both in linear and non-linear forms. For researchers and practitioners in the fields of chemistry, environmental science, soil science, and engineering, a comprehensive package that satisfies all these requirements would be ideal for accurate and efficient analysis of adsorption data, precise model selection and validation for rigorous scientific inquiry and real-world applications. Details can be found in Langmuir (1918) <doi:10.1021/ja02242a004> and Giles (1973) <doi:10.1111/j.1478-4408.1973.tb03158.x>.
Calculates the fused extended two-way fixed effects (FETWFE) estimator for unbiased and efficient estimation of difference-in-differences in panel data with staggered treatment adoption. This estimator eliminates bias inherent in conventional two-way fixed effects estimators, while also employing a novel bridge regression regularization approach to improve efficiency and yield valid standard errors. Also implements extended TWFE (etwfe) and bridge-penalized ETWFE (betwfe). Provides S3 classes for streamlined workflow and supports flexible tuning (ridge and rank-condition guarantees), automatic covariate centering/scaling, and detailed overall and cohort-specific effect estimates with valid standard errors. Includes simulation and formatting utilities, extensive diagnostic tools, vignettes, and examples. See Faletto (2025) (<doi:10.48550/arXiv.2312.05985>
).