Multilevel models (mixed effects models) are the statistical tool of choice for analyzing multilevel data (Searle et al, 2009). These models account for the correlated nature of observations within higher level units by adding group-level error terms that augment the singular residual error of a standard OLS regression. Multilevel and mixed effects models often require specialized data pre-processing and further post-estimation derivations and graphics to gain insight into model results. The package presented here, mlmtools', is a suite of pre- and post-estimation tools for multilevel models in R'. Package implements post-estimation tools designed to work with models estimated using lme4''s (Bates et al., 2014) lmer()
function, which fits linear mixed effects regression models. Searle, S. R., Casella, G., & McCulloch
, C. E. (2009, ISBN:978-0470009598). Bates, D., Mächler, M., Bolker, B., & Walker, S. (2014) <doi:10.18637/jss.v067.i01>.
Screen for and analyze non-linear sparse direct effects in the presence of unobserved confounding using the spectral deconfounding techniques (Ä evid, Bühlmann, and Meinshausen (2020)<jmlr.org/papers/v21/19-545.html>, Guo, Ä evid, and Bühlmann (2022) <doi:10.1214/21-AOS2152>). These methods have been shown to be a good estimate for the true direct effect if we observe many covariates, e.g., high-dimensional settings, and we have fairly dense confounding. Even if the assumptions are violated, it seems like there is not much to lose, and the deconfounded models will, in general, estimate a function closer to the true one than classical least squares optimization. SDModels provides functions SDAM()
for Spectrally Deconfounded Additive Models (Scheidegger, Guo, and Bühlmann (2025) <doi:10.1145/3711116>) and SDForest()
for Spectrally Deconfounded Random Forests (Ulmer, Scheidegger, and Bühlmann (2025) <doi:10.48550/arXiv.2502.03969>
).
The methods discussed in this package are new non-parametric methods based on sequential normal scores SNS (Conover et al (2017) <doi:10.1080/07474946.2017.1360091>), designed for sequences of observations, usually time series data, which may occur singly or in batches, and may be univariate or multivariate. These methods are designed to detect changes in the process, which may occur as changes in location (mean or median), changes in scale (standard deviation, or variance), or other changes of interest in the distribution of the observations, over the time observed. They usually apply to large data sets, so computations need to be simple enough to be done in a reasonable time on a computer, and easily updated as each new observation (or batch of observations) becomes available. Some examples and more detail in SNS is presented in the work by Conover et al (2019) <arXiv:1901.04443>
.
We developed a lightweight machine learning tool for RNA profiling of acute lymphoblastic leukemia (ALL), however, it can be used for any problem where multiple classes need to be identified from multi-dimensional data. The methodology is described in Makinen V-P, Rehn J, Breen J, Yeung D, White DL (2022) Multi-cohort transcriptomic subtyping of B-cell acute lymphoblastic leukemia, International Journal of Molecular Sciences 23:4574, <doi:10.3390/ijms23094574>. The classifier contains optimized mean profiles of the classes (centroids) as observed in the training data, and new samples are matched to these centroids using the shortest Euclidean distance. Centroids derived from a dataset of 1,598 ALL patients are included, but users can train the models with their own data as well. The output includes both numerical and visual presentations of the classification results. Samples with mixed features from multiple classes or atypical values are also identified.
This package provides advanced statistical methods to describe and predict customers purchase behavior in a non-contractual setting. It uses historic transaction records to fit a probabilistic model, which then allows to compute quantities of managerial interest on a cohort- as well as on a customer level (Customer Lifetime Value, Customer Equity, P(alive), etc.). This package complements the BTYD package by providing several additional buy-till-you-die models, that have been published in the marketing literature, but whose implementation are complex and non-trivial. These models are: NBD [Ehrenberg (1959) <doi:10.2307/2985810>], MBG/NBD [Batislam et al (2007) <doi:10.1016/j.ijresmar.2006.12.005>], (M)BG/CNBD-k [Reutterer et al (2020) <doi:10.1016/j.ijresmar.2020.09.002>], Pareto/NBD (HB) [Abe (2009) <doi:10.1287/mksc.1090.0502>] and Pareto/GGG [Platzer and Reutterer (2016) <doi:10.1287/mksc.2015.0963>].
Offers a set of tools for visualizing and analyzing size and power properties of the test for equal predictive accuracy, the Diebold-Mariano test that is based on heteroskedasticity and autocorrelation-robust (HAR) inference. A typical HAR inference is involved with non-parametric estimation of the long-run variance, and one of its tuning parameters, the truncation parameter, trades off a size and power. Lazarus, Lewis, and Stock (2021)<doi:10.3982/ECTA15404> theoretically characterize the size-power frontier for the Gaussian multivariate location model. ForeComp
computes and visualizes the finite-sample size-power frontier of the Diebold-Mariano test based on fixed-b asymptotics together with the Bartlett kernel. To compute the finite-sample size and power, it works with the best approximating ARMA process to the given dataset. It informs the user how their choice of the truncation parameter performs and how robust the testing outcomes are.
This package provides functions for computing test subscores using different methods in both classical test theory (CTT) and item response theory (IRT). This package enables three types of subscoring methods within the framework of CTT and IRT, including (1) Wainer's augmentation method (Wainer et. al., 2001) <doi:10.4324/9781410604729>, (2) Haberman's subscoring methods (Haberman, 2008) <doi:10.3102/1076998607302636>, and (3) Yen's objective performance index (OPI; Yen, 1987) <https://www.ets.org/research/policy_research_reports/publications/paper/1987/hrap>. It also includes functions to compute Proportional Reduction of Mean Squared Errors (PRMSEs) in Haberman's methods which are used to examine whether test subscores are of added value. In addition, the package includes a function to assess the local independence assumption of IRT with Yen's Q3 statistic (Yen, 1984 <doi:10.1177/014662168400800201>; Yen, 1993 <doi:10.1111/j.1745-3984.1993.tb00423.x>).
In the observational study design stage, matching/weighting methods are conducted. However, when many background variables are present, the decision as to which variables to prioritize for matching/weighting is not trivial. Thus, the joint treatment-outcome variable importance plots are created to guide variable selection. The joint variable importance plots enhance variable comparisons via unadjusted bias curves derived under the omitted variable bias framework. The plots translate variable importance into recommended values for tuning parameters in existing methods. Post-matching and/or weighting plots can also be used to visualize and assess the quality of the observational study design. The method motivation and derivation is presented in "Prioritizing Variables for Observational Study Design using the Joint Variable Importance Plot" by Liao et al. (2024) <doi:10.1080/00031305.2024.2303419>. See the package paper by Liao and Pimentel (2024) <doi:10.21105/joss.06093> for a beginner friendly user introduction.
This package contains functions for multiple imputation which complements existing functionality in R. In particular, several imputation methods for the mice package (van Buuren & Groothuis-Oudshoorn, 2011, <doi:10.18637/jss.v045.i03>) are implemented. Main features of the miceadds package include plausible value imputation (Mislevy, 1991, <doi:10.1007/BF02294457>), multilevel imputation for variables at any level or with any number of hierarchical and non-hierarchical levels (Grund, Luedtke & Robitzsch, 2018, <doi:10.1177/1094428117703686>; van Buuren, 2018, Ch.7, <doi:10.1201/9780429492259>), imputation using partial least squares (PLS) for high dimensional predictors (Robitzsch, Pham & Yanagida, 2016), nested multiple imputation (Rubin, 2003, <doi:10.1111/1467-9574.00217>), substantive model compatible imputation (Bartlett et al., 2015, <doi:10.1177/0962280214521348>), and features for the generation of synthetic datasets (Reiter, 2005, <doi:10.1111/j.1467-985X.2004.00343.x>; Nowok, Raab, & Dibben, 2016, <doi:10.18637/jss.v074.i11>).
User-friendly analysis of hierarchical multinomial processing tree (MPT) models that are often used in cognitive psychology. Implements the latent-trait MPT approach (Klauer, 2010) <DOI:10.1007/s11336-009-9141-0> and the beta-MPT approach (Smith & Batchelder, 2010) <DOI:10.1016/j.jmp.2009.06.007> to model heterogeneity of participants. MPT models are conveniently specified by an .eqn-file as used by other MPT software and data are provided by a .csv-file or directly in R. Models are either fitted by calling JAGS or by an MPT-tailored Gibbs sampler in C++ (only for nonhierarchical and beta MPT models). Provides tests of heterogeneity and MPT-tailored summaries and plotting functions. A detailed documentation is available in Heck, Arnold, & Arnold (2018) <DOI:10.3758/s13428-017-0869-7> and a tutorial on MPT modeling can be found in Schmidt, Erdfelder, & Heck (2022) <DOI:10.31234/osf.io/gh8md>.
This package provides functions to compute various clinical scores used in healthcare. These include the Charlson Comorbidity Index (CCI), predicting 10-year survival in patients with multiple comorbidities; the EPICES score, an individual indicator of precariousness considering its multidimensional nature; the MELD score for chronic liver disease severity; the Alternative Fistula Risk Score (a-FRS) for postoperative pancreatic fistula risk; and the Distal Pancreatectomy Fistula Risk Score (D-FRS) for risk following distal pancreatectomy. For detailed methodology, refer to Charlson et al. (1987) <doi:10.1016/0021-9681(87)90171-8> , Sass et al. (2006) <doi:10.1007/s10332-006-0131-5>, Kamath et al. (2001) <doi:10.1053/jhep.2001.22172>, Kim et al. (2008) <doi:10.1056/NEJMoa0801209> Kim et al. (2021) <doi:10.1053/j.gastro.2021.08.050>, Mungroop et al. (2019) <doi:10.1097/SLA.0000000000002620>, and de Pastena et al. (2023) <doi:10.1097/SLA.0000000000005497>..
Computes fungible coefficients and Monte Carlo data. Underlying theory for these functions is described in the following publications: Waller, N. (2008). Fungible Weights in Multiple Regression. Psychometrika, 73(4), 691-703, <DOI:10.1007/s11336-008-9066-z>. Waller, N. & Jones, J. (2009). Locating the Extrema of Fungible Regression Weights. Psychometrika, 74(4), 589-602, <DOI:10.1007/s11336-008-9087-7>. Waller, N. G. (2016). Fungible Correlation Matrices: A Method for Generating Nonsingular, Singular, and Improper Correlation Matrices for Monte Carlo Research. Multivariate Behavioral Research, 51(4), 554-568. Jones, J. A. & Waller, N. G. (2015). The normal-theory and asymptotic distribution-free (ADF) covariance matrix of standardized regression coefficients: theoretical extensions and finite sample behavior. Psychometrika, 80, 365-378, <DOI:10.1007/s11336-013-9380-y>. Waller, N. G. (2018). Direct Schmid-Leiman transformations and rank-deficient loadings matrices. Psychometrika, 83, 858-870. <DOI:10.1007/s11336-017-9599-0>.
Approximate Bayesian regularization using Gaussian approximations. The input is a vector of estimates and a Gaussian error covariance matrix of the key parameters. Bayesian shrinkage is then applied to obtain parsimonious solutions. The method is described on Karimova, van Erp, Leenders, and Mulder (2024) <DOI:10.31234/osf.io/2g8qm>. Gibbs samplers are used for model fitting. The shrinkage priors that are supported are Gaussian (ridge) priors, Laplace (lasso) priors (Park and Casella, 2008 <DOI:10.1198/016214508000000337>), and horseshoe priors (Carvalho, et al., 2010; <DOI:10.1093/biomet/asq017>). These priors include an option for grouped regularization of different subsets of parameters (Meier et al., 2008; <DOI:10.1111/j.1467-9868.2007.00627.x>). F priors are used for the penalty parameters lambda^2 (Mulder and Pericchi, 2018 <DOI:10.1214/17-BA1092>). This correspond to half-Cauchy priors on lambda (Carvalho, Polson, Scott, 2010 <DOI:10.1093/biomet/asq017>).
Functionality for reliability estimates. For unidimensional tests: Coefficient alpha, Guttman's lambda-2/-4/-6, the Greatest lower bound and coefficient omega_u ('unidimensional') in a Bayesian and a frequentist version. For multidimensional tests: omega_t (total) and omega_h (hierarchical). The results include confidence and credible intervals, the probability of a coefficient being larger than a cutoff, and a check for the factor models, necessary for the omega coefficients. The method for the Bayesian unidimensional estimates, except for omega_u, is sampling from the posterior inverse Wishart for the covariance matrix based measures (see Murphy', 2007, <https://groups.seas.harvard.edu/courses/cs281/papers/murphy-2007.pdf>. The Bayesian omegas (u, t, and h) are obtained by Gibbs sampling from the conditional posterior distributions of (1) the single factor model, (2) the second-order factor model, (3) the bi-factor model, (4) the correlated factor model ('Lee', 2007, <doi:10.1002/9780470024737>).
Programmatic connection to the OpenAltimetry
API <https://openaltimetry.earthdatacloud.nasa.gov/data/openapi/swagger-ui/index.html/> to download and process ATL03 (Global Geolocated Photon Data), ATL06 (Land Ice Height), ATL07 (Sea Ice Height), ATL08 (Land and Vegetation Height), ATL10 (Sea Ice Freeboard), ATL12 (Ocean Surface Height) and ATL13 (Inland Water Surface Height) ICESat-2 Altimeter Data. The user has the option to download the data by selecting a bounding box from a 1- or 5-degree grid globally utilizing a shiny application. The ICESat-2 mission collects altimetry data of the Earth's surface. The sole instrument on ICESat-2 is the Advanced Topographic Laser Altimeter System (ATLAS) instrument that measures ice sheet elevation change and sea ice thickness, while also generating an estimate of global vegetation biomass. ICESat-2 continues the important observations of ice-sheet elevation change, sea-ice freeboard, and vegetation canopy height begun by ICESat in 2003.
This package provides a comprehensive framework for calculating unbiased distances in datasets containing mixed-type variables (numerical and categorical). The package implements a general formulation that ensures multivariate additivity and commensurability, meaning that variables contribute equally to the overall distance regardless of their type, scale, or distribution. Supports multiple distance measures including Gower's distance, Euclidean distance, Manhattan distance, and various categorical variable distances such as simple matching, Eskin, occurrence frequency, and association-based distances. Provides tools for variable scaling (standard deviation, range, robust range, and principal component scaling), and handles both independent and association-based category dissimilarities. Implements methods to correct for biases that typically arise from different variable types, distributions, and number of categories. Particularly useful for cluster analysis, data visualization, and other distance-based methods when working with mixed data. Methods based on van de Velden et al. (2024) <doi:10.48550/arXiv.2411.00429>
"Unbiased mixed variables distance".
This package provides a set of functions designed to calculate the standardised precipitation and standardised precipitation evapotranspiration indices using NASA POWER data as described in Blain et al. (2023) <doi:10.2139/ssrn.4442843>. These indices are calculated using a reference data source. The functions verify if the indices estimates meet the assumption of normality and how well NASA POWER estimates represent real-world data. Indices are calculated in a routine mode. Potential evapotranspiration amounts and the difference between rainfall and potential evapotranspiration are also calculated. The functions adopt a basic time scale that splits each month into four periods. Days 1 to 7, days 8 to 14, days 15 to 21, and days 22 to 28, 29, 30, or 31, where TS=4 corresponds to a 1-month length moving window (calculated 4 times per month) and TS=48 corresponds to a 12-month length moving window (calculated 4 times per month).
msPurity
R package was developed to: 1) Assess the spectral quality of fragmentation spectra by evaluating the "precursor ion purity". 2) Process fragmentation spectra. 3) Perform spectral matching. What is precursor ion purity? -What we call "Precursor ion purity" is a measure of the contribution of a selected precursor peak in an isolation window used for fragmentation. The simple calculation involves dividing the intensity of the selected precursor peak by the total intensity of the isolation window. When assessing MS/MS spectra this calculation is done before and after the MS/MS scan of interest and the purity is interpolated at the recorded time of the MS/MS acquisition. Additionally, isotopic peaks can be removed, low abundance peaks are removed that are thought to have limited contribution to the resulting MS/MS spectra and the isolation efficiency of the mass spectrometer can be used to normalise the intensities used for the calculation.
The Satellite Application Facility on Climate Monitoring (CM SAF) is a ground segment of the European Organization for the Exploitation of Meteorological Satellites (EUMETSAT) and one of EUMETSATs Satellite Application Facilities. The CM SAF contributes to the sustainable monitoring of the climate system by providing essential climate variables related to the energy and water cycle of the atmosphere (<https://www.cmsaf.eu>). It is a joint cooperation of eight National Meteorological and Hydrological Services. The cmsafops R-package provides a collection of R-operators for the analysis and manipulation of CM SAF NetCDF
formatted data. Other CF conform NetCDF
data with time, longitude and latitude dimension should be applicable, but there is no guarantee for an error-free application. CM SAF climate data records are provided for free via (<https://wui.cmsaf.eu/safira>). Detailed information and test data are provided on the CM SAF webpage (<http://www.cmsaf.eu/R_toolbox>).
This package implements fractional differencing with Autoregressive Moving Average models to analyse long-memory time series data. Traditional ARIMA models typically use integer values for differencing, which are suitable for time series with short memory or anti-persistent behaviour. In contrast, the Fractional ARIMA model allows fractional differencing, enabling it to effectively capture long memory characteristics in time series data. The âfracARMAâ
package is user-friendly and allows users to manually input the fractional differencing parameter, which can be obtained using various estimators such as the GPH estimator, Sperio method, or Wavelet method and many. Additionally, the package enables users to directly feed the time series data, AR order, MA order, fractional differencing parameter, and the proportion of training data as a split ratio, all in a single command. The package is based on the reference from the paper of Irshad and others (2024, <doi:10.22271/maths.2024.v9.i6b.1906>).
Plots U-Pb data on Wetherill and Tera-Wasserburg concordia diagrams. Calculates concordia and discordia ages. Performs linear regression of measurements with correlated errors using York', Titterington', Ludwig and Omnivariant Generalised Least-Squares ('OGLS') approaches. Generates Kernel Density Estimates (KDEs) and Cumulative Age Distributions (CADs). Produces Multidimensional Scaling (MDS) configurations and Shepard plots of multi-sample detrital datasets using the Kolmogorov-Smirnov distance as a dissimilarity measure. Calculates 40Ar/39Ar ages, isochrons, and age spectra. Computes weighted means accounting for overdispersion. Calculates U-Th-He (single grain and central) ages, logratio plots and ternary diagrams. Processes fission track data using the external detector method and LA-ICP-MS, calculates central ages and plots fission track and other data on radial (a.k.a. Galbraith') plots. Constructs total Pb-U, Pb-Pb, Th-Pb, K-Ca, Re-Os, Sm-Nd, Lu-Hf, Rb-Sr and 230Th-U isochrons as well as 230Th-U evolution plots.
This package provides functions are provided to facilitate prior elicitation for Bayesian generalised linear models using independent conditional means priors. The package supports the elicitation of multivariate normal priors for generalised linear models. The approach can be applied to indirect elicitation for a generalised linear model that is linear in the parameters. The package is designed such that the facilitator executes functions within the R console during the elicitation session to provide graphical and numerical feedback at each design point. Various methodologies for eliciting fractiles (equivalently, percentiles or quantiles) are supported, including versions of the approach of Hosack et al. (2017) <doi:10.1016/j.ress.2017.06.011>. For example, experts may be asked to provide central credible intervals that correspond to a certain probability. Or experts may be allowed to vary the probability allocated to the central credible interval for each design point. Additionally, a median may or may not be elicited.
Large-scale matrix-variate data have been widely observed nowadays in various research areas such as finance, signal processing and medical imaging. Modelling matrix-valued data by matrix-elliptical family not only provides a flexible way to handle heavy-tail property and tail dependencies, but also maintains the intrinsic row and column structure of random matrices. We proposed a new tool named matrix Kendall's tau which is efficient for analyzing random elliptical matrices. By applying this new type of Kendellâ s tau to the matrix elliptical factor model, we propose a Matrix-type Robust Two-Step (MRTS) method to estimate the loading and factor spaces. See the details in He at al. (2022) <arXiv:2207.09633>
. In this package, we provide the algorithms for calculating sample matrix Kendall's tau, the MRTS method and the Matrix Kendall's tau Eigenvalue-Ratio (MKER) method which is used for determining the number of factors.
This package implements measures of tree similarity, including information-based generalized Robinson-Foulds distances (Phylogenetic Information Distance, Clustering Information Distance, Matching Split Information Distance; Smith 2020) <doi:10.1093/bioinformatics/btaa614>; Jaccard-Robinson-Foulds distances (Bocker et al. 2013) <doi:10.1007/978-3-642-40453-5_13>, including the Nye et al. (2006) metric <doi:10.1093/bioinformatics/bti720>; the Matching Split Distance (Bogdanowicz & Giaro 2012) <doi:10.1109/TCBB.2011.48>; Maximum Agreement Subtree distances; the Kendall-Colijn (2016) distance <doi:10.1093/molbev/msw124>, and the Nearest Neighbour Interchange (NNI) distance, approximated per Li et al. (1996) <doi:10.1007/3-540-61332-3_168>. Includes tools for visualizing mappings of tree space (Smith 2022) <doi:10.1093/sysbio/syab100>, for identifying islands of trees (Silva and Wilkinson 2021) <doi:10.1093/sysbio/syab015>, for calculating the median of sets of trees, and for computing the information content of trees and splits.