The first stand-alone R package for computation of latent correlation that takes into account all variable types (continuous/binary/ordinal/zero-inflated), comes with an optimized memory footprint, and is computationally efficient, essentially making latent correlation estimation almost as fast as rank-based correlation estimation. The estimation is based on latent copula Gaussian models. For continuous/binary types, see Fan, J., Liu, H., Ning, Y., and Zou, H. (2017). For ternary type, see Quan X., Booth J.G. and Wells M.T. (2018) <doi:10.48550/arXiv.1809.06255>. For truncated type or zero-inflated type, see Yoon G., Carroll R.J. and Gaynanova I. (2020) <doi:10.1093/biomet/asaa007>. For approximation method of computation, see Yoon G., Müller C.L. and Gaynanova I. (2021) <doi:10.1080/10618600.2021.1882468>. The latter method uses multi-linear interpolation originally implemented in the R package <https://cran.r-project.org/package=chebpol>.
Several classes for moment-based models are defined. The classes are defined for moment conditions derived from a single equation or a system of equations. The conditions can also be expressed as functions or formulas. Several methods are also offered to facilitate the development of different estimation techniques. The methods that are currently provided are the Generalized method of moments (Hansen 1982; <doi:10.2307/1912775>), for single equations and systems of equation, and the Generalized Empirical Likelihood (Smith 1997; <doi:10.1111/j.0013-0133.1997.174.x>, Kitamura 1997; <doi:10.1214/aos/1069362388>, Newey and Smith 2004; <doi:10.1111/j.1468-0262.2004.00482.x>, and Anatolyev 2005 <doi:10.1111/j.1468-0262.2005.00601.x>). Some work is being done to add tools to deal with weak and/or many instruments. This includes K-Class estimators (Limited Information Maximum Likelihood and Fuller), Anderson and Rubin statistic test, etc.
This package provides new tools for analyzing discrete trait data integrating bio-ontologies and phylogenetics. It expands on the previous work of Tarasov et al. (2019) <doi:10.1093/isd/ixz009>. The PARAMO pipeline allows to reconstruct ancestral phenomes treating groups of morphological traits as a single complex character. The pipeline incorporates knowledge from ontologies during the amalgamation of individual character stochastic maps. Here we expand the current PARAMO functionality by adding new statistical methods for inferring evolutionary phenome dynamics using non-homogeneous Poisson process (NHPP). The new functionalities include: (1) reconstruction of evolutionary rate shifts of phenomes across lineages and time; (2) reconstruction of morphospace dynamics through time; and (3) estimation of rates of phenome evolution at different levels of anatomical hierarchy (e.g., entire body or specific regions only). The package also includes user-friendly tools for visualizing evolutionary rates of different anatomical regions using vector images of the organisms of interest.
An easy-to-use tool for working with presence/absence tests on pooled or grouped samples. The primary application is for estimating prevalence of a marker in a population based on the results of tests on pooled specimens. This sampling method is often employed in surveillance of rare conditions in humans or animals (e.g. molecular xenomonitoring). The package was initially conceived as an R-based alternative to the molecular xenomonitoring software, PoolScreen <https://sites.uab.edu/statgenetics/software/>. However, it goes further, allowing for estimates of prevalence to be adjusted for hierarchical sampling frames, and perform flexible mixed-effect regression analyses (McLure et al. Environmental Modelling and Software. <DOI:10.1016/j.envsoft.2021.105158>). The package is currently in early stages, however more features are planned or in the works: e.g. adjustments for imperfect test specificity/sensitivity, functions for helping with optimal experimental design, and functions for spatial modelling.
Generative modeling for protein engineering is key to solving fundamental problems in synthetic biology, medicine, and material science. Machine learning has enabled us to generate useful protein sequences on a variety of scales. Generative models are machine learning methods which seek to model the distribution underlying the data, allowing for the generation of novel samples with similar properties to those on which the model was trained. Generative models of proteins can learn biologically meaningful representations helpful for a variety of downstream tasks. Furthermore, they can learn to generate protein sequences that have not been observed before and to assign higher probability to protein sequences that satisfy desired criteria. In this package, common deep generative models for protein sequences, such as variational autoencoder (VAE), generative adversarial networks (GAN), and autoregressive models are available. In the VAE and GAN, the Word2vec is used for embedding. The transformer encoder is applied to protein sequences for the autoregressive model.
Allows clinicians and researchers to compute daily dose (and subsequently days supply) for prescription refills using the following methods: Fixed window, fixed tablet, defined daily dose (DDD), and Random Effects Warfarin Days Supply (REWarDS). Daily dose is the computed dose that the patient takes every day. For medications with fixed dosing (e.g. direct oral anticoagulants) this is known and does not need to be estimated. For medications with varying dose such as warfarin, however, the daily dose should be assumed or estimated to allow measurement of drug exposure. Daysâ supply is the number of days that patientsâ supply of medication will last after each prescription fill. Estimating daysâ supply is necessary to calculate drug exposure. The package computes daysâ supply and daily dose at both the prescription and patient levels. Results at the prescription level are denoted with â -Rx-â and those at patient level are denoted with â -Pt-â .
It fits a univariate left, right, or interval censored linear regression model with autoregressive errors, considering the normal or the Student-t distribution for the innovations. It provides estimates and standard errors of the parameters, predicts future observations, and supports missing values on the dependent variable. References used for this package: Schumacher, F. L., Lachos, V. H., & Dey, D. K. (2017). Censored regression models with autoregressive errors: A likelihood-based perspective. Canadian Journal of Statistics, 45(4), 375-392 <doi:10.1002/cjs.11338>. Schumacher, F. L., Lachos, V. H., Vilca-Labra, F. E., & Castro, L. M. (2018). Influence diagnostics for censored regression models with autoregressive errors. Australian & New Zealand Journal of Statistics, 60(2), 209-229 <doi:10.1111/anzs.12229>. Valeriano, K. A., Schumacher, F. L., Galarza, C. E., & Matos, L. A. (2024). Censored autoregressive regression models with Studentâ t innovations. Canadian Journal of Statistics, 52(3), 804-828 <doi:10.1002/cjs.11804>.
Processes noble gas mass spectrometer data to determine the isotopic composition of argon (comprised of Ar36, Ar37, Ar38, Ar39 and Ar40) released from neutron-irradiated potassium-bearing minerals. Then uses these compositions to calculate precise and accurate geochronological ages for multiple samples as well as the covariances between them. Error propagation is done in matrix form, which jointly treats all samples and all isotopes simultaneously at every step of the data reduction process. Includes methods for regression of the time-resolved mass spectrometer signals to t=0 ('time zero') for both single- and multi-collector instruments, blank correction, mass fractionation correction, detector intercalibration, decay corrections, interference corrections, interpolation of the irradiation parameter between neutron fluence monitors, and (weighted mean) age calculation. All operations are performed on the logs of the ratios between the different argon isotopes so as to properly treat them as compositional data', sensu Aitchison [1986, The Statistics of Compositional Data, Chapman and Hall].
Genome-wide association studies (GWAS) are widely used to investigate the genetic basis of diseases and traits, but they pose many computational challenges. The R package SNPRelate provides a binary format for single-nucleotide polymorphism (SNP) data in GWAS utilizing CoreArray Genomic Data Structure (GDS) data files. The GDS format offers the efficient operations specifically designed for integers with two bits, since a SNP could occupy only two bits. SNPRelate is also designed to accelerate two key computations on SNP data using parallel computing for multi-core symmetric multiprocessing computer architectures: Principal Component Analysis (PCA) and relatedness analysis using Identity-By-Descent measures. The SNP GDS format is also used by the GWASTools package with the support of S4 classes and generic functions. The extended GDS format is implemented in the SeqArray package to support the storage of single nucleotide variations (SNVs), insertion/deletion polymorphism (indel) and structural variation calls in whole-genome and whole-exome variant data.
An important environmental impact on running water ecosystems is caused by hydropeaking - the discontinuous release of turbine water because of peaks of energy demand. An event-based algorithm is implemented to detect flow fluctuations referring to increase events (IC) and decrease events (DC). For each event, a set of parameters related to the fluctuation intensity is calculated. The framework is introduced in Greimel et al. (2016) "A method to detect and characterize sub-daily flow fluctuations" <doi:10.1002/hyp.10773> and can be used to identify different fluctuation types according to the potential source: e.g., sub-daily flow fluctuations caused by hydropeaking, rainfall, or snow and glacier melt. This is a companion to the package hydroroute', which is used to detect and follow hydropower plant-specific hydropeaking waves at the sub-catchment scale and to describe how hydropeaking flow parameters change along the longitudinal flow path as proposed and validated in Greimel et al. (2022).
An index is created using a mathematical model that transforms multi-dimensional variables into a single value. These variables are often correlated, and while PCA-based indices can address the issue of multicollinearity, they typically do not account for survey weights, which can lead to inaccurate rankings of survey units such as households, districts, or states. To resolve this, the current package facilitates the development of a principal component analysis-based composite index by incorporating survey weights for each sample observation. This ensures the generation of a survey-weighted principal component-based normalized composite index. Additionally, the package provides a normalized principal component-based composite index and ranks the sample observations based on the values of the composite indices. For method details see, Skinner, C. J., Holmes, D. J. and Smith, T. M. F. (1986) <DOI:10.1080/01621459.1986.10478336>, Singh, D., Basak, P., Kumar, R. and Ahmad, T. (2023) <DOI:10.3389/fams.2023.1274530>.
This package performs non-parametric tests of parametric specifications. Five tests are available. Specific bandwidth and kernel methods can be chosen along with many other options. Allows parallel computing to quickly compute p-values based on the bootstrap. Methods implemented in the package are H.J. Bierens (1982) <doi:10.1016/0304-4076(82)90105-1>, J.C. Escanciano (2006) <doi:10.1017/S0266466606060506>, P.L. Gozalo (1997) <doi:10.1016/S0304-4076(97)86571-2>, P. Lavergne and V. Patilea (2008) <doi:10.1016/j.jeconom.2007.08.014>, P. Lavergne and V. Patilea (2012) <doi:10.1198/jbes.2011.07152>, J.H. Stock and M.W. Watson (2006) <doi:10.1111/j.1538-4616.2007.00014.x>, C.F.J. Wu (1986) <doi:10.1214/aos/1176350142>, J. Yin, Z. Geng, R. Li, H. Wang (2010) <https://www.jstor.org/stable/24309002> and J.X. Zheng (1996) <doi:10.1016/0304-4076(95)01760-7>.
This package provides a suite of machine learning algorithms written in C++ with the R interface contains several learning techniques for classification and regression. Predictive models include e.g., classification and regression trees with optional constructive induction and models in the leaves, random forests, kNN, naive Bayes, and locally weighted regression. All predictions obtained with these models can be explained and visualized with the ExplainPrediction package. This package is especially strong in feature evaluation where it contains several variants of Relief algorithm and many impurity based attribute evaluation functions, e.g., Gini, information gain, MDL, and DKM. These methods can be used for feature selection or discretization of numeric attributes. The OrdEval algorithm and its visualization is used for evaluation of data sets with ordinal features and class, enabling analysis according to the Kano model of customer satisfaction. Several algorithms support parallel multithreaded execution via OpenMP. The top-level documentation is reachable through ?CORElearn.
High-performance implementation of the Modified Hodrick-Prescott (HP) Filter for decomposing macroeconomic time series into trend and cyclical components. Based on the methodology of Choudhary, Hanif and Iqbal (2014) <doi:10.1080/00036846.2014.894631> "On smoothing macroeconomic time series using the modified HP filter", which uses generalized cross-validation (GCV) to automatically select the optimal smoothing parameter lambda, following McDermott (1997) "An automatic method for choosing the smoothing parameter in the HP filter" (as described in Coe and McDermott (1997) <doi:10.2307/3867497>). Unlike the standard HP filter that uses fixed lambda values (1600 for quarterly, 100 for annual data), this package estimates series-specific lambda values that minimize the GCV criterion. Implements efficient C++ routines via RcppArmadillo for fast computation, supports batch processing of multiple series, and provides comprehensive visualization tools using ggplot2'. Particularly useful for cross-country macroeconomic comparisons, business cycle analysis, and when the appropriate smoothing parameter is uncertain.
This package provides tools to simulate and analyse datasets of social interactions between individuals using hierarchical Bayesian models implemented in Stan. Model fitting is performed via the rstan package. Users can generate realistic interaction data where individual phenotypes influence and respond to those of their partners, with control over sampling design parameters such as the number of individuals, partners, and repeated dyads. The simulation framework allows flexible control over variation and correlation in mean trait values, social responsiveness, and social impact, making it suitable for research on interacting phenotypes and on direct and indirect genetic effects ('DGEs and IGEs'). The package also includes functions to fit and compare alternative models of social effects, including impactâ responsiveness, varianceâ partitioning, and trait-based models, and to summarise model performance in terms of bias and dispersion. For a more detailed description of the available models and impactâ responsiveness, see the accompanying article Wijnhorst et al. (2026) <doi:10.1093/jeb/voag013>.
This package provides tools for shoreline dating coastal Stone Age sites. The implemented method was developed in Roalkvam (2023) <doi:10.1016/j.quascirev.2022.107880> for the Norwegian Skagerrak coast. Although it can be extended to other areas, this also forms the core area for application of the package. Shoreline dating is based on the present-day elevation of a site, a reconstruction of past relative sea-level change, and empirically derived estimates of the likely elevation of the sites above the contemporaneous sea-level when they were in use. The geographical and temporal coverage of the method thus follows from the availability of local geological reconstructions of shoreline displacement and the degree to which the settlements to be dated have been located on or close to the shoreline when they were in use. Methods for numerical treatment and visualisation of the dates are provided, along with basic tools for visualising and evaluating the location of sites.
We present a rank-based Mercer kernel to compute a pair-wise similarity metric corresponding to informative representation of data. We tailor the development of a kernel to encode our prior knowledge about the data distribution over a probability space. The philosophical concept behind our construction is that objects whose feature values fall on the extreme of that featureâ s probability mass distribution are more similar to each other, than objects whose feature values lie closer to the mean. Semblance emphasizes features whose values lie far away from the mean of their probability distribution. The kernel relies on properties empirically determined from the data and does not assume an underlying distribution. The use of feature ranks on a probability space ensures that Semblance is computational efficacious, robust to outliers, and statistically stable, thus making it widely applicable algorithm for pattern analysis. The output from the kernel is a square, symmetric matrix that gives proximity values between pairs of observations.
Estimation and inference for multiple kink quantile regression for longitudinal data and the i.i.d data. A bootstrap restarting iterative segmented quantile algorithm is proposed to estimate the multiple kink quantile regression model conditional on a given number of change points. The number of kinks is also allowed to be unknown. In such case, the backward elimination algorithm and the bootstrap restarting iterative segmented quantile algorithm are combined to select the number of change points based on a quantile BIC. For longitudinal data, we also develop the GEE estimator to incorporate the within-subject correlations. A score-type based test statistic is also developed for testing the existence of kink effect. The package is based on the paper, ``Wei Zhong, Chuang Wan and Wenyang Zhang (2022). Estimation and inference for multikink quantile regression, JBES and ``Chuang Wan, Wei Zhong, Wenyang Zhang and Changliang Zou (2022). Multi-kink quantile regression for longitudinal data with application to progesterone data analysis, Biometrics".
This package provides functions to calculate the minimum and maximum possible values of Cronbach's alpha when item-level missing data are present. Cronbach's alpha (Cronbach, 1951 <doi:10.1007/BF02310555>) is one of the most widely used measures of internal consistency in the social, behavioral, and medical sciences (Bland & Altman, 1997 <doi:10.1136/bmj.314.7080.572>; Tavakol & Dennick, 2011 <doi:10.5116/ijme.4dfb.8dfd>). However, conventional implementations assume complete data, and listwise deletion is often applied when missingness occurs, which can lead to biased or overly optimistic reliability estimates (Enders, 2003 <doi:10.1037/1082-989X.8.3.322>). This package implements computational strategies including enumeration, Monte Carlo sampling, and optimization algorithms (e.g., Genetic Algorithm, Differential Evolution, Sequential Least Squares Programming) to obtain sharp lower and upper bounds of Cronbach's alpha under arbitrary missing data patterns. The approach is motivated by Manski's partial identification framework and pessimistic bounding ideas from optimization literature.
This package provides functions for model fitting and selection of generalised hypergeometric ensembles of random graphs (gHypEG). To learn how to use it, check the vignettes for a quick tutorial. Please reference its use as Casiraghi, G., Nanumyan, V. (2019) <doi:10.5281/zenodo.2555300> together with those relevant references from the one listed below. The package is based on the research developed at the Chair of Systems Design, ETH Zurich. Casiraghi, G., Nanumyan, V., Scholtes, I., Schweitzer, F. (2016) <doi:10.48550/arXiv.1607.02441>. Casiraghi, G., Nanumyan, V., Scholtes, I., Schweitzer, F. (2017) <doi:10.1007/978-3-319-67256-4_11>. Casiraghi, G., (2017) <doi:10.48550/arXiv.1702.02048>. Brandenberger, L., Casiraghi, G., Nanumyan, V., Schweitzer, F. (2019) <doi:10.1145/3341161.3342926>. Casiraghi, G. (2019) <doi:10.1007/s41109-019-0241-1>. Casiraghi, G., Nanumyan, V. (2021) <doi:10.1038/s41598-021-92519-y>. Casiraghi, G. (2021) <doi:10.1088/2632-072X/ac0493>.
Carry out Bayesian estimation and forecasting for a variety of stochastic mortality models using vague prior distributions. Models supported include numerous well-established approaches introduced in the actuarial and demographic literature, such as the Lee-Carter (1992) <doi:10.1080/01621459.1992.10475265>, the Cairns-Blake-Dowd (2009) <doi:10.1080/10920277.2009.10597538>, the Li-Lee (2005) <doi:10.1353/dem.2005.0021>, and the Plat (2009) <doi:10.1016/j.insmatheco.2009.08.006> models. The package is designed to analyse stratified mortality data structured as a 3-dimensional array of dimensions p à A à T (strata à age à year). Stratification can represent factors such as cause of death, country, deprivation level, sex, geographic region, insurance product, marital status, socioeconomic group, or smoking behavior. While the primary focus is on analysing stratified data (p > 1), the package can also handle mortality data that are not stratified (p = 1). Model selection via the Deviance Information Criterion (DIC) is supported.
This package provides functions to assess complex heterogeneity in the strength of a surrogate marker with respect to multiple baseline covariates, in either a randomized treatment setting or observational setting. For a randomized treatment setting, the functions assess and test for heterogeneity using both a parametric model and a semiparametric two-step model. More details for the randomized setting are available in: Knowlton, R., Tian, L., & Parast, L. (2025). "A General Framework to Assess Complex Heterogeneity in the Strength of a Surrogate Marker," Statistics in Medicine, 44(5), e70001 <doi:10.1002/sim.70001>. For an observational setting, functions in this package assess complex heterogeneity in the strength of a surrogate marker using meta-learners, with options for different base learners. More details for the observational setting will be available in the future in: Knowlton, R., Parast, L. (2025) "Assessing Surrogate Heterogeneity in Real World Data Using Meta-Learners." A tutorial for this package can be found at <https://www.laylaparast.com/cohetsurr>.
Perform variable selection for the spatial Poisson regression model under the adaptive elastic net penalty. Spatial count data with covariates is the input. We use a spatial Poisson regression model to link the spatial counts and covariates. For maximization of the likelihood under adaptive elastic net penalty, we implemented the penalized quasi-likelihood (PQL) and the approximate penalized loglikelihood (APL) methods. The proposed methods can automatically select important covariates, while adjusting for possible spatial correlations among the responses. More details are available in Xie et al. (2018, <arXiv:1809.06418>). The package also contains the Lyme disease dataset, which consists of the disease case data from 2006 to 2011, and demographic data and land cover data in Virginia. The Lyme disease case data were collected by the Virginia Department of Health. The demographic data (e.g., population density, median income, and average age) are from the 2010 census. Land cover data were obtained from the Multi-Resolution Land Cover Consortium for 2006.
Extending the functionalities of the VGAM package with additional functions and datasets. At present, VGAMextra comprises new family functions (ffs) to estimate several time series models by maximum likelihood using Fisher scoring, unlike popular packages in CRAN relying on optim(), including ARMA-GARCH-like models, the Order-(p, d, q) ARIMAX model (non- seasonal), the Order-(p) VAR model, error correction models for cointegrated time series, and ARMA-structures with Student-t errors. For independent data, new ffs to estimate the inverse- Weibull, the inverse-gamma, the generalized beta of the second kind and the general multivariate normal distributions are available. In addition, VGAMextra incorporates new VGLM-links for the mean-function, and the quantile-function (as an alternative to ordinary quantile modelling) of several 1-parameter distributions, that are compatible with the class of VGLM/VGAM family functions. Currently, only fixed-effects models are implemented. All functions are subject to change; see the NEWS for further details on the latest changes.