Allows clinicians and researchers to compute daily dose (and subsequently days supply) for prescription refills using the following methods: Fixed window, fixed tablet, defined daily dose (DDD), and Random Effects Warfarin Days Supply (REWarDS). Daily dose is the computed dose that the patient takes every day. For medications with fixed dosing (e.g. direct oral anticoagulants) this is known and does not need to be estimated. For medications with varying dose such as warfarin, however, the daily dose should be assumed or estimated to allow measurement of drug exposure. Daysâ supply is the number of days that patientsâ supply of medication will last after each prescription fill. Estimating daysâ supply is necessary to calculate drug exposure. The package computes daysâ supply and daily dose at both the prescription and patient levels. Results at the prescription level are denoted with â -Rx-â and those at patient level are denoted with â -Pt-â .
This package provides functions for model fitting and selection of generalised hypergeometric ensembles of random graphs (gHypEG). To learn how to use it, check the vignettes for a quick tutorial. Please reference its use as Casiraghi, G., Nanumyan, V. (2019) <doi:10.5281/zenodo.2555300> together with those relevant references from the one listed below. The package is based on the research developed at the Chair of Systems Design, ETH Zurich. Casiraghi, G., Nanumyan, V., Scholtes, I., Schweitzer, F. (2016) <arXiv:1607.02441>. Casiraghi, G., Nanumyan, V., Scholtes, I., Schweitzer, F. (2017) <doi:10.1007/978-3-319-67256-4_11>. Casiraghi, G., (2017) <arXiv:1702.02048> Brandenberger, L., Casiraghi, G., Nanumyan, V., Schweitzer, F. (2019) <doi:10.1145/3341161.3342926> Casiraghi, G. (2019) <doi:10.1007/s41109-019-0241-1>. Casiraghi, G., Nanumyan, V. (2021) <doi:10.1038/s41598-021-92519-y>. Casiraghi, G. (2021) <doi:10.1088/2632-072X/ac0493>.
Processes noble gas mass spectrometer data to determine the isotopic composition of argon (comprised of Ar36, Ar37, Ar38, Ar39 and Ar40) released from neutron-irradiated potassium-bearing minerals. Then uses these compositions to calculate precise and accurate geochronological ages for multiple samples as well as the covariances between them. Error propagation is done in matrix form, which jointly treats all samples and all isotopes simultaneously at every step of the data reduction process. Includes methods for regression of the time-resolved mass spectrometer signals to t=0 ('time zero') for both single- and multi-collector instruments, blank correction, mass fractionation correction, detector intercalibration, decay corrections, interference corrections, interpolation of the irradiation parameter between neutron fluence monitors, and (weighted mean) age calculation. All operations are performed on the logs of the ratios between the different argon isotopes so as to properly treat them as compositional data', sensu Aitchison [1986, The Statistics of Compositional Data, Chapman and Hall].
It fits a univariate left, right, or interval censored linear regression model with autoregressive errors, considering the normal or the Student-t distribution for the innovations. It provides estimates and standard errors of the parameters, predicts future observations, and supports missing values on the dependent variable. References used for this package: Schumacher, F. L., Lachos, V. H., & Dey, D. K. (2017). Censored regression models with autoregressive errors: A likelihood-based perspective. Canadian Journal of Statistics, 45(4), 375-392 <doi:10.1002/cjs.11338>. Schumacher, F. L., Lachos, V. H., Vilca-Labra, F. E., & Castro, L. M. (2018). Influence diagnostics for censored regression models with autoregressive errors. Australian & New Zealand Journal of Statistics, 60(2), 209-229 <doi:10.1111/anzs.12229>. Valeriano, K. A., Schumacher, F. L., Galarza, C. E., & Matos, L. A. (2024). Censored autoregressive regression models with Studentâ t innovations. Canadian Journal of Statistics, 52(3), 804-828 <doi:10.1002/cjs.11804>.
Genome-wide association studies (GWAS) are widely used to investigate the genetic basis of diseases and traits, but they pose many computational challenges. The R package SNPRelate provides a binary format for single-nucleotide polymorphism (SNP) data in GWAS utilizing CoreArray Genomic Data Structure (GDS) data files. The GDS format offers the efficient operations specifically designed for integers with two bits, since a SNP could occupy only two bits. SNPRelate is also designed to accelerate two key computations on SNP data using parallel computing for multi-core symmetric multiprocessing computer architectures: Principal Component Analysis (PCA) and relatedness analysis using Identity-By-Descent measures. The SNP GDS format is also used by the GWASTools package with the support of S4 classes and generic functions. The extended GDS format is implemented in the SeqArray package to support the storage of single nucleotide variations (SNVs), insertion/deletion polymorphism (indel) and structural variation calls in whole-genome and whole-exome variant data.
An important environmental impact on running water ecosystems is caused by hydropeaking - the discontinuous release of turbine water because of peaks of energy demand. An event-based algorithm is implemented to detect flow fluctuations referring to increase events (IC) and decrease events (DC). For each event, a set of parameters related to the fluctuation intensity is calculated. The framework is introduced in Greimel et al. (2016) "A method to detect and characterize sub-daily flow fluctuations" <doi:10.1002/hyp.10773> and can be used to identify different fluctuation types according to the potential source: e.g., sub-daily flow fluctuations caused by hydropeaking, rainfall, or snow and glacier melt. This is a companion to the package hydroroute', which is used to detect and follow hydropower plant-specific hydropeaking waves at the sub-catchment scale and to describe how hydropeaking flow parameters change along the longitudinal flow path as proposed and validated in Greimel et al. (2022).
An index is created using a mathematical model that transforms multi-dimensional variables into a single value. These variables are often correlated, and while PCA-based indices can address the issue of multicollinearity, they typically do not account for survey weights, which can lead to inaccurate rankings of survey units such as households, districts, or states. To resolve this, the current package facilitates the development of a principal component analysis-based composite index by incorporating survey weights for each sample observation. This ensures the generation of a survey-weighted principal component-based normalized composite index. Additionally, the package provides a normalized principal component-based composite index and ranks the sample observations based on the values of the composite indices. For method details see, Skinner, C. J., Holmes, D. J. and Smith, T. M. F. (1986) <DOI:10.1080/01621459.1986.10478336>, Singh, D., Basak, P., Kumar, R. and Ahmad, T. (2023) <DOI:10.3389/fams.2023.1274530>.
This package performs non-parametric tests of parametric specifications. Five tests are available. Specific bandwidth and kernel methods can be chosen along with many other options. Allows parallel computing to quickly compute p-values based on the bootstrap. Methods implemented in the package are H.J. Bierens (1982) <doi:10.1016/0304-4076(82)90105-1>, J.C. Escanciano (2006) <doi:10.1017/S0266466606060506>, P.L. Gozalo (1997) <doi:10.1016/S0304-4076(97)86571-2>, P. Lavergne and V. Patilea (2008) <doi:10.1016/j.jeconom.2007.08.014>, P. Lavergne and V. Patilea (2012) <doi:10.1198/jbes.2011.07152>, J.H. Stock and M.W. Watson (2006) <doi:10.1111/j.1538-4616.2007.00014.x>, C.F.J. Wu (1986) <doi:10.1214/aos/1176350142>, J. Yin, Z. Geng, R. Li, H. Wang (2010) <https://www.jstor.org/stable/24309002> and J.X. Zheng (1996) <doi:10.1016/0304-4076(95)01760-7>.
This package provides a suite of machine learning algorithms written in C++ with the R interface contains several learning techniques for classification and regression. Predictive models include e.g., classification and regression trees with optional constructive induction and models in the leaves, random forests, kNN, naive Bayes, and locally weighted regression. All predictions obtained with these models can be explained and visualized with the ExplainPrediction package. This package is especially strong in feature evaluation where it contains several variants of Relief algorithm and many impurity based attribute evaluation functions, e.g., Gini, information gain, MDL, and DKM. These methods can be used for feature selection or discretization of numeric attributes. The OrdEval algorithm and its visualization is used for evaluation of data sets with ordinal features and class, enabling analysis according to the Kano model of customer satisfaction. Several algorithms support parallel multithreaded execution via OpenMP. The top-level documentation is reachable through ?CORElearn.
Weighted Deming regression, also known as "errors-in-variable" regression, is applied with suitable weights. Weights are modeled via a precision profile; functions are provided for implementing it in both known and unknown precision profile situations. The package provides tools for precision profile weighted Deming (PWD) regression. It covers two settings â one where the precision profiles are known either from external studies or from adequate replication of the X and Y readings, and one in which there is a plausible functional form for the precision profiles but the exact function must be estimated from the (generally singlicate) readings. The function set includes tools for: estimated standard errors (via jackknifing); standardized-residual analysis function with regression diagnostic tools for normality, linearity and constant variance; and an outlier analysis identifying significant outliers for closer investigation. Further information on mathematical derivations and applications can be found on arXiv: Hawkins and Kraker (2025) <doi:10.48550/arXiv.2508.02888>.
This package provides tools for shoreline dating coastal Stone Age sites. The implemented method was developed in Roalkvam (2023) <doi:10.1016/j.quascirev.2022.107880> for the Norwegian Skagerrak coast. Although it can be extended to other areas, this also forms the core area for application of the package. Shoreline dating is based on the present-day elevation of a site, a reconstruction of past relative sea-level change, and empirically derived estimates of the likely elevation of the sites above the contemporaneous sea-level when they were in use. The geographical and temporal coverage of the method thus follows from the availability of local geological reconstructions of shoreline displacement and the degree to which the settlements to be dated have been located on or close to the shoreline when they were in use. Methods for numerical treatment and visualisation of the dates are provided, along with basic tools for visualising and evaluating the location of sites.
We present a rank-based Mercer kernel to compute a pair-wise similarity metric corresponding to informative representation of data. We tailor the development of a kernel to encode our prior knowledge about the data distribution over a probability space. The philosophical concept behind our construction is that objects whose feature values fall on the extreme of that featureâ s probability mass distribution are more similar to each other, than objects whose feature values lie closer to the mean. Semblance emphasizes features whose values lie far away from the mean of their probability distribution. The kernel relies on properties empirically determined from the data and does not assume an underlying distribution. The use of feature ranks on a probability space ensures that Semblance is computational efficacious, robust to outliers, and statistically stable, thus making it widely applicable algorithm for pattern analysis. The output from the kernel is a square, symmetric matrix that gives proximity values between pairs of observations.
The goal of this package is to user-friendly realizing Gaussian graphical model-based heterogeneity analysis. Recently, several Gaussian graphical model-based heterogeneity analysis techniques have been developed. A common methodological limitation is that the number of subgroups is assumed to be known a priori, which is not realistic. In a very recent study (Ren et al., 2022), a novel approach based on the penalized fusion technique is developed to fully data-dependently determine the number and structure of subgroups in Gaussian graphical model-based heterogeneity analysis. It opens the door for utilizing the Gaussian graphical model technique in more practical settings. Beyond Ren et al. (2022), more estimations and functions are added, so that the package is self-contained and more comprehensive and can provide ``more direct insights to practitioners (with the visualization function). Reference: Ren, M., Zhang S., Zhang Q. and Ma S. (2022). Gaussian Graphical Model-based Heterogeneity Analysis via Penalized Fusion. Biometrics, 78 (2), 524-535.
Estimation and inference for multiple kink quantile regression for longitudinal data and the i.i.d data. A bootstrap restarting iterative segmented quantile algorithm is proposed to estimate the multiple kink quantile regression model conditional on a given number of change points. The number of kinks is also allowed to be unknown. In such case, the backward elimination algorithm and the bootstrap restarting iterative segmented quantile algorithm are combined to select the number of change points based on a quantile BIC. For longitudinal data, we also develop the GEE estimator to incorporate the within-subject correlations. A score-type based test statistic is also developed for testing the existence of kink effect. The package is based on the paper, ``Wei Zhong, Chuang Wan and Wenyang Zhang (2022). Estimation and inference for multikink quantile regression, JBES and ``Chuang Wan, Wei Zhong, Wenyang Zhang and Changliang Zou (2022). Multi-kink quantile regression for longitudinal data with application to progesterone data analysis, Biometrics".
This package provides functions to calculate the minimum and maximum possible values of Cronbach's alpha when item-level missing data are present. Cronbach's alpha (Cronbach, 1951 <doi:10.1007/BF02310555>) is one of the most widely used measures of internal consistency in the social, behavioral, and medical sciences (Bland & Altman, 1997 <doi:10.1136/bmj.314.7080.572>; Tavakol & Dennick, 2011 <doi:10.5116/ijme.4dfb.8dfd>). However, conventional implementations assume complete data, and listwise deletion is often applied when missingness occurs, which can lead to biased or overly optimistic reliability estimates (Enders, 2003 <doi:10.1037/1082-989X.8.3.322>). This package implements computational strategies including enumeration, Monte Carlo sampling, and optimization algorithms (e.g., Genetic Algorithm, Differential Evolution, Sequential Least Squares Programming) to obtain sharp lower and upper bounds of Cronbach's alpha under arbitrary missing data patterns. The approach is motivated by Manski's partial identification framework and pessimistic bounding ideas from optimization literature.
Carry out Bayesian estimation and forecasting for a variety of stochastic mortality models using vague prior distributions. Models supported include numerous well-established approaches introduced in the actuarial and demographic literature, such as the Lee-Carter (1992) <doi:10.1080/01621459.1992.10475265>, the Cairns-Blake-Dowd (2009) <doi:10.1080/10920277.2009.10597538>, the Li-Lee (2005) <doi:10.1353/dem.2005.0021>, and the Plat (2009) <doi:10.1016/j.insmatheco.2009.08.006> models. The package is designed to analyse stratified mortality data structured as a 3-dimensional array of dimensions p à A à T (strata à age à year). Stratification can represent factors such as cause of death, country, deprivation level, sex, geographic region, insurance product, marital status, socioeconomic group, or smoking behavior. While the primary focus is on analysing stratified data (p > 1), the package can also handle mortality data that are not stratified (p = 1). Model selection via the Deviance Information Criterion (DIC) is supported.
This package provides functions to assess complex heterogeneity in the strength of a surrogate marker with respect to multiple baseline covariates, in either a randomized treatment setting or observational setting. For a randomized treatment setting, the functions assess and test for heterogeneity using both a parametric model and a semiparametric two-step model. More details for the randomized setting are available in: Knowlton, R., Tian, L., & Parast, L. (2025). "A General Framework to Assess Complex Heterogeneity in the Strength of a Surrogate Marker," Statistics in Medicine, 44(5), e70001 <doi:10.1002/sim.70001>. For an observational setting, functions in this package assess complex heterogeneity in the strength of a surrogate marker using meta-learners, with options for different base learners. More details for the observational setting will be available in the future in: Knowlton, R., Parast, L. (2025) "Assessing Surrogate Heterogeneity in Real World Data Using Meta-Learners." A tutorial for this package can be found at <https://www.laylaparast.com/cohetsurr>.
Perform variable selection for the spatial Poisson regression model under the adaptive elastic net penalty. Spatial count data with covariates is the input. We use a spatial Poisson regression model to link the spatial counts and covariates. For maximization of the likelihood under adaptive elastic net penalty, we implemented the penalized quasi-likelihood (PQL) and the approximate penalized loglikelihood (APL) methods. The proposed methods can automatically select important covariates, while adjusting for possible spatial correlations among the responses. More details are available in Xie et al. (2018, <arXiv:1809.06418>). The package also contains the Lyme disease dataset, which consists of the disease case data from 2006 to 2011, and demographic data and land cover data in Virginia. The Lyme disease case data were collected by the Virginia Department of Health. The demographic data (e.g., population density, median income, and average age) are from the 2010 census. Land cover data were obtained from the Multi-Resolution Land Cover Consortium for 2006.
Extending the functionalities of the VGAM package with additional functions and datasets. At present, VGAMextra comprises new family functions (ffs) to estimate several time series models by maximum likelihood using Fisher scoring, unlike popular packages in CRAN relying on optim(), including ARMA-GARCH-like models, the Order-(p, d, q) ARIMAX model (non- seasonal), the Order-(p) VAR model, error correction models for cointegrated time series, and ARMA-structures with Student-t errors. For independent data, new ffs to estimate the inverse- Weibull, the inverse-gamma, the generalized beta of the second kind and the general multivariate normal distributions are available. In addition, VGAMextra incorporates new VGLM-links for the mean-function, and the quantile-function (as an alternative to ordinary quantile modelling) of several 1-parameter distributions, that are compatible with the class of VGLM/VGAM family functions. Currently, only fixed-effects models are implemented. All functions are subject to change; see the NEWS for further details on the latest changes.
This package provides functions in this package provide solution to classical problem in survey methodology - an optimum sample allocation in stratified sampling. In this context, the optimum allocation is in the classical Tschuprow-Neyman's sense and it satisfies additional lower or upper bounds restrictions imposed on sample sizes in strata. There are few different algorithms available to use, and one them is based on popular sample allocation method that applies Neyman allocation to recursively reduced set of strata. This package also provides the function that computes a solution to the minimum cost allocation problem, which is a minor modification of the classical optimum sample allocation. This problem lies in the determination of a vector of strata sample sizes that minimizes total cost of the survey, under assumed fixed level of the stratified estimator's variance. As in the case of the classical optimum allocation, the problem of minimum cost allocation can be complemented by imposing upper-bounds constraints on sample sizes in strata.
Various cladogenesis-related calculations that are slow in pure R are implemented in C++ with Rcpp. These include the calculation of the probability of various scenarios for the inheritance of geographic range at the divergence events on a phylogenetic tree, and other calculations necessary for models which are not continuous-time markov chains (CTMC), but where change instead occurs instantaneously at speciation events. Typically these models must assess the probability of every possible combination of (ancestor state, left descendent state, right descendent state). This means that there are up to (# of states)^3 combinations to investigate, and in biogeographical models, there can easily be hundreds of states, so calculation time becomes an issue. C++ implementation plus clever tricks (many combinations can be eliminated a priori) can greatly speed the computation time over naive R implementations. CITATION INFO: This package is the result of my Ph.D. research, please cite the package if you use it! Type: citation(package="cladoRcpp") to get the citation information.
An implementation to reconstruct individual patient data from Kaplan-Meier (K-M) survival curves, visualize and assess the accuracy of the reconstruction, then perform secondary analysis on the reconstructed data. We involve a simple function to extract the coordinates form the published K-M curves. The function is developed based on Poisot T. â s digitize package (2011) <doi:10.32614/RJ-2011-004> . For more complex and tangled together graphs, digitizing software, such as DigitizeIt (for MAC or windows) or ScanIt'(for windows) can be used to get the coordinates. Additional information should also be involved to increase the accuracy, like numbers of patients at risk (often reported at 5-10 time points under the x-axis of the K-M graph), total number of patients, and total number of events. The package implements the modified iterative K-M estimation algorithm (modified-iKM) improved upon the approach proposed by Guyot (2012) <doi:10.1186/1471-2288-12-9> with some modifications.
We proposes a framework that provides real time support for early detection of anomalous series within a large collection of streaming time series data. By definition, anomalies are rare in comparison to a system's typical behaviour. We define an anomaly as an observation that is very unlikely given the forecast distribution. The algorithm first forecasts a boundary for the system's typical behaviour using a representative sample of the typical behaviour of the system. An approach based on extreme value theory is used for this boundary prediction process. Then a sliding window is used to test for anomalous series within the newly arrived collection of series. Feature based representation of time series is used as the input to the model. To cope with concept drift, the forecast boundary for the system's typical behaviour is updated periodically. More details regarding the algorithm can be found in Talagala, P. D., Hyndman, R. J., Smith-Miles, K., et al. (2019) <doi:10.1080/10618600.2019.1617160>.
Implementation of the classic Genz algorithm and a novel tile-low-rank algorithm for computing relatively high-dimensional multivariate normal (MVN) and Student-t (MVT) probabilities. References used for this package: Foley, James, Andries van Dam, Steven Feiner, and John Hughes. "Computer Graphics: Principle and Practice". Addison-Wesley Publishing Company. Reading, Massachusetts (1987, ISBN:0-201-84840-6 1); Genz, A., "Numerical computation of multivariate normal probabilities," Journal of Computational and Graphical Statistics, 1, 141-149 (1992) <doi:10.1080/10618600.1992.10477010>; Cao, J., Genton, M. G., Keyes, D. E., & Turkiyyah, G. M. "Exploiting Low Rank Covariance Structures for Computing High-Dimensional Normal and Student- t Probabilities," Statistics and Computing, 31.1, 1-16 (2021) <doi:10.1007/s11222-020-09978-y>; Cao, J., Genton, M. G., Keyes, D. E., & Turkiyyah, G. M. "tlrmvnmvt: Computing High-Dimensional Multivariate Normal and Student-t Probabilities with Low-Rank Methods in R," Journal of Statistical Software, 101.4, 1-25 (2022) <doi:10.18637/jss.v101.i04>.