This package provides efficient implementation of the Cross-Covariance Isolate Detect (CCID) methodology for the estimation of the number and location of multiple change-points in the second-order (cross-covariance or network) structure of multivariate, possibly high-dimensional time series. The method is motivated by the detection of change points in functional connectivity networks for functional magnetic resonance imaging (fMRI
), electroencephalography (EEG), magentoencephalography (MEG) and electrocorticography (ECoG
) data. The main routines in the package have been extensively tested on fMRI
data. For details on the CCID methodology, please see Anastasiou et al (2022), Cross-covariance isolate detect: A new change-point method for estimating dynamic functional connectivity. Medical Image Analysis, Volume 75.
Fit linear, logistic and Cox survival regression models penalised with adaptive multi-group ridge penalties. The multi-group penalties correspond to groups of covariates defined by (multiple) co-data sources. Group hyperparameters are estimated with an empirical Bayes method of moments, penalised with an extra level of hyper shrinkage. Various types of hyper shrinkage may be used for various co-data. Co-data may be continuous or categorical. The method accommodates inclusion of unpenalised covariates, posterior selection of covariates and multiple data types. The model fit is used to predict for new samples. The name ecpc stands for Empirical Bayes, Co-data learnt, Prediction and Covariate selection. See Van Nee et al. (2020) <arXiv:2005.04010>
.
Fit response surfaces for datasets with latent-variable Gaussian process modeling, predict responses for new inputs, and plot latent variables locations in the latent space (only 1D or 2D). The input variables of the datasets can be quantitative, qualitative/categorical or mixed. The output variable of the datasets is a scalar (quantitative). The optimization of the likelihood function is done using a successive approximation/relaxation algorithm similar to another GP modeling package "GPM". The modeling method is published in "A Latent Variable Approach to Gaussian Process Modeling with Qualitative and Quantitative Factors" by Yichi Zhang, Siyu Tao, Wei Chen, and Daniel W. Apley (2018) <arXiv:1806.07504>
. The package is developed in IDEAL of Northwestern University.
Bundles functions used to analyze the harmfulness of trial errors in criminal trials. Functions in the Scientific Analysis of Trial Errors ('SATE') package help users estimate the probability that a jury will find a defendant guilty given jurors preferences for a guilty verdict and the uncertainty of that estimate. Users can also compare actual and hypothetical trial conditions to conduct harmful error analysis. The relationship between individual jurors verdict preferences and the probability that a jury returns a guilty verdict has been studied by Davis (1973) <doi:10.1037/h0033951>; MacCoun
& Kerr (1988) <doi:10.1037/0022-3514.54.1.21>, and Devine et el. (2001) <doi:10.1037/1076-8971.7.3.622>, among others.
marr (Maximum Rank Reproducibility) is a nonparametric approach that detects reproducible signals using a maximal rank statistic for high-dimensional biological data. In this R package, we implement functions that measures the reproducibility of features per sample pair and sample pairs per feature in high-dimensional biological replicate experiments. The user-friendly plot functions in this package also plot histograms of the reproducibility of features per sample pair and sample pairs per feature. Furthermore, our approach also allows the users to select optimal filtering threshold values for the identification of reproducible features and sample pairs based on output visualization checks (histograms). This package also provides the subset of data filtered by reproducible features and/or sample pairs.
The Molecular Signatures Database ('MSigDB
') is one of the most widely used and comprehensive databases of gene sets for performing gene set enrichment analysis <doi:10.1016/j.cels.2015.12.004>. The msig package provides you with powerful, easy-to-use and flexible query functions for the MsigDB
database. There are 2 query modes in the msig package: online query and local query. Both queries contain 2 steps: gene set name and gene. The online search is divided into 2 modes: registered search and non-registered browse. For registered search, email that you registered should be provided. Local queries can be made from local database, which can be updated by msig_update()
function.
This package provides two main functionalities. 1 - Given a system of simultaneous equation, it decomposes the matrix of coefficients weighting the endogenous variables into three submatrices: one includes the subset of coefficients that have a causal nature in the model, two include the subset of coefficients that have a interdependent nature in the model, either at systematic level or induced by the correlation between error terms. 2 - Given a decomposed model, it tests for the significance of the interdependent relationships acting in the system, via Maximum likelihood and Wald test, which can be built starting from the function output. For theoretical reference see Faliva (1992) <doi:10.1007/BF02589085> and Faliva and Zoia (1994) <doi:10.1007/BF02589041>.
Uncertainty propagation analysis in spatial environmental modelling following methodology described in Heuvelink et al. (2007) <doi:10.1080/13658810601063951> and Brown and Heuvelink (2007) <doi:10.1016/j.cageo.2006.06.015>. The package provides functions for examining the uncertainty propagation starting from input data and model parameters, via the environmental model onto model outputs. The functions include uncertainty model specification, stochastic simulation and propagation of uncertainty using Monte Carlo (MC) techniques. Uncertain variables are described by probability distributions. Both numerical and categorical data types are handled. Spatial auto-correlation within an attribute and cross-correlation between attributes is accommodated for. The MC realizations may be used as input to the environmental models called from R, or externally.
This package implements Weighted-Average Least Squares model averaging for negative binomial regression models of Huynh (2024) <doi:10.48550/arXiv.2404.11324>
, generalized linear models of De Luca, Magnus, Peracchi (2018) <doi:10.1016/j.jeconom.2017.12.007> and linear regression models of Magnus, Powell, Pruefer (2010) <doi:10.1016/j.jeconom.2009.07.004>, see also Magnus, De Luca (2016) <doi:10.1111/joes.12094>. Weighted-Average Least Squares for the linear regression model is based on the original MATLAB code by Magnus and De Luca <https://www.janmagnus.nl/items/WALS.pdf>, see also Kumar, Magnus (2013) <doi:10.1007/s13571-013-0060-9> and De Luca, Magnus (2011) <doi:10.1177/1536867X1201100402>.
Trains logistic regression model by discretizing continuous variables via gradient boosting approach. The proposed method tries to achieve a tradeoff between interpretation and prediction accuracy for logistic regression by discretizing the continuous variables. The variable binning is accomplished in a supervised fashion. The model trained by this package is still a single logistic regression model, but not a sequence of logistic regression models. The fitted model object returned from the model training consists of two tables. One table is used to give the boundaries of bins for each continuous variable as well as the corresponding coefficients, and the other one is used for discrete variables. This package can also be used for binning continuous variables for other statistical analysis.
Over sixty clustering algorithms are provided in this package with consistent input and output, which enables the user to try out algorithms swiftly. Additionally, 26 statistical approaches for the estimation of the number of clusters as well as the mirrored density plot (MD-plot) of clusterability are implemented. The packages is published in Thrun, M.C., Stier Q.: "Fundamental Clustering Algorithms Suite" (2021), SoftwareX
, <DOI:10.1016/j.softx.2020.100642>. Moreover, the fundamental clustering problems suite (FCPS) offers a variety of clustering challenges any algorithm should handle when facing real world data, see Thrun, M.C., Ultsch A.: "Clustering Benchmark Datasets Exploiting the Fundamental Clustering Problems" (2020), Data in Brief, <DOI:10.1016/j.dib.2020.105501>.
This package provides a tool to process and analyse data collected with wearable raw acceleration sensors as described in Migueles and colleagues (JMPB 2019), and van Hees and colleagues (JApplPhysiol
2014; PLoSONE
2015). The package has been developed and tested for binary data from GENEActiv <https://activinsights.com/>, binary (.gt3x) and .csv-export data from Actigraph <https://theactigraph.com> devices, and binary (.cwa) and .csv-export data from Axivity <https://axivity.com>. These devices are currently widely used in research on human daily physical activity. Further, the package can handle accelerometer data file from any other sensor brand providing that the data is stored in csv format. Also the package allows for external function embedding.
It contains six common multi-category classification accuracy evaluation measures. All of these measures could be found in Li and Ming (2019) <doi:10.1002/sim.8103>. Specifically, Hypervolume Under Manifold (HUM), described in Li and Fine (2008) <doi:10.1093/biostatistics/kxm050>. Correct Classification Percentage (CCP), Integrated Discrimination Improvement (IDI), Net Reclassification Improvement (NRI), R-Squared Value (RSQ), described in Li, Jiang and Fine (2013) <doi:10.1093/biostatistics/kxs047>. Polytomous Discrimination Index (PDI), described in Van Calster et al. (2012) <doi:10.1007/s10654-012-9733-3>. Li et al. (2018) <doi:10.1177/0962280217692830>. We described all these above measures and our mcca package in Li, Gao and D'Agostino (2019) <doi:10.1002/sim.8103>.
Price volatility refers to the degree of variation in series over a certain period of time. This volatility is especially noticeable in agricultural commodities, adding uncertainty for farmers, traders, and others in the agricultural supply chain. Commonly and popularly used four volatility models viz, GARCH, Glosten Jagannatan Runkle-GARCH (GJR-GARCH) model, exponentially weighted moving average (EWMA) model and Multiplicative Error Model (MEM) are selected and implemented. PWAVE, weighted ensemble model based on particle swarm optimization (PSO) is proposed to combine the forecast obtained from all the candidate models. This package has been developed using algorithm of Paul et al. <doi:10.1007/s40009-023-01218-x> and Yeasin and Paul (2024) <doi:10.1007/s11227-023-05542-3>.
Although model selection is ubiquitous in scientific discovery, the stability and uncertainty of the selected model is often hard to evaluate. How to characterize the random behavior of the model selection procedure is the key to understand and quantify the model selection uncertainty. This R package offers several graphical tools to visualize the distribution of the selected model. For example, Gplot()
, Hplot()
, VDSM_scatterplot()
and VDSM_heatmap()
. To the best of our knowledge, this is the first attempt to visualize such a distribution. About what distribution of selected model is and how it work please see Qin,Y.and Wang,L. (2021) "Visualization of Model Selection Uncertainty" <https://homepages.uc.edu/~qinyn/VDSM/VDSM.html>.
Estimation of the average treatment effect when controlling for high-dimensional confounders using debiased inverse propensity score weighting (DIPW). DIPW relies on the propensity score following a sparse logistic regression model, but the regression curves are not required to be estimable. Despite this, our package also allows the users to estimate the regression curves and take the estimated curves as input to our methods. Details of the methodology can be found in Yuhao Wang and Rajen D. Shah (2020) "Debiased Inverse Propensity Score Weighting for Estimation of Average Treatment Effects with High-Dimensional Confounders" <arXiv:2011.08661>
. The package relies on the optimisation software MOSEK <https://www.mosek.com/> which must be installed separately; see the documentation for Rmosek'.
User-friendly general package providing standard methods for meta-analysis and supporting Schwarzer, Carpenter, and Rücker <DOI:10.1007/978-3-319-21416-0>, "Meta-Analysis with R" (2015): - common effect and random effects meta-analysis; - several plots (forest, funnel, Galbraith / radial, L'Abbe, Baujat, bubble); - three-level meta-analysis model; - generalised linear mixed model; - logistic regression with penalised likelihood for rare events; - Hartung-Knapp method for random effects model; - Kenward-Roger method for random effects model; - prediction interval; - statistical tests for funnel plot asymmetry; - trim-and-fill method to evaluate bias in meta-analysis; - meta-regression; - cumulative meta-analysis and leave-one-out meta-analysis; - import data from RevMan
5'; - produce forest plot summarising several (subgroup) meta-analyses.
Computation of predictive information criteria (PIC) from select model object classes for model selection in predictive contexts. In contrast to the more widely used Akaike Information Criterion (AIC), which are derived under the assumption that target(s) of prediction (i.e. validation data) are independently and identically distributed to the fitting data, the PIC are derived under less restrictive assumptions and thus generalize AIC to the more practically relevant case of training/validation data heterogeneity. The methodology featured in this package is based on Flores (2021) <https://iro.uiowa.edu/esploro/outputs/doctoral/A-new-class-of-information-criteria/9984097169902771?institution=01IOWA_INST> "A new class of information criteria for improved prediction in the presence of training/validation data heterogeneity".
This package provides a novel meta-learning framework for forecast model selection using time series features. Many applications require a large number of time series to be forecast. Providing better forecasts for these time series is important in decision and policy making. We propose a classification framework which selects forecast models based on features calculated from the time series. We call this framework FFORMS (Feature-based FORecast Model Selection). FFORMS builds a mapping that relates the features of time series to the best forecast model using a random forest. seer package is the implementation of the FFORMS algorithm. For more details see our paper at <https://www.monash.edu/business/econometrics-and-business-statistics/research/publications/ebs/wp06-2018.pdf>.
In randomized controlled trial (RCT), balancing covariate is often one of the most important concern. CARM package provides functions to balance the covariates and generate allocation sequence by covariate-adjusted Adaptive Randomization via Mahalanobis-distance (ARM) for RCT. About what ARM is and how it works please see Y. Qin, Y. Li, W. Ma, H. Yang, and F. Hu (2022). "Adaptive randomization via Mahalanobis distance" Statistica Sinica. <doi:10.5705/ss.202020.0440>. In addition, the package is also suitable for the randomization process of multi-arm trials. For details, please see Yang H, Qin Y, Wang F, et al. (2023). "Balancing covariates in multi-arm trials via adaptive randomization" Computational Statistics & Data Analysis.<doi:10.1016/j.csda.2022.107642>.
The single largest source of dams in the United States is the National Inventory of Dams (NID) <http://nid.usace.army.mil> from the US Army Corps of Engineers. Entire data from the NID cannot be obtained all at once and NID's website limits extraction of more than a couple of thousand records at a time. Moreover, selected data from the NID's user interface cannot not be saved to a file. In order to make the analysis of this data easier, all the data from NID was extracted manually. Subsequently, the raw data was checked for potential errors and cleaned. This package provides sample cleaned data from the NID and provides functionality to access the entire cleaned NID data.
This package performs iterative proportional updating given a seed table and an arbitrary number of marginal distributions. This is commonly used in population synthesis, survey raking, matrix rebalancing, and other applications. For example, a household survey may be weighted to match the known distribution of households by size from the census. An origin/ destination trip matrix might be balanced to match traffic counts. The approach used by this package is based on a paper from Arizona State University (Ye, Xin, et. al. (2009) <http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.537.723&rep=rep1&type=pdf>). Some enhancements have been made to their work including primary and secondary target balance/importance, general marginal agreement, and weight restriction.
With the provision of several tools and templates the MOSAIC project (DFG-Grant Number HO 1937/2-1) supports the implementation of a central data management in epidemiological research projects. The MOQA package enables epidemiologists with none or low experience in R to generate basic data quality reports for a wide range of application scenarios. See <https://mosaic-greifswald.de/> for more information. Please read and cite the corresponding open access publication (using the former package-name) in METHODS OF INFORMATION IN MEDICINE by M. Bialke, H. Rau, T. Schwaneberg, R. Walk, T. Bahls and W. Hoffmann (2017) <doi:10.3414/ME16-01-0123>. <https://methods.schattauer.de/en/contents/most-recent-articles/issue/2483/issue/special/manuscript/27573/show.html>.
The multiple instance data set consists of many independent subjects (called bags) and each subject is composed of several components (called instances). The outcomes of such data set are binary or categorical responses, and, we can only observe the subject-level outcomes. For example, in manufacturing processes, a subject is labeled as "defective" if at least one of its own components is defective, and otherwise, is labeled as "non-defective". The milr package focuses on the predictive model for the multiple instance data set with binary outcomes and performs the maximum likelihood estimation with the Expectation-Maximization algorithm under the framework of logistic regression. Moreover, the LASSO penalty is attached to the likelihood function for simultaneous parameter estimation and variable selection.