Generalized Additive Model for Location, Scale and Shape (GAMLSS) with zero inflated beta (BEZI) family for analysis of microbiome relative abundance data (with various options for data transformation/normalization to address compositional effects) and random effects meta-analysis models for meta-analysis pooling estimates across microbiome studies are implemented. Random Forest model to predict microbiome age based on relative abundances of shared bacterial genera with the Bangladesh data (Subramanian et al 2014), comparison of multiple diversity indexes using linear/linear mixed effect models and some data display/visualization are also implemented. The reference paper is published by Ho NT, Li F, Wang S, Kuhn L (2019) <doi:10.1186/s12859-019-2744-2> .
This package provides a fast implementation with additional experimental features for testing, monitoring and dating structural changes in (linear) regression models. strucchangeRcpp
features tests/methods from the generalized fluctuation test framework as well as from the F test (Chow test) framework. This includes methods to fit, plot and test fluctuation processes (e.g. cumulative/moving sum, recursive/moving estimates) and F statistics, respectively. These methods are described in Zeileis et al. (2002) <doi:10.18637/jss.v007.i02>. Finally, the breakpoints in regression models with structural changes can be estimated together with confidence intervals, and their magnitude as well as the model fit can be evaluated using a variety of statistical measures.
This package defines a BigMatrix
ReferenceClass
which adds safety and convenience features to the filebacked.big.matrix
class from the bigmemory
package. BigMatrix
protects against segfaults by monitoring and gracefully restoring the connection to on-disk data and it also protects against accidental data modification with a file-system-based permissions system. Utilities are provided for using BigMatrix
-derived classes as assayData
matrices within the Biobase
package's eSet
family of classes. BigMatrix
provides some optimizations related to attaching to, and indexing into, file-backed matrices with dimnames. Additionally, the package provides a BigMatrixFactor
class, a file-backed matrix with factor properties.
Total Time on Test plot and routines for parameter estimation of any lifetime distribution implemented in R via maximum likelihood (ML) given a data set. It is implemented thinking on parametric survival analysis, but it feasible to use in parameter estimation of probability density or mass functions in any field. The main routines maxlogL
and maxlogLreg
are wrapper functions specifically developed for ML estimation. There are included optimization procedures such as nlminb and optim from base package, and DEoptim Mullen (2011) <doi: 10.18637/jss.v040.i06>. Standard errors are estimated with numDeriv
Gilbert (2011) <https://CRAN.R-project.org/package=numDeriv>
or the option Hessian = TRUE of optim function.
This package provides functions for working with primary event censored distributions and Stan implementations for use in Bayesian modeling. Primary event censored distributions are useful for modeling delayed reporting scenarios in epidemiology and other fields (Charniga et al. (2024) <doi:10.48550/arXiv.2405.08841>
). It also provides support for arbitrary delay distributions, a range of common primary distributions, and allows for truncation and secondary event censoring to be accounted for (Park et al. (2024) <doi:10.1101/2024.01.12.24301247>). A subset of common distributions also have analytical solutions implemented, allowing for faster computation. In addition, it provides multiple methods for fitting primary event censored distributions to data via optional dependencies.
multiHiCcompare
provides functions for joint normalization and difference detection in multiple Hi-C datasets. This extension of the original HiCcompare
package now allows for Hi-C experiments with more than 2 groups and multiple samples per group. multiHiCcompare
operates on processed Hi-C data in the form of sparse upper triangular matrices. It accepts four column (chromosome, region1, region2, IF) tab-separated text files storing chromatin interaction matrices. multiHiCcompare
provides cyclic loess and fast loess (fastlo) methods adapted to jointly normalizing Hi-C data. Additionally, it provides a general linear model (GLM) framework adapting the edgeR
package to detect differences in Hi-C data in a distance dependent manner.
The komacv-rg
bundle provides packages that aid in creating CVs based on the komacv
class and creating related documents, such as cover letters and cover sheets for job applications.
Concretely, the bundle consists of three packages: komacv-addons
, komacv-lco
, and komacv-multilang
. komacv-addons
is a small collection of add-ons and fixes for the komacv
class; komacv-lco
enables the use of letter
class options from scrlttr2
also in komacv
-based and other non-scrlttr2
-based documents; komacv-multilang
enables the provisioning of CVs in multiple languages and the selection of a language via Babel or Polyglossia.
The trigger strategy is a general framework for a multistage statistical design with multiple hypotheses, allowing an adaptive selection of interim analyses. The selection of interim stages can be associated with some prespecified endpoints which serve as the trigger. This selection allows us to refine the critical boundaries in hypotheses testing procedures, and potentially increase the statistical power. This package includes several trial designs using the trigger strategy. See Gou, J. (2023), "Trigger strategy in repeated tests on multiple hypotheses", Statistics in Biopharmaceutical Research, 15(1), 133-140, and Gou, J. (2022), "Sample size optimization and initial allocation of the significance levels in group sequential trials with multiple endpoints", Biometrical Journal, 64(2), 301-311.
This package provides hardware-accelerated tools for performing rerandomization and randomization testing in experimental research. Using a JAX backend, the package enables exact rerandomization inference even for large experiments with hundreds of billions of possible randomizations. Key functionalities include generating pools of acceptable rerandomizations based on covariate balance, conducting exact randomization tests, and performing pre-analysis evaluations to determine optimal rerandomization acceptance thresholds. The package supports various hardware acceleration frameworks including CPU', CUDA', and METAL', making it versatile across accelerated computing environments. This allows researchers to efficiently implement stringent rerandomization designs and conduct valid inference even with large sample sizes. The package is partly based on Jerzak and Goldstein (2023) <doi:10.48550/arXiv.2310.00861>
.
Copula based Cox proportional hazards models for survival data subject to dependent censoring. This approach does not assume that the parameter defining the copula is known. The dependency parameter is estimated with other finite model parameters by maximizing a Pseudo likelihood function. The cumulative hazard function is estimated via estimating equations derived based on martingale ideas. Available copula functions include Frank, Gumbel and Normal copulas. Only Weibull and lognormal models are allowed for the censoring model, even though any parametric model that satisfies certain identifiability conditions could be used. Implemented methods are described in the article "Copula based Cox proportional hazards models for dependent censoring" by Deresa and Van Keilegom (2024) <doi:10.1080/01621459.2022.2161387>.
git-remote-gcrypt is a Git remote helper to push and pull from repositories encrypted with GnuPG. It works with the standard Git transports, including repository hosting services like GitLab.
Remote helper programs are invoked by Git to handle network transport. This helper handles gcrypt:
URLs that access a remote repository encrypted with GPG, using our custom format.
Supported locations are local, rsync://
and sftp://
, where the repository is stored as a set of files, or instead any Git URL where gcrypt will store the same representation in a Git repository, bridged over arbitrary Git transport.
The aim is to provide confidential, authenticated Git storage and collaboration using typical untrusted file hosts or services.
The extended neighbourhood rule for the k nearest neighbour ensemble where the neighbours are determined in k steps. Starting from the first nearest observation of the test point, the algorithm identifies a single observation that is closest to the observation at the previous step. At each base learner in the ensemble, this search is extended to k steps on a random bootstrap sample with a random subset of features selected from the feature space. The final predicted class of the test point is determined by using a majority vote in the predicted classes given by all base models. Amjad Ali, Muhammad Hamraz, Naz Gul, Dost Muhammad Khan, Saeed Aldahmani, Zardad Khan (2022) <doi:10.48550/arXiv.2205.15111>
.
Non-proportional hazard (NPH) is commonly observed in immuno-oncology studies, where the survival curves of the treatment and control groups show delayed separation. To properly account for NPH, several statistical methods have been developed. One such method is Max-Combo test, which is a straightforward and flexible hypothesis testing method that can simultaneously test for constant, early, middle, and late treatment effects. However, the majority of the Max-Combo test performed in clinical studies are unstratified, ignoring the important prognostic stratification factors. To fill this gap, we have developed an R package for stratified Max-Combo testing that accounts for stratified baseline factors. Our package explores various methods for calculating combined test statistics, estimating joint distributions, and determining the p-values.
This package provides a nomogram, which can be carried out in rms package, provides a graphical explanation of a prediction process. However, it is not very easy to draw straight lines, read points and probabilities accurately. Even, it is hard for users to calculate total points and probabilities for all subjects. This package provides formula_rd()
and formula_lp()
functions to fit the formula of total points with raw data and linear predictors respectively by polynomial regression. Function points_cal()
will help you calculate the total points. prob_cal()
can be used to calculate the probabilities after lrm()
, cph()
or psm()
regression. For more complex condition, interaction or restricted cubic spine, TotalPoints.rms()
can be used.
This package implements structural estimators to correct for the sample selection bias from observed outcomes in matching markets. This includes one-sided matching of agents into groups (Klein, 2015) <https://www.econ.cam.ac.uk/research-files/repec/cam/pdf/cwpe1521.pdf> as well as two-sided matching of students to schools (Aue et al., 2020) <https://ftp.zew.de/pub/zew-docs/dp/dp20032.pdf>. The package also contains algorithms to find stable matchings in the three most common matching problems: the stable roommates problem (Irving, 1985) <doi:10.1016/0196-6774(85)90033-1>, the college admissions problem (Gale and Shapley, 1962) <doi:10.2307/2312726>, and the house allocation problem (Shapley and Scarf, 1974) <doi:10.1016/0304-4068(74)90033-0>.
This package implements two iterative techniques called T3Clus and 3Fkmeans, aimed at simultaneously clustering objects and a factorial dimensionality reduction of variables and occasions on three-mode datasets developed by Vichi et al. (2007) <doi:10.1007/s00357-007-0006-x>. Also, we provide a convex combination of these two simultaneous procedures called CT3Clus and based on a hyperparameter alpha (alpha in [0,1], with 3FKMeans for alpha=0 and T3Clus for alpha= 1) also developed by Vichi et al. (2007) <doi:10.1007/s00357-007-0006-x>. Furthermore, we implemented the traditional tandem procedures of T3Clus (TWCFTA) and 3FKMeans (TWFCTA) for sequential clustering-factorial decomposition (TWCFTA), and vice-versa (TWFCTA) proposed by P. Arabie and L. Hubert (1996) <doi:10.1007/978-3-642-79999-0_1>.
Allows the user to estimate transition probabilities for migratory animals between any two phases of the annual cycle, using a variety of different data types. Also quantifies the strength of migratory connectivity (MC), a standardized metric to quantify the extent to which populations co-occur between two phases of the annual cycle. Includes functions to estimate MC and the more traditional metric of migratory connectivity strength (Mantel correlation) incorporating uncertainty from multiple sources of sampling error. For cross-species comparisons, methods are provided to estimate differences in migratory connectivity strength, incorporating uncertainty. See Cohen et al. (2018) <doi:10.1111/2041-210X.12916>, Cohen et al. (2019) <doi:10.1111/ecog.03974>, and Roberts et al. (2023) <doi:10.1002/eap.2788> for details on some of these methods.
Facilitates some of the analyses performed in studies of behavioral economic discounting. The package supports scoring of the 27-Item Monetary Choice Questionnaire (see Kaplan et al., 2016; <doi:10.1007/s40614-016-0070-9>), calculating k values (Mazur's simple hyperbolic and exponential) using nonlinear regression, calculating various Area Under the Curve (AUC) measures, plotting regression curves for both fit-to-group and two-stage approaches, checking for unsystematic discounting (Johnson & Bickel, 2008; <doi:10.1037/1064-1297.16.3.264>) and scoring of the minute discounting task (see Koffarnus & Bickel, 2014; <doi:10.1037/a0035973>) using the Qualtrics 5-trial discounting template (see the Qualtrics Minute Discounting User Guide; <doi:10.13140/RG.2.2.26495.79527>), which is also available as a .qsf file in this package.
Supporting the quantitative analysis of binary welfare based decision making processes using Monte Carlo simulations. Decision support is given on two levels: (i) The actual decision level is to choose between two alternatives under probabilistic uncertainty. This package calculates the optimal decision based on maximizing expected welfare. (ii) The meta decision level is to allocate resources to reduce the uncertainty in the underlying decision problem, i.e to increase the current information to improve the actual decision making process. This problem is dealt with using the Value of Information Analysis. The Expected Value of Information for arbitrary prospective estimates can be calculated as well as Individual Expected Value of Perfect Information. The probabilistic calculations are done via Monte Carlo simulations. This Monte Carlo functionality can be used on its own.
This package provides a function for distribution free control chart based on the change point model, for multivariate statistical process control. The main constituent of the chart is the energy test that focuses on the discrepancy between empirical characteristic functions of two random vectors. This new control chart highlights in three aspects. Firstly, it is distribution free, requiring no knowledge of the random processes. Secondly, this control chart can monitor mean and variance simultaneously. Thirdly it is devised for multivariate time series which is more practical in real data application. Fourthly, it is designed for online detection (Phase II), which is central for real time surveillance of stream data. For more information please refer to O. Okhrin and Y.F. Xu (2017) <https://github.com/YafeiXu/working_paper/raw/master/CPM102.pdf>
.
Package graphicx
provides a useful keyword viewport which allows to show just a part of an image. However, one needs to put there the actual coordinates of the viewport window. Sometimes it is useful to have relative coordinates as fractions of natural size. For example, one may want to print a large image on a spread, putting a half on a verso page, and another half on the next recto page. For this one would need a viewport occupying exactly one half of the file's bounding box, whatever the actual width of the image may be. This package adds a new keyword rviewport
to the graphicx
package specifying relative viewport for graphics inclusion: a window defined by the given fractions of the natural width and height of the image.
Implementation of cross-validation method for testing the forecasting accuracy of several multi-population mortality models. The family of multi-population includes several multi-population mortality models proposed through the actuarial and demography literature. The package includes functions for fitting and forecast the mortality rates of several populations. Additionally, we include functions for testing the forecasting accuracy of different multi-population models. References. Atance, D., Debon, A., and Navarro, E. (2020) <doi:10.3390/math8091550>. Bergmeir, C. & Benitez, J.M. (2012) <doi:10.1016/j.ins.2011.12.028>. Debon, A., Montes, F., & Martinez-Ruiz, F. (2011) <doi:10.1007/s13385-011-0043-z>. Lee, R.D. & Carter, L.R. (1992) <doi:10.1080/01621459.1992.10475265>. Russolillo, M., Giordano, G., & Haberman, S. (2011) <doi:10.1080/03461231003611933>. Santolino, M. (2023) <doi:10.3390/risks11100170>.
The BloodGen3Module
package provides functions for R user performing module repertoire analyses and generating fingerprint representations. Functions can perform group comparison or individual sample analysis and visualization by fingerprint grid plot or fingerprint heatmap. Module repertoire analyses typically involve determining the percentage of the constitutive genes for each module that are significantly increased or decreased. As we describe in details;https://www.biorxiv.org/content/10.1101/525709v2 and https://pubmed.ncbi.nlm.nih.gov/33624743/, the results of module repertoire analyses can be represented in a fingerprint format, where red and blue spots indicate increases or decreases in module activity. These spots are subsequently represented either on a grid, with each position being assigned to a given module, or in a heatmap where the samples are arranged in columns and the modules in rows.
Surrounds the usual sample variance of a univariate numeric sample with a confidence interval for the population variance. This has been done so far only under the assumption that the underlying distribution is normal. Under the hood, this package implements the unique least-variance unbiased estimator of the variance of the sample variance, in a formula that is equivalent to estimating kurtosis and square of the population variance in an unbiased way and combining them according to the classical formula into an estimator of the variance of the sample variance. Both the sample variance and the estimator of its variance are U-statistics. By the theory of U-statistic, the resulting estimator is unique. See Fuchs, Krautenbacher (2016) <doi:10.1080/15598608.2016.1158675> and the references therein for an overview of unbiased estimation of variances of U-statistics.