Metapackage for implementing a variety of event-based models, with a focus on spatially explicit models. These include raster-based, event-based, and agent-based models. The core simulation components (provided by SpaDES.core') are built upon a discrete event simulation (DES; see Matloff (2011) ch 7.8.3 <https://nostarch.com/artofr.htm>) framework that facilitates modularity, and easily enables the user to include additional functionality by running user-built simulation modules (see also SpaDES.tools'). Included are numerous tools to visualize rasters and other maps (via quickPlot'), and caching methods for reproducible simulations (via reproducible'). Tools for running simulation experiments are provided by SpaDES.experiment'. Additional functionality is provided by the SpaDES.addins and SpaDES.shiny packages.
Likelihood-based estimation of mixed-effects transformation models using the Template Model Builder ('TMB', Kristensen et al., 2016) <doi:10.18637/jss.v070.i05>. The technical details of transformation models are given in Hothorn et al. (2018) <doi:10.1111/sjos.12291>. Likelihood contributions of exact, randomly censored (left, right, interval) and truncated observations are supported. The random effects are assumed to be normally distributed on the scale of the transformation function, the marginal likelihood is evaluated using the Laplace approximation, and the gradients are calculated with automatic differentiation (Tamasi & Hothorn, 2021) <doi:10.32614/RJ-2021-075>. Penalized smooth shift terms can be defined using the mgcv notation. Additive mixed-effects transformation models are described in Tamasi (2025) <doi:10.18637/jss.v114.i11>.
This package provides a collection of randomization tests, data sets and examples. The current version focuses on five testing problems and their implementation in empirical work. First, it facilitates the empirical researcher to test for particular hypotheses, such as comparisons of means, medians, and variances from k populations using robust permutation tests, which asymptotic validity holds under very weak assumptions, while retaining the exact rejection probability in finite samples when the underlying distributions are identical. Second, the description and implementation of a permutation test for testing the continuity assumption of the baseline covariates in the sharp regression discontinuity design (RDD) as in Canay and Kamat (2018) <https://goo.gl/UZFqt7>. More specifically, it allows the user to select a set of covariates and test the aforementioned hypothesis using a permutation test based on the Cramer-von Misses test statistic. Graphical inspection of the empirical CDF and histograms for the variables of interest is also supported in the package. Third, it provides the practitioner with an effortless implementation of a permutation test based on the martingale decomposition of the empirical process for testing for heterogeneous treatment effects in the presence of an estimated nuisance parameter as in Chung and Olivares (2021) <doi:10.1016/j.jeconom.2020.09.015>. Fourth, this version considers the two-sample goodness-of-fit testing problem under covariate adaptive randomization and implements a permutation test based on a prepivoted Kolmogorov-Smirnov test statistic. Lastly, it implements an asymptotically valid permutation test based on the quantile process for the hypothesis of constant quantile treatment effects in the presence of an estimated nuisance parameter.
Chromatin looping is an essential feature of eukaryotic genomes and can bring regulatory sequences, such as enhancers or transcription factor binding sites, in the close physical proximity of regulated target genes. Here, we provide sevenC, an R package that uses protein binding signals from ChIP-seq and sequence motif information to predict chromatin looping events. Cross-linking of proteins that bind close to loop anchors result in ChIP-seq signals at both anchor loci. These signals are used at CTCF motif pairs together with their distance and orientation to each other to predict whether they interact or not. The resulting chromatin loops might be used to associate enhancers or transcription factor binding sites (e.g., ChIP-seq peaks) to regulated target genes.
Estimates the probability matrix for the RÃ C Ecological Inference problem using the Expectation-Maximization Algorithm with four approximation methods for the E-Step, and an exact method as well. It also provides a bootstrap function to estimate the standard deviation of the estimated probabilities. In addition, it has functions that aggregate rows optimally to have more reliable estimates in cases of having few data points. For comparing the probability estimates of two groups, a Wald test routine is implemented. The library has data from the first round of the Chilean Presidential Election 2021 and can also generate synthetic election data. Methods described in Thraves, Charles; Ubilla, Pablo; Hermosilla, Daniel (2024) A Fast Ecological Inference Algorithm for the RÃ C case <doi:10.2139/ssrn.4832834>.
The gRbase package provides graphical modelling features used by e.g. the packages gRain', gRim and gRc'. gRbase implements graph algorithms including (i) maximum cardinality search (for marked and unmarked graphs). (ii) moralization, (iii) triangulation, (iv) creation of junction tree. gRbase facilitates array operations, gRbase implements functions for testing for conditional independence. gRbase illustrates how hierarchical log-linear models may be implemented and describes concept of graphical meta data. The facilities of the package are documented in the book by Højsgaard, Edwards and Lauritzen (2012, <doi:10.1007/978-1-4614-2299-0>) and in the paper by Dethlefsen and Højsgaard, (2005, <doi:10.18637/jss.v014.i17>). Please see citation("gRbase") for citation details.
This package provides a complete framework for frequency analysis is provided by LMoFit'. It has functions related to the determination of sample L-moments as in Hosking, J.R.M. (1990) <doi:10.1111/j.2517-6161.1990.tb01775.x>, the fitting of various distributions as in Zaghloul et al. (2020) <doi:10.1016/j.advwatres.2020.103720> and Hosking, J.R.M. (2019) <https://CRAN.R-project.org/package=lmom>, besides plotting and manipulating L-space diagrams as in Papalexiou, S.M. & Koutsoyiannis, D. (2016) <doi:10.1016/j.advwatres.2016.05.005> for two-shape parametric distributions on the L-moment ratio diagram. Additionally, the quantile, probability density, and cumulative probability functions of various distributions are provided in a user-friendly manner.
Statistical decision in proteomics data using a hierarchical Bayesian model. There are two regression models for describing the mean-variance trend, a gamma regression or a latent gamma mixture regression. The regression model is then used as an Empirical Bayes estimator for the prior on the variance in a peptide. Further, it assumes that each measurement has an uncertainty (increased variance) associated with it that is also inferred. Finally, it tries to estimate the posterior distribution (by Hamiltonian Monte Carlo) for the differences in means for each peptide in the data. Once the posterior is inferred, it integrates the tails to estimate the probability of error from which a statistical decision can be made. See Berg and Popescu for details (<doi:10.1101/2023.05.11.540411>).
Fits a geographically weighted regression model with different scales for each covariate. Uses the negative binomial distribution as default, but also accepts the normal, Poisson, or logistic distributions. Can fit the global versions of each regression and also the geographically weighted alternatives with only one scale, since they are all particular cases of the multiscale approach. Hanchen Yu (2024). "Exploring Multiscale Geographically Weighted Negative Binomial Regression", Annals of the American Association of Geographers <doi:10.1080/24694452.2023.2289986>. Fotheringham AS, Yang W, Kang W (2017). "Multiscale Geographically Weighted Regression (MGWR)", Annals of the American Association of Geographers <doi:10.1080/24694452.2017.1352480>. Da Silva AR, Rodrigues TCV (2014). "Geographically Weighted Negative Binomial Regression - incorporating overdispersion", Statistics and Computing <doi:10.1007/s11222-013-9401-9>.
Structural multivariate-univariate linear mixed model solver for estimation of multiple random effects with unknown variance-covariance structures (e.g., heterogeneous and unstructured) and known covariance among levels of random effects (e.g., pedigree and genomic relationship matrices) (Covarrubias-Pazaran, 2016 <doi:10.1371/journal.pone.0156744>; Maier et al., 2015 <doi:10.1016/j.ajhg.2014.12.006>; Jensen et al., 1997). REML estimates can be obtained using the Direct-Inversion Newton-Raphson and Direct-Inversion Average Information algorithms for the problems r x r (r being the number of records) or using the Henderson-based average information algorithm for the problem c x c (c being the number of coefficients to estimate). Spatial models can also be fitted using the two-dimensional spline functionality available.
Analysis of multivariate environmental high frequency data by Self-Organizing Map and k-means clustering algorithms. By means of the graphical user interface it provides a comfortable way to elaborate by self-organizing map algorithm rather big datasets (txt files up to 100 MB ) obtained by environmental high-frequency monitoring by sensors/instruments. The functions present in the package are based on kohonen and openair packages implemented by functions embedding Vesanto et al. (2001) <http://www.cis.hut.fi/projects/somtoolbox/package/papers/techrep.pdf> heuristic rules for map initialization parameters, k-means clustering algorithm and map features visualization. Cluster profiles visualization as well as graphs dedicated to the visualization of time-dependent variables Licen et al. (2020) <doi:10.4209/aaqr.2019.08.0414> are provided.
This package implements methods for variable selection in linear regression based on the "Sum of Single Effects" (SuSiE) model, as described in Wang et al (2020) <DOI:10.1101/501114> and Zou et al (2021) <DOI:10.1101/2021.11.03.467167>. These methods provide simple summaries, called "Credible Sets", for accurately quantifying uncertainty in which variables should be selected. The methods are motivated by genetic fine-mapping applications, and are particularly well-suited to settings where variables are highly correlated and detectable effects are sparse. The fitting algorithm, a Bayesian analogue of stepwise selection methods called "Iterative Bayesian Stepwise Selection" (IBSS), is simple and fast, allowing the SuSiE model be fit to large data sets (thousands of samples and hundreds of thousands of variables).
This package creates the optimal (D, U and I) designs for the accelerated life testing with right censoring or interval censoring. It uses generalized linear model (GLM) approach to derive the asymptotic variance-covariance matrix of regression coefficients. The failure time distribution is assumed to follow Weibull distribution with a known shape parameter and log-linear link functions are used to model the relationship between failure time parameters and stress variables. The acceleration model may have multiple stress factors, although most ALTs involve only two or less stress factors. ALTopt package also provides several plotting functions including contour plot, Fraction of Use Space (FUS) plot and Variance Dispersion graphs of Use Space (VDUS) plot. For more details, see Seo and Pan (2015) <doi:10.32614/RJ-2015-029>.
This package provides a statistical framework and computational procedure for identifying the sub-populations within a tumor, determining the mutation profiles of each subpopulation, and inferring the tumor's phylogenetic history. The input are variant allele frequencies (VAFs) of somatic single nucleotide alterations (SNAs) along with allele-specific coverage ratios between the tumor and matched normal sample for somatic copy number alterations (CNAs). These quantities can be directly taken from the output of existing software. Canopy provides a general mathematical framework for pooling data across samples and sites to infer the underlying parameters. For SNAs that fall within CNA regions, Canopy infers their temporal ordering and resolves their phase. When there are multiple evolutionary configurations consistent with the data, Canopy outputs all configurations along with their confidence assessment.
In statistical modeling, there is a wide variety of regression models for categorical dependent variables (nominal or ordinal data); yet, there is no software embracing all these models together in a uniform and generalized format. Following the methodology proposed by Peyhardi, Trottier, and Guédon (2015) <doi:10.1093/biomet/asv042>, we introduce GLMcat', an R package to estimate generalized linear models implemented under the unified specification (r, F, Z). Where r represents the ratio of probabilities (reference, cumulative, adjacent, or sequential), F the cumulative cdf function for the linkage, and Z, the design matrix. The package accompanies the paper "GLMcat: An R Package for Generalized Linear Models for Categorical Responses" in the Journal of Statistical Software, Volume 114, Issue 9 (see <doi:10.18637/jss.v114.i09>).
Integration of disparate datasets is needed in order to make efficient use of all available data and thereby address the issues currently threatening biodiversity. Data integration is a powerful modeling framework which allows us to combine these datasets together into a single model, yet retain the strengths of each individual dataset. We therefore introduce the package, intSDM': an R package designed to help ecologists develop a reproducible workflow of integrated species distribution models, using data both provided from the user as well as data obtained freely online. An introduction to data integration methods is discussed in Issac, Jarzyna, Keil, Dambly, Boersch-Supan, Browning, Freeman, Golding, Guillera-Arroita, Henrys, Jarvis, Lahoz-Monfort, Pagel, Pescott, Schmucki, Simmonds and Oâ Hara (2020) <doi:10.1016/j.tree.2019.08.006>.
Analysis of repeated measurements and time-to-event data via random effects joint models. Fits the joint models proposed by Henderson and colleagues <doi:10.1093/biostatistics/1.4.465> (single event time) and by Williamson and colleagues (2008) <doi:10.1002/sim.3451> (competing risks events time) to a single continuous repeated measure. The time-to-event data is modelled using a (cause-specific) Cox proportional hazards regression model with time-varying covariates. The longitudinal outcome is modelled using a linear mixed effects model. The association is captured by a latent Gaussian process. The model is estimated using am Expectation Maximization algorithm. Some plotting functions and the variogram are also included. This project is funded by the Medical Research Council (Grant numbers G0400615 and MR/M013227/1).
This package provides a toolkit for simulation studies concerning time-to-event endpoints with non-proportional hazards. SimNPH encompasses functions for simulating time-to-event data in various scenarios, simulating different trial designs like fixed-followup, event-driven, and group sequential designs. The package provides functions to calculate the true values of common summary statistics for the implemented scenarios and offers common analysis methods for time-to-event data. Helper functions for running simulations with the SimDesign package and for aggregating and presenting the results are also included. Results of the conducted simulation study are available in the paper: "A Comparison of Statistical Methods for Time-To-Event Analyses in Randomized Controlled Trials Under Non-Proportional Hazards", Klinglmüller et al. (2025) <doi:10.1002/sim.70019>.
EpiMix is a comprehensive tool for the integrative analysis of high-throughput DNA methylation data and gene expression data. EpiMix enables automated data downloading (from TCGA or GEO), preprocessing, methylation modeling, interactive visualization and functional annotation.To identify hypo- or hypermethylated CpG sites across physiological or pathological conditions, EpiMix uses a beta mixture modeling to identify the methylation states of each CpG probe and compares the methylation of the experimental group to the control group.The output from EpiMix is the functional DNA methylation that is predictive of gene expression. EpiMix incorporates specialized algorithms to identify functional DNA methylation at various genetic elements, including proximal cis-regulatory elements of protein-coding genes, distal enhancers, and genes encoding microRNAs and lncRNAs.
This is the human disease ontology R package HDO.db, which provides the semantic relationship between human diseases. Relying on the DOSE and GOSemSim packages, this package can carry out disease enrichment and semantic similarity analyses. Many biological studies are achieved through mouse models, and a large number of data indicate the association between genotypes and phenotypes or diseases. The study of model organisms can be transformed into useful knowledge about normal human biology and disease to facilitate treatment and early screening for diseases. Organism-specific genotype-phenotypic associations can be applied to cross-species phenotypic studies to clarify previously unknown phenotypic connections in other species. Using the same principle to diseases can identify genetic associations and even help to identify disease associations that are not obvious.
Large panel data sets are often subject to common trends. However, it can be difficult to determine the exact number of these common factors and analyse their properties. The package implements the Barigozzi and Trapani (2022) <doi:10.1080/07350015.2021.1901719> test, which not only provides an efficient way of estimating the number of common factors in large nonstationary panel data sets, but also gives further insights on factor classes. The routine identifies the existence of (i) a factor subject to a linear trend, (ii) the number of zero-mean I(1) and (iii) zero-mean I(0) factors. Furthermore, the package includes the Integrated Panel Criteria by Bai (2004) <doi:10.1016/j.jeconom.2003.10.022> that provide a complementary measure for the number of factors.
This package performs copy number variants association analysis with Lasso and Weighted Fusion penalized regression. Creates a "CNV profile curve" to represent an individualâ s CNV events across a genomic region so to capture variations in CNV length and dosage. When evaluating association, the CNV profile curve is directly used as a predictor in the regression model, avoiding the need to predefine CNV loci. CNV profile regression estimates CNV effects at each genome position, making the results comparable across different studies. The penalization encourages sparsity in variable selection with a Lasso penalty and encourages effect smoothness between consecutive CNV events with a weighted fusion penalty, where the weight controls the level of smoothing between adjacent CNVs. For more details, see Si (2024) <doi:10.1101/2024.11.23.624994>.
Package for analysis of simple experimental designs (CRD, RBD and LSD), experiments in double factorial schemes (in CRD and RBD), experiments in a split plot in time schemes (in CRD and RBD), experiments in double factorial schemes with an additional treatment (in CRD and RBD), experiments in triple factorial scheme (in CRD and RBD) and experiments in triple factorial schemes with an additional treatment (in CRD and RBD), performing the analysis of variance and means comparison by fitting regression models until the third power (quantitative treatments) or by a multiple comparison test, Tukey test, test of Student-Newman-Keuls (SNK), Scott-Knott, Duncan test, t test (LSD) and Bonferroni t test (protected LSD) - for qualitative treatments; residual analysis (Ferreira, Cavalcanti and Nogueira, 2014) <doi:10.4236/am.2014.519280>.
It is used to travel graphs, by using DFS and BFS to get the path from node to each leaf node. Depth first traversal(DFS) is a recursive algorithm for searching all the vertices of a graph or tree data structure. Traversal means visiting all the nodes of a graph. Breadth first traversal(BFS) algorithm is used to search a tree or graph data structure for a node that meets a set of criteria. It starts at the treeâ s root or graph and searches/visits all nodes at the current depth level before moving on to the nodes at the next depth level. Also, it provides the matrix which is reachable between each node. Implement reference about Baruch Awerbuch (1985) <doi:10.1016/0020-0190(85)90083-3>.