This package provides a new set of tools to help with the development of detailed ecological niche models using multiple algorithms. Pre-modeling analyses and explorations can be done to prepare data. Model calibration (model selection) can be done by creating and testing models with several parameter combinations. Handy options for producing final models with transfers are included. Other tools to assess extrapolation risks and variability in model transfers are also available. Methodological and theoretical basis for the methods implemented here can be found in: Peterson et al. (2011) <https://www.degruyter.com/princetonup/view/title/506966>, Radosavljevic and Anderson (2014) <doi:10.1111/jbi.12227>, Peterson et al. (2018) <doi:10.1111/nyas.13873>, Cobos et al. (2019) <doi:10.7717/peerj.6281>, Alkishe et al. (2020) <doi:10.1016/j.pecon.2020.03.002>, Machado-Stredel et al. (2021) <doi:10.21425/F5FBG48814>, Arias-Giraldo and Cobos (2024) <doi:10.17161/bi.v18i.21742>, Cobos et al. (2024) <doi:10.17161/bi.v18i.21742>.
Estimate complex Structural Equation Models (SEMs) by fitting Partial Least Squares Structural Equation Modeling (PLS-SEM) and Partial Least Squares consistent Structural Equation Modeling (PLSc-SEM) specifications that handle categorical data, non-linear relations, and multilevel structures. The implementation follows Lohmöller (1989) for the classic PLS-SEM algorithm, Dijkstra and Henseler (2015) for consistent PLSc-SEM, Dijkstra et al., (2014) for nonlinear PLSc-SEM, and Schuberth, Henseler, Dijkstra (2018) for ordinal PLS-SEM and PLSc-SEM. Additional extensions are under development. References: Lohmöller, J.-B. (1989, ISBN:9783790803002). "Latent Variable Path Modeling with Partial Least Squares." Dijkstra, T. K., & Henseler, J. (2015). <doi:10.1016/j.jmva.2015.06.002>. "Consistent partial least squares path modeling." Dijkstra, T. K., & Schermelleh-Engel, K. (2014). <doi:10.1016/j.csda.2014.07.008>. "Consistent partial least squares for nonlinear structural equation models." Schuberth, F., Henseler, J., & Dijkstra, T. K. (2018). <doi:10.1007/s11135-018-0767-9>. "Partial least squares path modeling using ordinal categorical indicators.".
This package provides the vcd2df function, which loads a IEEE 1364-1995/2001 VCD (.vcd) file, specified as a parameter of type string containing exactly a file path, and returns an R dataframe containing values over time. A VCD file captures the register values at discrete timepoints from a simulated trace of execution of a hardware design in Verilog or VHDL. The returned dataframe contains a row for each register, by name, and a column for each time point, specified VCD-style using octothorpe-prefixed multiples of the timescale as strings. The only non-trivial implementation details are that (1) VCD x and z non-numerical values are encoded as negative value -1 (as otherwise all bit values are positive) and (2) registers with repeated names in distinct modules are ignored, rather than duplicated, as we anticipate these registers to have the same values. Read more in arXiv preprint: vcd2df -- Leveraging Data Science Insights for Hardware Security Research <doi:10.48550/arXiv.2505.06470>.
The TRONCO (TRanslational ONCOlogy) R package collects algorithms to infer progression models via the approach of Suppes-Bayes Causal Network, both from an ensemble of tumors (cross-sectional samples) and within an individual patient (multi-region or single-cell samples). The package provides parallel implementation of algorithms that process binary matrices where each row represents a tumor sample and each column a single-nucleotide or a structural variant driving the progression; a 0/1 value models the absence/presence of that alteration in the sample. The tool can import data from plain, MAF or GISTIC format files, and can fetch it from the cBioPortal for cancer genomics. Functions for data manipulation and visualization are provided, as well as functions to import/export such data to other bioinformatics tools for, e.g, clustering or detection of mutually exclusive alterations. Inferred models can be visualized and tested for their confidence via bootstrap and cross-validation. TRONCO is used for the implementation of the Pipeline for Cancer Inference (PICNIC).
The objective of these functions is to derive a species assemblage that satisfies a functional trait profile. Restoring resilient ecosystems requires a flexible framework for selecting assemblages that are based on the functional traits of species. However, current trait-based models have been limited to algorithms that can only select species by optimising specific trait values, and could not elegantly accommodate the common desire among restoration ecologists to produce functionally diverse assemblages. We have solved this problem by applying a non-linear optimisation algorithm that optimises Rao Q, a closed-form functional trait diversity index that incorporates species abundances, subject to other linear constraints. This framework generalises previous models that only optimised the entropy of the community, and can optimise both functional diversity and entropy simultaneously. This package can also be used to generate experimental assemblages to test the effects of community-level traits on community dynamics and ecosystem function. The method is based on theory discussed in Laughlin (2014, Ecology Letters) <doi:10.1111/ele.12288>.
The tools for MicroRNA Set Enrichment Analysis can identify risk pathways(or prior gene sets) regulated by microRNA set in the context of microRNA expression data. (1) This package constructs a correlation profile of microRNA and pathways by the hypergeometric statistic test. The gene sets of pathways derived from the three public databases (Kyoto Encyclopedia of Genes and Genomes ('KEGG'); Reactome'; Biocarta') and the target gene sets of microRNA are provided by four databases('TarBaseV6.0'; mir2Disease'; miRecords'; miRTarBase';). (2) This package can quantify the change of correlation between microRNA for each pathway(or prior gene set) based on a microRNA expression data with cases and controls. (3) This package uses the weighted Kolmogorov-Smirnov statistic to calculate an enrichment score (ES) of a microRNA set that co-regulate to a pathway , which reflects the degree to which a given pathway is associated with the specific phenotype. (4) This package can provide the visualization of the results.
Designed for estimating variants of hidden (latent) Markov models (HMMs), mixture HMMs, and non-homogeneous HMMs (NHMMs) for social sequence data and other categorical time series. Special cases include feedback-augmented NHMMs, Markov models without latent layer, mixture Markov models, and latent class models. The package supports models for one or multiple subjects with one or multiple parallel sequences (channels). External covariates can be added to explain cluster membership in mixture models as well as initial, transition and emission probabilities in NHMMs. The package provides functions for evaluating and comparing models, as well as functions for visualizing of multichannel sequence data and HMMs. For NHMMs, methods for computing average causal effects and marginal state and emission probabilities are available. Models are estimated using maximum likelihood via the EM algorithm or direct numerical maximization with analytical gradients. Documentation is available via several vignettes, and Helske and Helske (2019, <doi:10.18637/jss.v088.i03>). For methodology behind the NHMMs, see Helske (2025, <doi:10.48550/arXiv.2503.16014>).
Interactive visualizations of graphs created with the igraph package using a htmlwidgets wrapper for the sigma.js network visualization v2.4.0 <https://www.sigmajs.org/>, enabling to display several thousands of nodes. While several R packages have been developed to interface sigma.js', all were developed for v1.x.x and none have migrated to v2.4.0 nor are they planning to. This package builds upon the sigmaNet package, and users familiar with it will recognize the similar design approach. Two extensions have been added to the classic sigma.js visualizations by overriding the underlying JavaScript code, enabling to draw a frame around node labels, and to display labels on multiple lines by parsing line breaks. Other additional functionalities that did not require overriding sigma.js code include toggling node visibility when clicked using a node attribute and highlighting specific edges. sigma.js is currently preparing a stable release v3.0.0, and this package plans to update to it when it is available.
An integrated set of tools for thermodynamic calculations in aqueous geochemistry and geobiochemistry. Functions are provided for writing balanced reactions to form species from user-selected basis species and for calculating the standard molal properties of species and reactions, including the standard Gibbs energy and equilibrium constant. Calculations of the non-equilibrium chemical affinity and equilibrium chemical activity of species can be portrayed on diagrams as a function of temperature, pressure, or activity of basis species; in two dimensions, this gives a maximum affinity or predominance diagram. The diagrams have formatted chemical formulas and axis labels, and water stability limits can be added to Eh-pH, oxygen fugacity- temperature, and other diagrams with a redox variable. The package has been developed to handle common calculations in aqueous geochemistry, such as solubility due to complexation of metal ions, mineral buffers of redox or pH, and changing the basis species across a diagram ("mosaic diagrams"). CHNOSZ also implements a group additivity algorithm for the standard thermodynamic properties of proteins.
This package provides functions are provided for estimation, testing, diagnostic checking and forecasting of generalized linear autoregressive moving average (GLARMA) models for discrete valued time series with regression variables. These are a class of observation driven non-linear non-Gaussian state space models. The state vector consists of a linear regression component plus an observation driven component consisting of an autoregressive-moving average (ARMA) filter of past predictive residuals. Currently three distributions (Poisson, negative binomial and binomial) can be used for the response series. Three options (Pearson, score-type and unscaled) for the residuals in the observation driven component are available. Estimation is via maximum likelihood (conditional on initializing values for the ARMA process) optimized using Fisher scoring or Newton Raphson iterative methods. Likelihood ratio and Wald tests for the observation driven component allow testing for serial dependence in generalized linear model settings. Graphical diagnostics including model fits, autocorrelation functions and probability integral transform residuals are included in the package. Several standard data sets are included in the package.
Real-time quantitative polymerase chain reaction (qPCR) data sets by Karlen et al. (2007) <doi:10.1186/1471-2105-8-131>. Provides one single tabular tidy data set in long format, encompassing 32 dilution series, for seven PCR targets and four biological samples. The targeted amplicons are within the murine genes: Cav1, Ccn2, Eln, Fn1, Rpl27, Hspg2, and Serpine1, respectively. Dilution series: scheme 1 (Cav1, Eln, Hspg2, Serpine1): 1-fold, 10-fold, 50-fold, and 100-fold; scheme 2 (Ccn2, Rpl27, Fn1): 1-fold, 10-fold, 50-fold, 100-fold and 1000-fold. For each concentration there are five replicates, except for the 1000-fold concentration, where only two replicates were performed. Each amplification curve is 40 cycles long. Original raw data file is Additional file 2 from "Statistical significance of quantitative PCR" by Y. Karlen, A. McNair, S. Perseguers, C. Mazza, and N. Mermod (2007) <https://static-content.springer.com/esm/art%3A10.1186%2F1471-2105-8-131/MediaObjects/12859_2006_1503_MOESM2_ESM.ZIP>.
Flexible functions that use lme4 as computational engine for fitting models used in Genomic Selection (GS). GS is a technology used for genetic improvement, and it has many advantages over phenotype-based selection. There are several statistical models that adequately approach the statistical challenges in GS, such as in linear mixed models (LMMs). The lme4 is the standard package for fitting linear and generalized LMMs in the R-package, but its use for genetic analysis is limited because it does not allow the correlation between individuals or groups of individuals to be defined. The lme4GS package is focused on fitting LMMs with covariance structures defined by the user, bandwidth selection, and genomic prediction. The new package is focused on genomic prediction of the models used in GS and can fit LMMs using different variance-covariance matrices. Several examples of GS models are presented using this package as well as the analysis using real data. For more details see Caamal-Pat et.al. (2021) <doi:10.3389/fgene.2021.680569>.
This package provides a system for writing hierarchical statistical models largely compatible with BUGS and JAGS', writing nimbleFunctions to operate models and do basic R-style math, and compiling both models and nimbleFunctions via custom-generated C++. NIMBLE includes default methods for MCMC, Laplace Approximation, deterministic nested approximations, Monte Carlo Expectation Maximization, and some other tools. The nimbleFunction system makes it easy to do things like implement new MCMC samplers from R, customize the assignment of samplers to different parts of a model from R, and compile the new samplers automatically via C++ alongside the samplers NIMBLE provides. NIMBLE extends the BUGS'/'JAGS language by making it extensible: New distributions and functions can be added, including as calls to external compiled code. Although most people think of MCMC as the main goal of the BUGS'/'JAGS language for writing models, one can use NIMBLE for writing arbitrary other kinds of model-generic algorithms as well. A full User Manual is available at <https://r-nimble.org>.
To meet the needs of statistical power calculation for stepped wedge cluster randomized trials, we developed this software. Different parameters can be specified by users for different scenarios, including: cross-sectional and cohort designs, binary and continuous outcomes, marginal (GEE) and conditional models (mixed effects model), three link functions (identity, log, logit links), with and without time effects (the default specification assumes no-time-effect) under exchangeable, nested exchangeable and block exchangeable correlation structures. Unequal numbers of clusters per sequence are also allowed. The methods included in this package: Zhou et al. (2020) <doi:10.1093/biostatistics/kxy031>, Li et al. (2018) <doi:10.1111/biom.12918>. Supplementary documents can be found at: <https://ysph.yale.edu/cmips/research/software/study-design-power-calculation/swdpwr/>. The Shiny app for swdpwr can be accessed at: <https://jiachenchen322.shinyapps.io/swdpwr_shinyapp/>. The package also includes functions that perform calculations for the intra-cluster correlation coefficients based on the random effects variances as input variables for continuous and binary outcomes, respectively.
Cancer genomes contain large numbers of somatic alterations but few genes drive tumor development. Identifying cancer driver genes is critical for precision oncology. Most of current approaches either identify driver genes based on mutational recurrence or using estimated scores predicting the functional consequences of mutations. driveR is a tool for personalized or batch analysis of genomic data for driver gene prioritization by combining genomic information and prior biological knowledge. As features, driveR uses coding impact metaprediction scores, non-coding impact scores, somatic copy number alteration scores, hotspot gene/double-hit gene condition, phenolyzer gene scores and memberships to cancer-related KEGG pathways. It uses these features to estimate cancer-type-specific probability for each gene of being a cancer driver using the related task of a multi-task learning classification model. The method is described in detail in Ulgen E, Sezerman OU. 2021. driveR: driveR: a novel method for prioritizing cancer driver genes using somatic genomics data. BMC Bioinformatics <doi:10.1186/s12859-021-04203-7>.
This package provides the setup and calculations needed to run a likelihood-based continual reassessment method (CRM) dose finding trial and performs simulations to assess design performance under various scenarios. 3 dose finding designs are included in this package: ordinal proportional odds model (POM) CRM, ordinal continuation ratio (CR) model CRM, and the binary 2-parameter logistic model CRM. These functions allow customization of design characteristics to vary sample size, cohort sizes, target dose-limiting toxicity (DLT) rates, discrete or continuous dose levels, combining ordinal grades 0 and 1 into one category, and incorporate safety and/or stopping rules. For POM and CR model designs, ordinal toxicity grades are specified by common terminology criteria for adverse events (CTCAE) version 4.0. Function pseudodata creates the necessary starting models for these 3 designs, and function nextdose estimates the next dose to test in a cohort of patients for a target DLT rate. We also provide the function crmsimulations to assess the performance of these 3 dose finding designs under various scenarios.
Choice models are a widely used technique across numerous scientific disciplines. The Apollo package is a very flexible tool for the estimation and application of choice models in R. Users are able to write their own model functions or use a mix of already available ones. Random heterogeneity, both continuous and discrete and at the level of individuals and choices, can be incorporated for all models. There is support for both standalone models and hybrid model structures. Both classical and Bayesian estimation is available, and multiple discrete continuous models are covered in addition to discrete choice. Multi-threading processing is supported for estimation and a large number of pre and post-estimation routines, including for computing posterior (individual-level) distributions are available. For examples, a manual, and a support forum, visit <https://www.ApolloChoiceModelling.com>. For more information on choice models see Train, K. (2009) <isbn:978-0-521-74738-7> and Hess, S. & Daly, A.J. (2014) <isbn:978-1-781-00314-5> for an overview of the field.
Estimation of treatment hierarchies in network meta-analysis using a novel frequentist approach based on treatment choice criteria (TCC) and probabilistic ranking models, as described by Evrenoglou et al. (2024) <DOI:10.48550/arXiv.2406.10612>. The TCC are defined using a rule based on the smallest worthwhile difference (SWD). Using the defined TCC, the NMA estimates (i.e., treatment effects and standard errors) are first transformed into treatment preferences, indicating either a treatment preference (e.g., treatment A > treatment B) or a tie (treatment A = treatment B). These treatment preferences are then synthesized using a probabilistic ranking model, which estimates the latent ability parameter of each treatment and produces the final treatment hierarchy. This parameter represents each treatments ability to outperform all the other competing treatments in the network. Here the terms ability to outperform indicates the propensity of each treatment to yield clinically important and beneficial effects when compared to all the other treatments in the network. Consequently, larger ability estimates indicate higher positions in the ranking list.
This package provides tools for fitting and simulating mixtures of Watson distributions. The package is described in Sablica, Hornik and Leydold (2026) <doi:10.18637/jss.v115.i04>. The random sampling scheme of the package offers two sampling algorithms that are based of the results of Sablica, Hornik and Leydold (2022) <doi:10.1080/10618600.2024.2416521>. What is more, the package offers a smart tool to combine these two methods, and based on the selected parameters, it approximates the relative sampling speed for both methods and picks the faster one. In addition, the package offers a fitting function for the mixtures of Watson distribution, that uses the expectation-maximization (EM) algorithm. Special features are the possibility to use multiple variants of the E-step and M-step, sparse matrices for the data representation and state of the art methods for numerical evaluation of needed special functions using the results of Sablica and Hornik (2022) <doi:10.1090/mcom/3690> and Sablica and Hornik (2024) <doi:10.1016/j.jmaa.2024.128262>.
Population-averaged models have been increasingly used in the design and analysis of cluster randomized trials (CRTs). To facilitate the applications of population-averaged models in CRTs, the package implements the generalized estimating equations (GEE) and matrix-adjusted estimating equations (MAEE) approaches to jointly estimate the marginal mean models correlation models both for general CRTs and stepped wedge CRTs. Despite the general GEE/MAEE approach, the package also implements a fast cluster-period GEE method by Li et al. (2022) <doi:10.1093/biostatistics/kxaa056> specifically for stepped wedge CRTs with large and variable cluster-period sizes and gives a simple and efficient estimating equations approach based on the cluster-period means to estimate the intervention effects as well as correlation parameters. In addition, the package also provides functions for generating correlated binary data with specific mean vector and correlation matrix based on the multivariate probit method in Emrich and Piedmonte (1991) <doi:10.1080/00031305.1991.10475828> or the conditional linear family method in Qaqish (2003) <doi:10.1093/biomet/90.2.455>.
This package performs genetic association analyses of case-parent triad (trio) data with multiple markers. It can also incorporate complete or incomplete control triads, for instance independent control children. Estimation is based on haplotypes, for instance SNP haplotypes, even though phase is not known from the genetic data. Haplin estimates relative risk (RR + conf.int.) and p-value associated with each haplotype. It uses maximum likelihood estimation to make optimal use of data from triads with missing genotypic data, for instance if some SNPs has not been typed for some individuals. Haplin also allows estimation of effects of maternal haplotypes and parent-of-origin effects, particularly appropriate in perinatal epidemiology. Haplin allows special models, like X-inactivation, to be fitted on the X-chromosome. A GxE analysis allows testing interactions between environment and all estimated genetic effects. The models were originally described in "Gjessing HK and Lie RT. Case-parent triads: Estimating single- and double-dose effects of fetal and maternal disease gene haplotypes. Annals of Human Genetics (2006) 70, pp. 382-396".
This package provides the function qqtest which incorporates uncertainty in its qqplot display(s) so that the user might have a better sense of the evidence against the specified distributional hypothesis. qqtest draws a quantile quantile plot for visually assessing whether the data come from a test distribution that has been defined in one of many ways. The vertical axis plots the data quantiles, the horizontal those of a test distribution. The default behaviour generates 1000 samples from the test distribution and overlays the plot with shaded pointwise interval estimates for the ordered quantiles from the test distribution. A small number of independently generated exemplar quantile plots can also be overlaid. Both the interval estimates and the exemplars provide different comparative information to assess the evidence provided by the qqplot for or against the hypothesis that the data come from the test distribution (default is normal or gaussian). Finally, a visual test of significance (a lineup plot) can also be displayed to test the null hypothesis that the data come from the test distribution.
This package provides a leadership-inference framework for multivariate time series. The framework for multiple-faction-leadership inference from coordinated activities or mFLICA uses a notion of a leader as an individual who initiates collective patterns that everyone in a group follows. Given a set of time series of individual activities, our goal is to identify periods of coordinated activity, find factions of coordination if more than one exist, as well as identify leaders of each faction. For each time step, the framework infers following relations between individual time series, then identifying a leader of each faction whom many individuals follow but it follows no one. A faction is defined as a group of individuals that everyone follows the same leader. mFLICA reports following relations, leaders of factions, and members of each faction for each time step. Please see Chainarong Amornbunchornvej and Tanya Berger-Wolf (2018) <doi:10.1137/1.9781611975321.62> for methodology and Chainarong Amornbunchornvej (2021) <doi:10.1016/j.softx.2021.100781> for software when referring to this package in publications.
Relative transcript abundance has proven to be a valuable tool for understanding the function of genes in biological systems. For the differential analysis of transcript abundance using RNA sequencing data, the negative binomial model is by far the most frequently adopted. However, common methods that are based on a negative binomial model are not robust to extreme outliers, which we found to be abundant in public datasets. So far, no rigorous and probabilistic methods for detection of outliers have been developed for RNA sequencing data, leaving the identification mostly to visual inspection. Recent advances in Bayesian computation allow large-scale comparison of observed data against its theoretical distribution given in a statistical model. Here we propose ppcseq, a key quality-control tool for identifying transcripts that include outlier data points in differential expression analysis, which do not follow a negative binomial distribution. Applying ppcseq to analyse several publicly available datasets using popular tools, we show that from 3 to 10 percent of differentially abundant transcripts across algorithms and datasets had statistics inflated by the presence of outliers.