Implementation of Sparse-group SLOPE (SGS) (Feser and Evangelou (2023) <doi:10.48550/arXiv.2305.09467>) models. Linear and logistic regression models are supported, both of which can be fit using k-fold cross-validation. Dense and sparse input matrices are supported. In addition, a general Adaptive Three Operator Splitting (ATOS) (Pedregosa and Gidel (2018) <doi:10.48550/arXiv.1804.02339>) implementation is provided. Group SLOPE (gSLOPE) (Brzyski et al. (2019) <doi:10.1080/01621459.2017.1411269>) and group-based OSCAR models (Feser and Evangelou (2024) <doi:10.48550/arXiv.2405.15357>) are also implemented. All models are available with strong screening rules (Feser and Evangelou (2024) <doi:10.48550/arXiv.2405.15357>) for computational speed-up.
Recent developments in modern coexistence theory have advanced our understanding on how species are able to persist and co-occur with other species at varying abundances. However, applying this mathematical framework to empirical data is still challenging, precluding a larger adoption of the theoretical tools developed by empiricists. This package provides a complete toolbox for modelling interaction effects between species, and calculate fitness and niche differences. The functions are flexible, may accept covariates, and different fitting algorithms can be used. A full description of the underlying methods is available in Garcà a-Callejas, D., Godoy, O., and Bartomeus, I. (2020) <doi:10.1111/2041-210X.13443>. Furthermore, the package provides a series of functions to calculate dynamics for stage-structured populations across sites.
This package contains utilities for the analysis of post-translational modifications (PTMs) in proteins, with particular emphasis on the sulfoxidation of methionine residues. Features include the ability to download, filter and analyze data from the sulfoxidation database MetOSite'. Utilities to search and characterize S-aromatic motifs in proteins are also provided. In addition, functions to analyze sequence environments around modifiable residues in proteins can be found. For instance, ptm allows to search for amino acids either overrepresented or avoided around the modifiable residues from the proteins of interest. Functions tailored to test statistical hypothesis related to these differential sequence environments are also implemented. Further and detailed information regarding the methods in this package can be found in (Aledo (2020) <https://metositeptm.com>).
The FMT method computes posterior residual variances to be used in the denominator of a moderated t-statistic from a linear model analysis of gene expression data. It is an extension of the moderated t-statistic originally proposed by Smyth (2004) <doi:10.2202/1544-6115.1027>. LOESS local regression and empirical Bayesian method are used to estimate gene specific prior degrees of freedom and prior variance based on average gene intensity levels. The posterior residual variance in the denominator is a weighted average of prior and residual variance and the weights are prior degrees of freedom and residual variance degrees of freedom. The degrees of freedom of the moderated t-statistic is simply the sum of prior and residual variance degrees of freedom.
Main properties and regression procedures using a generalization of the Dirichlet distribution called Simplicial Generalized Beta distribution. It is a new distribution on the simplex (i.e. on the space of compositions or positive vectors with sum of components equal to 1). The Dirichlet distribution can be constructed from a random vector of independent Gamma variables divided by their sum. The SGB follows the same construction with generalized Gamma instead of Gamma variables. The Dirichlet exponents are supplemented by an overall shape parameter and a vector of scales. The scale vector is itself a composition and can be modeled with auxiliary variables through a log-ratio transformation. Graf, M. (2017, ISBN: 978-84-947240-0-8). See also the vignette enclosed in the package.
This package implements methods for selecting the number of factors in Poisson factor models, with a primary focus on Thinning Cross-Validation (TCV). The TCV method is based on the data thinning technique, which probabilistically partitions each count observation into training and test sets while preserving the underlying factor structure. The Poisson factor model is then fit on the training set, and model selection is performed by comparing predictive performance on the test set. This toolkit is designed for researchers working with high-dimensional count data in fields such as genomics, text mining, and social sciences. The data thinning methodology is detailed in Dharamshi et al. (2025) <doi:10.1080/01621459.2024.2353948> and Wang et al. (2025) <doi:10.1080/01621459.2025.2546577>.
This package provides a series of functions for performing differential expression analysis from RNA-seq count data using robust normalization strategy (called DEGES). The basic idea of DEGES is that potential differentially expressed genes or transcripts (DEGs) among compared samples should be removed before data normalization to obtain a well-ranked gene list where true DEGs are top-ranked and non-DEGs are bottom ranked. This can be done by performing a multi-step normalization strategy (called DEGES for DEG elimination strategy). A major characteristic of TCC is to provide the robust normalization methods for several kinds of count data (two-group with or without replicates, multi-group/multi-factor, and so on) by virtue of the use of combinations of functions in depended packages.
The blocked weighted bootstrap (BBW) is an estimation technique for use with data from two-stage cluster sampled surveys in which either prior weighting (e.g. population-proportional sampling or PPS as used in Standardized Monitoring and Assessment of Relief and Transitions or SMART surveys) or posterior weighting (e.g. as used in rapid assessment method or RAM and simple spatial sampling method or S3M surveys) is implemented. See Cameron et al (2008) <doi:10.1162/rest.90.3.414> for application of bootstrap to cluster samples. See Aaron et al (2016) <doi:10.1371/journal.pone.0163176> and Aaron et al (2016) <doi:10.1371/journal.pone.0162462> for application of the blocked weighted bootstrap to estimate indicators from two-stage cluster sampled surveys.
This package contains several tools to treat imaging flow cytometry data from ImageStream® and FlowSight® cytometers ('Amnis® Cytek®'). Provides an easy and simple way to read and write .fcs, .rif, .cif and .daf files. Information such as masks, features, regions and populations set within these files can be retrieved for each single cell. In addition, raw data such as images stored can also be accessed. Users, may hopefully increase their productivity thanks to dedicated functions to extract, visualize, manipulate and export IFC data. Toy data example can be installed through the IFCdata package of approximately 32 MB, which is available in a drat repository <https://gitdemont.github.io/IFCdata/>. See file COPYRIGHTS and file AUTHORS for a list of copyright holders and authors.
This is a non-parametric method for joint adaptive mean-variance regularization and variance stabilization of high-dimensional data. It is suited for handling difficult problems posed by high-dimensional multivariate datasets (p >> n paradigm). Among those are that the variance is often a function of the mean, variable-specific estimators of variances are not reliable, and tests statistics have low powers due to a lack of degrees of freedom. Key features include: (i) Normalization and/or variance stabilization of the data, (ii) Computation of mean-variance-regularized t-statistics (F-statistics to follow), (iii) Generation of diverse diagnostic plots, (iv) Computationally efficient implementation using C/C++ interfacing and an option for parallel computing to enjoy a faster and easier experience in the R environment.
The function plotLRT() draws pairwise graphical model checks for the Rasch Model (RM; Rasch, 1960), the Partial Credit Model(PCM; Masters, 1982), and the Rating Scale Model (RSM; Andrich, 1978) using the output object of eRm::LRtest(). The function cLRT() provides a conditional Likelihood Ratio Test (Andersen, 1973), using the routines of psychotools'. Users may choose to plot the threshold parameters, the cumulative thresholds, the average thresholds per item, or the person parameters. Extended coloring options allow for automated item-wise or threshold-wise coloring. For multi-group splits, all pairwise group comparisons are drawn automatically. For more details see Andersen (1973) <doi:10.1007/BF02291180>, Andrich (1978) <doi:10.1007/BF02293814>, Masters (1982) <doi:10.1007/BF02296272> and Rasch (1960, ISBN:9780598554512).
This package provides functions for the joint analysis of Q sets of p-values obtained for the same list of items. This joint analysis is performed by querying a composite hypothesis, i.e. an arbitrary complex combination of simple hypotheses, as described in Mary-Huard et al. (2021) <doi:10.1093/bioinformatics/btab592> and De Walsche et al.(2025) <doi: 10.1093/nargab/lqaf118>. In this approach, the Q-uplet of p-values associated with each item is distributed as a multivariate mixture, where each of the 2^Q components corresponds to a specific combination of simple hypotheses. The dependence between the p-value series is considered using a Gaussian copula function. A p-value for the composite hypothesis test is derived from the posterior probabilities.
This package implements the Stable Balancing Weights by Zubizarreta (2015) <DOI:10.1080/01621459.2015.1023805>. These are the weights of minimum variance that approximately balance the empirical distribution of the observed covariates. For an overview, see Chattopadhyay, Hase and Zubizarreta (2020) <DOI:10.1002/sim.8659>. To solve the optimization problem in sbw', the default solver is quadprog', which is readily available through CRAN. The solver osqp is also posted on CRAN. To enhance the performance of sbw', users are encouraged to install other solvers such as gurobi and Rmosek', which require special installation. For the installation of gurobi and pogs, please follow the instructions at <https://docs.gurobi.com/projects/optimizer/en/current/reference/r.html> and <http://foges.github.io/pogs/stp/r>.
Predicts individual race/ethnicity using surname, first name, middle name, geolocation, and other attributes, such as gender and age. The method utilizes Bayes Rule (with optional measurement error correction) to compute the posterior probability of each racial category for any given individual. The package implements methods described in Imai and Khanna (2016) "Improving Ecological Inference by Predicting Individual Ethnicity from Voter Registration Records" Political Analysis <DOI:10.1093/pan/mpw001> and Imai, Olivella, and Rosenman (2022) "Addressing census data problems in race imputation via fully Bayesian Improved Surname Geocoding and name supplements" <DOI:10.1126/sciadv.adc9824>. The package also incorporates the data described in Rosenman, Olivella, and Imai (2023) "Race and ethnicity data for first, middle, and surnames" <DOI:10.1038/s41597-023-02202-2>.
This package provides a two-step approach to imputing missing data in metabolomics. Step 1 uses a random forest classifier to classify missing values as either Missing Completely at Random/Missing At Random (MCAR/MAR) or Missing Not At Random (MNAR). MCAR/MAR are combined because it is often difficult to distinguish these two missing types in metabolomics data. Step 2 imputes the missing values based on the classified missing mechanisms, using the appropriate imputation algorithms. Imputation algorithms tested and available for MCAR/MAR include Bayesian Principal Component Analysis (BPCA), Multiple Imputation No-Skip K-Nearest Neighbors (Multi_nsKNN), and Random Forest. Imputation algorithms tested and available for MNAR include nsKNN and a single imputation approach for imputation of metabolites where left-censoring is present.
SVP uses the distance between cells and cells, features and features, cells and features in the space of MCA to build nearest neighbor graph, then uses random walk with restart algorithm to calculate the activity score of gene sets (such as cell marker genes, kegg pathway, go ontology, gene modules, transcription factor or miRNA target sets, reactome pathway, ...), which is then further weighted using the hypergeometric test results from the original expression matrix. To detect the spatially or single cell variable gene sets or (other features) and the spatial colocalization between the features accurately, SVP provides some global and local spatial autocorrelation method to identify the spatial variable features. SVP is developed based on SingleCellExperiment class, which can be interoperable with the existing computing ecosystem.
Differential abundance testing in microbiome data challenges both parametric and non-parametric statistical methods, due to its sparsity, high variability and compositional nature. Microbiome-specific statistical methods often assume classical distribution models or take into account compositional specifics. These produce results that range within the specificity vs sensitivity space in such a way that type I and type II error that are difficult to ascertain in real microbiome data when a single method is used. Recently, a consensus approach based on multiple differential abundance (DA) methods was recently suggested in order to increase robustness. With dar, you can use dplyr-like pipeable sequences of DA methods and then apply different consensus strategies. In this way we can obtain more reliable results in a fast, consistent and reproducible way.
This package provides a method of recovering the precision matrix for Gaussian graphical models efficiently. Our approach could be divided into three categories. First of all, we use Hard Graphical Thresholding for best subset selection problem of Gaussian graphical model, and the core concept of this method was proposed by Luo et al. (2014) <arXiv:1407.7819>. Secondly, a closed form solution for graphical lasso under acyclic graph structure is implemented in our package (Fattahi and Sojoudi (2019) <https://jmlr.org/papers/v20/17-501.html>). Furthermore, we implement block coordinate descent algorithm to efficiently solve the covariance selection problem (Dempster (1972) <doi:10.2307/2528966>). Our package is computationally efficient and can solve ultra-high-dimensional problems, e.g. p > 10,000, in a few minutes.
The generalised lambda distribution, or Tukey lambda distribution, provides a wide variety of shapes with one functional form. This package provides random numbers, quantiles, probabilities, densities and density quantiles for four different types of the distribution, the FKML (Freimer et al 1988), RS (Ramberg and Schmeiser 1974), GPD (van Staden and Loots 2009) and FM5 - see documentation for details. It provides the density function, distribution function, and Quantile-Quantile plots. It implements a variety of estimation methods for the distribution, including diagnostic plots. Estimation methods include the starship (all 4 types), method of L-Moments for the GPD and FKML types, and a number of methods for only the FKML type. These include maximum likelihood, maximum product of spacings, Titterington's method, Moments, Trimmed L-Moments and Distributional Least Absolutes.
Calculate users prevalence of a product based on the prevalence of triers in the population. The measurement of triers is relatively easy. It is just a question of whether a person tried a product even once in his life or not. On the other hand, The measurement of people who also adopt it as part of their life is more complicated since adopting an innovative product is a subjective view of the individual. Mickey Kislev and Shira Kislev developed a formula to calculate the prevalence of a product's users to overcome this difficulty. The current package assists in calculating the users prevalence of a product based on the prevalence of triers in the population. See for: Kislev, M. M., and S. Kislev (2020) <doi:10.5539/ijms.v12n4p63>.
This package provides comprehensive functionalities for causal modeling with Coincidence Analysis (CNA), which is a configurational comparative method of causal data analysis that was first introduced in Baumgartner (2009) <doi:10.1177/0049124109339369>, and generalized in Baumgartner & Ambuehl (2020) <doi:10.1017/psrm.2018.45>. CNA is designed to recover INUS-causation from data, which is particularly relevant for analyzing processes featuring conjunctural causation (component causation) and equifinality (alternative causation). CNA is currently the only method for INUS-discovery that allows for multiple effects (outcomes/endogenous factors), meaning it can analyze common-cause and causal chain structures. Moreover, as of version 4.0, it is the only method of its kind that provides measures for model evaluation and selection that are custom-made for the problem of INUS-discovery.
This package provides functions for testing affine hypotheses on the regression coefficient vector in regression models with heteroskedastic errors: (i) a function for computing various test statistics (in particular using HC0-HC4 covariance estimators based on unrestricted or restricted residuals); (ii) a function for numerically approximating the size of a test based on such test statistics and a user-supplied critical value; and, most importantly, (iii) a function for determining size-controlling critical values for such test statistics and a user-supplied significance level (also incorporating a check of conditions under which such a size-controlling critical value exists). The three functions are based on results in Poetscher and Preinerstorfer (2021) "Valid Heteroskedasticity Robust Testing" <doi:10.48550/arXiv.2104.12597>, which will appear as <doi:10.1017/S0266466623000269>.
Supports Bayesian models with full and partial (hence arbitrary) dependencies between random variables. Discrete and continuous variables are supported, and conditional joint probabilities and probability densities are estimated using Kernel Density Estimation (KDE). The full general form, which implements an extension to Bayes theorem, as well as the simple form, which is just a Bayesian network, both support regression through segmentation and KDE and estimation of probability or relative likelihood of discrete or continuous target random variables. This package also provides true statistical distance measures based on Bayesian models. Furthermore, these measures can be facilitated on neighborhood searches, and to estimate the similarity and distance between data points. Related work is by Bayes (1763) <doi:10.1098/rstl.1763.0053> and by Scutari (2010) <doi:10.18637/jss.v035.i03>.
The REUSE tool helps you achieve and confirm license compliance with the REUSE specification, a set of recommendations for licensing Free Software projects. REUSE makes it easy to declare the licenses under which your works are released, especially when reusing software from different projects released under different licenses. It avoids reliance on fuzzy heuristicts and allows both legal experts and computers to understand how your project is licensed. This allows generating a "bill of materials" for software.
This tool downloads full license texts, adds copyright and license information to file headers, and contains a linter to identify problems. There are other tools that have a lot more features and functionality surrounding the analysis and inspection of copyright and licenses in software projects. This one is designed to be simple.