This package provides methods for obtaining improved estimates of non-linear cross-validated risks are obtained using targeted minimum loss-based estimation, estimating equations, and one-step estimation (Benkeser, Petersen, van der Laan (2019), <doi:10.1080/01621459.2019.1668794>). Cross-validated area under the receiver operating characteristics curve (LeDell, Petersen, van der Laan (2015), <doi:10.1214/15-EJS1035>) and other metrics are included.
The online principal component regression method can process the online data set. OPCreg implements the online principal component regression method, which is specifically designed to process online datasets efficiently. This method is particularly useful for handling large-scale, streaming data where traditional batch processing methods may be computationally infeasible.The philosophy of the package is described in Guo (2025) <doi:10.1016/j.physa.2024.130308>.
This package provides a client that grants access to the power of the ohsome API from R. It lets you analyze the rich data source of the OpenStreetMap (OSM) history. You can retrieve the geometry of OSM data at specific points in time, and you can get aggregated statistics on the evolution of OSM elements and specify your own temporal, spatial and/or thematic filters.
This package provides functions for unconditional and conditional quantiles. These include methods for transformation-based quantile regression, quantile-based measures of location, scale and shape, methods for quantiles of discrete variables, quantile-based multiple imputation, restricted quantile regression, directional quantile classification, and quantile ratio regression. A vignette is given in Geraci (2016, The R Journal) <doi:10.32614/RJ-2016-037> and included in the package.
Semiparametric and parametric estimation of INAR models including a finite sample refinement (Faymonville et al. (2022) <doi:10.1007/s10260-022-00655-0>) for the semiparametric setting introduced in Drost et al. (2009) <doi:10.1111/j.1467-9868.2008.00687.x>, different procedures to bootstrap INAR data (Jentsch, C. and WeiĆ , C.H. (2017) <doi:10.3150/18-BEJ1057>) and flexible simulation of INAR data.
Some tools for cleaning up messy Excel files to be suitable for R. People who have been working with Excel for years built more or less complicated sheets with names, characters, formats that are not homogeneous. To be able to use them in R nowadays, we built a set of functions that will avoid the majority of importation problems and keep all the data at best.
Carries out analyses of two-way tables with one observation per cell, together with graphical displays for an additive fit and a diagnostic plot for removable non-additivity via a power transformation of the response. It implements Tukey's Exploratory Data Analysis (1973) <ISBN: 978-0201076165> methods, including a 1-degree-of-freedom test for row*column non-additivity', linear in the row and column effects.
ASURAT is a software for single-cell data analysis. Using ASURAT, one can simultaneously perform unsupervised clustering and biological interpretation in terms of cell type, disease, biological process, and signaling pathway activity. Inputting a single-cell RNA-seq data and knowledge-based databases, such as Cell Ontology, Gene Ontology, KEGG, etc., ASURAT transforms gene expression tables into original multivariate tables, termed sign-by-sample matrices (SSMs).
scider is a user-friendly R package providing functions to model the global density of cells in a slide of spatial transcriptomics data. All functions in the package are built based on the SpatialExperiment object, allowing integration into various spatial transcriptomics-related packages from Bioconductor. After modelling density, the package allows for serveral downstream analysis, including colocalization analysis, boundary detection analysis and differential density analysis.
Signal-to-Noise applied to Gene Expression Experiments. Signal-to-noise ratios can be used as a proxy for quality of gene expression studies and samples. The SNRs can be calculated on any gene expression data set as long as gene IDs are available, no access to the raw data files is necessary. This allows to flag problematic studies and samples in any public data set.
Various mRNA sequencing library preparation methods generate sequencing reads specifically from the transcript ends. Analyses that focus on quantification of isoform usage from such data can be aided by using truncated versions of transcriptome annotations, both at the alignment or pseudo-alignment stage, as well as in downstream analysis. This package implements some convenience methods for readily generating such truncated annotations and their corresponding sequences.
This package provides a simple interface for multivariate correlation analysis that unifies various classical statistical procedures including t-tests, tests in univariate and multivariate linear models, parametric and nonparametric tests for correlation, Kruskal-Wallis tests, common approximate versions of Wilcoxon rank-sum and signed rank tests, chi-squared tests of independence, score tests of particular hypotheses in generalized linear models, canonical correlation analysis and linear discriminant analysis.
Perform censored quantile regression of Huang (2010) <doi:10.1214/09-AOS771>, and restore monotonicity respecting via adaptive interpolation for dynamic regression of Huang (2017) <doi:10.1080/01621459.2016.1149070>. The monotonicity-respecting restoration applies to general dynamic regression models including (uncensored or censored) quantile regression model, additive hazards model, and dynamic survival models of Peng and Huang (2007) <doi:10.1093/biomet/asm058>, among others.
In metabolic flux experiments tracer molecules (often glucose containing labelled carbon) are incorporated in compounds measured using mass spectrometry. The mass isotopologue distributions of these compounds needs to be corrected for natural abundance of labelled carbon and other effects, which are specific on the compound and ionization technique applied. This package provides functions to correct such effects in gas chromatography atmospheric pressure chemical ionization mass spectrometry analyses.
Decorrelates a set of summary statistics (i.e., Z-scores or P-values per SNP) via Decorrelation by Orthogonal Transformation (DOT) approach and performs gene-set analyses by combining transformed statistic values; operations are performed with algorithms that rely only on the association summary results and the linkage disequilibrium (LD). For more details on DOT and its power, see Olga (2020) <doi:10.1371/journal.pcbi.1007819>.
Replication methods to compute some basic statistic operations (means, standard deviations, frequency tables, percentiles, mean comparisons using weighted effect coding, generalized linear models, and linear multilevel models) in complex survey designs comprising multiple imputed or nested imputed variables and/or a clustered sampling structure which both deserve special procedures at least in estimating standard errors. See the package documentation for a more detailed description along with references.
Fair machine learning regression models which take sensitive attributes into account in model estimation. Currently implementing Komiyama et al. (2018) <http://proceedings.mlr.press/v80/komiyama18a/komiyama18a.pdf>, Zafar et al. (2019) <https://www.jmlr.org/papers/volume20/18-262/18-262.pdf> and my own approach from Scutari, Panero and Proissl (2022) <doi:10.1007/s11222-022-10143-w> that uses ridge regression to enforce fairness.
This package provides flexible odds ratio curves that enable modeling non-linear relationships between continuous predictors and binary outcomes. This package facilitates a deeper understanding of the impact of each continuous predictor on the outcome by presenting results in terms of odds ratio (OR) curves based on splines. These curves allow for comparison against a specified reference value, aiding in the interpretation of the predictor's effect.
Activate dark mode on your favorite ggplot2 theme with dark_mode() or use the dark versions of ggplot2 themes, including dark_theme_gray(), dark_theme_minimal(), and others. When a dark theme is applied, all geom color and geom fill defaults are changed to make them visible against a dark background. To restore the defaults to their original values, use invert_geom_defaults().
Fits a multivariate linear mixed effects model that uses a polygenic term, after Zhou & Stephens (2014) (<https://www.nature.com/articles/nmeth.2848>). Of particular interest is the estimation of variance components with restricted maximum likelihood (REML) methods. Genome-wide efficient mixed-model association (GEMMA), as implemented in the package gemma2', uses an expectation-maximization algorithm for variance components inference for use in quantitative trait locus studies.
Launches a shiny based application for Nuclear Magnetic Resonance (NMR)data importation and Statistical TOtal Correlation SpectroscopY (STOCSY) analyses in a full interactive approach. The theoretical background and applications of STOCSY method could be found at Cloarec, O., Dumas, M. E., Craig, A., Barton, R. H., Trygg, J., Hudson, J., Blancher, C., Gauguier, D., Lindon, J. C., Holmes, E. & Nicholson, J. (2005) <doi:10.1021/ac048630x>.
This package provides a collection of statistical tests for the detection of differential item functioning (DIF) in multistage tests. Methods entail logistic regression, an adaptation of the simultaneous item bias test (SIBTEST), and various score-based tests. The presented tests provide itemwise test for DIF along categorical, ordinal or metric covariates. Methods for uniform and non-uniform DIF effects are available depending on which method is used.
This package provides a new method to implement clustering from multiple modality data of certain samples, the function M2SMjF() jointly factorizes multiple similarity matrices into a shared sub-matrix and several modality private sub-matrices, which is further used for clustering. Along with this method, we also provide function to calculate the similarity matrix and function to evaluate the best cluster number from the original data.
Compute effect sizes and their sampling variances from factorial experimental designs. The package supports calculation of simple effects, overall effects, and interaction effects for use in factorial meta-analyses. See Gurevitch et al. (2000) <doi:10.1086/303337>, Morris et al. (2007) <doi:10.1890/06-0442>, Lajeunesse (2011) <doi:10.1890/11-0423.1> and Macartney et al. (2022) <doi:10.1016/j.neubiorev.2022.104554>.