This package implements methods for batch correction and integration of scRNA-seq datasets, based on the Seurat anchor-based integration framework. In particular, STACAS is optimized for the integration of heterogeneous datasets with only limited overlap between cell sub-types (e.g. TIL sets of CD8 from tumor with CD8/CD4 T cells from lymphnode), for which the default Seurat alignment methods would tend to over-correct biological differences. The 2.0 version of the package allows the users to incorporate explicit information about cell-types in order to assist the integration process.
Streaming JSON (ndjson) has one JSON record per-line and many modern ndjson files contain large numbers of records. These constructs may not be columnar in nature, but it is often useful to read in these files and "flatten" the structure out to enable working with the data in an R data.frame
-like context. Functions are provided that make it possible to read in plain ndjson files or compressed (gz
) ndjson files and either validate the format of the records or create "flat" data.table
structures from them.
This package provides a collection of sampling formulas for the unified neutral model of biogeography and biodiversity. Alongside the sampling formulas, it includes methods to perform maximum likelihood optimization of the sampling formulas, methods to generate data given the neutral model, and methods to estimate the expected species abundance distribution. Sampling formulas included in the GUILDS package are the Etienne Sampling Formula (Etienne 2005), the guild sampling formula, where guilds are assumed to differ in dispersal ability (Janzen et al. 2015), and the guilds sampling formula conditioned on guild size (Janzen et al. 2015).
Estimation methods for phase-type distribution (PH) and Markovian arrival process (MAP) from empirical data (point and grouped data) and density function. The tool is based on the following researches: Okamura et al. (2009) <doi:10.1109/TNET.2008.2008750>, Okamura and Dohi (2009) <doi:10.1109/QEST.2009.28>, Okamura et al. (2011) <doi:10.1016/j.peva.2011.04.001>, Okamura et al. (2013) <doi:10.1002/asmb.1919>, Horvath and Okamura (2013) <doi:10.1007/978-3-642-40725-3_10>, Okamura and Dohi (2016) <doi:10.15807/jorsj.59.72>.
This package implements a model-based clustering method for categorical life-course sequences relying on mixtures of exponential-distance models introduced by Murphy et al. (2021) <doi:10.1111/rssa.12712>. A range of flexible precision parameter settings corresponding to weighted generalisations of the Hamming distance metric are considered, along with the potential inclusion of a noise component. Gating covariates can be supplied in order to relate sequences to baseline characteristics and sampling weights are also accommodated. The models are fitted using the EM algorithm and tools for visualising the results are also provided.
This package provides a wrapper around Michel Scheffers's libassp (<https://libassp.sourceforge.net/>). The libassp (Advanced Speech Signal Processor) library aims at providing functionality for handling speech signal files in most common audio formats and for performing analyses common in phonetic science/speech science. This includes the calculation of formants, fundamental frequency, root mean square, auto correlation, a variety of spectral analyses, zero crossing rate, filtering etc. This wrapper provides R with a large subset of libassp's signal processing functions and provides them to the user in a (hopefully) user-friendly manner.
The package alpine
helps to model bias parameters and then using those parameters to estimate RNA-seq transcript abundance. Alpine
is a package for estimating and visualizing many forms of sample-specific biases that can arise in RNA-seq, including fragment length distribution, positional bias on the transcript, read start bias (random hexamer priming), and fragment GC-content (amplification). It also offers bias-corrected estimates of transcript abundance in FPKM(Fragments Per Kilobase of transcript per Million mapped reads). It is currently designed for un-stranded paired-end RNA-seq data.
Maleknia et al. (2020) <doi:10.1101/2020.01.13.905448>. A novel pathway enrichment analysis package based on Bayesian network to investigate the topology features of the pathways. firstly, 187 kyoto encyclopedia of genes and genomes (KEGG) human non-metabolic pathways which their cycles were eliminated by biological approach, enter in analysis as Bayesian network structures. The constructed Bayesian network were optimized by the Least Absolute Shrinkage Selector Operator (lasso) and the parameters were learned based on gene expression data. Finally, the impacted pathways were enriched by Fisherâ s Exact Test on significant parameters.
This package implements bridge models for nowcasting and forecasting macroeconomic variables by linking high-frequency indicator variables (e.g., monthly data) to low-frequency target variables (e.g., quarterly GDP). Simplifies forecasting and aggregating indicator variables to match the target frequency, enabling timely predictions ahead of official data releases. For more on bridge models, see Baffigi, A., Golinelli, R., & Parigi, G. (2004) <doi:10.1016/S0169-2070(03)00067-0>, Burri (2023) <https://www5.unine.ch/RePEc/ftp/irn/pdfs/WP23-02.pdf>
or Schumacher (2016) <doi:10.1016/j.ijforecast.2015.07.004>.
Although many software tools can perform meta-analyses on genetic case-control data, none of these apply to combined case-control and family-based (TDT) studies. This package conducts fixed-effects (with inverse variance weighting) and random-effects [DerSimonian
and Laird (1986) <DOI:10.1016/0197-2456(86)90046-2>] meta-analyses on combined genetic data. Specifically, this package implements a fixed-effects model [Kazeem and Farrall (2005) <DOI:10.1046/j.1529-8817.2005.00156.x>] and a random-effects model [Nicodemus (2008) <DOI:10.1186/1471-2105-9-130>] for combined studies.
This package provides a multiple testing procedure aims to find the rare-variant association regions. When variants are rare, the single variant association test approach suffers from low power. To improve testing power, the procedure dynamically and hierarchically aggregates smaller genome regions to larger ones and performs multiple testing for disease associations with a controlled node-level false discovery rate. This method are members of the family of ancillary information assisted recursive testing introduced in Pura, Li, Chan and Xie (2021) <arXiv:1906.07757v2>
and Li, Sung and Xie (2021) <arXiv:2103.11085v2>
.
Functions, data sets and shiny apps for "Epidemics: Models and Data in R" by Ottar N. Bjornstad (ISBN 978-3-319-97487-3) <https://www.springer.com/gp/book/9783319974866>. The package contains functions to study the S(E)IR model, spatial and age-structured SIR models; time-series SIR and chain-binomial stochastic models; catalytic disease models; coupled map lattice models of spatial transmission and network models for social spread of infection. The package is also an advanced quantitative companion to the coursera Epidemics Massive Online Open Course <https://www.coursera.org/learn/epidemics>.
This package provides a toolbox to make it easy to analyze plant disease epidemics. It provides a common framework for plant disease intensity data recorded over time and/or space. Implemented statistical methods are currently mainly focused on spatial pattern analysis (e.g., aggregation indices, Taylor and binary power laws, distribution fitting, SADIE and mapcomp methods). See Laurence V. Madden, Gareth Hughes, Franck van den Bosch (2007) <doi:10.1094/9780890545058> for further information on these methods. Several data sets that were mainly published in plant disease epidemiology literature are also included in this package.
Calculates 15 different goodness of fit criteria. These are; standard deviation ratio (SDR), coefficient of variation (CV), relative root mean square error (RRMSE), Pearson's correlation coefficients (PC), root mean square error (RMSE), performance index (PI), mean error (ME), global relative approximation error (RAE), mean relative approximation error (MRAE), mean absolute percentage error (MAPE), mean absolute deviation (MAD), coefficient of determination (R-squared), adjusted coefficient of determination (adjusted R-squared), Akaike's information criterion (AIC), corrected Akaike's information criterion (CAIC), Mean Square Error (MSE), Bayesian Information Criterion (BIC) and Normalized Mean Square Error (NMSE).
It provides functions to generate operating characteristics and to calculate Sequential Conditional Probability Ratio Tests(SCPRT) efficacy and futility boundary values along with sample/event size of Multi-Arm Multi-Stage(MAMS) trials for different outcomes. The package is based on Jianrong Wu, Yimei Li, Liang Zhu (2023) <doi:10.1002/sim.9682>, Jianrong Wu, Yimei Li (2023) "Group Sequential Multi-Arm Multi-Stage Survival Trial Design with Treatment Selection"(Manuscript accepted for publication) and Jianrong Wu, Yimei Li, Shengping Yang (2023) "Group Sequential Multi-Arm Multi-Stage Trial Design with Ordinal Endpoints"(In preparation).
In statistical modeling, there is a wide variety of regression models for categorical dependent variables (nominal or ordinal data); yet, there is no software embracing all these models together in a uniform and generalized format. Following the methodology proposed by Peyhardi, Trottier, and Guédon (2015) <doi:10.1093/biomet/asv042>, we introduce GLMcat', an R package to estimate generalized linear models implemented under the unified specification (r, F, Z). Where r represents the ratio of probabilities (reference, cumulative, adjacent, or sequential), F the cumulative cdf function for the linkage, and Z, the design matrix.
This package provides a wrapper for querying WISKI databases via the KiWIS
REST API. WISKI is an SQL relational database used for the collection and storage of water data developed by KISTERS and KiWIS
is a REST service that provides access to WISKI databases via HTTP requests (<https://www.kisters.eu/water-weather-and-environment/>). Contains a list of default databases (called hubs') and also allows users to provide their own KiWIS
URL. Supports the entire query process- from metadata to specific time series values. All data is returned as tidy tibbles.
Various efficient and robust bootstrap methods are implemented for linear models with least squares estimation. Functions within this package allow users to create bootstrap sampling distributions for model parameters, test hypotheses about parameters, and visualize the bootstrap sampling or null distributions. Methods implemented for linear models include the wild bootstrap by Wu (1986) <doi:10.1214/aos/1176350142>, the residual and paired bootstraps by Efron (1979, ISBN:978-1-4612-4380-9), the delete-1 jackknife by Quenouille (1956) <doi:10.2307/2332914>, and the Bayesian bootstrap by Rubin (1981) <doi:10.1214/aos/1176345338>.
Advanced forecasting algorithms for long-term energy demand at the national or regional level. The methodology is based on Grandón et al. (2024) <doi:10.1016/j.apenergy.2023.122249>; Zimmermann & Ziel (2024) <doi:10.1016/j.apenergy.2025.125444>. Real-time data, including power demand, weather conditions, and macroeconomic indicators, are provided through automated API integration with various institutions. The modular approach maintains transparency on the various model selection processes and encompasses the ability to be adapted to individual needs. oRaklE
tries to help facilitating robust decision-making in energy management and planning.
Implementations of the quantile slice sampler of Heiner et al. (2024+, in preparation) as well as other popular slice samplers are provided. Helper functions for specifying pseudo-target distributions are included, both for diagnostics and for tuning the quantile slice sampler. Other implemented methods include the generalized elliptical slice sampler of Nishihara et al. (2014)<https://jmlr.org/papers/v15/nishihara14a.html
Uses simulation to create prediction intervals for post-policy outcomes in interrupted time series (ITS) designs, following Miratrix (2020) <arXiv:2002.05746>
. This package provides methods for fitting ITS models with lagged outcomes and variables to account for temporal dependencies. It then conducts inference via simulation, simulating a set of plausible counterfactual post-policy series to compare to the observed post-policy series. This package also provides methods to visualize such data, and also to incorporate seasonality models and smoothing and aggregation/summarization. This work partially funded by Arnold Ventures in collaboration with MDRC.
Have you ever index sorted cells in a 96 or 384-well plate and then sequenced using Sanger sequencing? If so, you probably had some struggles to either check the electropherogram of each cell sequenced manually, or when you tried to identify which cell was sorted where after sequencing the plate. Scifer was developed to solve this issue by performing basic quality control of Sanger sequences and merging flow cytometry data from probed single-cell sorted B cells with sequencing data. scifer can export summary tables, fasta files, electropherograms for visual inspection, and generate reports.
This package takes a list of p-values resulting from the simultaneous testing of many hypotheses and estimates their q-values and local false discovery rate (FDR) values. The q-value of a test measures the proportion of false positives incurred when that particular test is called significant. The local FDR measures the posterior probability the null hypothesis is true given the test's p-value. Various plots are automatically generated, allowing one to make sensible significance cut-offs. The software can be applied to problems in genomics, brain imaging, astrophysics, and data mining.
Covariance measure tests for conditional independence testing against conditional covariance and nonlinear conditional mean alternatives. The package implements versions of the generalised covariance measure test (Shah and Peters, 2020, <doi:10.1214/19-aos1857>) and projected covariance measure test (Lundborg et al., 2023, <doi:10.1214/24-AOS2447>). The tram-GCM test, for censored responses, is implemented including the Cox model and survival forests (Kook et al., 2024, <doi:10.1080/01621459.2024.2395588>). Application examples to variable significance testing and modality selection can be found in Kook and Lundborg (2024, <doi:10.1093/bib/bbae475>).