The package alpine
helps to model bias parameters and then using those parameters to estimate RNA-seq transcript abundance. Alpine
is a package for estimating and visualizing many forms of sample-specific biases that can arise in RNA-seq, including fragment length distribution, positional bias on the transcript, read start bias (random hexamer priming), and fragment GC-content (amplification). It also offers bias-corrected estimates of transcript abundance in FPKM(Fragments Per Kilobase of transcript per Million mapped reads). It is currently designed for un-stranded paired-end RNA-seq data.
This package implements bridge models for nowcasting and forecasting macroeconomic variables by linking high-frequency indicator variables (e.g., monthly data) to low-frequency target variables (e.g., quarterly GDP). Simplifies forecasting and aggregating indicator variables to match the target frequency, enabling timely predictions ahead of official data releases. For more on bridge models, see Baffigi, A., Golinelli, R., & Parigi, G. (2004) <doi:10.1016/S0169-2070(03)00067-0>, Burri (2023) <https://www5.unine.ch/RePEc/ftp/irn/pdfs/WP23-02.pdf>
or Schumacher (2016) <doi:10.1016/j.ijforecast.2015.07.004>.
Maleknia et al. (2020) <doi:10.1101/2020.01.13.905448>. A novel pathway enrichment analysis package based on Bayesian network to investigate the topology features of the pathways. firstly, 187 kyoto encyclopedia of genes and genomes (KEGG) human non-metabolic pathways which their cycles were eliminated by biological approach, enter in analysis as Bayesian network structures. The constructed Bayesian network were optimized by the Least Absolute Shrinkage Selector Operator (lasso) and the parameters were learned based on gene expression data. Finally, the impacted pathways were enriched by Fisherâ s Exact Test on significant parameters.
Although many software tools can perform meta-analyses on genetic case-control data, none of these apply to combined case-control and family-based (TDT) studies. This package conducts fixed-effects (with inverse variance weighting) and random-effects [DerSimonian
and Laird (1986) <DOI:10.1016/0197-2456(86)90046-2>] meta-analyses on combined genetic data. Specifically, this package implements a fixed-effects model [Kazeem and Farrall (2005) <DOI:10.1046/j.1529-8817.2005.00156.x>] and a random-effects model [Nicodemus (2008) <DOI:10.1186/1471-2105-9-130>] for combined studies.
This package provides a multiple testing procedure aims to find the rare-variant association regions. When variants are rare, the single variant association test approach suffers from low power. To improve testing power, the procedure dynamically and hierarchically aggregates smaller genome regions to larger ones and performs multiple testing for disease associations with a controlled node-level false discovery rate. This method are members of the family of ancillary information assisted recursive testing introduced in Pura, Li, Chan and Xie (2021) <arXiv:1906.07757v2>
and Li, Sung and Xie (2021) <arXiv:2103.11085v2>
.
Functions, data sets and shiny apps for "Epidemics: Models and Data in R" by Ottar N. Bjornstad (ISBN 978-3-319-97487-3) <https://www.springer.com/gp/book/9783319974866>. The package contains functions to study the S(E)IR model, spatial and age-structured SIR models; time-series SIR and chain-binomial stochastic models; catalytic disease models; coupled map lattice models of spatial transmission and network models for social spread of infection. The package is also an advanced quantitative companion to the coursera Epidemics Massive Online Open Course <https://www.coursera.org/learn/epidemics>.
This package provides a toolbox to make it easy to analyze plant disease epidemics. It provides a common framework for plant disease intensity data recorded over time and/or space. Implemented statistical methods are currently mainly focused on spatial pattern analysis (e.g., aggregation indices, Taylor and binary power laws, distribution fitting, SADIE and mapcomp methods). See Laurence V. Madden, Gareth Hughes, Franck van den Bosch (2007) <doi:10.1094/9780890545058> for further information on these methods. Several data sets that were mainly published in plant disease epidemiology literature are also included in this package.
Calculates 15 different goodness of fit criteria. These are; standard deviation ratio (SDR), coefficient of variation (CV), relative root mean square error (RRMSE), Pearson's correlation coefficients (PC), root mean square error (RMSE), performance index (PI), mean error (ME), global relative approximation error (RAE), mean relative approximation error (MRAE), mean absolute percentage error (MAPE), mean absolute deviation (MAD), coefficient of determination (R-squared), adjusted coefficient of determination (adjusted R-squared), Akaike's information criterion (AIC), corrected Akaike's information criterion (CAIC), Mean Square Error (MSE), Bayesian Information Criterion (BIC) and Normalized Mean Square Error (NMSE).
In statistical modeling, there is a wide variety of regression models for categorical dependent variables (nominal or ordinal data); yet, there is no software embracing all these models together in a uniform and generalized format. Following the methodology proposed by Peyhardi, Trottier, and Guédon (2015) <doi:10.1093/biomet/asv042>, we introduce GLMcat', an R package to estimate generalized linear models implemented under the unified specification (r, F, Z). Where r represents the ratio of probabilities (reference, cumulative, adjacent, or sequential), F the cumulative cdf function for the linkage, and Z, the design matrix.
It provides functions to generate operating characteristics and to calculate Sequential Conditional Probability Ratio Tests(SCPRT) efficacy and futility boundary values along with sample/event size of Multi-Arm Multi-Stage(MAMS) trials for different outcomes. The package is based on Jianrong Wu, Yimei Li, Liang Zhu (2023) <doi:10.1002/sim.9682>, Jianrong Wu, Yimei Li (2023) "Group Sequential Multi-Arm Multi-Stage Survival Trial Design with Treatment Selection"(Manuscript accepted for publication) and Jianrong Wu, Yimei Li, Shengping Yang (2023) "Group Sequential Multi-Arm Multi-Stage Trial Design with Ordinal Endpoints"(In preparation).
This package provides a wrapper for querying WISKI databases via the KiWIS
REST API. WISKI is an SQL relational database used for the collection and storage of water data developed by KISTERS and KiWIS
is a REST service that provides access to WISKI databases via HTTP requests (<https://www.kisters.eu/water-weather-and-environment/>). Contains a list of default databases (called hubs') and also allows users to provide their own KiWIS
URL. Supports the entire query process- from metadata to specific time series values. All data is returned as tidy tibbles.
Various efficient and robust bootstrap methods are implemented for linear models with least squares estimation. Functions within this package allow users to create bootstrap sampling distributions for model parameters, test hypotheses about parameters, and visualize the bootstrap sampling or null distributions. Methods implemented for linear models include the wild bootstrap by Wu (1986) <doi:10.1214/aos/1176350142>, the residual and paired bootstraps by Efron (1979, ISBN:978-1-4612-4380-9), the delete-1 jackknife by Quenouille (1956) <doi:10.2307/2332914>, and the Bayesian bootstrap by Rubin (1981) <doi:10.1214/aos/1176345338>.
Implementations of the quantile slice sampler of Heiner et al. (2024+, in preparation) as well as other popular slice samplers are provided. Helper functions for specifying pseudo-target distributions are included, both for diagnostics and for tuning the quantile slice sampler. Other implemented methods include the generalized elliptical slice sampler of Nishihara et al. (2014)<https://jmlr.org/papers/v15/nishihara14a.html
Uses simulation to create prediction intervals for post-policy outcomes in interrupted time series (ITS) designs, following Miratrix (2020) <arXiv:2002.05746>
. This package provides methods for fitting ITS models with lagged outcomes and variables to account for temporal dependencies. It then conducts inference via simulation, simulating a set of plausible counterfactual post-policy series to compare to the observed post-policy series. This package also provides methods to visualize such data, and also to incorporate seasonality models and smoothing and aggregation/summarization. This work partially funded by Arnold Ventures in collaboration with MDRC.
Have you ever index sorted cells in a 96 or 384-well plate and then sequenced using Sanger sequencing? If so, you probably had some struggles to either check the electropherogram of each cell sequenced manually, or when you tried to identify which cell was sorted where after sequencing the plate. Scifer was developed to solve this issue by performing basic quality control of Sanger sequences and merging flow cytometry data from probed single-cell sorted B cells with sequencing data. scifer can export summary tables, fasta files, electropherograms for visual inspection, and generate reports.
This package takes a list of p-values resulting from the simultaneous testing of many hypotheses and estimates their q-values and local false discovery rate (FDR) values. The q-value of a test measures the proportion of false positives incurred when that particular test is called significant. The local FDR measures the posterior probability the null hypothesis is true given the test's p-value. Various plots are automatically generated, allowing one to make sensible significance cut-offs. The software can be applied to problems in genomics, brain imaging, astrophysics, and data mining.
Covariance measure tests for conditional independence testing against conditional covariance and nonlinear conditional mean alternatives. The package implements versions of the generalised covariance measure test (Shah and Peters, 2020, <doi:10.1214/19-aos1857>) and projected covariance measure test (Lundborg et al., 2023, <doi:10.1214/24-AOS2447>). The tram-GCM test, for censored responses, is implemented including the Cox model and survival forests (Kook et al., 2024, <doi:10.1080/01621459.2024.2395588>). Application examples to variable significance testing and modality selection can be found in Kook and Lundborg (2024, <doi:10.1093/bib/bbae475>).
This package provides a collection of tools that support data diagnosis, exploration, and transformation. Data diagnostics provides information and visualization of missing values, outliers, and unique and negative values to help you understand the distribution and quality of your data. Data exploration provides information and visualization of the descriptive statistics of univariate variables, normality tests and outliers, correlation of two variables, and the relationship between the target variable and predictor. Data transformation supports binning for categorizing continuous variables, imputes missing values and outliers, and resolves skewness. And it creates automated reports that support these three tasks.
Forensic applications of pedigree analysis, including likelihood ratios for relationship testing, general relatedness inference, marker simulation, and power analysis. forrel is part of the pedsuite', a collection of packages for pedigree analysis, further described in the book Pedigree Analysis in R (Vigeland, 2021, ISBN:9780128244302). Several functions deal specifically with power analysis in missing person cases, implementing methods described in Vigeland et al. (2020) <doi:10.1016/j.fsigen.2020.102376>. Data import from the Familias software (Egeland et al. (2000) <doi:10.1016/S0379-0738(00)00147-X>) is supported through the pedFamilias
package.
An implementation of regression models with partial differential regularizations, making use of the Finite Element Method. The models efficiently handle data distributed over irregularly shaped domains and can comply with various conditions at the boundaries of the domain. A priori information about the spatial structure of the phenomenon under study can be incorporated in the model via the differential regularization. See Sangalli, L. M. (2021) <doi:10.1111/insr.12444> "Spatial Regression With Partial Differential Equation Regularisation" for an overview. The release 1.1-9 requires R (>= 4.2.0) to be installed on windows machines.
Given a group of genomes and their relationship with each other, the package clusters the genomes and selects the most representative members of each cluster. Additional data can be provided to the prioritize certain genomes. The results can be printed out as a list or a new phylogeny with graphs of the trees and distance distributions also available. For detailed introduction see: Thomas H Clarke, Lauren M Brinkac, Granger Sutton, and Derrick E Fouts (2018), GGRaSP
: a R-package for selecting representative genomes using Gaussian mixture models, Bioinformatics, bty300, <doi:10.1093/bioinformatics/bty300>.
The risk plot may be one of the most commonly used figures in tumor genetic data analysis. We can conclude the following two points: Comparing the prediction results of the model with the real survival situation to see whether the survival rate of the high-risk group is lower than that of the low-level group, and whether the survival time of the high-risk group is shorter than that of the low-risk group. The other is to compare the heat map and scatter plot to see the correlation between the predictors and the outcome.
The number of clusters (k) is needed to start all the partitioning clustering algorithms. An optimal value of this input argument is widely determined by using some internal validity indices. Since most of the existing internal indices suggest a k value which is computed from the clustering results after several runs of a clustering algorithm they are computationally expensive. On the contrary, the package kpeaks enables to estimate k before running any clustering algorithm. It is based on a simple novel technique using the descriptive statistics of peak counts of the features in a data set.
Sharing statistical methods or simulation frameworks through shiny applications often requires workflows for handling data. To help save and display simulation results, the postgresUI()
and postgresServer()
functions in mmints help with persistent data storage using a PostgreSQL
database. The mmints package also offers data upload functionality through the csvUploadUI()
and csvUploadServer()
functions which allow users to upload data, view variables and their types, and edit variable types before fitting statistical models within the shiny application. These tools aim to enhance efficiency and user interaction in shiny based statistical and simulation applications.