Group SLOPE (Group Sorted L1 Penalized Estimation) is a penalized linear regression method that is used for adaptive selection of groups of significant predictors in a high-dimensional linear model. The Group SLOPE method can control the (group) false discovery rate at a user-specified level (i.e., control the expected proportion of irrelevant among all selected groups of predictors). For additional information about the implemented methods please see Brzyski, Gossmann, Su, Bogdan (2018) <doi:10.1080/01621459.2017.1411269>.
This package provides a fragmentation spectra detection pipeline for high-throughput LC/HRMS data processing using peaklists generated by the IDSL.IPA workflow <doi:10.1021/acs.jproteome.2c00120>. The IDSL.CSA package can deconvolute fragmentation spectra from Composite Spectra Analysis (CSA), Data Dependent Acquisition (DDA) analysis, and various Data-Independent Acquisition (DIA) methods such as MS^E, All-Ion Fragmentation (AIF) and SWATH-MS analysis. The IDSL.CSA package was introduced in <doi:10.1021/acs.analchem.3c00376>.
Routines to handle family data with a pedigree object. The initial purpose was to create correlation structures that describe family relationships such as kinship and identity-by-descent, which can be used to model family data in mixed effects models, such as in the coxme function. Also includes a tool for pedigree drawing which is focused on producing compact layouts without intervention. Recent additions include utilities to trim the pedigree object with various criteria, and kinship for the X chromosome.
Estimation of a multi-group count regression models (i.e., Poisson, negative binomial) with latent covariates. This packages provides two extensions compared to ordinary count regression models based on a generalized linear model: First, measurement models for the predictors can be specified allowing to account for measurement error. Second, the count regression can be simultaneously estimated in multiple groups with stochastic group weights. The marginal maximum likelihood estimation is described in Kiefer & Mayer (2020) <doi:10.1080/00273171.2020.1751027>.
Simulate a (bivariate) multivariate renewal Hawkes (MRHawkes) self-exciting process, with given immigrant hazard rate functions and offspring density function. Calculate the likelihood of a MRHawkes process with given hazard rate functions and offspring density function for an (increasing) sequence of event times. Calculate the Rosenblatt residuals of the event times. Predict future event times based on observed event times up to a given time. For details see Stindl and Chen (2018) <doi:10.1016/j.csda.2018.01.021>.
This package implements the American Heart Association Predicting Risk of cardiovascular disease EVENTs (PREVENT) equations from Khan SS, Matsushita K, Sang Y, and colleagues (2023) <doi:10.1161/CIRCULATIONAHA.123.067626>, with optional comparison with their de facto predecessor, the Pooled Cohort Equations from the American Heart Association and American College of Cardiology (2013) <doi:10.1161/01.cir.0000437741.48606.98> and the revision to the Pooled Cohort Equations from Yadlowsky and colleagues (2018) <doi:10.7326/M17-3011>.
LIONESS, or Linear Interpolation to Obtain Network Estimates for Single Samples, can be used to reconstruct single-sample networks (https://arxiv.org/abs/1505.06440). This code implements the LIONESS equation in the lioness function in R to reconstruct single-sample networks. The default network reconstruction method we use is based on Pearson correlation. However, lionessR
can run on any network reconstruction algorithms that returns a complete, weighted adjacency matrix. lionessR
works for both unipartite and bipartite networks.
Given a set of genomic sites/regions (e.g. ChIP-seq peaks, CpGs, differentially methylated CpGs or regions, SNPs, etc.) it is often of interest to investigate the intersecting genomic annotations. Such annotations include those relating to gene models (promoters, 5'UTRs, exons, introns, and 3'UTRs), CpGs (CpG islands, CpG shores, CpG shelves), or regulatory sequences such as enhancers. The annotatr package provides an easy way to summarize and visualize the intersection of genomic sites/regions with genomic annotations.
The package contains functions to infer and visualize cell cycle process using Single-cell RNA-Seq data. It exploits the idea of transfer learning, projecting new data to the previous learned biologically interpretable space. The tricycle
provides a pre-learned cell cycle space, which could be used to infer cell cycle time of human and mouse single cell samples. In addition, it also offer functions to visualize cell cycle time on different embeddings and functions to build new reference.
This package provides a small collection of interesting and educational machine learning data sets which are used as examples in the mlr3 book Applied machine learning using mlr3 in R https://mlr3book.mlr-org.com, the use case gallery https://mlr3gallery.mlr-org.com, or in other examples. All data sets are properly preprocessed and ready to be analyzed by most machine learning algorithms. Data sets are automatically added to the dictionary of tasks if mlr3 is loaded.
This package provides portable tools to run system processes in the background. It can check if a background process is running; wait on a background process to finish; get the exit status of finished processes; kill background processes and their children; restart processes. It can read the standard output and error of the processes, using non-blocking connections. processx
can poll a process for standard output or error, with a timeout. It can also poll several processes at once.
For researchers to quickly and comprehensively acquire disease genes, so as to understand the mechanism of disease, we developed this program to acquire disease-related genes. The data is integrated from three public databases. The three databases are eDGAR
', DrugBank
and MalaCards
'. The eDGAR
is a comprehensive database, containing data on the relationship between disease and genes. DrugBank
contains information on 13443 drugs and 5157 targets. MalaCards
integrates human disease information, including disease-related genes.
Automatic model selection for structural time series decomposition into trend, cycle, and seasonal components, plus optionality for structural interpolation, using the Kalman filter. Koopman, Siem Jan and Marius Ooms (2012) "Forecasting Economic Time Series Using Unobserved Components Time Series Models" <doi:10.1093/oxfordhb/9780195398649.013.0006>. Kim, Chang-Jin and Charles R. Nelson (1999) "State-Space Models with Regime Switching: Classical and Gibbs-Sampling Approaches with Applications" <doi:10.7551/mitpress/6444.001.0001><http://econ.korea.ac.kr/~cjkim/>.
Provide early termination phase II trial designs with a decreasingly informative prior (DIP) or a regular Bayesian prior chosen by the user. The program can determine the minimum planned sample size necessary to achieve the user-specified admissible designs. The program can also perform power and expected sample size calculations for the tests in early termination Phase II trials. See Wang C and Sabo RT (2022) <doi:10.18203/2349-3259.ijct20221110>; Sabo RT (2014) <doi:10.1080/10543406.2014.888441>.
BEAST2 (<https://www.beast2.org>) is a widely used Bayesian phylogenetic tool, that uses DNA/RNA/protein data and many model priors to create a posterior of jointly estimated phylogenies and parameters. BEAUti 2 (which is part of BEAST2') is a GUI tool that allows users to specify the many possible setups and generates the XML file BEAST2 needs to run. This package provides a way to create BEAST2 input files without active user input, but using R function calls instead.
This package provides functions to perform the following analyses: i) inferring epistasis from RNAi double knockdown data; ii) identifying gene pairs of multiple mutation patterns; iii) assessing association between gene pairs and survival; and iv) calculating the smallworldness of a graph (e.g., a gene interaction network). Data and analyses are described in Wang, X., Fu, A. Q., McNerney
, M. and White, K. P. (2014). Widespread genetic epistasis among breast cancer genes. Nature Communications. 5 4828. <doi:10.1038/ncomms5828>.
This package provides various functions for reading and preparing the Panel Study of Income Dynamics (PSID) for longitudinal analysis, including functions that read the PSID's fixed width format files directly into R, rename all of the PSID's longitudinal variables so that recurring variables have consistent names across years, simplify assembling longitudinal datasets from cross sections of the PSID Family Files, and export the resulting PSID files into file formats common among other statistical programming languages ('SAS', STATA', and SPSS').
The goal of this collection of functions is to provide an easy to use tool for the measurement of foraminifera and other unicellulars organisms size. With functions developed to guide foraminiferal test biovolume calculations and cell biomass estimations. The volume function includes several microalgae models geometric adaptations based on Hillebrand et al. (1999) <doi:10.1046/j.1529-8817.1999.3520403.x>, Sun and Liu (2003) <doi:10.1093/plankt/fbg096> and Vadrucci, Cabrini and Basset (2007) <doi:10.1285/i1825229Xv1n2p83>.
This R package provides a calculation of between-cases AUC estimate, corresponding covariance, and variance estimate in the nested data problem. Also, the package has the function to simulate the nested data. The calculated between-cases AUC estimate is used to evaluate the reader's diagnostic performance in clinical tasks with nested data. For more details on the above methods, please refer to the paper by H Du, S Wen, Y Guo, F Jin, BD Gallas (2022) <doi:10.1177/09622802221111539>.
This package provides tools for the analysis of land use and cover (LUC) time series. It includes support for loading spatiotemporal raster data and synthesized spatial plotting. Several LUC change (LUCC) metrics in regular or irregular time intervals can be extracted and visualized through one- and multistep sankey and chord diagrams. A complete intensity analysis according to Aldwaik and Pontius (2012) <doi:10.1016/j.landurbplan.2012.02.010> is implemented, including tools for the generation of standardized multilevel output graphics.
Creation of linkage maps in polyploid species from marker dosage scores of an F1 cross from two heterozygous parents. Currently works for outcrossing diploid, autotriploid, autotetraploid and autohexaploid species, as well as segmental allotetraploids. Methods are described in a manuscript of Bourke et al. (2018) <doi:10.1093/bioinformatics/bty371>. Since version 1.1.0, both discrete and probabilistic genotypes are acceptable input; for more details on the latter see Liao et al. (2021) <doi:10.1007/s00122-021-03834-x>.
This package contains functions to fit proportional hazards (PH) model to partly interval-censored (PIC) data (Pan et al. (2020) <doi:10.1177/0962280220921552>), PH model with spatial frailty to spatially dependent PIC data (Pan and Cai (2021) <doi:10.1080/03610918.2020.1839497>), and mixed effects PH model to clustered PIC data. Each random intercept/random effect can follow both a normal prior and a Dirichlet process mixture prior. It also includes the corresponding functions for general interval-censored data.
Various functions for discrete time survival analysis and longitudinal analysis. SIMEX method for correcting for bias for errors-in-variables in a mixed effects model. Asymptotic mean and variance of different proportional hazards test statistics using different ties methods given two survival curves and censoring distributions. Score test and Wald test for regression analysis of grouped survival data. Calculation of survival curves for events defined by the response variable in a mixed effects model crossing a threshold with or without confirmation.
The tcplfit2 R package performs basic concentration-response curve fitting. The original tcplFit()
function in the tcpl R package performed basic concentration-response curvefitting to 3 models. With tcplfit2, the core tcpl concentration-response functionality has been expanded to process diverse high-throughput screen (HTS) data generated at the US Environmental Protection Agency, including targeted ToxCast
, high-throughput transcriptomics (HTTr) and high-throughput phenotypic profiling (HTPP). tcplfit2 can be used independently to support analysis for diverse chemical screening efforts.