The real-life time series data are hardly pure linear or nonlinear. Merging a linear time series model like the autoregressive moving average (ARMA) model with a nonlinear neural network model such as the Long Short-Term Memory (LSTM) model can be used as a hybrid model for more accurate modeling purposes. Both the autoregressive integrated moving average (ARIMA) and autoregressive fractionally integrated moving average (ARFIMA) models can be implemented. Details can be found in Box et al. (2015, ISBN: 978-1-118-67502-1) and Hochreiter and Schmidhuber (1997) <doi:10.1162/neco.1997.9.8.1735>.
Computes the center of gravity (COG) of character-like binary images using three different methods. This package provides functions for estimating stroke-based, contour-based, and potential energy-based COG. It is useful for analyzing glyph structure in areas such as visual cognition research and font development. The contour-based method was originally proposed by Kotani et al. (2004) <https://ipsj.ixsq.nii.ac.jp/records/36793> and Kotani (2011) <https://shonan-it.repo.nii.ac.jp/records/2000243>, while the potential energy-based method was introduced by Kotani et al. (2006) <doi:10.11371/iieej.35.296>.
Generates simulated data representing the LOX drop testing process (also known as impact testing). A simulated process allows for accelerated study of test behavior. Functions are provided to simulate trials, test series, and groups of test series. Functions for creating plots specific to this process are also included. Test attributes and criteria can be set arbitrarily. This work is not endorsed by or affiliated with NASA. See "ASTM G86-17, Standard Test Method for Determining Ignition Sensitivity of Materials to Mechanical Impact in Ambient Liquid Oxygen and Pressurized Liquid and Gaseous Oxygen Environments" <doi:10.1520/G0086-17>.
This package provides a small subset of plots throughout the U.S. are sampled and assessed "on-the-ground" as forested or non-forested by the U.S. Department of Agriculture, Forest Service, Forest Inventory and Analysis (FIA) Program, but the FIA also has access to remotely sensed data for all land in the country. The forested package contains data frames intended for use in predictive modeling applications where the more easily-accessible remotely sensed data can be used to predict whether a plot is forested or non-forested. Currently, the package provides data for Washington and Georgia.
An implementation of SPRE (standardised predicted random-effects) statistics in R to explore heterogeneity in genetic association meta- analyses, as described by Magosi et al. (2019) <doi:10.1093/bioinformatics/btz590>. SPRE statistics are precision weighted residuals that indicate the direction and extent with which individual study-effects in a meta-analysis deviate from the average genetic effect. Overly influential positive outliers have the potential to inflate average genetic effects in a meta-analysis whilst negative outliers might lower or change the direction of effect. See the getspres website for documentation and examples <https://magosil86.github.io/getspres/>.
Generalizes application of gray-level co-occurrence matrix (GLCM) metrics to objects outside of images. The current focus is to apply GLCM metrics to the study of biological networks and fitness landscapes that are used in studying evolutionary medicine and biology, particularly the evolution of cancer resistance. The package was developed as part of the author's publication in Physics in Medicine and Biology Barker-Clarke et al. (2023) <doi:10.1088/1361-6560/ace305>. A general reference to learn more about mathematical oncology can be found at Rockne et al. (2019) <doi:10.1088/1478-3975/ab1a09>.
Fits the joint model proposed by Henderson and colleagues (2000) <doi:10.1093/biostatistics/1.4.465>, but extended to the case of multiple continuous longitudinal measures. The time-to-event data is modelled using a Cox proportional hazards regression model with time-varying covariates. The multiple longitudinal outcomes are modelled using a multivariate version of the Laird and Ware linear mixed model. The association is captured by a multivariate latent Gaussian process. The model is estimated using a Monte Carlo Expectation Maximization algorithm. This project was funded by the Medical Research Council (Grant number MR/M013227/1).
The Cauchy distribution is a special case of the t distribution when the degrees of freedom are equal to 1. The functions are related to the multivariate Cauchy distribution and include simulation, computation of the density, maximum likelihood estimation, contour plot of the bivariate Cauchy distribution, and discriminant analysis. References include: Nadarajah S. and Kotz S. (2008). "Estimation methods for the multivariate t distribution". Acta Applicandae Mathematicae, 102(1): 99--118. <doi:10.1007/s10440-008-9212-8>, and Kanti V. Mardia, John T. Kent and John M. Bibby (1979). "Multivariate analysis", ISBN:978-0124712522. Academic Press, London.
This package provides functions for the design process of survey sampling, with specific tools for multi-wave and multi-phase designs. Perform optimum allocation using Neyman (1934) <doi:10.2307/2342192> or Wright (2012) <doi:10.1080/00031305.2012.733679> allocation, split strata based on quantiles or values of known variables, randomly select samples from strata, allocate sampling waves iteratively, and organize a complex survey design. Also includes a Shiny application for observing the effects of different strata splits. A paper on this package was published in the Journal of Statistical Software <doi:10.18637/jss.v114.i10>.
This package provides tools for exploratory process data analysis. Process data refers to the data describing participants problem-solving processes in computer-based assessments. It is often recorded in computer log files. This package provides functions to read, process, and write process data. It also implements two feature extraction methods to compress the information stored in process data into standard numerical vectors. This package also provides recurrent neural network based models that relate response processes with other binary or scale variables of interest. The functions that involve training and evaluating neural networks are wrappers of functions in keras'.
This package provides a set of concise and efficient tools for statistical production. Can also be used for data management. In statistical production, you deal with complex data and need to control your process at each step of your work. Concise functions are very helpful, because you do not hesitate to use them. The following functions are included in the package. dup checks duplicates. miss checks missing values. tac computes contingency table of all columns. toc compares two tables, spotting significant deviations. chi2_find compares columns within a data.frame, spotting related categories of (a more complex function).
This takes spatial single-cell-type RNA-seq data (specifically designed for Slide-seq v2) that calls copy number alterations (CNAs) using pseudo-spatial binning, clusters cellular units (e.g. beads) based on CNA profile, and visualizes spatial CNA patterns. Documentation about SlideCNA is included in the the pre-print by Zhang et al. (2022, <doi:10.1101/2022.11.25.517982>). The package enrichR (>= 3.0), conditionally used to annotate SlideCNA-determined clusters with gene ontology terms, can be installed at <https://github.com/wjawaid/enrichR> or with install_github("wjawaid/enrichR").
MiDAS is a R package for immunogenetics data transformation and statistical analysis. MiDAS accepts input data in the form of HLA alleles and KIR types, and can transform it into biologically meaningful variables, enabling HLA amino acid fine mapping, analyses of HLA evolutionary divergence, KIR gene presence, as well as validated HLA-KIR interactions. Further, it allows comprehensive statistical association analysis workflows with phenotypes of diverse measurement scales. MiDAS closes a gap between the inference of immunogenetic variation and its efficient utilization to make relevant discoveries related to T cell, Natural Killer cell, and disease biology.
Estimation of functional spaces based on traits of organisms. The package includes functions to impute missing trait values (with or without considering phylogenetic information), and to create, represent and analyse two dimensional functional spaces based on principal components analysis, other ordination methods, or raw traits. It also allows for mapping a third variable onto the functional space. See Carmona et al. (2021) <doi:10.1038/s41586-021-03871-y>, Puglielli et al. (2021) <doi:10.1111/nph.16952>, Carmona et al. (2021) <doi:10.1126/sciadv.abf2675>, Carmona et al. (2019) <doi:10.1002/ecy.2876> for more information.
Analysis of musical scales (& modes, grooves, etc.) in the vein of Sherrill 2025 <doi:10.1215/00222909-11595194>. The initials MCT in the package title refer to the article's title: "Modal Color Theory." Offers support for conventional musical pitch class set theory as developed by Forte (1973, ISBN: 9780300016109) and David Lewin (1987, ISBN: 9780300034936), as well as for the continuous geometries of Callender, Quinn, & Tymoczko (2008) <doi:10.1126/science.1153021>. Identifies structural properties of scales and calculates derived values (sign vector, color number, brightness ratio, etc.). Creates plots such as "brightness graphs" which visualize these properties.
Analysis of annual average ocean water level time series from long (minimum length 80 years) individual records, providing improved estimates of trend (mean sea level) and associated real-time velocities and accelerations. Improved trend estimates are based on Singular Spectrum Analysis methods. Various gap-filling options are included to accommodate incomplete time series records. The package also contains a forecasting module to consider the implication of user defined quantum of sea level rise between the end of the available historical record and the year 2100. A wide range of screen and pdf plotting options are available in the package.
This package provides a series of numerical methods for extracting parameters of distributions for risks based on knowing the expected value and c-statistics (e.g., from a published report on the performance of a risk prediction model). This package implements the methodology described in Sadatsafavi et al (2024) <doi:10.48550/arXiv.2409.09178>. The core of the package is mcmap(), which takes a pair of (mean, c-statistic) and the distribution type requested. This function provides a generic interface to more customized functions (mcmap_beta(), mcmap_logitnorm(), mcmap_probitnorm()) for specific distributions.
This package provides a minimalist implementation of model stacking by Wolpert (1992) <doi:10.1016/S0893-6080(05)80023-1> for boosted tree models. A classic, two-layer stacking model is implemented, where the first layer generates features using gradient boosting trees, and the second layer employs a logistic regression model that uses these features as inputs. Utilities for training the base models and parameters tuning are provided, allowing users to experiment with different ensemble configurations easily. It aims to provide a simple and efficient way to combine multiple gradient boosting models to improve predictive model performance and robustness.
Simulate genotypes in SNP (single nucleotide polymorphisms) Matrix as random numbers from an uniform distribution, for diploid organisms (coded by 0, 1, 2), Sikorska et al., (2013) <doi:10.1186/1471-2105-14-166>, or half-sib/full-sib SNP matrix from real or simulated parents SNP data, assuming mendelian segregation. Simulate phenotypic traits for real or simulated SNP data, controlled by a specific number of quantitative trait loci and their effects, sampled from a Normal or an Uniform distributions, assuming a pure additive model. This is useful for testing association and genomic prediction models or for educational purposes.
The stochastic (also called on-line) version of the Self-Organising Map (SOM) algorithm is provided. Different versions of the algorithm are implemented, for numeric and relational data and for contingency tables as described, respectively, in Kohonen (2001) <isbn:3-540-67921-9>, Olteanu & Villa-Vialaneix (2005) <doi:10.1016/j.neucom.2013.11.047> and Cottrell et al (2004) <doi:10.1016/j.neunet.2004.07.010>. The package also contains many plotting features (to help the user interpret the results), can handle (and impute) missing values and is delivered with a graphical user interface based on shiny'.
This package provides functions to perform most of the common analysis in genome association studies are implemented. These analyses include descriptive statistics and exploratory analysis of missing values, calculation of Hardy-Weinberg equilibrium, analysis of association based on generalized linear models (either for quantitative or binary traits), and analysis of multiple SNPs (haplotype and epistasis analysis). Permutation test and related tests (sum statistic and truncated product) are also implemented. Max-statistic and genetic risk-allele score exact distributions are also possible to be estimated. The methods are described in Gonzalez JR et al., 2007 <doi: 10.1093/bioinformatics/btm025>.
This package implements methods for inference on potential waning of vaccine efficacy and for estimation of vaccine efficacy at a user-specified time after vaccination based on data from a randomized, double-blind, placebo-controlled vaccine trial in which participants may be unblinded and placebo subjects may be crossed over to the study vaccine. The methods also allow adjustment for possible confounding via inverse probability weighting through specification of models for the trial entry process, unblinding mechanisms, and the probability an unblinded placebo participant accepts study vaccine: Tsiatis, A. A. and Davidian, M. (2022) <doi:10.1111/biom.13509>.
This package provides a spline based scRNA-seq method for identifying differentially variable (DV) genes across two experimental conditions. Spline-DV constructs a 3D spline from 3 key gene statistics: mean expression, coefficient of variance, and dropout rate. This is done for both conditions. The 3D spline provides the “expected” behavior of genes in each condition. The distance of the observed mean, CV and dropout rate of each gene from the expected 3D spline is used to measure variability. As the final step, the spline-DV method compares the variabilities of each condition to identify differentially variable (DV) genes.
Integrates methods for epidemiological analysis, modeling, and visualization, including functions for summary statistics, SIR (Susceptible-Infectious-Recovered) modeling, DALY (Disability-Adjusted Life Years) estimation, age standardization, diagnostic test evaluation, NLP (Natural Language Processing) keyword extraction, clinical trial power analysis, survival analysis, SNP (Single Nucleotide Polymorphism) association, and machine learning methods such as logistic regression, k-means clustering, Random Forest, and Support Vector Machine (SVM). Includes datasets for prevalence estimation, SIR modeling, genomic analysis, clinical trials, DALY, diagnostic tests, and survival analysis. Methods are based on Gelman et al. (2013) <doi:10.1201/b16018> and Wickham et al. (2019, ISBN:9781492052040>.