Fast and flexible Kalman filtering and smoothing implementation utilizing sequential processing, designed for efficient parameter estimation through maximum likelihood estimation. Sequential processing is a univariate treatment of a multivariate series of observations and can benefit from computational efficiency over traditional Kalman filtering when independence is assumed in the variance of the disturbances of the measurement equation. Sequential processing is described in the textbook of Durbin and Koopman (2001, ISBN:978-0-19-964117-8). FKF.SP was built upon the existing FKF package and is, in general, a faster Kalman filter/smoother.
This package provides statistical methods to check if a parametric family of conditional density functions fits to some given dataset of covariates and response variables. Different test statistics can be used to determine the goodness-of-fit of the assumed model, see Andrews (1997) <doi:10.2307/2171880>, Bierens & Wang (2012) <doi:10.1017/S0266466611000168>, Dikta & Scheer (2021) <doi:10.1007/978-3-030-73480-0> and Kremling & Dikta (2024) <doi:10.48550/arXiv.2409.20262>. As proposed in these papers, the corresponding p-values are approximated using a parametric bootstrap method.
This package implements the Generalized Method of Wavelet Moments with Exogenous Inputs estimator (GMWMX) presented in Voirol, L., Xu, H., Zhang, Y., Insolia, L., Molinari, R. and Guerrier, S. (2024) <doi:10.48550/arXiv.2409.05160>. The GMWMX estimator allows to estimate functional and stochastic parameters of linear models with correlated residuals in presence of missing data. The gmwmx2 package provides functions to load and plot Global Navigation Satellite System (GNSS) data from the Nevada Geodetic Laboratory and functions to estimate linear model model with correlated residuals in presence of missing data.
Optimized for handling complex datasets in environmental and ecological research, this package offers functionality that is not fully met by general-purpose packages. It provides two key functions, summarize_data()', which summarizes datasets, and plot_means()', which creates plots with error bars. The plot_means() function incorporates error bars by default, allowing quick visualization of uncertainties, crucial in ecological studies. It also streamlines workflows for grouped datasets (e.g., by species or treatment), making it particularly user-friendly and reducing the complexity and time required for data summarization and visualization.
R interface to PRIMME <https://www.cs.wm.edu/~andreas/software/>, a C library for computing a few eigenvalues and their corresponding eigenvectors of a real symmetric or complex Hermitian matrix, or generalized Hermitian eigenproblem. It can also compute singular values and vectors of a square or rectangular matrix. PRIMME finds largest, smallest, or interior singular/eigenvalues and can use preconditioning to accelerate convergence. General description of the methods are provided in the papers Stathopoulos (2010, <doi:10.1145/1731022.1731031>) and Wu (2017, <doi:10.1137/16M1082214>). See citation("PRIMME") for details.
This package provides functions for sample size estimation and simulation in clinical trials. Includes methods for selecting the best group using the Indifference-zone approach, as well as designs for non-inferiority, equivalence, and negative binomial models. For the sample size calculation for non-inferiority of vaccines, the approach is based on Fleming, Powers, and Huang (2021) <doi:10.1177/1740774520988244>. The Indifference-zone approach is based on Sobel and Huyett (1957) <doi:10.1002/j.1538-7305.1957.tb02411.x> and Bechhofer, Santner, and Goldsman (1995, ISBN:978-0-471-57427-9).
The R implementation of TIGER. TIGER integrates random forest algorithm into an innovative ensemble learning architecture. Benefiting from this advanced architecture, TIGER is resilient to outliers, free from model tuning and less likely to be affected by specific hyperparameters. TIGER supports targeted and untargeted metabolomics data and is competent to perform both intra- and inter-batch technical variation removal. TIGER can also be used for cross-kit adjustment to ensure data obtained from different analytical assays can be effectively combined and compared. Reference: Han S. et al. (2022) <doi:10.1093/bib/bbab535>.
This package provides classes and functions for quality control, filtering, normalization and differential expression analysis of pre-processed `RNA-seq` data. Data can be imported from `SummarizedExperiment` as well as `matrix` objects and can be annotated from `BioMart`. Filtering for genes without too low expression or containing required annotations, as well as filtering for samples with sufficient correlation to other samples or total number of reads is supported. The standard normalization methods including cpm, rpkm and tpm can be used, and DESeq2` as well as voom differential expression analyses are available.
This software is meant to be used for classification of images of cell-based assays for neuronal surface autoantibody detection or similar techniques. It takes imaging files as input and creates a composite score from these, that for example can be used to classify samples as negative or positive for a certain antibody-specificity. The reason for its name is that I during its creation have thought about the individual picture as an archielago where we with different filters control the water level as well as ground characteristica, thereby finding islands of interest.
The package alpine helps to model bias parameters and then using those parameters to estimate RNA-seq transcript abundance. Alpine is a package for estimating and visualizing many forms of sample-specific biases that can arise in RNA-seq, including fragment length distribution, positional bias on the transcript, read start bias (random hexamer priming), and fragment GC-content (amplification). It also offers bias-corrected estimates of transcript abundance in FPKM(Fragments Per Kilobase of transcript per Million mapped reads). It is currently designed for un-stranded paired-end RNA-seq data.
This package provides a collection of sampling formulas for the unified neutral model of biogeography and biodiversity. Alongside the sampling formulas, it includes methods to perform maximum likelihood optimization of the sampling formulas, methods to generate data given the neutral model, and methods to estimate the expected species abundance distribution. Sampling formulas included in the GUILDS package are the Etienne Sampling Formula (Etienne 2005), the guild sampling formula, where guilds are assumed to differ in dispersal ability (Janzen et al. 2015), and the guilds sampling formula conditioned on guild size (Janzen et al. 2015).
This package implements a model-based clustering method for categorical life-course sequences relying on mixtures of exponential-distance models introduced by Murphy et al. (2021) <doi:10.1111/rssa.12712>. A range of flexible precision parameter settings corresponding to weighted generalisations of the Hamming distance metric are considered, along with the potential inclusion of a noise component. Gating covariates can be supplied in order to relate sequences to baseline characteristics and sampling weights are also accommodated. The models are fitted using the EM algorithm and tools for visualising the results are also provided.
Estimation methods for phase-type distribution (PH) and Markovian arrival process (MAP) from empirical data (point and grouped data) and density function. The tool is based on the following researches: Okamura et al. (2009) <doi:10.1109/TNET.2008.2008750>, Okamura and Dohi (2009) <doi:10.1109/QEST.2009.28>, Okamura et al. (2011) <doi:10.1016/j.peva.2011.04.001>, Okamura et al. (2013) <doi:10.1002/asmb.1919>, Horvath and Okamura (2013) <doi:10.1007/978-3-642-40725-3_10>, Okamura and Dohi (2016) <doi:10.15807/jorsj.59.72>.
Automatically fetch, transform and arrange subsets of multidimensional data sets (collections of files) stored in local and/or remote file systems or servers, using multicore capabilities where possible. This tool provides an interface to perceive a collection of data sets as a single large multidimensional data array, and enables the user to request for automatic retrieval, processing and arrangement of subsets of the large array. Wrapper functions to add support for custom file formats can be plugged in/out, making the tool suitable for any research field where large multidimensional data sets are involved.
This package provides a wrapper around Michel Scheffers's libassp (<https://libassp.sourceforge.net/>). The libassp (Advanced Speech Signal Processor) library aims at providing functionality for handling speech signal files in most common audio formats and for performing analyses common in phonetic science/speech science. This includes the calculation of formants, fundamental frequency, root mean square, auto correlation, a variety of spectral analyses, zero crossing rate, filtering etc. This wrapper provides R with a large subset of libassp's signal processing functions and provides them to the user in a (hopefully) user-friendly manner.
This package takes a list of p-values resulting from the simultaneous testing of many hypotheses and estimates their q-values and local false discovery rate (FDR) values. The q-value of a test measures the proportion of false positives incurred when that particular test is called significant. The local FDR measures the posterior probability the null hypothesis is true given the test's p-value. Various plots are automatically generated, allowing one to make sensible significance cut-offs. The software can be applied to problems in genomics, brain imaging, astrophysics, and data mining.
Facilitates the use of machine learning algorithms in classification and regression (including time series forecasting) tasks by presenting a short and coherent set of functions. Versions: 1.5.0 improved mparheuristic function (new hyperparameter heuristics); 1.4.9 / 1.4.8 improved help, several warning and error code fixes (more stable version, all examples run correctly); 1.4.7 - improved Importance function and examples, minor error fixes; 1.4.6 / 1.4.5 / 1.4.4 new automated machine learning (AutoML) and ensembles, via improved fit(), mining() and mparheuristic() functions, and new categorical preprocessing, via improved delevels() function; 1.4.3 new metrics (e.g., macro precision, explained variance), new "lssvm" model and improved mparheuristic() function; 1.4.2 new "NMAE" metric, "xgboost" and "cv.glmnet" models (16 classification and 18 regression models); 1.4.1 new tutorial and more robust version; 1.4 - new classification and regression models, with a total of 14 classification and 15 regression methods, including: Decision Trees, Neural Networks, Support Vector Machines, Random Forests, Bagging and Boosting; 1.3 and 1.3.1 - new classification and regression metrics; 1.2 - new input importance methods via improved Importance() function; 1.0 - first version.
This package implements bridge models for nowcasting and forecasting macroeconomic variables by linking high-frequency indicator variables (e.g., monthly data) to low-frequency target variables (e.g., quarterly GDP). Simplifies forecasting and aggregating indicator variables to match the target frequency, enabling timely predictions ahead of official data releases. For more on bridge models, see Baffigi, A., Golinelli, R., & Parigi, G. (2004) <doi:10.1016/S0169-2070(03)00067-0>, Burri (2023) <https://www5.unine.ch/RePEc/ftp/irn/pdfs/WP23-02.pdf> or Schumacher (2016) <doi:10.1016/j.ijforecast.2015.07.004>.
Maleknia et al. (2020) <doi:10.1101/2020.01.13.905448>. A novel pathway enrichment analysis package based on Bayesian network to investigate the topology features of the pathways. firstly, 187 kyoto encyclopedia of genes and genomes (KEGG) human non-metabolic pathways which their cycles were eliminated by biological approach, enter in analysis as Bayesian network structures. The constructed Bayesian network were optimized by the Least Absolute Shrinkage Selector Operator (lasso) and the parameters were learned based on gene expression data. Finally, the impacted pathways were enriched by Fisherâ s Exact Test on significant parameters.
Gibbs sampling for Bayesian spatial blind source separation (BSP-BSS). BSP-BSS is designed for spatially dependent signals in high dimensional and large-scale data, such as neuroimaging. The method assumes the expectation of the observed images as a linear mixture of multiple sparse and piece-wise smooth latent source signals, and constructs a Bayesian nonparametric prior by thresholding Gaussian processes. Details can be found in our paper: Wu, B., Guo, Y., & Kang, J. (2024). Bayesian spatial blind source separation via the thresholded gaussian process. Journal of the American Statistical Association, 119(545), 422-433.
Although many software tools can perform meta-analyses on genetic case-control data, none of these apply to combined case-control and family-based (TDT) studies. This package conducts fixed-effects (with inverse variance weighting) and random-effects [DerSimonian and Laird (1986) <DOI:10.1016/0197-2456(86)90046-2>] meta-analyses on combined genetic data. Specifically, this package implements a fixed-effects model [Kazeem and Farrall (2005) <DOI:10.1046/j.1529-8817.2005.00156.x>] and a random-effects model [Nicodemus (2008) <DOI:10.1186/1471-2105-9-130>] for combined studies.
This package provides a multiple testing procedure aims to find the rare-variant association regions. When variants are rare, the single variant association test approach suffers from low power. To improve testing power, the procedure dynamically and hierarchically aggregates smaller genome regions to larger ones and performs multiple testing for disease associations with a controlled node-level false discovery rate. This method are members of the family of ancillary information assisted recursive testing introduced in Pura, Li, Chan and Xie (2021) <arXiv:1906.07757v2> and Li, Sung and Xie (2021) <arXiv:2103.11085v2>.
Calculates 15 different goodness of fit criteria. These are; standard deviation ratio (SDR), coefficient of variation (CV), relative root mean square error (RRMSE), Pearson's correlation coefficients (PC), root mean square error (RMSE), performance index (PI), mean error (ME), global relative approximation error (RAE), mean relative approximation error (MRAE), mean absolute percentage error (MAPE), mean absolute deviation (MAD), coefficient of determination (R-squared), adjusted coefficient of determination (adjusted R-squared), Akaike's information criterion (AIC), corrected Akaike's information criterion (CAIC), Mean Square Error (MSE), Bayesian Information Criterion (BIC) and Normalized Mean Square Error (NMSE).
Functions, data sets and shiny apps for "Epidemics: Models and Data in R" by Ottar N. Bjornstad (ISBN 978-3-319-97487-3) <https://www.springer.com/gp/book/9783319974866>. The package contains functions to study the S(E)IR model, spatial and age-structured SIR models; time-series SIR and chain-binomial stochastic models; catalytic disease models; coupled map lattice models of spatial transmission and network models for social spread of infection. The package is also an advanced quantitative companion to the coursera Epidemics Massive Online Open Course <https://www.coursera.org/learn/epidemics>.