This package provides a multivariate Gaussian mixture model framework to integrate multiple types of genomic data and allow modeling of inter-data-type correlations for association analysis. IMIX can be implemented to test whether a disease is associated with genes in multiple genomic data types, such as DNA methylation, copy number variation, gene expression, etc. It can also study the integration of multiple pathways. IMIX uses the summary statistics of association test outputs and conduct integration analysis for two or three types of genomics data. IMIX features statistically-principled model selection, global FDR control and computational efficiency. Details are described in Ziqiao Wang and Peng Wei (2020) <doi:10.1093/bioinformatics/btaa1001>.
The Minorize-Maximization(MM) algorithm based on Assembly-Decomposition(AD) technology can be used for model estimation of parametric models, semi-parametric models and non-parametric models. We selected parametric models including left truncated normal distribution, type I multivariate zero-inflated generalized poisson distribution and multivariate compound zero-inflated generalized poisson distribution; semiparametric models include Cox model and gamma frailty model; nonparametric model is estimated for type II interval-censored data. These general methods are proposed based on the following papers, Tian, Huang and Xu (2019) <doi:10.5705/SS.202016.0488>, Huang, Xu and Tian (2019) <doi:10.5705/ss.202016.0516>, Zhang and Huang (2022) <doi:10.1117/12.2642737>.
Bayesian network learning using the PCHC, FEDHC, MMHC and variants of these algorithms. PCHC stands for PC Hill-Climbing, a new hybrid algorithm that uses PC to construct the skeleton of the BN and then applies the Hill-Climbing greedy search. More algorithms and variants have been added, such as MMHC, FEDHC, and the Tabu search variants, PCTABU, MMTABU and FEDTABU. The relevant papers are: a) Tsagris M. (2021). "A new scalable Bayesian network learning algorithm with applications to economics". Computational Economics, 57(1): 341-367. <doi:10.1007/s10614-020-10065-7>. b) Tsagris M. (2022). "The FEDHC Bayesian Network Learning Algorithm". Mathematics 2022, 10(15): 2604. <doi:10.3390/math10152604>.
The two-parameter Xgamma and Poisson Xgamma distributions are analyzed, covering standard distribution and regression functions, maximum likelihood estimation, quantile functions, probability density and mass functions, cumulative distribution functions, and random number generation. References include: "Sen, S., Chandra, N. and Maiti, S. S. (2018). On properties and applications of a two-parameter XGamma distribution. Journal of Statistical Theory and Applications, 17(4): 674--685. <doi:10.2991/jsta.2018.17.4.9>." "Wani, M. A., Ahmad, P. B., Para, B. A. and Elah, N. (2023). A new regression model for count data with applications to health care data. International Journal of Data Science and Analytics. <doi:10.1007/s41060-023-00453-1>.".
The CPSM package provides a comprehensive computational pipeline for predicting the survival probability of cancer patients. It offers a series of steps including data processing, splitting data into training and test subsets, and normalization of data. The package enables the selection of significant features based on univariate survival analysis and generates a LASSO prognostic index score. It supports the development of predictive models for survival probability using various features and provides visualization tools to draw survival curves based on predicted survival probabilities. Additionally, SPM includes functionalities for generating bar plots that depict the predicted mean and median survival times of patients, making it a versatile tool for survival analysis in cancer research.
tLOH
, or transcriptomicsLOH
, assesses evidence for loss of heterozygosity (LOH) in pre-processed spatial transcriptomics data. This tool requires spatial transcriptomics cluster and allele count information at likely heterozygous single-nucleotide polymorphism (SNP) positions in VCF format. Bayes factors are calculated at each SNP to determine likelihood of potential loss of heterozygosity event. Two plotting functions are included to visualize allele fraction and aggregated Bayes factor per chromosome. Data generated with the 10X Genomics Visium Spatial Gene Expression platform must be pre-processed to obtain an individual sample VCF with columns for each cluster. Required fields are allele depth (AD) with counts for reference/alternative alleles and read depth (DP).
Learning and inference over dynamic Bayesian networks of arbitrary Markovian order. Extends some of the functionality offered by the bnlearn package to learn the networks from data and perform exact inference. It offers three structure learning algorithms for dynamic Bayesian networks: Trabelsi G. (2013) <doi:10.1007/978-3-642-41398-8_34>, Santos F.P. and Maciel C.D. (2014) <doi:10.1109/BRC.2014.6880957>, Quesada D., Bielza C. and Larrañaga P. (2021) <doi:10.1007/978-3-030-86271-8_14>. It also offers the possibility to perform forecasts of arbitrary length. A tool for visualizing the structure of the net is also provided via the visNetwork
package.
Import data of tests and questionnaires from FormScanner
. FormScanner
is an open source software that converts scanned images to data using optical mark recognition (OMR) and it can be downloaded from <http://sourceforge.net/projects/formscanner/>. The spreadsheet file created by FormScanner
is imported in a convenient format to perform the analyses provided by the package. These analyses include the conversion of multiple responses to binary (correct/incorrect) data, the computation of the number of corrected responses for each subject or item, scoring using weights,the computation and the graphical representation of the frequencies of the responses to each item and the report of the responses of a few subjects.
Kernel Fisher Discriminant Analysis (KFDA) is performed using Kernel Principal Component Analysis (KPCA) and Fisher Discriminant Analysis (FDA). There are some similar packages. First, lfda is a package that performs Local Fisher Discriminant Analysis (LFDA) and performs other functions. In particular, lfda seems to be impossible to test because it needs the label information of the data in the function argument. Also, the ks package has a limited dimension, which makes it difficult to analyze properly. This package is a simple and practical package for KFDA based on the paper of Yang, J., Jin, Z., Yang, J. Y., Zhang, D., and Frangi, A. F. (2004) <DOI:10.1016/j.patcog.2003.10.015>.
Solves kernel ridge regression, within the the mixed model framework, for the linear, polynomial, Gaussian, Laplacian and ANOVA kernels. The model components (i.e. fixed and random effects) and variance parameters are estimated using the expectation-maximization (EM) algorithm. All the estimated components and parameters, e.g. BLUP of dual variables and BLUP of random predictor effects for the linear kernel (also known as RR-BLUP), are available. The kernel ridge mixed model (KRMM) is described in Jacquin L, Cao T-V and Ahmadi N (2016) A Unified and Comprehensible View of Parametric and Kernel Methods for Genomic Prediction with Application to Rice. Front. Genet. 7:145. <doi:10.3389/fgene.2016.00145>.
Maximum likelihood estimation for stochastic frontier analysis (SFA) of production (profit) and cost functions. The package includes the basic stochastic frontier for cross-sectional or pooled data with several distributions for the one-sided error term (i.e., Rayleigh, gamma, Weibull, lognormal, uniform, generalized exponential and truncated skewed Laplace), the latent class stochastic frontier model (LCM) as described in Dakpo et al. (2021) <doi:10.1111/1477-9552.12422>, for cross-sectional and pooled data, and the sample selection model as described in Greene (2010) <doi:10.1007/s11123-009-0159-1>, and applied in Dakpo et al. (2021) <doi:10.1111/agec.12683>. Several possibilities in terms of optimization algorithms are proposed.
The Delta-Delta-Ct (ddCt
) Algorithm is an approximation method to determine relative gene expression with quantitative real-time PCR (qRT-PCR
) experiments. Compared to other approaches, it requires no standard curve for each primer-target pair, therefore reducing the working load and yet returning accurate enough results as long as the assumptions of the amplification efficiency hold. The ddCt
package implements a pipeline to collect, analyse and visualize qRT-PCR
results, for example those from TaqMan
SDM software, mainly using the ddCt
method. The pipeline can be either invoked by a script in command-line or through the API consisting of S4-Classes, methods and functions.
This is a package for the analysis of discrete response data using unidimensional and multidimensional item analysis models under the Item Response Theory paradigm (Chalmers (2012) <doi:10.18637/jss.v048.i06>). Exploratory and confirmatory item factor analysis models are estimated with quadrature (EM) or stochastic (MHRM) methods. Confirmatory bi-factor and two-tier models are available for modeling item testlets using dimension reduction EM algorithms, while multiple group analyses and mixed effects designs are included for detecting differential item, bundle, and test functioning, and for modeling item and person covariates. Finally, latent class models such as the DINA, DINO, multidimensional latent class, mixture IRT models, and zero-inflated response models are supported.
This package performs approximate GP regression for large computer experiments and spatial datasets. The approximation is based on finding small local designs for prediction (independently) at particular inputs. OpenMP
and SNOW parallelization are supported for prediction over a vast out-of-sample testing set; GPU acceleration is also supported for an important subroutine. OpenMP
and GPU features may require special compilation. An interface to lower-level (full) GP inference and prediction is provided. Wrapper routines for blackbox optimization under mixed equality and inequality constraints via an augmented Lagrangian scheme, and for large scale computer model calibration, are also provided. For details and tutorial, see Gramacy (2016 <doi:10.18637/jss.v072.i01>.
Several functions can be used to analyze multiblock multivariable data. If the input is a single matrix, then principal components analysis (PCA) is implemented. If the input is a list of matrices, then multiblock PCA is implemented. If the input is two matrices, for exploratory and objective variables, then partial least squares (PLS) analysis is implemented. If the input is two lists of matrices, for exploratory and objective variables, then multiblock PLS analysis is implemented. Additionally, if an extra outcome variable is specified, then a supervised version of the methods above is implemented. For each method, sparse modeling is also incorporated. Functions for selecting the number of components and regularized parameters are also provided.
An implementation of the feature Selection procedure by Partitioning the entire Solution Paths (namely SPSP) to identify the relevant features rather than using a single tuning parameter. By utilizing the entire solution paths, this procedure can obtain better selection accuracy than the commonly used approach of selecting only one tuning parameter based on existing criteria, cross-validation (CV), generalized CV, AIC, BIC, and extended BIC (Liu, Y., & Wang, P. (2018) <doi:10.1214/18-EJS1434>). It is more stable and accurate (low false positive and false negative rates) than other variable selection approaches. In addition, it can be flexibly coupled with the solution paths of Lasso, adaptive Lasso, ridge regression, and other penalized estimators.
In ecology, spatial data is often represented using polygons. These polygons can represent a variety of spatial entities, such as ecological patches, animal home ranges, or gaps in the forest canopy. Researchers often need to determine if two spatial processes, represented by these polygons, are independent of each other. For instance, they might want to test if the home range of a particular animal species is influenced by the presence of a certain type of vegetation. To address this, Godoy et al. (2022) (<doi:10.1016/j.spasta.2022.100695>) developed conditional Monte Carlo tests. These tests are designed to assess spatial independence while taking into account the shape and size of the polygons.
The Central Bank of the Republic of Turkey (CBRT) provides one of the most comprehensive time series databases on the Turkish economy. The CBRT package provides functions for accessing the CBRT's electronic data delivery system <https://evds2.tcmb.gov.tr/>. It contains the lists of all data categories and data groups for searching the available variables (data series). As of November 3, 2024, there were 40,826 variables in the dataset. The lists of data categories and data groups can be updated by the user at any time. A specific variable, a group of variables, or all variables in a data group can be downloaded at different frequencies using a variety of aggregation methods.
The POMA package offers a comprehensive toolkit designed for omics data analysis, streamlining the process from initial visualization to final statistical analysis. Its primary goal is to simplify and unify the various steps involved in omics data processing, making it more accessible and manageable within a single, intuitive R package. Emphasizing on reproducibility and user-friendliness, POMA leverages the standardized SummarizedExperiment
class from Bioconductor, ensuring seamless integration and compatibility with a wide array of Bioconductor tools. This approach guarantees maximum flexibility and replicability, making POMA an essential asset for researchers handling omics datasets. See https://github.com/pcastellanoescuder/POMAShiny. Paper: Castellano-Escuder et al. (2021) <doi:10.1371/journal.pcbi.1009148> for more details.
This package provides tools to evaluate the value of using a risk prediction instrument to decide treatment or intervention (versus no treatment or intervention). Given one or more risk prediction instruments (risk models) that estimate the probability of a binary outcome, rmda provides functions to estimate and display decision curves and other figures that help assess the population impact of using a risk model for clinical decision making. Here, "population" refers to the relevant patient population. Decision curves display estimates of the (standardized) net benefit over a range of probability thresholds used to categorize observations as high risk'. The curves help evaluate a treatment policy that recommends treatment for patients who are estimated to be high risk by comparing the population impact of a risk-based policy to "treat all" and "treat none" intervention policies. Curves can be estimated using data from a prospective cohort. In addition, rmda can estimate decision curves using data from a case-control study if an estimate of the population outcome prevalence is available. Version 1.4 of the package provides an alternative framing of the decision problem for situations where treatment is the standard-of-care and a risk model might be used to recommend that low-risk patients (i.e., patients below some risk threshold) opt out of treatment. Confidence intervals calculated using the bootstrap can be computed and displayed. A wrapper function to calculate cross-validated curves using k-fold cross-validation is also provided.
The method proposed in this package takes into account the impact of dependence on the multiple testing procedures for high-throughput data as proposed by Friguet et al. (2009). The common information shared by all the variables is modeled by a factor analysis structure. The number of factors considered in the model is chosen to reduce the false discoveries variance in multiple tests. The model parameters are estimated thanks to an EM algorithm. Adjusted tests statistics are derived, as well as the associated p-values. The proportion of true null hypotheses (an important parameter when controlling the false discovery rate) is also estimated from the FAMT model. Graphics are proposed to interpret and describe the factors.
Find the permutation symmetry group such that the covariance matrix of the given data is approximately invariant under it. Discovering such a permutation decreases the number of observations needed to fit a Gaussian model, which is of great use when it is smaller than the number of variables. Even if that is not the case, the covariance matrix found with gips approximates the actual covariance with less statistical error. The methods implemented in this package are described in Graczyk et al. (2022) <doi:10.1214/22-AOS2174>. Documentation about gips is provided via its website at <https://przechoj.github.io/gips/> and the paper by Chojecki, Morgen, KoÅ odziejek (2025, <doi:10.18637/jss.v112.i07>).
This package provides a collection of functions is provided by this package to fit quantiles regression models for censored residual lifetimes. It provides various options for regression parameters estimation: the induced smoothing approach (smooth), and L1-minimization (non-smooth). It also implements the estimation methods for standard errors of the regression parameters estimates based on an efficient partial multiplier bootstrap method and robust sandwich estimator. Furthermore, a simultaneous procedure of estimating regression parameters and their standard errors via an iterative updating procedure is implemented (iterative). For more details, see Kim, K. H., Caplan, D. J., & Kang, S. (2022), "Smoothed quantile regression for censored residual life", Computational Statistics, 1-22 <doi:10.1007/s00180-022-01262-z>.
Find multiple solutions of a nonlinear least squares problem. Cluster Gauss-Newton method does not assume uniqueness of the solution of the nonlinear least squares problem and compute multiple minimizers. Please cite the following paper when this software is used in your research: Aoki et al. (2020) <doi:10.1007/s11081-020-09571-2>. Cluster Gaussâ Newton method. Optimization and Engineering, 1-31. Please cite the following paper when profile likelihood plot is drawn with this software and used in your research: Aoki and Sugiyama (2024) <doi:10.1002/psp4.13055>. Cluster Gauss-Newton method for a quick approximation of profile likelihood: With application to physiologically-based pharmacokinetic models. CPT Pharmacometrics Syst Pharmacol.13(1):54-67.