This package provides tools to help convert credit risk data at two timepoints into traditional credit state migration (aka, "transition") matrices. At a higher level, migrate is intended to help an analyst understand how risk moved in their credit portfolio over a time interval. References to this methodology include: 1. Schuermann, T. (2008) <doi:10.1002/9780470061596.risk0409>. 2. Perederiy, V. (2017) <doi:10.48550/arXiv.1708.00062>
.
This package provides a set of tools for testing networks. It includes functions for univariate and multivariate conditional uniform graph and quadratic assignment procedure testing, and network regression. The package is a complement to Multimodal Political Networks (2021, ISBN:9781108985000), and includes various datasets used in the book. Built on the manynet package, all functions operate with matrices, edge lists, and igraph', network', and tidygraph objects, and on one-mode and two-mode (bipartite) networks.
The Self-Organizing Maps with Built-in Missing Data Imputation. Missing values are imputed and regularly updated during the online Kohonen algorithm. Our method can be used for data visualisation, clustering or imputation of missing data. It is an extension of the online algorithm of the kohonen package. The method is described in the article "Self-Organizing Maps for Exploration of Partially Observed Data and Imputation of Missing Values" by S. Rejeb, C. Duveau, T. Rebafka (2022) <arXiv:2202.07963>
.
When a network is partially observed (here, NAs in the adjacency matrix rather than 1 or 0 due to missing information between node pairs), it is possible to account for the underlying process that generates those NAs. missSBM
', presented in Barbillon, Chiquet and Tabouy (2022) <doi:10.18637/jss.v101.i12>, adjusts the popular stochastic block model from network data sampled under various missing data conditions, as described in Tabouy, Barbillon and Chiquet (2019) <doi:10.1080/01621459.2018.1562934>.
This package creates and runs Bayesian mixing models to analyze biological tracer data (i.e. stable isotopes, fatty acids), which estimate the proportions of source (prey) contributions to a mixture (consumer). MixSIAR
is not one model, but a framework that allows a user to create a mixing model based on their data structure and research questions, via options for fixed/ random effects, source data types, priors, and error terms. MixSIAR
incorporates several years of advances since MixSIR
and SIAR'.
This package is deprecated. Please use redatamx instead. Provides an API to work with Redatam (see <https://redatam.org>) databases in both formats: RXDB (new format) and DICX (old format) and running Redatam programs written in SPC language. It's a wrapper around Redatam core and provides functions to open/close a database (redatam_open()/redatam_close()
), list entities and variables from the database (redatam_entities()
, redatam_variables()
) and execute a SPC program and gets the results as data frames (redatam_query()
, redatam_run()
).
This package provides tools to generate HTML interfaces for adaptive and non-adaptive tests using the shiny package (Chalmers (2016) <doi:10.18637/jss.v071.i05>). Suitable for applying unidimensional and multidimensional computerized adaptive tests (CAT) using item response theory methodology and for creating simple questionnaires forms to collect response data directly in R. Additionally, optimal test designs (e.g., "shadow testing") are supported for tests that contain a large number of item selection constraints. Finally, package contains tools useful for performing Monte Carlo simulations for studying test item banks.
Implementation of a framework for cluster analysis with selection of the final number of clusters and an optional variable selection procedure. The package is designed to integrate the results of multiple imputed datasets while accounting for the uncertainty that the imputations introduce in the final results. In addition, the package can also be used for a cluster analysis of the complete cases of a single dataset. The package also includes specific methods to summarize and plot the results. The methods are described in Basagana et al. (2013) <doi:10.1093/aje/kws289>.
Utilizing model-based clustering (unsupervised) for functional magnetic resonance imaging (fMRI
) data. The developed methods (Chen and Maitra (2023) <doi:10.1002/hbm.26425>) include 2D and 3D clustering analyses (for p-values with voxel locations) and segmentation analyses (for p-values alone) for fMRI
data where p-values indicate significant level of activation responding to stimulate of interesting. The analyses are mainly identifying active voxel/signal associated with normal brain behaviors. Analysis pipelines (R scripts) utilizing this package (see examples in inst/workflow/') is also implemented with high performance techniques.
This package provides a procedure for comparing multivariate samples associated with different groups. It uses principal component analysis to convert multivariate observations into a set of linearly uncorrelated statistical measures, which are then compared using a number of statistical methods. The procedure is independent of the distributional properties of samples and automatically selects features that best explain their differences, avoiding manual selection of specific points or summary statistics. It is appropriate for comparing samples of time series, images, spectrometric measures or similar multivariate observations. This package is described in Fachada et al. (2016) <doi:10.32614/RJ-2016-055>.
The midasml package implements estimation and prediction methods for high-dimensional mixed-frequency (MIDAS) time-series and panel data regression models. The regularized MIDAS models are estimated using orthogonal (e.g. Legendre) polynomials and sparse-group LASSO (sg-LASSO) estimator. For more information on the midasml approach see Babii, Ghysels, and Striaukas (2021, JBES forthcoming) <doi:10.1080/07350015.2021.1899933>. The package is equipped with the fast implementation of the sg-LASSO estimator by means of proximal block coordinate descent. High-dimensional mixed frequency time-series data can also be easily manipulated with functions provided in the package.
Our pipeline, MICSQTL, utilizes scRNA-seq
reference and bulk transcriptomes to estimate cellular composition in the matched bulk proteomes. The expression of genes and proteins at either bulk level or cell type level can be integrated by Angle-based Joint and Individual Variation Explained (AJIVE) framework. Meanwhile, MICSQTL can perform cell-type-specic quantitative trait loci (QTL) mapping to proteins or transcripts based on the input of bulk expression data and the estimated cellular composition per molecule type, without the need for single cell sequencing. We use matched transcriptome-proteome from human brain frontal cortex tissue samples to demonstrate the input and output of our tool.
This package provides a facility to generate various classes of fractional designs for order-of-addition experiments namely fractional order-of-additions orthogonal arrays, see Voelkel, Joseph G. (2019). "The design of order-of-addition experiments." Journal of Quality Technology 51:3, 230-241, <doi:10.1080/00224065.2019.1569958>. Provides facility to construct component orthogonal arrays, see Jian-Feng Yang, Fasheng Sun and Hongquan Xu (2020). "A Component Position Model, Analysis and Design for Order-of-Addition Experiments." Technometrics, <doi:10.1080/00401706.2020.1764394>. Supports generation of fractional designs for order-of-addition mixture experiments. Analysis of data from order-of-addition mixture experiments is also supported.
Machine learning method specifically designed for pre-miRNA
prediction. It takes advantage of unlabeled sequences to improve the prediction rates even when there are just a few positive examples, when the negative examples are unreliable or are not good representatives of its class. Furthermore, the method can automatically search for negative examples if the user is unable to provide them. MiRNAss
can find a good boundary to divide the pre-miRNAs
from other groups of sequences; it automatically optimizes the threshold that defines the classes boundaries, and thus, it is robust to high class imbalance. Each step of the method is scalable and can handle large volumes of data.
Deconvolution of thermal decay curves allows you to quantify proportions of biomass components in plant litter. Thermal decay curves derived from thermogravimetric analysis (TGA) are imported, modified, and then modelled in a three- or four- part mixture model using the Fraser-Suzuki function. The output is estimates for weights of pseudo-components corresponding to hemicellulose, cellulose, and lignin. For more information see: Müller-Hagedorn, M. and Bockhorn, H. (2007) <doi:10.1016/j.jaap.2006.12.008>, à rfão, J. J. M. and Figueiredo, J. L. (2001) <doi:10.1016/S0040-6031(01)00634-7>, and Yang, H. and Yan, R. and Chen, H. and Zheng, C. and Lee, D. H. and Liang, D. T. (2006) <doi:10.1021/ef0580117>.
Logistic-normal Multinomial (LNM) models are common in problems with multivariate count data. This package gives a simple implementation with a 30 line Stan script. This lightweight implementation makes it an easy starting point for other projects, in particular for downstream tasks that require analysis of "compositional" data. It can be applied whenever a multinomial probability parameter is thought to depend linearly on inputs in a transformed, log ratio space. Additional utilities make it easy to inspect, create predictions, and draw samples using the fitted models. More about the LNM can be found in Xia et al. (2013) "A Logistic Normal Multinomial Regression Model for Microbiome Compositional Data Analysis" <doi:10.1111/biom.12079> and Sankaran and Holmes (2023) "Generative Models: An Interdisciplinary Perspective" <doi:10.1146/annurev-statistics-033121-110134>.
This package provides constrained joint maximum likelihood estimation algorithms for item factor analysis (IFA) based on multidimensional item response theory models. So far, we provide functions for exploratory and confirmatory IFA based on the multidimensional two parameter logistic (M2PL) model for binary response data. Comparing with traditional estimation methods for IFA, the methods implemented in this package scale better to data with large numbers of respondents, items, and latent factors. The computation is facilitated by multiprocessing OpenMP
API. For more information, please refer to: 1. Chen, Y., Li, X., & Zhang, S. (2018). Joint Maximum Likelihood Estimation for High-Dimensional Exploratory Item Factor Analysis. Psychometrika, 1-23. <doi:10.1007/s11336-018-9646-5>; 2. Chen, Y., Li, X., & Zhang, S. (2019). Structured Latent Factor Analysis for Large-scale Data: Identifiability, Estimability, and Their Implications. Journal of the American Statistical Association, <doi: 10.1080/01621459.2019.1635485>.
Weakly supervised (WS), multiple instance (MI) data lives in numerous interesting applications such as drug discovery, object detection, and tumor prediction on whole slide images. The mildsvm package provides an easy way to learn from this data by training Support Vector Machine (SVM)-based classifiers. It also contains helpful functions for building and printing multiple instance data frames. The core methods from mildsvm come from the following references: Kent and Yu (2022) <arXiv:2206.14704>
; Xiao, Liu, and Hao (2018) <doi:10.1109/TNNLS.2017.2766164>; Muandet et al. (2012) <https://proceedings.neurips.cc/paper/2012/file/9bf31c7ff062936a96d3c8bd1f8f2ff3-Paper.pdf>; Chu and Keerthi (2007) <doi:10.1162/neco.2007.19.3.792>; and Andrews et al. (2003) <https://papers.nips.cc/paper/2232-support-vector-machines-for-multiple-instance-learning.pdf>. Many functions use the Gurobi optimization back-end to improve the optimization problem speed; the gurobi R package and associated software can be downloaded from <https://www.gurobi.com> after obtaining a license.
Different data resources for microRNAs
and some functions for manipulating them.
This package provides a collection of R functions for analyzing finite mixture models.
This package provides a collection of functions for computations and visualizations of microbial pan-genomes.
Randomization schedules are generated in the schemes with k (k>=2) treatment groups and any allocation ratios by minimization algorithms.
Inference on stochastic differential models Ornstein-Uhlenbeck or Cox-Ingersoll-Ross, with one or two random effects in the drift function.
Some basic math calculators for finding angles for triangles and for finding the greatest common divisor of two numbers and so on.