This package contains functions to help create an Analysis Results Dataset. The dataset follows industry recommended structure. The dataset can be created in multiple passes, using different data frames as input. Analysis Results Datasets are used in the pharmaceutical and biotech industries to capture analysis in a common tabular data structure.
Analysis of means (ANOM) as used in technometrical computing. The package takes results from multiple comparisons with the grand mean (obtained with multcomp', SimComp', nparcomp', or MCPAN') or corresponding simultaneous confidence intervals as input and produces ANOM decision charts that illustrate which group means deviate significantly from the grand mean.
Constrained ordinary least squares is performed. One constraint is that all beta coefficients (including the constant) cannot be negative. They can be either 0 or strictly positive. Another constraint is that the sum of the beta coefficients equals a constant. References: Hansen, B. E. (2022). Econometrics, Princeton University Press. <ISBN:9780691235899>.
This linear model solution is useful when both predictor and response have associated uncertainty. The doubly weights linear model solution is invariant on which quantity is used as predictor or response. Based on the results by Reed(1989) <doi:10.1119/1.15963> and Ripley & Thompson(1987) <doi:10.1039/AN9871200377>.
This comprehensive toolkit for Distributed Elliptical model is designated as "ELIC" (The LIC for Distributed Elliptical Model Analysis) analysis. It is predicated on the assumption that the error term adheres to a Elliptical distribution. The philosophy of the package is described in Guo G. (2020) <doi:10.1080/02664763.2022.2053949>.
An implementation of hyperparameter optimization for Gradient Boosted Trees on binary classification and regression problems. The current version provides two optimization methods: Bayesian optimization and random search. Instead of giving the single best model, the final output is an ensemble of Gradient Boosted Trees constructed via the method of ensemble selection.
Streamlines exploratory data analysis by providing a turnkey approach to visualising n-dimensional data which graphically reveals correlative or associative relationships between 2 or more features. Represents all dataset features as distinct, vertically aligned bar or tile plots, with plot types auto-selected based on whether variables are categorical or numeric.
Supplement for the book "Handbook of Regression Methods" by D. S. Young. Some datasets used in the book are included and documented. Wrapper functions are included that simplify the examples in the textbook, such as code for constructing a regressogram and expanding ANOVA tables to reflect the total sum of squares.
An implementation of high-probability lower bounds for the total variance distance as introduced in Michel & Naef & Meinshausen (2020) <arXiv:2005.06006>. An estimated lower-bound (with high-probability) on the total variation distance between two probability distributions from which samples are observed can be obtained with the function HPLB.
This package implements a wide range of metrics for measuring glucose control and glucose variability based on continuous glucose monitoring data. The list of implemented metrics is summarized in Rodbard (2009) <doi:10.1089/dia.2009.0015>. Additional visualization tools include time-series plots, lasagna plots and ambulatory glucose profile report.
This package provides functions to clean and process international trade data into an international trade network (ITN) are provided. It then provides a set a functions to undertake analysis and plots of the ITN (extract the backbone, centrality, blockmodels, clustering). Examining the key players in the ITN and regional trade patterns.
Machine learning, containing several algorithms for supervised and unsupervised classification, in addition to a function that plots the Receiver Operating Characteristic (ROC) and Precision-Recall (PRC) curve graphs, and also a function that returns several metrics used for model evaluation, the latter can be used in ranking results from other packs.
Simultaneous multiple outcomes prediction based on revised stacking algorithms, which enables the integration of information from predictions of individual models. An implementation of methodologies proposed in our paper: Li Xing, Mary L Lesperance, Xuekui Zhang. (2019) Bioinformatics, "Simultaneous prediction of multiple outcomes using revised stacking algorithms" <doi:10.1093/bioinformatics/btz531>.
Includes functions for conducting univariate and multivariate meta-analysis. This includes the estimation of the asymptotic variance-covariance matrix of effect sizes. For more details see Becker (1992) <doi:10.2307/1165128>, Cooper, Hedges, and Valentine (2019) <doi:10.7758/9781610448864>, and Schmid, Stijnen, and White (2020) <doi:10.1201/9781315119403>.
Make statistical inference on the probability of being in response, the duration of response, and the cumulative response rate up to a given time point. The method can be applied to analyze phase II randomized clinical trials with the endpoints being time to treatment response and time to progression or death.
This comprehensive toolkit for T-distributed regression is designated as "TLIC" (The LIC for T Distribution Regression Analysis) analysis. It is predicated on the assumption that the error term adheres to a T-distribution. The philosophy of the package is described in Guo G. (2020) <doi:10.1080/02664763.2022.2053949>.
Key-value store, implemented as a wrapper around LMDB'; the "lightning memory-mapped database" <https://www.symas.com/mdb>. LMDB is a transactional key value store that uses a memory map for efficient access. This package wraps the entire LMDB interface (except duplicated keys), and provides objects for transactions and cursors.
Implementation of integrative weighting approaches for multiple observational studies and causal inferences. The package features three weighting approaches, each representing a special case of the unified weighting framework, introduced by Guha and Li (2024) <doi:10.1093/biomtc/ujae070>, which includes an extension of inverse probability weights for data integration settings.
This package provides a novel clustering algorithm and toolkit RCSL (Rank Constrained Similarity Learning) to accurately identify various cell types using scRNA-seq data from a complex tissue. RCSL considers both lo-cal similarity and global similarity among the cells to discern the subtle differences among cells of the same type as well as larger differences among cells of different types. RCSL uses Spearman’s rank correlations of a cell’s expression vector with those of other cells to measure its global similar-ity, and adaptively learns neighbour representation of a cell as its local similarity. The overall similar-ity of a cell to other cells is a linear combination of its global similarity and local similarity.
Interface for multiple data sources, such as the `EDDS` API <https://evds2.tcmb.gov.tr/index.php?/evds/userDocs> of the Central Bank of the Republic of Türkiye and the `FRED` API <https://fred.stlouisfed.org/docs/api/fred/> of the Federal Reserve Bank. Both data providers require API keys for access, which users can easily obtain by creating accounts on their respective websites. The package provides caching ability with the selection of periods to increase the speed and efficiency of requests. It combines datasets requested from different sources, helping users when the data has common frequencies. While combining data frames whenever possible, it also keeps all requested data available as separate data frames to increase efficiency.
This package provides a fast dimensionality reduction method scalable to large numbers of samples. Landmark Multi-Dimensional Scaling (LMDS) is an extension of classical Torgerson MDS, but rather than calculating a complete distance matrix between all pairs of samples, only the distances between a set of landmarks and the samples are calculated.
This package provides tools to perform model selection alongside estimation under Linear, Logistic, Negative binomial, Quantile, and Skew-Normal regression. Under the spike-and-slab method, a probability for each possible model is estimated with the posterior mean, credibility interval, and standard deviation of coefficients and parameters under the most probable model.
An implementation of a variety of escalation with overdose control designs introduced by Babb, Rogatko and Zacks (1998) <doi:10.1002/(SICI)1097-0258(19980530)17:10%3C1103::AID-SIM793%3E3.0.CO;2-9>. It calculates the next dose as a clinical trial proceeds and performs simulations to obtain operating characteristics.
This package contains the core functions associated with Fast Regularized Canonical Correlation Analysis. Please see the following for details: Raul Cruz-Cano, Mei-Ling Ting Lee, Fast regularized canonical correlation analysis, Computational Statistics & Data Analysis, Volume 70, 2014, Pages 88-100, ISSN 0167-9473 <doi:10.1016/j.csda.2013.09.020>.