Feature screening is a powerful tool in processing ultrahigh dimensional data. It attempts to screen out most irrelevant features in preparation for a more elaborate analysis. Xu and Chen (2014)<doi:10.1080/01621459.2013.879531> proposed an effective screening method SMLE, which naturally incorporates the joint effects among features in the screening process. This package provides an efficient implementation of SMLE-screening for high-dimensional linear, logistic, and Poisson models. The package also provides a function for conducting accurate post-screening feature selection based on an iterative hard-thresholding procedure and a user-specified selection criterion.
Facilitates scalable spatiotemporally varying coefficient modelling with Bayesian kernelized tensor regression. The important features of this package are: (a) Enabling local temporal and spatial modeling of the relationship between the response variable and covariates. (b) Implementing the model described by Lei et al. (2023) <doi:10.48550/arXiv.2109.00046>
. (c) Using a Bayesian Markov Chain Monte Carlo (MCMC) algorithm to sample from the posterior distribution of the model parameters. (d) Employing a tensor decomposition to reduce the number of estimated parameters. (e) Accelerating tensor operations and enabling graphics processing unit (GPU) acceleration with the torch package.
Different approaches to censored or truncated regression with conditional heteroscedasticity are provided. First, continuous distributions can be used for the (right and/or left censored or truncated) response with separate linear predictors for the mean and variance. Second, cumulative link models for ordinal data (obtained by interval-censoring continuous data) can be employed for heteroscedastic extended logistic regression (HXLR). In the latter type of models, the intercepts depend on the thresholds that define the intervals. Infrastructure for working with censored or truncated normal, logistic, and Student-t distributions, i.e., d/p/q/r functions and distributions3 objects.
This package provides a set of functions to conduct Conjunctive Analysis of Case Configurations (CACC) as described in Miethe, Hart, and Regoeczi (2008) <doi:10.1007/s10940-008-9044-8>, and identify and quantify situational clustering in dominant case configurations as described in Hart (2019) <doi:10.1177/0011128719866123>. Initially conceived as an exploratory technique for multivariate analysis of categorical data, CACC has developed to include formal statistical tests that can be applied in a wide variety of contexts. This technique allows examining composite profiles of different units of analysis in an alternative way to variable-oriented methods.
This package provides tools for the analysis of epidemiological and surveillance data. Contains functions for directly and indirectly adjusting measures of disease frequency, quantifying measures of association on the basis of single or multiple strata of count data presented in a contingency table, computation of confidence intervals around incidence risk and incidence rate estimates and sample size calculations for cross-sectional, case-control and cohort studies. Surveillance tools include functions to calculate an appropriate sample size for 1- and 2-stage representative freedom surveys, functions to estimate surveillance system sensitivity and functions to support scenario tree modelling analyses.
Integrates game theory and ecological theory to construct social-ecological models that simulate the management of populations and stakeholder actions. These models build off of a previously developed management strategy evaluation (MSE) framework to simulate all aspects of management: population dynamics, manager observation of populations, manager decision making, and stakeholder responses to management decisions. The newly developed generalised management strategy evaluation (GMSE) framework uses genetic algorithms to mimic the decision-making process of managers and stakeholders under conditions of change, uncertainty, and conflict. Simulations can be run using gmse()
, gmse_apply()
, and gmse_gui()
functions.
Allows users to create high-quality heatmaps from labelled, hierarchical data. Specifically, for data with a two-level hierarchical structure, it will produce a heatmap where each row and column represents a category at the lower level. These rows and columns are then grouped by the higher-level group each category belongs to, with the names for each category and groups shown in the margins. While other packages (e.g. dendextend') allow heatmap rows and columns to be arranged by groups only, hhmR
also allows the labelling of the data at both the category and group level.
Joint models have been widely used to study the associations between longitudinal biomarkers and a survival outcome. However, existing joint models only consider one or a few longitudinal biomarkers and cannot deal with high-dimensional longitudinal biomarkers. This package can be used to fit our recently developed penalized joint model that can handle high-dimensional longitudinal biomarkers. Specifically, an adaptive lasso penalty is imposed on the parameters for the effects of the longitudinal biomarkers on the survival outcome, which allows for variable selection. Also, our algorithm is computationally efficient, which is based on the Gaussian variational approximation method.
Provide estimation for particular cases of the power series cure rate model <doi:10.1080/03610918.2011.639971>. For the distribution of the concurrent causes the alternative models are the Poisson, logarithmic, negative binomial and Bernoulli (which are includes in the original work), the polylogarithm model <doi:10.1080/00949655.2018.1451850> and the Flory-Schulz <doi:10.3390/math10244643>. The estimation procedure is based on the EM algorithm discussed in <doi:10.1080/03610918.2016.1202276>. For the distribution of the time-to-event the alternative models are slash half-normal, Weibull, gamma and Birnbaum-Saunders distributions.
This package implements data processing described in <doi:10.1126/sciadv.abk3283> to align modern differentially private data with formatting of older US Census data releases. The primary goal is to read in Census Privacy Protected Microdata Files data in a reproducible way. This includes tools for aggregating to relevant levels of geography by creating geographic identifiers which match the US Census Bureau's numbering. Additionally, there are tools for grouping race numeric identifiers into categories, consistent with OMB (Office of Management and Budget) classifications. Functions exist for downloading and linking to existing sources of privacy protected microdata.
This package implements functions for working with absorbing Markov chains. The implementation is based on the framework described in "Toward a unified framework for connectivity that disentangles movement and mortality in space and time" by Fletcher et al. (2019) <doi:10.1111/ele.13333>, which applies them to spatial ecology. This framework incorporates both resistance and absorption with spatial absorbing Markov chains (SAMC) to provide several short-term and long-term predictions for metrics related to connectivity in landscapes. Despite the ecological context of the framework, this package can be used in any application of absorbing Markov chains.
Scale invariant version of the original PNN proposed by Specht (1990) <doi:10.1016/0893-6080(90)90049-q> with the added functionality of allowing for smoothing along multiple dimensions while accounting for covariances within the data set. It is written in the R statistical programming language. Given a data set with categorical variables, we use this algorithm to estimate the probabilities of a new observation vector belonging to a specific category. This type of neural network provides the benefits of fast training time relative to backpropagation and statistical generalization with only a small set of known observations.
Characterization of intra-individual variability using physiologically relevant measurements provides important insights into fundamental biological questions ranging from cell type identity to tumor development. For each individual, the data measurements can be written as a matrix with the different subsamples of the individual recorded in the columns and the different phenotypic units recorded in the rows. Datasets of this type are called high-dimensional transposable data. The HDTD package provides functions for conducting statistical inference for the mean relationship between the row and column variables and for the covariance structure within and between the row and column variables.
Format dates and times flexibly and to whichever locales make sense. Parses dates, times, and date-times in various formats (including string-based ISO 8601 constructions). The formatting syntax gives the user many options for formatting the date and time output in a precise manner. Time zones in the input can be expressed in multiple ways and there are many options for formatting time zones in the output as well. Several of the provided helper functions allow for automatic generation of locale-aware formatting patterns based on date/time skeleton formats and standardized date/time formats with varying specificity.
Schur polynomials appear in combinatorics and zonal polynomials appear in random matrix theory. They are particular cases of Jack polynomials. This package allows to compute these polynomials and other symmetric multivariate polynomials: flagged Schur polynomials, factorial Schur polynomials, t-Schur polynomials, Hall-Littlewood polynomials, Macdonald polynomials, and modified Macdonald polynomials. In addition, it can compute the Kostka-Jack numbers, the Kostka-Foulkes polynomials, the Kostka-Macdonald polynomials, and the Hall polynomials. Mainly based on Demmel & Koev's paper (2006) <doi:10.1090/S0025-5718-05-01780-1> and Macdonald's book (1995) <doi:10.1093/oso/9780198534891.003.0001>.
Implementation of the Partitioned Local Depth (PaLD
) approach which provides a measure of local depth and the cohesion of a point to another which (together with a universal threshold for distinguishing strong and weak ties) may be used to reveal local and global structure in data, based on methods described in Berenhaut, Moore, and Melvin (2022) <doi:10.1073/pnas.2003634119>. No extraneous inputs, distributional assumptions, iterative procedures nor optimization criteria are employed. This package includes functions for computing local depths and cohesion as well as flexible functions for plotting community networks and displays of cohesion against distance.
Estimation, scoring, and plotting functions for the semi-parametric factor model proposed by Liu & Wang (2022) <doi:10.1007/s11336-021-09832-8> and Liu & Wang (2023) <arXiv:2303.10079>
. Both the conditional densities of observed responses given the latent factors and the joint density of latent factors are estimated non-parametrically. Functional parameters are approximated by smoothing splines, whose coefficients are estimated by penalized maximum likelihood using an expectation-maximization (EM) algorithm. E- and M-steps can be parallelized on multi-thread computing platforms that support OpenMP
'. Both continuous and unordered categorical response variables are supported.
An implementation of the surrogate approach to residuals and diagnostics for ordinal and general regression models; for details, see Liu and Zhang (2017) <doi:10.1080/01621459.2017.1292915>. These residuals can be used to construct standard residual plots for model diagnostics (e.g., residual-vs-fitted value plots, residual-vs-covariate plots, Q-Q plots, etc.). The package also provides an autoplot function for producing standard diagnostic plots using ggplot2 graphics. The package currently supports cumulative link models from packages MASS', ordinal', rms', and VGAM'. Support for binary regression models using the standard glm function is also available.
This package provides the necessary functions for performing the Partial Correlation coefficient with Information Theory (PCIT) (Reverter and Chan 2008) and Regulatory Impact Factors (RIF) (Reverter et al. 2010) algorithm. The PCIT algorithm identifies meaningful correlations to define edges in a weighted network and can be applied to any correlation-based network including but not limited to gene co-expression networks, while the RIF algorithm identify critical Transcription Factors (TF) from gene expression data. These two algorithms when combined provide a very relevant layer of information for gene expression studies (Microarray, RNA-seq and single-cell RNA-seq data).
Response surface methods for drug synergy analysis. Available methods include generalized and classical Loewe formulations as well as Highest Single Agent methodology. Response surfaces can be plotted in an interactive 3-D plot and formal statistical tests for presence of synergistic effects are available. Implemented methods and tests are described in the article "BIGL: Biochemically Intuitive Generalized Loewe null model for prediction of the expected combined effect compatible with partial agonism and antagonism" by Koen Van der Borght, Annelies Tourny, Rytis Bagdziunas, Olivier Thas, Maxim Nazarov, Heather Turner, Bie Verbist & Hugo Ceulemans (2017) <doi:10.1038/s41598-017-18068-5>.
Bayesian survival model using Weibull regression on both scale and shape parameters. Dependence of shape parameter on covariates permits deviation from proportional-hazard assumption, leading to dynamic - i.e. non-constant with time - hazard ratios between subjects. Bayesian Lasso shrinkage in the form of two Laplace priors - one for scale and one for shape coefficients - allows for many covariates to be included. Cross-validation helper functions can be used to tune the shrinkage parameters. Monte Carlo Markov Chain (MCMC) sampling using a Gibbs wrapper around Radford Neal's univariate slice sampler (R package MfUSampler
) is used for coefficient estimation.
It aims to find significant pathways through network topology information. It has several advantages compared with current pathway enrichment tools. First, pathway node instead of single gene is taken as the basic unit when analysing networks to meet the fact that genes must be constructed into complexes to hold normal functions. Second, multiple network centrality measures are applied simultaneously to measure importance of nodes from different aspects to make a full view on the biological system. CePa
extends standard pathway enrichment methods, which include both over-representation analysis procedure and gene-set analysis procedure. <doi:10.1093/bioinformatics/btt008>.
This package provides a collection of several functions related to construction and analysis of incomplete split-plot designs. The package contains functions to obtain and analyze incomplete split-plot designs for three kinds of situations namely (i) when blocks are complete with respect to main plot treatments and main plots are incomplete with respect to subplot treatments, (ii) when blocks are incomplete with respect to main plot treatments and main plots are complete with respect to subplot treatments and (iii) when blocks are incomplete with respect to main plot treatments and main plots are incomplete with respect to subplot treatments.
The popular population genetic software Treemix by Pickrell and Pritchard (2012) <DOI:10.1371/journal.pgen.1002967> estimates the number of migration edges on a population tree. However, it can be difficult to determine the number of migration edges to include. Previously, it was customary to stop adding migration edges when 99.8% of variation in the data was explained, but OptM
automates this process using an ad hoc statistic based on the second-order rate of change in the log likelihood. OptM
also has added functionality for various threshold modeling to compare with the ad hoc statistic.