Decode elements of the Australian Higher Education Information Management System (HEIMS) data for clarity and performance. HEIMS is the record system of the Department of Education, Australia to record enrolments and completions in Australia's higher education system, as well as a range of relevant information. For more information, including the source of the data dictionary, see <http://heimshelp.education.gov.au/sites/heimshelp/dictionary/pages/data-element-dictionary>.
Flexible procedures to compute local density-based outlier scores for ranking outliers. Both exact and approximate nearest neighbor search can be implemented, while also accommodating multiple neighborhood sizes and four different local density-based methods. It allows for referencing a random subsample of the input data or a user specified reference data set to compute outlier scores against, so both unsupervised and semi-supervised outlier detection can be implemented.
Calculates and differentiates probabilities and density of (conditional) multivariate normal distribution and Gaussian copula (with various marginal distributions) using methods described in A. Genz (2004) <doi:10.1023/B:STCO.0000035304.20635.31>, A. Genz, F. Bretz (2009) <doi:10.1007/978-3-642-01689-9>, H. I. Gassmann (2003) <doi:10.1198/1061860032283> and E. Kossova, B. Potanin (2018) <https://ideas.repec.org/a/ris/apltrx/0346.html>.
This package provides methods and tools for deriving spatial summary functions from single-cell imaging data and performing functional data analyses. Functions can be applied to other single-cell technologies such as spatial transcriptomics. Functional regression and functional principal component analysis methods are in the refund package <https://cran.r-project.org/package=refund> while calculation of the spatial summary functions are from the spatstat package <https://spatstat.org/>.
Test-based Image structural similarity measure and test of independence. This package implements the key functions of two tasks: (1) computing image structural similarity measure PSSIM of Wang, Maldonado and Silwal (2011) <DOI:10.1016/j.csda.2011.04.021>; and (2) test of independence between a response and a covariate in presence of heteroscedastic treatment effects proposed by Wang, Tolos, and Wang (2010) <DOI:10.1002/cjs.10068>.
An implementation of the one-step privacy-protecting method for estimating the overall and site-specific hazard ratios using inverse probability weighted Cox models in distributed data network studies, as proposed by Shu, Yoshida, Fireman, and Toh (2019) <doi: 10.1177/0962280219869742>. This method only requires sharing of summary-level riskset tables instead of individual-level data. Both the conventional inverse probability weights and the stabilized weights are implemented.
This package provides a novel pseudo-value regression approach for the differential co-expression network analysis in expression data, which can incorporate additional clinical variables in the model. This is a direct regression modeling for the differential network analysis, and it is therefore computationally amenable for the most users. The full methodological details can be found in Ahn S et al (2023) <doi:10.1186/s12859-022-05123-w>.
This package provides various styles of function chaining methods: Pipe operator, Pipe object, and pipeline function, each representing a distinct pipeline model yet sharing almost a common set of features: A value can be piped to the first unnamed argument of a function and to dot symbol in an enclosed expression. The syntax is designed to make the pipeline more readable and friendly to a wide range of operations.
Support package for the textbook "An Introduction to Quantitative Text Analysis for Linguists: Reproducible Research Using R" (Francom, 2024) <doi:10.4324/9781003393764>. Includes functions to acquire, clean, and analyze text data as well as functions to document and share the results of text analysis. The package is designed to be used in conjunction with the book, but can also be used as a standalone package for text analysis.
This package provides functions for simplified emulation of time series computer model output in model parameter space using Gaussian processes. Stilt can be used more generally for Kriging of spatio-temporal fields. There are functions to predict at new parameter settings, to test the emulator using cross-validation (which includes information on 95% confidence interval empirical coverage), and to produce contour plots over 2D slices in model parameter space.
You can use the functions provided by the package to make various statistical tables, such as baseline data tables. Creates Table 1', i.e., a description of the baseline patient characteristics, which is essential in every medical research. Supports both continuous and categorical variables, as well as p-values and standardized mean differences. This method was described by Mary L McHugh (2013) <doi:10.11613/bm.2013.018>.
Data related to the Salem Witch Trials Datasets and tutorials documenting the witch accusations and trials centered around Salem, Massachusetts in 1692. Originally assembled by Richard B. Latner of Tulane University for his website <https://www2.tulane.edu/~salem/index.html>. The data sets include information on 152 accused witches, members of the Salem Village Committee, signatories of petitions related to the events, and tax data for Salem Village.
Inferring causation from spatial cross-sectional data through empirical dynamic modeling (EDM), with methodological extensions including geographical convergent cross mapping from Gao et al. (2023) <doi:10.1038/s41467-023-41619-6>, as well as the spatial causality test following the approach of Herrera et al. (2016) <doi:10.1111/pirs.12144>, together with geographical pattern causality proposed in Zhang et al. (2025) <doi:10.1080/13658816.2025.2581207>.
Software that leverages the capabilities of Circos by manipulating data, preparing configuration files, and running the Perl-native Circos directly from the R environment with minimal user intervention. Circos is a novel software that addresses the challenges in visualizing genetic data by creating circular ideograms composed of tracks of heatmaps, scatter plots, line plots, histograms, links between common markers, glyphs, text, and etc. Please see <http://www.circos.ca>.
Imbalanced training datasets impede many popular classifiers. To balance training data, a combination of oversampling minority classes and undersampling majority classes is useful. This package implements the SCUT (SMOTE and Cluster-based Undersampling Technique) algorithm as described in Agrawal et. al. (2015) <doi:10.5220/0005595502260234>. Their paper uses model-based clustering and synthetic oversampling to balance multiclass training datasets, although other resampling methods are provided in this package.
This package provides a simple interface to recursively list files from a directory, filter them using a regular expression, read their contents, and extract lines that match a user-defined pattern. The package returns a dataframe containing the matched lines, their line numbers, file paths, and the corresponding matched substrings. Designed for quick code base exploration, log inspection, or any use case involving pattern-based file and line filtering.
Spatial transcriptomics iterative hierarchical clustering ('stIHC'), is a method for identifying spatial gene co-expression modules, defined as groups of genes with shared spatial expression patterns. The method is applicable across spatial transcriptomics technologies with differing spatial resolution, and provides a framework for investigating the spatial organisation of gene expression in tissues. For further details, see Higgins C., Li J.J., Carey M. <doi:10.1002/qub2.70011>.
Computes the solution path of the Terminating-LARS (T-LARS) algorithm. The T-LARS algorithm is a major building block of the T-Rex selector (see R package TRexSelector'). The package is based on the papers Machkour, Muma, and Palomar (2022) <arXiv:2110.06048>, Efron, Hastie, Johnstone, and Tibshirani (2004) <doi:10.1214/009053604000000067>, and Tibshirani (1996) <doi:10.1111/j.2517-6161.1996.tb02080.x>.
The best ANN structure for time series data analysis is a demanding need in the present era. This package will find the best-fitted ANN model based on forecasting accuracy. The optimum size of the hidden layers was also determined after determining the number of lags to be included. This package has been developed using the algorithm of Paul and Garai (2021) <doi:10.1007/s00500-021-06087-4>.
This package provides a suite of psychometric analysis tools for research and operation, including: (1) computation of probability, information, and likelihood for the 3PL, GPCM, and GRM; (2) parameter estimation using joint or marginal likelihood estimation method; (3) simulation of computerized adaptive testing using built-in or customized algorithms; (4) assembly and simulation of multistage testing. The full documentation and tutorials are at <https://github.com/xluo11/xxIRT>.
Gene-environment (GÃ E) interactions have important implications to elucidate the etiology of complex diseases beyond the main genetic and environmental effects. Outliers and data contamination in disease phenotypes of GÃ E studies have been commonly encountered, leading to the development of a broad spectrum of robust penalization methods. Nevertheless, within the Bayesian framework, the issue has not been taken care of in existing studies. We develop a robust Bayesian variable selection method for GÃ E interaction studies. The proposed Bayesian method can effectively accommodate heavy-tailed errors and outliers in the response variable while conducting variable selection by accounting for structural sparsity. In particular, the spike-and-slab priors have been imposed on both individual and group levels to identify important main and interaction effects. An efficient Gibbs sampler has been developed to facilitate fast computation. The Markov chain Monte Carlo algorithms of the proposed and alternative methods are efficiently implemented in C++.
tidyr is a reframing of the reshape2 package designed to accompany the tidy data framework, and to work hand-in-hand with magrittr and dplyr to build a solid pipeline for data analysis. It is designed specifically for tidying data, not the general reshaping that reshape2 does, or the general aggregation that reshape did. In particular, built-in methods only work for data frames, and tidyr provides no margins or aggregation.
SPAMS (SPArse Modeling Software) is an optimization toolbox for solving various sparse estimation problems. It includes tools for the following problems:
Dictionary learning and matrix factorization (NMF, sparse principle component analysis (PCA), ...)
Solving sparse decomposition problems with LARS, coordinate descent, OMP, SOMP, proximal methods
Solving structured sparse decomposition problems (l1/l2, l1/linf, sparse group lasso, tree-structured regularization, structured sparsity with overlapping groups,...).
Features tools for exploring congruent phylogenetic birth-death models. It can construct the pulled speciation- and net-diversification rates from a reference model. Given alternative speciation- or extinction rates, it can construct new models that are congruent with the reference model. Functionality is included to sample new rate functions, and to visualize the distribution of one congruence class. See also Louca & Pennell (2020) <doi:10.1038/s41586-020-2176-1>.