This package provides a collection of functions for Kronecker structured covariance estimation and testing under the array normal model. For estimation, maximum likelihood and Bayesian equivariant estimation procedures are implemented. For testing, a likelihood ratio testing procedure is available. This package also contains additional functions for manipulating and decomposing tensor data sets. This work was partially supported by NSF grant DMS-1505136. Details of the methods are described in Gerard and Hoff (2015) <doi:10.1016/j.jmva.2015.01.020> and Gerard and Hoff (2016) <doi:10.1016/j.laa.2016.04.033>.
This package provides a tool that allows users to estimate tree height in the long-term forest experiments in Sweden. It utilizes the multilevel nonlinear mixed-effect height models developed for the forest experiments and consists of four functions for the main species, other conifer species, and other broadleaves. Each function within the system returns a data frame that includes the input data and the estimated heights for any missing values. Ogana et al. (2023) <doi:10.1016/j.foreco.2023.120843>\n Arias-Rodil et al. (2015) <doi:10.1371/JOURNAL.PONE.0143521>.
The sqldf function is typically passed a single argument which is an SQL select statement where the table names are ordinary R data frame names. sqldf transparently sets up a database, imports the data frames into that database, performs the SQL statement and returns the result using a heuristic to determine which class to assign to each column of the returned data frame. The sqldf or read.csv.sql functions can also be used to read filtered files into R even if the original files are larger than R itself can handle.
The Biomarker Optimal Segmentation System R package, bossR', is designed for precision medicine, helping to identify individual traits using biomarkers. It focuses on determining the most effective cutoff value for a continuous biomarker, which is crucial for categorizing patients into two groups with distinctly different clinical outcomes. The package simultaneously finds the optimal cutoff from given candidate values and tests its significance. Simulation studies demonstrate that bossR offers statistical power and false positive control non-inferior to the permutation approach (considered the gold standard in this field), while being hundreds of times faster.
Inference functionalities for distributed-lag linear structural equation models (DLSEMs). DLSEMs are Markovian structural causal models where each factor of the joint probability distribution is a distributed-lag linear regression with constrained lag shapes (Magrini, 2018 <doi:10.2478/bile-2018-0012>; Magrini et al., 2019 <doi:10.1007/s11135-019-00855-z>). DLSEMs account for temporal delays in the dependence relationships among the variables through a single parameter per covariate, thus allowing to perform dynamic causal inference in a feasible fashion. Endpoint-constrained quadratic, quadratic decreasing, linearly decreasing and gamma lag shapes are available.
Simultaneously detect the number and locations of change points in piecewise linear models under stationary Gaussian noise allowing autocorrelated random noise. The core idea is to transform the problem of detecting change points into the detection of local extrema (local maxima and local minima)through kernel smoothing and differentiation of the data sequence, see Cheng et al. (2020) <doi:10.1214/20-EJS1751>. A low-computational and fast algorithm call dSTEM is introduced to detect change points based on the STEM algorithm in D. Cheng and A. Schwartzman (2017) <doi:10.1214/16-AOS1458>.
With the functions in this package you can check the validity of the following financial instrument identifiers: FIGI (Financial Instrument Global Identifier <https://www.openfigi.com/about/figi>), CUSIP (Committee on Uniform Security Identification Procedures <https://www.cusip.com/identifiers.html#/CUSIP>), ISIN (International Securities Identification Number <https://www.cusip.com/identifiers.html#/ISIN>), SEDOL (Stock Exchange Daily Official List <https://www2.lseg.com/SEDOL-masterfile-service-tech-guide-v8.6>). You can also calculate the FIGI checksum of 11-character strings, which can be useful if you want to create your own FIGI identifiers.
Fit penalized multivariable linear mixed models with a single random effect to control for population structure in genetic association studies. The goal is to simultaneously fit many genetic variants at the same time, in order to select markers that are independently associated with the response. Can also handle prior annotation information, for example, rare variants, in the form of variable weights. For more information, see the website below and the accompanying paper: Bhatnagar et al., "Simultaneous SNP selection and adjustment for population structure in high dimensional prediction models", 2020, <DOI:10.1371/journal.pgen.1008766>.
This is the central location for data and tools for the development, maintenance, analysis, and deployment of the International Soil Radiocarbon Database (ISRaD). ISRaD was developed as a collaboration between the U.S. Geological Survey Powell Center and the Max Planck Institute for Biogeochemistry. This R package provides tools for accessing and manipulating ISRaD data, compiling local data using the ISRaD data structure, and simple query and reporting functions for ISRaD. For more detailed information visit the ISRaD website at: <https://soilradiocarbon.org/>.
Reproduces the harmonized DB of the ESTAT survey of the same name. The survey data is served as separate spreadsheets with noticeable differences in the collected attributes. The tool here presented carries out a series of instructions that harmonize the attributes in terms of name, meaning, and occurrence, while also introducing a series of new variables, instrumental to adding value to the product. Outputs include one harmonized table with all the years, and three separate geometries, corresponding to the theoretical point, the gps location where the measurement was made and the 250m east-facing transect.
Implementation of custom tidymodels metrics for multi-class prediction models with a single negative class. Currently are implemented macro-average sensitivity and specificity as in Mortaz, Ebrahim (2020) "Imbalance accuracy metric for model selection in multi-class imbalance classification problemsâ <doi:10.1016/j.knosys.2020.106490> and a generalized weighted Youden index as in Li, D.L., Shen F., Yin Y., Peng J.X and Chen P.Y. (2013) â Weighted Youden index and its two-independent-sample comparison based on weighted sensitivity and specificityâ <doi:10.3760/cma.j.issn.0366-6999.20123102>.
To assist biological researchers in assembling taxonomically and marker focused molecular sequence data sets. MACER accepts a list of genera as a user input and uses NCBI-GenBank and BOLD as resources to download and assemble molecular sequence datasets. These datasets are then assembled by marker, aligned, trimmed, and cleaned. The use of this package allows the publication of specific parameters to ensure reproducibility. The MACER package has four core functions and an example run through using all of these functions can be found in the associated repository <https://github.com/rgyoung6/MACER_example>.
This package implements the Multi-view Aggregated Two-Sample (MATES) test, a powerful nonparametric method for testing equality of two multivariate distributions. The method constructs multiple graph-based statistics from various perspectives (views) including different distance metrics, graph types (nearest neighbor graphs, minimum spanning trees, and robust nearest neighbor graphs), and weighting schemes. These statistics are then aggregated through a quadratic form to achieve improved statistical power. The package provides both asymptotic closed-form inference and permutation-based testing procedures. For methodological details, see Cai and others (2026+) <doi:10.48550/arXiv.2412.16684>.
This package provides an implementation of a rare variant association test that utilizes protein tertiary structure to increase signal and to identify likely causal variants. Performs structure-guided collapsing, which leads to local tests that borrow information from neighboring variants on a protein and that provide association information on a variant-specific level. For details of the implemented method see West, R. M., Lu, W., Rotroff, D. M., Kuenemann, M., Chang, S-M., Wagner M. J., Buse, J. B., Motsinger-Reif, A., Fourches, D., and Tzeng, J-Y. (2019) <doi:10.1371/journal.pcbi.1006722>.
An algorithm for nonlinear global optimization based on the variable neighbourhood trust region search (VNTRS) algorithm proposed by Bierlaire et al. (2009) "A Heuristic for Nonlinear Global Optimization" <doi:10.1287/ijoc.1090.0343>. The algorithm combines variable neighbourhood exploration with a trust-region framework to efficiently search the solution space. It can terminate a local search early if the iterates are converging toward a previously visited local optimum or if further improvement within the current region is unlikely. In addition to global optimization, the algorithm can also be applied to identify multiple local optima.
Radare2 is a complete framework for reverse-engineering, debugging, and analyzing binaries. It is composed of a set of small utilities that can be used together or independently from the command line.
Radare2 is built around a scriptable disassembler and hexadecimal editor that support a variety of executable formats for different processors and operating systems, through multiple back ends for local and remote files and disk images.
It can also compare (diff) binaries with graphs and extract information like relocation symbols. It is able to deal with malformed binaries, making it suitable for security research and analysis.
DMCFB is a pipeline for identifying differentially methylated cytosines using a Bayesian functional regression model in bisulfite sequencing data. By using a functional regression data model, it tries to capture position-specific, group-specific and other covariates-specific methylation patterns as well as spatial correlation patterns and unknown underlying models of methylation data. It is robust and flexible with respect to the true underlying models and inclusion of any covariates, and the missing values are imputed using spatial correlation between positions and samples. A Bayesian approach is adopted for estimation and inference in the proposed method.
Single-cell RNA-seq technologies enable high throughput gene expression measurement of individual cells, and allow the discovery of heterogeneity within cell populations. Measurement of cell-to-cell gene expression similarity is critical for the identification, visualization and analysis of cell populations. However, single-cell data introduce challenges to conventional measures of gene expression similarity because of the high level of noise, outliers and dropouts. We develop a novel similarity-learning framework, SIMLR (Single-cell Interpretation via Multi-kernel LeaRning), which learns an appropriate distance metric from the data for dimension reduction, clustering and visualization.
Combining a generalized linear model with an additional tree part on the same scale. A four-step procedure is proposed to fit the model and test the joint effect of the selected tree part while adjusting on confounding factors. We also proposed an ensemble procedure based on the bagging to improve prediction accuracy and computed several scores of importance for variable selection. See Cyprien Mbogning et al.'(2014)<doi:10.1186/2043-9113-4-6> and Cyprien Mbogning et al.'(2015)<doi:10.1159/000380850> for an overview of all the methods implemented in this package.
This package provides a utility to quickly obtain clean and tidy men's basketball play by play data. Provides functions to access live play by play and box score data from ESPN<https://www.espn.com> with shot locations when available. It is also a full NBA Stats API<https://www.nba.com/stats/> wrapper. It is also a scraping and aggregating interface for Ken Pomeroy's men's college basketball statistics website<https://kenpom.com>. It provides users with an active subscription the capability to scrape the website tables and analyze the data for themselves.
Combines Latent Dirichlet Allocation (LDA) and Bayesian multinomial time series methods in a two-stage analysis to quantify dynamics in high-dimensional temporal data. LDA decomposes multivariate data into lower-dimension latent groupings, whose relative proportions are modeled using generalized Bayesian time series models that include abrupt changepoints and smooth dynamics. The methods are described in Blei et al. (2003) <doi:10.1162/jmlr.2003.3.4-5.993>, Western and Kleykamp (2004) <doi:10.1093/pan/mph023>, Venables and Ripley (2002, ISBN-13:978-0387954578), and Christensen et al. (2018) <doi:10.1002/ecy.2373>.
Create and manipulate numeric list ('nlist') objects. An nlist is an S3 list of uniquely named numeric objects. An numeric object is an integer or double vector, matrix or array. An nlists object is a S3 class list of nlist objects with the same names, dimensionalities and typeofs. Numeric list objects are of interest because they are the raw data inputs for analytic engines such as JAGS', STAN and TMB'. Numeric lists objects, which are useful for storing multiple realizations of of simulated data sets, can be converted to coda::mcmc and coda::mcmc.list objects.
Single cell Higher Order Testing (scHOT) is an R package that facilitates testing changes in higher order structure of gene expression along either a developmental trajectory or across space. scHOT is general and modular in nature, can be run in multiple data contexts such as along a continuous trajectory, between discrete groups, and over spatial orientations; as well as accommodate any higher order measurement such as variability or correlation. scHOT meaningfully adds to first order effect testing, such as differential expression, and provides a framework for interrogating higher order interactions from single cell data.
This package provides functions to specify, fit and visualize nested partially-latent class models ( Wu, Deloria-Knoll, Hammitt, and Zeger (2016) <doi:10.1111/rssc.12101>; Wu, Deloria-Knoll, and Zeger (2017) <doi:10.1093/biostatistics/kxw037>; Wu and Chen (2021) <doi:10.1002/sim.8804>) for inference of population disease etiology and individual diagnosis. In the motivating Pneumonia Etiology Research for Child Health (PERCH) study, because both quantities of interest sum to one hundred percent, the PERCH scientists frequently refer to them as population etiology pie and individual etiology pie, hence the name of the package.