Estimate average treatment effects (ATEs) in stratified randomized experiments. sreg is designed to accommodate scenarios with multiple treatments and cluster-level treatment assignments, and accommodates optimal linear covariate adjustment based on baseline observable characteristics. sreg computes estimators and standard errors based on Bugni, Canay, Shaikh (2018) <doi:10.1080/01621459.2017.1375934>; Bugni, Canay, Shaikh, Tabord-Meehan (2024+) <doi:10.48550/arXiv.2204.08356>
; and Jiang, Linton, Tang, Zhang (2023+) <doi:10.48550/arXiv.2201.13004>
.
These are tools that allow users to do time series diagnostics, primarily tests of unit root, by way of simulation. While there is nothing necessarily wrong with the received wisdom of critical values generated decades ago, simulation provides its own perks. Not only is simulation broadly informative as to what these various test statistics do and what are their plausible values, simulation provides more flexibility for assessing unit root by way of different thresholds or different hypothesized distributions.
Capable of deriving seasonal statistics, such as "normals", and analysis of seasonal data, such as departures. This package also has graphics capabilities for representing seasonal data, including boxplots for seasonal parameters, and bars for summed normals. There are many specific functions related to climatology, including precipitation normals, temperature normals, cumulative precipitation departures and precipitation interarrivals. However, this package is designed to represent any time-varying parameter with a discernible seasonal signal, such as found in hydrology and ecology.
Testing SNPs and SNP interactions with a genotypic TDT. This package furthermore contains functions for computing pairwise values of LD measures and for identifying LD blocks, as well as functions for setting up matched case pseudo-control genotype data for case-parent trios in order to run trio logic regression, for imputing missing genotypes in trios, for simulating case-parent trios with disease risk dependent on SNP interaction, and for power and sample size calculation in trio data.
Compute expected shortfall (ES) and Value at Risk (VaR
) from a quantile function, distribution function, random number generator or probability density function. ES is also known as Conditional Value at Risk (CVaR
). Virtually any continuous distribution can be specified. The functions are vectorized over the arguments. The computations are done directly from the definitions, see e.g. Acerbi and Tasche (2002) <doi:10.1111/1468-0300.00091>. Some support for GARCH models is provided, as well.
This package provides a uniform statistical inferential tool in making individualized treatment decisions, which implements the methods of Ma et al. (2017)<DOI:10.1177/0962280214541724> and Guo et al. (2021)<DOI:10.1080/01621459.2020.1865167>. It uses a flexible semiparametric modeling strategy for heterogeneous treatment effect estimation in high-dimensional settings and can gave valid confidence bands. Based on it, one can find the subgroups of patients that benefit from each treatment, thereby making individualized treatment selection.
Measuring cellular energetics is essential to understanding a matrixâ s (e.g. cell, tissue or biofluid) metabolic state. The Agilent Seahorse machine is a common method to measure real-time cellular energetics, but existing analysis tools are highly manual or lack functionality. The Cellular Energetics Analysis Software (ceas) R package fills this analytical gap by providing modular and automated Seahorse data analysis and visualization using the methods described by Mookerjee et al. (2017) <doi:10.1074/jbc.m116.774471>.
Basic time series functionalities such as listing of missing values, application of arbitrary aggregation as well as rolling (asymmetric) window functions and automatic detection of periodicity. As it is mainly based on data.table', it is fast and (in combination with the R6 package) offers reference semantics. In addition to its native R6 interface, it provides an S3 interface for those who prefer the latter. Finally yet importantly, its functional approach allows for incorporating functionalities from many other packages.
R interface for the Google Cloud Services Document AI API <https://cloud.google.com/document-ai/> with additional tools for output file parsing and text reconstruction. Document AI is a powerful server-based OCR service that extracts text and tables from images and PDF files with high accuracy. daiR
gives R users programmatic access to this service and additional tools to handle and visualize the output. See the package website <https://dair.info/> for more information and examples.
DNA methylation is essential for human, and environment can change the DNA methylation and affect body status. Epigenome-Wide Mediation Analysis Study (EMAS) can find potential mediator CpG
sites between exposure (x) and outcome (y) in epigenome-wide. For more information on the methods we used, please see the following references: Tingley, D. (2014) <doi:10.18637/jss.v059.i05>, Turner, S. D. (2018) <doi:10.21105/joss.00731>, Rosseel, D. (2012) <doi:10.18637/jss.v048.i02>.
The R package proposes extreme value index estimators for heavy tailed models by mean of order p <DOI:10.1016/j.csda.2012.07.019>, peaks over random threshold <DOI:10.57805/revstat.v4i3.37> and a bias-reduced estimator <DOI:10.1080/00949655.2010.547196>. The package also computes moment, generalised Hill <DOI:10.2307/3318416> and mixed moment estimates for the extreme value index. High quantiles and value at risk estimators based on these estimators are implemented.
The Fill-Mask Association Test ('FMAT') <doi:10.1037/pspa0000396> is an integrative and probability-based method using Masked Language Models to measure conceptual associations (e.g., attitudes, biases, stereotypes, social norms, cultural values) as propositions in natural language. Supported language models include BERT <doi:10.48550/arXiv.1810.04805>
and its variants available at Hugging Face <https://huggingface.co/models?pipeline_tag=fill-mask>. Methodological references and installation guidance are provided at <https://psychbruce.github.io/FMAT/>.
Inference of chromosome-length haplotypes using a few haploid gametes of an individual. The gamete genotype data may be generated from various platforms including genotyping arrays and sequencing even with low-coverage. Hapi simply takes genotype data of known hetSNPs
in single gamete cells as input and report the high-resolution haplotypes as well as confidence of each phased hetSNPs
. The package also includes a module allowing downstream analyses and visualization of identified crossovers in the gametes.
Generalized low-rank models for mixed and incomplete data frames. The main function may be used for dimensionality reduction of imputation of numeric, binary and count data (simultaneously). Main effects such as column means, group effects, or effects of row-column side information (e.g. user/item attributes in recommendation system) may also be modelled in addition to the low-rank model. Geneviève Robin, Olga Klopp, Julie Josse, à ric Moulines, Robert Tibshirani (2018) <arXiv:1806.09734>
.
Modified functions of the package pcalg and some additional functions to run the PC and the FCI (Fast Causal Inference) algorithm for constraint-based causal discovery in incomplete and multiply imputed datasets. Foraita R, Friemel J, Günther K, Behrens T, Bullerdiek J, Nimzyk R, Ahrens W, Didelez V (2020) <doi:10.1111/rssa.12565>; Andrews RM, Foraita R, Didelez V, Witte J (2021) <arXiv:2108.13395>
; Witte J, Foraita R, Didelez V (2022) <doi:10.1002/sim.9535>.
This package provides tools for predicting ICU length of stay and assessing ICU efficiency. It is based on the methodologies proposed by Peres et al. (2022, 2023), which utilize data-driven approaches for modeling and validation, offering insights into ICU performance and patient outcomes. References: Peres et al. (2022)<https://pubmed.ncbi.nlm.nih.gov/35988701/>, Peres et al. (2023)<https://pubmed.ncbi.nlm.nih.gov/37922007/>. More information: <https://github.com/igor-peres/ICU-Length-of-Stay-Prediction>.
This package provides a SAS interface, through SASPy'(<https://sassoftware.github.io/saspy/>) and reticulate'(<https://rstudio.github.io/reticulate/>). This package helps you create SAS sessions, execute SAS code in remote SAS servers, retrieve execution results and log, and exchange datasets between SAS and R'. It also helps you to install SASPy and create a configuration file for the connection. Please review the SASPy license file as instructed so that you comply with its separate and independent license.
This package awst (Asymmetric Within-Sample Transformation) that regularizes RNA-seq read counts and reduces the effect of noise on the classification of samples. AWST comprises two main steps: standardization and smoothing. These steps transform gene expression data to reduce the noise of the lowly expressed features, which suffer from background effects and low signal-to-noise ratio, and the influence of the highly expressed features, which may be the result of amplification bias and other experimental artifacts.
Query the four endpoints of the Air and Water Database (AWDB) REST API maintained by the National Water and Climate Center (NWCC) at the United States Department of Agriculture (USDA). Endpoints include data, forecast, reference-data, and metadata. The package is extremely light weight, with Rust via extendr doing most of the heavy lifting to deserialize and flatten deeply nested JSON responses. The AWDB can be found at <https://wcc.sc.egov.usda.gov/awdbRestApi/swagger-ui/index.html>
.
Agreement of continuously scaled measurements made by two techniques, devices or methods is usually evaluated by the well-established Bland-Altman analysis or plot. Conditional method agreement trees (COAT), proposed by Karapetyan, Zeileis, Henriksen, and Hapfelmeier (2023) <doi:10.48550/arXiv.2306.04456>
, embed the Bland-Altman analysis in the framework of recursive partitioning to explore heterogeneous method agreement in dependence of covariates. COAT can also be used to perform a Bland-Altman test for differences in method agreement.
This package implements the Improved Expectation Maximisation EM* and the traditional EM algorithm for clustering big data (gaussian mixture models for both multivariate and univariate datasets). This version implements the faster alternative-EM* that expedites convergence via structure based data segregation. The implementation supports both random and K-means++ based initialization. Reference: Parichit Sharma, Hasan Kurban, Mehmet Dalkilic (2022) <doi:10.1016/j.softx.2021.100944>. Hasan Kurban, Mark Jenne, Mehmet Dalkilic (2016) <doi:10.1007/s41060-017-0062-1>.
Gaussian mixture graphical models include Bayesian networks and dynamic Bayesian networks (their temporal extension) whose local probability distributions are described by Gaussian mixture models. They are powerful tools for graphically and quantitatively representing nonlinear dependencies between continuous variables. This package provides a complete framework to create, manipulate, learn the structure and the parameters, and perform inference in these models. Most of the algorithms are described in the PhD
thesis of Roos (2018) <https://tel.archives-ouvertes.fr/tel-01943718>.
Multidimensional nonparametric spatial (spatio-temporal) geostatistics. S3 classes and methods for multidimensional: linear binning, local polynomial kernel regression (spatial trend estimation), density and variogram estimation. Nonparametric methods for simultaneous inference on both spatial trend and variogram functions (for spatial processes). Nonparametric residual kriging (spatial prediction). For details on these methods see, for example, Fernandez-Casal and Francisco-Fernandez (2014) <doi:10.1007/s00477-013-0817-8> or Castillo-Paez et al. (2019) <doi:10.1016/j.csda.2019.01.017>.
Calculate the ratio of iron oxides, hematite and goethite, in soil using the diffuse reflectance technique. The Kubelka-Munk theory, second derivative analysis, and spectral region amplitudes related to hematite and goethite content are used for quantification (Torrent, J., & Barron, V. (2008) <doi:10.2136/sssabookser5.5.c13>). Additionally, the package calculates soil color in the visible spectrum using Munsell and RGB color spaces, based on color theory (Viscarra et al. (2006) <doi:10.1016/j.geoderma.2005.07.017>).