This package implements persistent row and column annotations for R matrices. The annotations associated with rows and columns are preserved after subsetting, transposition, and various other matrix-specific operations. Intended use case is for storing and manipulating genomic datasets which typically consist of a matrix of measurements (like gene expression values) as well as annotations about rows (i.e. genomic locations) and annotations about columns (i.e. meta-data about collected samples). But annmatrix objects are also expected to be useful in various other contexts.
Enrichment strategies play a critical role in modern clinical trial design, especially as precision medicine advances the focus on patient-specific efficacy. Recent developments in enrichment design have introduced biomarker randomness and accounted for the correlation structure between treatment effect and biomarker, resulting in a two-stage threshold enrichment design. We propose novel two-stage enrichment designs capable of handling two or more continuous biomarkers. See Zhang, F. and Gou, J. (2025). Using multiple biomarkers for patient enrichment in two-stage clinical designs. Technical Report.
Analysis of trade in value added with international input-output tables. Includes commands for easy data extraction, matrix manipulation, decomposition of value added in gross exports and calculation of value added indicators, with full geographical and sector customization. Decomposition methods include Borin and Mancini (2023) <doi:10.1080/09535314.2022.2153221>, Miroudot and Ye (2021) <doi:10.1080/09535314.2020.1730308>, Wang et al. (2013) <https://econpapers.repec.org/paper/nbrnberwo/19677.htm> and Koopman et al. (2014) <doi:10.1257/aer.104.2.459>.
This package provides a handy tool to calculate carbon footprints from air travel based on three-letter International Air Transport Association (IATA) airport codes or latitude and longitude. footprint first calculates the great-circle distance between departure and arrival destinations. It then uses the Department of Environment, Food & Rural Affairs (DEFRA) greenhouse gas conversion factors for business air travel to estimate the carbon footprint. These conversion factors consider trip length, flight class (e.g. economy, business), and emissions metric (e.g. carbon dioxide equivalent, methane).
This package provides several helper functions for working with knitr and LaTeX'. It includes xTab for creating traditional LaTeX tables, lTab for generating longtable environments, and sTab for generating a supertabular environment. Additionally, this package contains a knitr_setup() function which fixes a well-known bug in knitr', which distorts the results="asis" command when used in conjunction with user-defined commands; and a com command (<<com=TRUE>>=) which renders the output from knitr as a LaTeX command.
Multi-threaded serialization of compressed array that fully utilizes modern solid state drives. It allows to store and load extremely large data on demand within seconds without occupying too much memories. With data stored on hard drive, a lazy-array data can be loaded, shared across multiple R sessions. For arrays with partition mode on, multiple R sessions can write to a same array simultaneously along the last dimension (partition). The internal storage format is provided by fstcore package geared by LZ4 and ZSTD compressors.
We introduce factor models designed to jointly analyze high-dimensional count data from multiple studies by extracting study-shared and specified factors. Our factor models account for heterogeneous noises and overdispersion among counts with augmented covariates. We propose an efficient and speedy variational estimation procedure for estimating model parameters, along with a novel criterion for selecting the optimal number of factors and the rank of regression coefficient matrix. More details can be referred to Liu et al. (2024) <doi:10.48550/arXiv.2402.15071>.
There are two functions-meta2d and meta3d for detecting rhythmic signals from time-series datasets. For analyzing time-series datasets without individual information, meta2d is suggested, which could incorporates multiple methods from ARSER, JTK_CYCLE and Lomb-Scargle in the detection of interested rhythms. For analyzing time-series datasets with individual information, meta3d is suggested, which takes use of any one of these three methods to analyze time-series data individual by individual and gives out integrated values based on analysis result of each individual.
In biomedical studies, researchers are often interested in assessing the association between one or more ordinal explanatory variables and an outcome variable, at the same time adjusting for covariates of any type. The outcome variable may be continuous, binary, or represent censored survival times. In the absence of a precise knowledge of the response function, using monotonicity constraints on the ordinal variables improves efficiency in estimating parameters, especially when sample sizes are small. This package implements an active set algorithm that efficiently computes such estimators.
Offers tools to estimate and visualize levels of major pollutants (CO, NO2, SO2, Ozone, PM2.5 and PM10) across the conterminous United States for user-defined time ranges. Provides functions to retrieve pollutant data from the U.S. Environmental Protection Agencyâ s Air Quality System (AQS) API service <https://aqs.epa.gov/aqsweb/documents/data_api.html> for interactive visualization through a shiny application, allowing users to explore pollutant levels for a given location over time relative to the National Ambient Air Quality Standards (NAAQS).
Given bincount data from single-cell copy number profiling (segmented or unsegmented), estimates ploidy, and uses the ploidy estimate to scale the data to absolute copy numbers. Uses the modular quantogram proposed by Kendall (1986) <doi:10.1002/0471667196.ess2129.pub2>, modified by weighting segments according to confidence, and quantifying confidence in the estimate using a theoretical quantogram. Includes optional fused-lasso segmentation with the algorithm in Johnson (2013) <doi:10.1080/10618600.2012.681238>, using the implementation from glmgen by Arnold, Sadhanala, and Tibshirani.
Transforms long data into a matrix form to allow for ease of input into modelling packages for regression, principal components, imputation or machine learning. It does this by pivoting on user defined columns, generating a key-value table for variable names to ensure one-to-one mappings are preserved. It is particularly useful when the indicator names in the columns are long descriptive strings, for example "Energy imports, net (% of energy use)". High level analysis wrapper functions for correlation and principal components analysis are provided.
coMethDMR identifies genomic regions associated with continuous phenotypes by optimally leverages covariations among CpGs within predefined genomic regions. Instead of testing all CpGs within a genomic region, coMethDMR carries out an additional step that selects co-methylated sub-regions first without using any outcome information. Next, coMethDMR tests association between methylation within the sub-region and continuous phenotype using a random coefficient mixed effects model, which models both variations between CpG sites within the region and differential methylation simultaneously.
geomeTriD (Three Dimensional Geometry Package) create interactive 3D plots using the GL library with the three.js visualization library (https://threejs.org) or the rgl library. In addition to creating interactive 3D plots, the application also generates simplified models in 2D. These 2D models provide a more straightforward visual representation, making it easier to analyze and interpret the data quickly. This functionality ensures that users have access to both detailed three-dimensional visualizations and more accessible two-dimensional views, catering to various analytical needs.
Standard and extensible Eddy-Covariance data post-processing (Wutzler et al. (2018) <doi:10.5194/bg-15-5015-2018>) includes uStar-filtering, gap-filling, and flux-partitioning. The Eddy-Covariance (EC) micrometeorological technique quantifies continuous exchange fluxes of gases, energy, and momentum between an ecosystem and the atmosphere. It is important for understanding ecosystem dynamics and upscaling exchange fluxes. (Aubinet et al. (2012) <doi:10.1007/978-94-007-2351-1>). This package inputs pre-processed (half-)hourly data and supports further processing. First, a quality-check and filtering is performed based on the relationship between measured flux and friction velocity (uStar) to discard biased data (Papale et al. (2006) <doi:10.5194/bg-3-571-2006>). Second, gaps in the data are filled based on information from environmental conditions (Reichstein et al. (2005) <doi:10.1111/j.1365-2486.2005.001002.x>). Third, the net flux of carbon dioxide is partitioned into its gross fluxes in and out of the ecosystem by night-time based and day-time based approaches (Lasslop et al. (2010) <doi:10.1111/j.1365-2486.2009.02041.x>).
This package provides a few functions aim to provide a statistic tool for three purposes. First, simulate kin pairs data based on the assumption that every trait is affected by genetic effects (A), common environmental effects (C) and unique environmental effects (E).Second, use kin pairs data to fit an ACE model and get model fit output.Third, calculate power of A estimate given a specific condition. For the mechanisms of power calculation, we suggest to check Visscher(2004)<doi:10.1375/twin.7.5.505>.
Integrated, convenient, and uniform access to Canadian Census data and geography retrieved using the CensusMapper API. This package produces analysis-ready tidy data frames and spatial data in multiple formats, as well as convenience functions for working with Census variables, variable hierarchies, and region selection. API keys are freely available with free registration at <https://censusmapper.ca/api>. Census data and boundary geometries are reproduced and distributed on an "as is" basis with the permission of Statistics Canada (Statistics Canada 2001; 2006; 2011; 2016; 2021).
This package contains several functions for equivalence testing and practical significance testing. First, the tsti() command provides an automatic computation of three-sided testing results for a given estimate, standard error, and region of practical equivalence. For details, see Goeman, Solari, & Stijnen (2010) <doi:10.1002/sim.4002> and Isager & Fitzgerald (2024) <doi:10.31234/osf.io/8y925>. Second, the lddtest() command performs logarithmic density discontinuity equivalence testing for regression discontinuity designs. For reference, see Fitzgerald (2025) <doi:10.31222/osf.io/2dgrp_v1>.
This package provides functional control charts for statistical process monitoring of functional data, using the methods of Capezza et al. (2020) <doi:10.1002/asmb.2507>, Centofanti et al. (2021) <doi:10.1080/00401706.2020.1753581>, Capezza et al. (2024) <doi:10.1080/00401706.2024.2327346>, Capezza et al. (2024) <doi:10.1080/00224065.2024.2383674>, Centofanti et al. (2022) <doi:10.48550/arXiv.2205.06256>. The package is thoroughly illustrated in the paper of Capezza et al (2023) <doi:10.1080/00224065.2023.2219012>.
This package provides a framework to detect Differential Item Functioning (DIF) in Generalized Partial Credit Models (GPCM) and special cases of the GPCM as proposed by Schauberger and Mair (2019) <doi:10.3758/s13428-019-01224-2>. A joint model is set up where DIF is explicitly parametrized and penalized likelihood estimation is used for parameter selection. The big advantage of the method called GPCMlasso is that several variables can be treated simultaneously and that both continuous and categorical variables can be used to detect DIF.
This package provides methods include converting series of event names to strings, finding common patterns in a group of strings, discovering featured patterns when comparing two groups of strings as well as the number and starting position of each pattern in each string, obtaining transition matrix, computing transition entropy, statistically comparing the difference between two groups of strings, and clustering string groups. Event names can be any action names or labels such as events in log files or areas of interest (AOIs) in eye tracking research.
This package provides a set of functions to locate some programs available on the user machine. The package provides functions to locate Node.js', npm', LibreOffice', Microsoft Word', Microsoft PowerPoint', Microsoft Excel', Python', pip', Mozilla Firefox and Google Chrome'. User can test the availability of a program with eventually a version and call it with function system2() or system(). This allows the use of a single function to retrieve the path to a program regardless of the operating system and its configuration.
This package implements three families of parsimonious hidden Markov models (HMMs) for matrix-variate longitudinal data using the Expectation-Conditional Maximization (ECM) algorithm. The package supports matrix-variate normal, t, and contaminated normal distributions as emission distributions. For each hidden state, parsimony is achieved through the eigen-decomposition of the covariance matrices associated with the emission distribution. This approach results in a comprehensive set of 98 parsimonious HMMs for each type of emission distribution. Atypical matrix detection is also supported, utilizing the fitted (heavy-tailed) models.
This package provides essential tools for the pre-processing techniques of matching and weighting multiply imputed datasets. The package includes functions for matching within and across multiply imputed datasets using various methods, estimating weights for units in the imputed datasets using multiple weighting methods, calculating causal effect estimates in each matched or weighted dataset using parametric or non-parametric statistical models, and pooling the resulting estimates according to Rubin's rules (please see <https://journal.r-project.org/archive/2021/RJ-2021-073/> for more details).