Multi-threaded serialization of compressed array that fully utilizes modern solid state drives. It allows to store and load extremely large data on demand within seconds without occupying too much memories. With data stored on hard drive, a lazy-array data can be loaded, shared across multiple R sessions. For arrays with partition mode on, multiple R sessions can write to a same array simultaneously along the last dimension (partition). The internal storage format is provided by fstcore package geared by LZ4 and ZSTD compressors.
Estimation routines for several classes of affine term structure of interest rates models. All the models are based on the single-country unspanned macroeconomic risk framework from Joslin, Priebsch, and Singleton (2014, JF) <doi:10.1111/jofi.12131>. Multicountry extensions such as the ones of Jotikasthira, Le, and Lundblad (2015, JFE) <doi:10.1016/j.jfineco.2014.09.004>, Candelon and Moura (2023, EM) <doi:10.1016/j.econmod.2023.106453>, and Candelon and Moura (Forthcoming, JFEC) <doi:10.1093/jjfinec/nbae008> are also available.
We introduce factor models designed to jointly analyze high-dimensional count data from multiple studies by extracting study-shared and specified factors. Our factor models account for heterogeneous noises and overdispersion among counts with augmented covariates. We propose an efficient and speedy variational estimation procedure for estimating model parameters, along with a novel criterion for selecting the optimal number of factors and the rank of regression coefficient matrix. More details can be referred to Liu et al. (2024) <doi:10.48550/arXiv.2402.15071>
.
There are two functions-meta2d and meta3d for detecting rhythmic signals from time-series datasets. For analyzing time-series datasets without individual information, meta2d is suggested, which could incorporates multiple methods from ARSER, JTK_CYCLE and Lomb-Scargle in the detection of interested rhythms. For analyzing time-series datasets with individual information, meta3d is suggested, which takes use of any one of these three methods to analyze time-series data individual by individual and gives out integrated values based on analysis result of each individual.
In biomedical studies, researchers are often interested in assessing the association between one or more ordinal explanatory variables and an outcome variable, at the same time adjusting for covariates of any type. The outcome variable may be continuous, binary, or represent censored survival times. In the absence of a precise knowledge of the response function, using monotonicity constraints on the ordinal variables improves efficiency in estimating parameters, especially when sample sizes are small. This package implements an active set algorithm that efficiently computes such estimators.
Offers tools to estimate and visualize levels of major pollutants (CO, NO2, SO2, Ozone, PM2.5 and PM10) across the conterminous United States for user-defined time ranges. Provides functions to retrieve pollutant data from the U.S. Environmental Protection Agencyâ s Air Quality System (AQS) API service <https://aqs.epa.gov/aqsweb/documents/data_api.html> for interactive visualization through a shiny application, allowing users to explore pollutant levels for a given location over time relative to the National Ambient Air Quality Standards (NAAQS).
Given bincount data from single-cell copy number profiling (segmented or unsegmented), estimates ploidy, and uses the ploidy estimate to scale the data to absolute copy numbers. Uses the modular quantogram proposed by Kendall (1986) <doi:10.1002/0471667196.ess2129.pub2>, modified by weighting segments according to confidence, and quantifying confidence in the estimate using a theoretical quantogram. Includes optional fused-lasso segmentation with the algorithm in Johnson (2013) <doi:10.1080/10618600.2012.681238>, using the implementation from glmgen by Arnold, Sadhanala, and Tibshirani.
Transforms long data into a matrix form to allow for ease of input into modelling packages for regression, principal components, imputation or machine learning. It does this by pivoting on user defined columns, generating a key-value table for variable names to ensure one-to-one mappings are preserved. It is particularly useful when the indicator names in the columns are long descriptive strings, for example "Energy imports, net (% of energy use)". High level analysis wrapper functions for correlation and principal components analysis are provided.
coMethDMR
identifies genomic regions associated with continuous phenotypes by optimally leverages covariations among CpGs
within predefined genomic regions. Instead of testing all CpGs
within a genomic region, coMethDMR
carries out an additional step that selects co-methylated sub-regions first without using any outcome information. Next, coMethDMR
tests association between methylation within the sub-region and continuous phenotype using a random coefficient mixed effects model, which models both variations between CpG
sites within the region and differential methylation simultaneously.
geomeTriD
(Three Dimensional Geometry Package) create interactive 3D plots using the GL library with the three.js visualization library (https://threejs.org) or the rgl library. In addition to creating interactive 3D plots, the application also generates simplified models in 2D. These 2D models provide a more straightforward visual representation, making it easier to analyze and interpret the data quickly. This functionality ensures that users have access to both detailed three-dimensional visualizations and more accessible two-dimensional views, catering to various analytical needs.
Testing and documenting code that communicates with remote servers can be painful. This package helps with writing tests for packages that use httr2
. It enables testing all of the logic on the R sides of the API without requiring access to the remote service, and it also allows recording real API responses to use as test fixtures. The ability to save responses and load them offline also enables writing vignettes and other dynamic documents that can be distributed without access to a live server.
This package provides a few functions aim to provide a statistic tool for three purposes. First, simulate kin pairs data based on the assumption that every trait is affected by genetic effects (A), common environmental effects (C) and unique environmental effects (E).Second, use kin pairs data to fit an ACE model and get model fit output.Third, calculate power of A estimate given a specific condition. For the mechanisms of power calculation, we suggest to check Visscher(2004)<doi:10.1375/twin.7.5.505>.
Integrated, convenient, and uniform access to Canadian Census data and geography retrieved using the CensusMapper
API. This package produces analysis-ready tidy data frames and spatial data in multiple formats, as well as convenience functions for working with Census variables, variable hierarchies, and region selection. API keys are freely available with free registration at <https://censusmapper.ca/api>. Census data and boundary geometries are reproduced and distributed on an "as is" basis with the permission of Statistics Canada (Statistics Canada 2001; 2006; 2011; 2016; 2021).
This package provides functional control charts for statistical process monitoring of functional data, using the methods of Capezza et al. (2020) <doi:10.1002/asmb.2507>, Centofanti et al. (2021) <doi:10.1080/00401706.2020.1753581>, Capezza et al. (2024) <doi:10.1080/00401706.2024.2327346>, Capezza et al. (2024) <doi:10.1080/00224065.2024.2383674>, Centofanti et al. (2022) <doi:10.48550/arXiv.2205.06256>
. The package is thoroughly illustrated in the paper of Capezza et al (2023) <doi:10.1080/00224065.2023.2219012>.
This package provides methods include converting series of event names to strings, finding common patterns in a group of strings, discovering featured patterns when comparing two groups of strings as well as the number and starting position of each pattern in each string, obtaining transition matrix, computing transition entropy, statistically comparing the difference between two groups of strings, and clustering string groups. Event names can be any action names or labels such as events in log files or areas of interest (AOIs) in eye tracking research.
This package provides a framework to detect Differential Item Functioning (DIF) in Generalized Partial Credit Models (GPCM) and special cases of the GPCM as proposed by Schauberger and Mair (2019) <doi:10.3758/s13428-019-01224-2>. A joint model is set up where DIF is explicitly parametrized and penalized likelihood estimation is used for parameter selection. The big advantage of the method called GPCMlasso is that several variables can be treated simultaneously and that both continuous and categorical variables can be used to detect DIF.
This package provides a set of functions to locate some programs available on the user machine. The package provides functions to locate Node.js', npm', LibreOffice
', Microsoft Word', Microsoft PowerPoint
', Microsoft Excel', Python', pip', Mozilla Firefox and Google Chrome'. User can test the availability of a program with eventually a version and call it with function system2()
or system()
. This allows the use of a single function to retrieve the path to a program regardless of the operating system and its configuration.
This package provides essential tools for the pre-processing techniques of matching and weighting multiply imputed datasets. The package includes functions for matching within and across multiply imputed datasets using various methods, estimating weights for units in the imputed datasets using multiple weighting methods, calculating causal effect estimates in each matched or weighted dataset using parametric or non-parametric statistical models, and pooling the resulting estimates according to Rubin's rules (please see <https://journal.r-project.org/archive/2021/RJ-2021-073/> for more details).
This package implements three families of parsimonious hidden Markov models (HMMs) for matrix-variate longitudinal data using the Expectation-Conditional Maximization (ECM) algorithm. The package supports matrix-variate normal, t, and contaminated normal distributions as emission distributions. For each hidden state, parsimony is achieved through the eigen-decomposition of the covariance matrices associated with the emission distribution. This approach results in a comprehensive set of 98 parsimonious HMMs for each type of emission distribution. Atypical matrix detection is also supported, utilizing the fitted (heavy-tailed) models.
This package provides a quadratic time dynamic programming algorithm can be used to compute an approximate solution to the problem of finding the most likely changepoints with respect to the Poisson likelihood, subject to a constraint on the number of segments, and the changes which must alternate: up, down, up, down, etc. For more info read <http://proceedings.mlr.press/v37/hocking15.html> "PeakSeg
: constrained optimal segmentation and supervised penalty learning for peak detection in count data" by TD Hocking et al, proceedings of ICML2015.
An implementation of the full-likelihood Bayes factor (FLB) for evaluating segregation evidence in clinical medical genetics. The method was introduced by Thompson et al. (2003) <doi:10.1086/378100>. This implementation supports custom penetrance values and liability classes, and allows visualisations and robustness analysis as presented in Ratajska et al. (2023) <doi:10.1002/mgg3.2107>. See also the online app shinyseg', <https://chrcarrizosa.shinyapps.io/shinyseg>, which offers interactive segregation analysis with many additional features (Carrizosa et al. (2024) <doi:10.1093/bioinformatics/btae201>).
This package provides a flexible tool for simulating complex longitudinal data using structural equations, with emphasis on problems in causal inference. Specify interventions and simulate from intervened data generating distributions. Define and evaluate treatment-specific means, the average treatment effects and coefficients from working marginal structural models. User interface designed to facilitate the conduct of transparent and reproducible simulation studies, and allows concise expression of complex functional dependencies for a large number of time-varying nodes. See the package vignette for more information, documentation and examples.
This package provides methods to calculate diagnostics for multicollinearity among predictors in a linear or generalized linear model. It also provides methods to visualize those diagnostics following Friendly & Kwan (2009), "Whereâ s Waldo: Visualizing Collinearity Diagnostics", <doi:10.1198/tast.2009.0012>. These include better tabular presentation of collinearity diagnostics that highlight the important numbers, a semi-graphic tableplot of the diagnostics to make warning and danger levels more salient, and a "collinearity biplot" of the smallest dimensions of predictor space, where collinearity is most apparent.
Standard and extensible Eddy-Covariance data post-processing (Wutzler et al. (2018) <doi:10.5194/bg-15-5015-2018>) includes uStar-filtering
, gap-filling, and flux-partitioning. The Eddy-Covariance (EC) micrometeorological technique quantifies continuous exchange fluxes of gases, energy, and momentum between an ecosystem and the atmosphere. It is important for understanding ecosystem dynamics and upscaling exchange fluxes. (Aubinet et al. (2012) <doi:10.1007/978-94-007-2351-1>). This package inputs pre-processed (half-)hourly data and supports further processing. First, a quality-check and filtering is performed based on the relationship between measured flux and friction velocity (uStar
) to discard biased data (Papale et al. (2006) <doi:10.5194/bg-3-571-2006>). Second, gaps in the data are filled based on information from environmental conditions (Reichstein et al. (2005) <doi:10.1111/j.1365-2486.2005.001002.x>). Third, the net flux of carbon dioxide is partitioned into its gross fluxes in and out of the ecosystem by night-time based and day-time based approaches (Lasslop et al. (2010) <doi:10.1111/j.1365-2486.2009.02041.x>).