Identify and visualize individuals with unusual association patterns of genetics and geography using the approach of Chang and Schmid (2023) <doi:10.1101/2023.04.06.535838>. It detects potential outliers that violate the isolation-by-distance assumption using the K-nearest neighbor approach. You can obtain a table of outliers with statistics and visualize unusual geo-genetic patterns on a geographical map. This is useful for landscape genomics studies to discover individuals with unusual geography and genetics associations from a large biological sample.
This package implements our Bayesian phase I repeated measurement design that accounts for multidimensional toxicity endpoints from multiple treatment cycles. The package also provides a novel design to account for both multidimensional toxicity endpoints and early-stage efficacy endpoints in the phase I design. For both designs, functions are provided to recommend the next dosage selection based on the data collected in the available patient cohorts and to simulate trial characteristics given design parameters. Yin, Jun, et al. (2017) <doi:10.1002/sim.7134>.
Recent years have seen an increased interest in novel methods for analyzing quantitative data from experimental psychology. Currently, however, they lack an established and accessible software framework. Many existing implementations provide no guidelines, consisting of small code snippets, or sets of packages. In addition, the use of existing packages often requires advanced programming experience. PredPsych is a user-friendly toolbox based on machine learning predictive algorithms. It comprises of multiple functionalities for multivariate analyses of quantitative behavioral data based on machine learning models.
Traditional methods for analyzing single cell RNA-seq datasets focus solely on gene expression, but this package introduces a novel approach that goes beyond this limitation. Using Gene Ontology terms as features, the package allows for the functional profile of cell populations, and comparison within and between datasets from the same or different species. Our approach enables the discovery of previously unrecognized functional similarities and differences between cell types and has demonstrated success in identifying cell types functional correspondence even between evolutionarily distant species.
For multiple ranked input lists (full or partial) representing the same set of N objects, the package TopKLists <doi:10.1515/sagmb-2014-0093> offers (1) statistical inference on the lengths of informative top-k lists, (2) stochastic aggregation of full or partial lists, and (3) graphical tools for the statistical exploration of input lists, and for the visualization of aggregation results. Note that RGtk2 and gWidgets2RGtk2 have been archived on CRAN. See <https://github.com/pievos101/TopKLists> for installation instructions.
coMethDMR identifies genomic regions associated with continuous phenotypes by optimally leverages covariations among CpGs within predefined genomic regions. Instead of testing all CpGs within a genomic region, coMethDMR carries out an additional step that selects co-methylated sub-regions first without using any outcome information. Next, coMethDMR tests association between methylation within the sub-region and continuous phenotype using a random coefficient mixed effects model, which models both variations between CpG sites within the region and differential methylation simultaneously.
This package implements persistent row and column annotations for R matrices. The annotations associated with rows and columns are preserved after subsetting, transposition, and various other matrix-specific operations. Intended use case is for storing and manipulating genomic datasets which typically consist of a matrix of measurements (like gene expression values) as well as annotations about rows (i.e. genomic locations) and annotations about columns (i.e. meta-data about collected samples). But annmatrix objects are also expected to be useful in various other contexts.
Enrichment strategies play a critical role in modern clinical trial design, especially as precision medicine advances the focus on patient-specific efficacy. Recent developments in enrichment design have introduced biomarker randomness and accounted for the correlation structure between treatment effect and biomarker, resulting in a two-stage threshold enrichment design. We propose novel two-stage enrichment designs capable of handling two or more continuous biomarkers. See Zhang, F. and Gou, J. (2025). Using multiple biomarkers for patient enrichment in two-stage clinical designs. Technical Report.
Analysis of trade in value added with international input-output tables. Includes commands for easy data extraction, matrix manipulation, decomposition of value added in gross exports and calculation of value added indicators, with full geographical and sector customization. Decomposition methods include Borin and Mancini (2023) <doi:10.1080/09535314.2022.2153221>, Miroudot and Ye (2021) <doi:10.1080/09535314.2020.1730308>, Wang et al. (2013) <https://econpapers.repec.org/paper/nbrnberwo/19677.htm> and Koopman et al. (2014) <doi:10.1257/aer.104.2.459>.
This package provides a handy tool to calculate carbon footprints from air travel based on three-letter International Air Transport Association (IATA) airport codes or latitude and longitude. footprint first calculates the great-circle distance between departure and arrival destinations. It then uses the Department of Environment, Food & Rural Affairs (DEFRA) greenhouse gas conversion factors for business air travel to estimate the carbon footprint. These conversion factors consider trip length, flight class (e.g. economy, business), and emissions metric (e.g. carbon dioxide equivalent, methane).
This package provides several helper functions for working with knitr and LaTeX'. It includes xTab for creating traditional LaTeX tables, lTab for generating longtable environments, and sTab for generating a supertabular environment. Additionally, this package contains a knitr_setup() function which fixes a well-known bug in knitr', which distorts the results="asis" command when used in conjunction with user-defined commands; and a com command (<<com=TRUE>>=) which renders the output from knitr as a LaTeX command.
We introduce factor models designed to jointly analyze high-dimensional count data from multiple studies by extracting study-shared and specified factors. Our factor models account for heterogeneous noises and overdispersion among counts with augmented covariates. We propose an efficient and speedy variational estimation procedure for estimating model parameters, along with a novel criterion for selecting the optimal number of factors and the rank of regression coefficient matrix. More details can be referred to Liu et al. (2024) <doi:10.48550/arXiv.2402.15071>.
There are two functions-meta2d and meta3d for detecting rhythmic signals from time-series datasets. For analyzing time-series datasets without individual information, meta2d is suggested, which could incorporates multiple methods from ARSER, JTK_CYCLE and Lomb-Scargle in the detection of interested rhythms. For analyzing time-series datasets with individual information, meta3d is suggested, which takes use of any one of these three methods to analyze time-series data individual by individual and gives out integrated values based on analysis result of each individual.
In biomedical studies, researchers are often interested in assessing the association between one or more ordinal explanatory variables and an outcome variable, at the same time adjusting for covariates of any type. The outcome variable may be continuous, binary, or represent censored survival times. In the absence of a precise knowledge of the response function, using monotonicity constraints on the ordinal variables improves efficiency in estimating parameters, especially when sample sizes are small. This package implements an active set algorithm that efficiently computes such estimators.
Offers tools to estimate and visualize levels of major pollutants (CO, NO2, SO2, Ozone, PM2.5 and PM10) across the conterminous United States for user-defined time ranges. Provides functions to retrieve pollutant data from the U.S. Environmental Protection Agencyâ s Air Quality System (AQS) API service <https://aqs.epa.gov/aqsweb/documents/data_api.html> for interactive visualization through a shiny application, allowing users to explore pollutant levels for a given location over time relative to the National Ambient Air Quality Standards (NAAQS).
Given bincount data from single-cell copy number profiling (segmented or unsegmented), estimates ploidy, and uses the ploidy estimate to scale the data to absolute copy numbers. Uses the modular quantogram proposed by Kendall (1986) <doi:10.1002/0471667196.ess2129.pub2>, modified by weighting segments according to confidence, and quantifying confidence in the estimate using a theoretical quantogram. Includes optional fused-lasso segmentation with the algorithm in Johnson (2013) <doi:10.1080/10618600.2012.681238>, using the implementation from glmgen by Arnold, Sadhanala, and Tibshirani.
Transforms long data into a matrix form to allow for ease of input into modelling packages for regression, principal components, imputation or machine learning. It does this by pivoting on user defined columns, generating a key-value table for variable names to ensure one-to-one mappings are preserved. It is particularly useful when the indicator names in the columns are long descriptive strings, for example "Energy imports, net (% of energy use)". High level analysis wrapper functions for correlation and principal components analysis are provided.
Standard and extensible Eddy-Covariance data post-processing (Wutzler et al. (2018) <doi:10.5194/bg-15-5015-2018>) includes uStar-filtering, gap-filling, and flux-partitioning. The Eddy-Covariance (EC) micrometeorological technique quantifies continuous exchange fluxes of gases, energy, and momentum between an ecosystem and the atmosphere. It is important for understanding ecosystem dynamics and upscaling exchange fluxes. (Aubinet et al. (2012) <doi:10.1007/978-94-007-2351-1>). This package inputs pre-processed (half-)hourly data and supports further processing. First, a quality-check and filtering is performed based on the relationship between measured flux and friction velocity (uStar) to discard biased data (Papale et al. (2006) <doi:10.5194/bg-3-571-2006>). Second, gaps in the data are filled based on information from environmental conditions (Reichstein et al. (2005) <doi:10.1111/j.1365-2486.2005.001002.x>). Third, the net flux of carbon dioxide is partitioned into its gross fluxes in and out of the ecosystem by night-time based and day-time based approaches (Lasslop et al. (2010) <doi:10.1111/j.1365-2486.2009.02041.x>).
This package provides statistical tests for label-free LC-MS/MS data by spectral counts, to discover differentially expressed proteins between two biological conditions. Three tests are available: Poisson GLM regression, quasi-likelihood GLM regression, and the negative binomial of the edgeR package. The three models admit blocking factors to control for nuisance variables. To assure a good level of reproducibility a post-test filter is available, where we may set the minimum effect size considered biologicaly relevant, and the minimum expression of the most abundant condition.
The package provides functions to create and use transcript-centric annotation databases/packages. The annotation for the databases are directly fetched from Ensembl using their Perl API. The functionality and data is similar to that of the TxDb packages from the GenomicFeatures package, but, in addition to retrieve all gene/transcript models and annotations from the database, the ensembldb package also provides a filter framework allowing to retrieve annotations for specific entries like genes encoded on a chromosome region or transcript models of lincRNA genes.
This package provides a few functions aim to provide a statistic tool for three purposes. First, simulate kin pairs data based on the assumption that every trait is affected by genetic effects (A), common environmental effects (C) and unique environmental effects (E).Second, use kin pairs data to fit an ACE model and get model fit output.Third, calculate power of A estimate given a specific condition. For the mechanisms of power calculation, we suggest to check Visscher(2004)<doi:10.1375/twin.7.5.505>.
This package contains several functions for equivalence testing and practical significance testing. First, the tsti() command provides an automatic computation of three-sided testing results for a given estimate, standard error, and region of practical equivalence. For details, see Goeman, Solari, & Stijnen (2010) <doi:10.1002/sim.4002> and Isager & Fitzgerald (2024) <doi:10.31234/osf.io/8y925>. Second, the lddtest() command performs logarithmic density discontinuity equivalence testing for regression discontinuity designs. For reference, see Fitzgerald (2025) <doi:10.31222/osf.io/2dgrp_v1>.
This package provides a framework to detect Differential Item Functioning (DIF) in Generalized Partial Credit Models (GPCM) and special cases of the GPCM as proposed by Schauberger and Mair (2019) <doi:10.3758/s13428-019-01224-2>. A joint model is set up where DIF is explicitly parametrized and penalized likelihood estimation is used for parameter selection. The big advantage of the method called GPCMlasso is that several variables can be treated simultaneously and that both continuous and categorical variables can be used to detect DIF.
This package provides methods include converting series of event names to strings, finding common patterns in a group of strings, discovering featured patterns when comparing two groups of strings as well as the number and starting position of each pattern in each string, obtaining transition matrix, computing transition entropy, statistically comparing the difference between two groups of strings, and clustering string groups. Event names can be any action names or labels such as events in log files or areas of interest (AOIs) in eye tracking research.