Knowledge space theory by Doignon and Falmagne (1999) <doi:10.1007/978-3-642-58625-5> is a set- and order-theoretical framework, which proposes mathematical formalisms to operationalize knowledge structures in a particular domain. The kstMatrix package provides basic functionalities to generate, handle, and manipulate knowledge structures and knowledge spaces. Opposed to the kst package, kstMatrix uses matrix representations for knowledge structures. Furthermore, kstMatrix contains several knowledge spaces developed by the research group around Cornelia Dowling through querying experts.
Lights Out is a puzzle game consisting of a grid of lights that are either on or off. Pressing any light will toggle it and its adjacent lights. The goal of the game is to switch all the lights off. This package provides an interface to play the game on different board sizes, both through the command line or with a visual application. Puzzles can also be solved using the automatic solver included. View a demo online at <https://daattali.com/shiny/lightsout/>.
Alternate font rendering is useful when rendering text to novel graphics outputs where modern font rendering is not available or where bespoke text positioning is required. Bitmap and vector fonts allow for custom layout and rendering using pixel coordinates and line drawing. Formatted text is created as a data.frame of pixel coordinates (for bitmap fonts) or stroke coordinates (for vector fonts). All text can be easily previewed as a matrix or raster image. A selection of fonts is included with this package.
Fast approximate methods for mixed logistic regression in genome-wide analysis studies (GWAS). Two computationnally efficient methods are proposed for obtaining effect size estimates (beta) in Mixed Logistic Regression in GWAS: the Approximate Maximum Likelihood Estimate (AMLE), and the Offset method. The wald test obtained with AMLE is identical to the score test. Data can be genotype matrices in plink format, or dosage (VCF files). The methods are described in details in Milet et al (2020) <doi:10.1101/2020.01.17.910109>.
This package provides standardized effect decomposition (direct, indirect, and total effects) for three major structural equation modeling frameworks: lavaan', piecewiseSEM', and plspm'. Automatically handles zero-effect variables, generates publication-ready ggplot2 visualizations, and returns both wide-format and long-format effect tables. Supports effect filtering, multi-model object inputs, and customizable visualization parameters. For a general overview of the methods used in this package, see Rosseel (2012) <doi:10.18637/jss.v048.i02> and Lefcheck (2016) <doi:10.1111/2041-210X.12512>.
Calculate the statistical power to detect clusters using kernel-based spatial relative risk functions that are estimated using the sparr package. Details about the sparr package methods can be found in the tutorial: Davies et al. (2018) <doi:10.1002/sim.7577>. Details about kernel density estimation can be found in J. F. Bithell (1990) <doi:10.1002/sim.4780090616>. More information about relative risk functions using kernel density estimation can be found in J. F. Bithell (1991) <doi:10.1002/sim.4780101112>.
Testing and documenting code that communicates with remote servers can be painful. This package helps with writing tests for packages that use httr2. It enables testing all of the logic on the R sides of the API without requiring access to the remote service, and it also allows recording real API responses to use as test fixtures. The ability to save responses and load them offline also enables writing vignettes and other dynamic documents that can be distributed without access to a live server.
The Mass Spec Query Language (MassQL) is a domain-specific language enabling to express a query and retrieve mass spectrometry (MS) data in a more natural and understandable way for MS users. It is inspired by SQL and is by design programming language agnostic. The SpectraQL package adds support for the MassQL query language to R, in particular to MS data represented by Spectra objects. Users can thus apply MassQL expressions to analyze and retrieve specific data from Spectra objects.
Create beautiful and interactive visualizations in a single function call. The data.table package is utilized to perform the data wrangling necessary to prepare your data for the plot types you wish to build, along with allowing fast processing for big data. There are two broad classes of plots available: standard plots and machine learning evaluation plots. There are lots of parameters available in each plot type function for customizing the plots (such as faceting) and data wrangling (such as variable transformations and aggregation).
Identify and visualize individuals with unusual association patterns of genetics and geography using the approach of Chang and Schmid (2023) <doi:10.1101/2023.04.06.535838>. It detects potential outliers that violate the isolation-by-distance assumption using the K-nearest neighbor approach. You can obtain a table of outliers with statistics and visualize unusual geo-genetic patterns on a geographical map. This is useful for landscape genomics studies to discover individuals with unusual geography and genetics associations from a large biological sample.
Recent years have seen an increased interest in novel methods for analyzing quantitative data from experimental psychology. Currently, however, they lack an established and accessible software framework. Many existing implementations provide no guidelines, consisting of small code snippets, or sets of packages. In addition, the use of existing packages often requires advanced programming experience. PredPsych is a user-friendly toolbox based on machine learning predictive algorithms. It comprises of multiple functionalities for multivariate analyses of quantitative behavioral data based on machine learning models.
This package implements our Bayesian phase I repeated measurement design that accounts for multidimensional toxicity endpoints from multiple treatment cycles. The package also provides a novel design to account for both multidimensional toxicity endpoints and early-stage efficacy endpoints in the phase I design. For both designs, functions are provided to recommend the next dosage selection based on the data collected in the available patient cohorts and to simulate trial characteristics given design parameters. Yin, Jun, et al. (2017) <doi:10.1002/sim.7134>.
Traditional methods for analyzing single cell RNA-seq datasets focus solely on gene expression, but this package introduces a novel approach that goes beyond this limitation. Using Gene Ontology terms as features, the package allows for the functional profile of cell populations, and comparison within and between datasets from the same or different species. Our approach enables the discovery of previously unrecognized functional similarities and differences between cell types and has demonstrated success in identifying cell types functional correspondence even between evolutionarily distant species.
For multiple ranked input lists (full or partial) representing the same set of N objects, the package TopKLists <doi:10.1515/sagmb-2014-0093> offers (1) statistical inference on the lengths of informative top-k lists, (2) stochastic aggregation of full or partial lists, and (3) graphical tools for the statistical exploration of input lists, and for the visualization of aggregation results. Note that RGtk2 and gWidgets2RGtk2 have been archived on CRAN. See <https://github.com/pievos101/TopKLists> for installation instructions.
coMethDMR identifies genomic regions associated with continuous phenotypes by optimally leverages covariations among CpGs within predefined genomic regions. Instead of testing all CpGs within a genomic region, coMethDMR carries out an additional step that selects co-methylated sub-regions first without using any outcome information. Next, coMethDMR tests association between methylation within the sub-region and continuous phenotype using a random coefficient mixed effects model, which models both variations between CpG sites within the region and differential methylation simultaneously.
This package implements persistent row and column annotations for R matrices. The annotations associated with rows and columns are preserved after subsetting, transposition, and various other matrix-specific operations. Intended use case is for storing and manipulating genomic datasets which typically consist of a matrix of measurements (like gene expression values) as well as annotations about rows (i.e. genomic locations) and annotations about columns (i.e. meta-data about collected samples). But annmatrix objects are also expected to be useful in various other contexts.
Enrichment strategies play a critical role in modern clinical trial design, especially as precision medicine advances the focus on patient-specific efficacy. Recent developments in enrichment design have introduced biomarker randomness and accounted for the correlation structure between treatment effect and biomarker, resulting in a two-stage threshold enrichment design. We propose novel two-stage enrichment designs capable of handling two or more continuous biomarkers. See Zhang, F. and Gou, J. (2025). Using multiple biomarkers for patient enrichment in two-stage clinical designs. Technical Report.
Analysis of trade in value added with international input-output tables. Includes commands for easy data extraction, matrix manipulation, decomposition of value added in gross exports and calculation of value added indicators, with full geographical and sector customization. Decomposition methods include Borin and Mancini (2023) <doi:10.1080/09535314.2022.2153221>, Miroudot and Ye (2021) <doi:10.1080/09535314.2020.1730308>, Wang et al. (2013) <https://econpapers.repec.org/paper/nbrnberwo/19677.htm> and Koopman et al. (2014) <doi:10.1257/aer.104.2.459>.
This package provides a handy tool to calculate carbon footprints from air travel based on three-letter International Air Transport Association (IATA) airport codes or latitude and longitude. footprint first calculates the great-circle distance between departure and arrival destinations. It then uses the Department of Environment, Food & Rural Affairs (DEFRA) greenhouse gas conversion factors for business air travel to estimate the carbon footprint. These conversion factors consider trip length, flight class (e.g. economy, business), and emissions metric (e.g. carbon dioxide equivalent, methane).
We introduce factor models designed to jointly analyze high-dimensional count data from multiple studies by extracting study-shared and specified factors. Our factor models account for heterogeneous noises and overdispersion among counts with augmented covariates. We propose an efficient and speedy variational estimation procedure for estimating model parameters, along with a novel criterion for selecting the optimal number of factors and the rank of regression coefficient matrix. More details can be referred to Liu et al. (2024) <doi:10.48550/arXiv.2402.15071>.
There are two functions-meta2d and meta3d for detecting rhythmic signals from time-series datasets. For analyzing time-series datasets without individual information, meta2d is suggested, which could incorporates multiple methods from ARSER, JTK_CYCLE and Lomb-Scargle in the detection of interested rhythms. For analyzing time-series datasets with individual information, meta3d is suggested, which takes use of any one of these three methods to analyze time-series data individual by individual and gives out integrated values based on analysis result of each individual.
In biomedical studies, researchers are often interested in assessing the association between one or more ordinal explanatory variables and an outcome variable, at the same time adjusting for covariates of any type. The outcome variable may be continuous, binary, or represent censored survival times. In the absence of a precise knowledge of the response function, using monotonicity constraints on the ordinal variables improves efficiency in estimating parameters, especially when sample sizes are small. This package implements an active set algorithm that efficiently computes such estimators.
Offers tools to estimate and visualize levels of major pollutants (CO, NO2, SO2, Ozone, PM2.5 and PM10) across the conterminous United States for user-defined time ranges. Provides functions to retrieve pollutant data from the U.S. Environmental Protection Agencyâ s Air Quality System (AQS) API service <https://aqs.epa.gov/aqsweb/documents/data_api.html> for interactive visualization through a shiny application, allowing users to explore pollutant levels for a given location over time relative to the National Ambient Air Quality Standards (NAAQS).
Given bincount data from single-cell copy number profiling (segmented or unsegmented), estimates ploidy, and uses the ploidy estimate to scale the data to absolute copy numbers. Uses the modular quantogram proposed by Kendall (1986) <doi:10.1002/0471667196.ess2129.pub2>, modified by weighting segments according to confidence, and quantifying confidence in the estimate using a theoretical quantogram. Includes optional fused-lasso segmentation with the algorithm in Johnson (2013) <doi:10.1080/10618600.2012.681238>, using the implementation from glmgen by Arnold, Sadhanala, and Tibshirani.