scFeatures constructs multi-view representations of single-cell and spatial data. scFeatures is a tool that generates multi-view representations of single-cell and spatial data through the construction of a total of 17 feature types. These features can then be used for a variety of analyses using other software in Biocondutor.
Exploration of Weather Research & Forecasting ('WRF') Model data of Servicio Meteorologico Nacional (SMN) from Amazon Web Services (<https://registry.opendata.aws/smn-ar-wrf-dataset/>) cloud. The package provides the possibility of data downloading, processing and correction methods. It also has map management and series exploration of available meteorological variables of WRF forecast.
This package provides tools to estimate soil organic carbon stocks and sequestration rates in blue carbon ecosystems. BlueCarbon contains functions to estimate and correct for core compaction, estimate sample thickness, estimate organic carbon content from organic matter content, estimate organic carbon stocks and sequestration rates, and visualize the error of carbon stock extrapolation.
This package provides functions for predictor pruning using association-based and model-based approaches. Includes corrPrune() for fast correlation-based pruning, modelPrune() for VIF-based regression pruning, and exact graph-theoretic algorithms (Eppsteinâ Löfflerâ Strash, Bronâ Kerbosch) for exhaustive subset enumeration. Supports linear models, GLMs, and mixed models ('lme4', glmmTMB').
Model soil gas fluxes with the Flux-Gradient Method. It includes functions for data handling, a forward and an inverse model for flux modeling and methods for calibration and uncertainty estimation. For more details see Gartiser et al. (2025a) <doi:10.21105/joss.08094> and Gartiser et al. (2025b) <doi:10.1111/ejss.70126>.
It is sometimes necessary to create documentation for all files in a directory. Doing so by hand can be very tedious. This task is made fast and reproducible using the functionality of documenter'. It aggregates all text files in a directory and its subdirectories into a single word document in a semi-automated fashion.
Padroniza endereços brasileiros a partir de diferentes critérios. Os métodos de padronização incluem apenas manipulações básicas de strings, não oferecendo suporte a correspondências probabilà sticas entre strings. (Standardizes brazilian addresses using different criteria. Standardization methods include only basic string manipulation, not supporting probabilistic matches between strings.).
Implementation of a function which calculates the empirical excess mass for given \eqn\lambda and given maximal number of modes (excessm()). Offering powerful plot features to visualize empirical excess mass (exmplot()). This includes the possibility of drawing several plots (with different maximal number of modes / cut off values) in a single graph.
Time-based joins to analyze sequence of events, both in memory and out of memory. after_join() joins two tables of events, while funnel_start() and funnel_step() join events in the same table. With the type argument, you can switch between different funnel types, like first-first and last-firstafter.
This package implements the estimators and algorithms described in Chapters 8 and 9 of the book "The Fundamentals of Heavy Tails: Properties, Emergence, and Estimation" by Nair et al. (2022, ISBN:9781009053730). These include the Hill estimator, Moments estimator, Pickands estimator, Peaks-over-Threshold (POT) method, Power-law fit, and the Double Bootstrap algorithm.
Simplify the loading matrix in factor models using the l1 criterion as proposed in Freyaldenhoven (2025) <doi:10.21799/frbp.wp.2020.25>. Given a data matrix, find the rotation of the loading matrix with the smallest l1-norm and/or test for the presence of local factors with main function local_factors().
This package provides a number of testthat tests that can be used to verify that tidy(), glance() and augment() methods meet consistent specifications. This allows methods for the same generic to be spread across multiple packages, since all of those packages can make the same guarantees to users about returned objects.
Generation of multiple count, binary and ordinal variables simultaneously given the marginal characteristics and association structure. Throughout the package, the word Poisson is used to imply count data under the assumption of Poisson distribution. The details of the method are explained in Amatya, A. and Demirtas, H. (2015) <DOI:10.1080/00949655.2014.953534>.
Two protein complex-based group regression models (PCLasso and PCLasso2) for risk protein complex identification. PCLasso is a prognostic model that identifies risk protein complexes associated with survival. PCLasso2 is a classification model that identifies risk protein complexes associated with classes. For more information, see Wang and Liu (2021) <doi:10.1093/bib/bbab212>.
This package provides a helper function, to bulk read SQL code from separate files and load it into an R list, where the list elements contain the individual statements and queries as strings. This works by annotating the SQL code with a name comment, which also will be the name of the list element.
This package provides functions to scale, log-transform and fit linear models within a tidyverse'-style R code framework. Intended to smooth over inconsistencies in output of base R statistical functions, allowing ease of teaching, learning and daily use. Inspired by the tidy principles used in broom Robinson (2017) <doi:10.21105/joss.00341>.
Statistical exploration of textual corpora using several methods from French Textometrie (new name of Lexicometrie') and French Data Analysis schools. It includes methods for exploring irregularity of distribution of lexicon features across text sets or parts of texts (Specificity analysis); multi-dimensional exploration (Factorial analysis), etc. Those methods are used in the TXM software.
Utilizing the OpenAI API as the back end (<https://platform.openai.com/docs/api-reference>), TheOpenAIR offers R wrapper functions for the ChatGPT endpoint and several high-level functions that enable the integration of ChatGPT capabilities in diverse data-related tasks, such as data cleansing and automated analytics script generation.
This package implements the methodology of "Cannings, T. I. and Samworth, R. J. (2017) Random-projection ensemble classification, J. Roy. Statist. Soc., Ser. B. (with discussion), 79, 959--1035". The random projection ensemble classifier is a general method for classification of high-dimensional data, based on careful combination of the results of applying an arbitrary base classifier to random projections of the feature vectors into a lower-dimensional space. The random projections are divided into non-overlapping blocks, and within each block the projection yielding the smallest estimate of the test error is selected. The random projection ensemble classifier then aggregates the results of applying the base classifier on the selected projections, with a data-driven voting threshold to determine the final assignment.
This package provides functionality for client-side navigation of the server side file system in shiny apps. In case the app is running locally this gives the user direct access to the file system without the need to "download" files to a temporary location. Both file and folder selection as well as file saving is available.
Read in activity measurements from standard file formats used by circadian rhythm researchers, currently only ClockLab format, and process and plot the data. The central type of plot is the actogram, as first described in "Activity and distribution of certain wild mice in relation to biotic communities" by MS Johnson (1926) doi:10.2307/1373575.
This package provides a collection of functions dealing with labelled data, like reading and writing data between R and other statistical software packages. This includes easy ways to get, set or change value and variable label attributes, to convert labelled vectors into factors or numeric (and vice versa), or to deal with multiple declared missing values.
countsimQC provides functionality to create a comprehensive report comparing a broad range of characteristics across a collection of count matrices. One important use case is the comparison of one or more synthetic count matrices to a real count matrix, possibly the one underlying the simulations. However, any collection of count matrices can be compared.
DNAZooData is a data package giving programmatic access to genome assemblies and Hi-C contact matrices uniformly processed by the [DNA Zoo Consortium](https://www.dnazoo.org/). The matrices are available in the multi-resolution `.hic` format. A URL to corrected genome assemblies in `.fastq` format is also provided to the end-user.