Simplify the loading matrix in factor models using the l1 criterion as proposed in Freyaldenhoven (2025) <doi:10.21799/frbp.wp.2020.25>. Given a data matrix, find the rotation of the loading matrix with the smallest l1-norm and/or test for the presence of local factors with main function local_factors()
.
This package provides a number of testthat tests that can be used to verify that tidy()
, glance()
and augment()
methods meet consistent specifications. This allows methods for the same generic to be spread across multiple packages, since all of those packages can make the same guarantees to users about returned objects.
Two protein complex-based group regression models (PCLasso and PCLasso2) for risk protein complex identification. PCLasso is a prognostic model that identifies risk protein complexes associated with survival. PCLasso2 is a classification model that identifies risk protein complexes associated with classes. For more information, see Wang and Liu (2021) <doi:10.1093/bib/bbab212>.
Generation of multiple count, binary and ordinal variables simultaneously given the marginal characteristics and association structure. Throughout the package, the word Poisson is used to imply count data under the assumption of Poisson distribution. The details of the method are explained in Amatya, A. and Demirtas, H. (2015) <DOI:10.1080/00949655.2014.953534>.
Load and export SomaScan
data via the Standard BioTools
, Inc. structured text file called an ADAT ('*.adat'). For file format see <https://github.com/SomaLogic/SomaLogic-Data/blob/main/README.md>
. The package also exports auxiliary functions for manipulating, wrangling, and extracting relevant information from an ADAT object once in memory.
This package provides a helper function, to bulk read SQL code from separate files and load it into an R list, where the list elements contain the individual statements and queries as strings. This works by annotating the SQL code with a name comment, which also will be the name of the list element.
Utilizing the OpenAI
API as the back end (<https://platform.openai.com/docs/api-reference>), TheOpenAIR
offers R wrapper functions for the ChatGPT
endpoint and several high-level functions that enable the integration of ChatGPT
capabilities in diverse data-related tasks, such as data cleansing and automated analytics script generation.
Statistical exploration of textual corpora using several methods from French Textometrie (new name of Lexicometrie') and French Data Analysis schools. It includes methods for exploring irregularity of distribution of lexicon features across text sets or parts of texts (Specificity analysis); multi-dimensional exploration (Factorial analysis), etc. Those methods are used in the TXM software.
This package provides functions to scale, log-transform and fit linear models within a tidyverse'-style R code framework. Intended to smooth over inconsistencies in output of base R statistical functions, allowing ease of teaching, learning and daily use. Inspired by the tidy principles used in broom Robinson (2017) <doi:10.21105/joss.00341>.
This package comprises a set of pretrained machine learning models to predict basic immune cell types. This enables to quickly get a first annotation of the cell types present in the dataset without requiring prior knowledge. The package also lets you train using own models to predict new cell types based on specific research needs.
This package provides high level functions for reading Affy .CEL
files, phenotypic data, and then computing simple things with it, such as t-tests, fold changes and the like. It makes heavy use of the affy
library. It also has some basic scatter plot functions and mechanisms for generating high resolution journal figures.
This package provides a model agnostic tool for decomposition of predictions from black boxes. It supports additive attributions and attributions with interactions. The Break Down Table shows contributions of every variable to a final prediction. The Break Down Plot presents variable contributions in a concise graphical way. This package works for classification and regression models.
Pry Doc is a Pry REPL plugin. It provides extended documentation support for the REPL by means of improving the show-doc
and show-source
commands. With help of the plugin the commands are be able to display the source code and the docs of Ruby methods and classes implemented in C.
This package implements the methodology of "Cannings, T. I. and Samworth, R. J. (2017) Random-projection ensemble classification, J. Roy. Statist. Soc., Ser. B. (with discussion), 79, 959--1035". The random projection ensemble classifier is a general method for classification of high-dimensional data, based on careful combination of the results of applying an arbitrary base classifier to random projections of the feature vectors into a lower-dimensional space. The random projections are divided into non-overlapping blocks, and within each block the projection yielding the smallest estimate of the test error is selected. The random projection ensemble classifier then aggregates the results of applying the base classifier on the selected projections, with a data-driven voting threshold to determine the final assignment.
Data on the first 24 seasons of the UK TV show I'm a Celebrity, Get Me Out of Here', broadcast from 2002-2024. Taken from the Wikipedia pages for each season and the main page available at <https://en.wikipedia.org/wiki/I%27m_a_Celebrity...Get_Me_Out_of_Here!_(British_TV_series)>.
It helps in development of a principal component analysis based composite index by assigning weights to variables and combining the weighted variables. For method details see Sendhil, R., Jha, A., Kumar, A. and Singh, S. (2018). <doi:10.1016/j.ecolind.2018.02.053>, and Wu, T. (2021). <doi:10.1016/j.ecolind.2021.108006>.
Three general demographic decomposition methods: Pseudo-continuous decomposition proposed by Horiuchi, Wilmoth, and Pletcher (2008) <doi:10.1353/dem.0.0033>, stepwise replacement decomposition proposed by Andreev, Shkolnikov and Begun (2002) <doi:10.4054/DemRes.2002.7.14>
, and lifetable response experiments proposed by Caswell (1989) <doi:10.1016/0304-3800(89)90019-7>.
This package provides a unified framework to building Area Deprivation Index (ADI), Social Vulnerability Index (SVI), and Neighborhood Deprivation Index (NDI) deprivation measures and accessing related data from the U.S. Census Bureau such as Gini coefficient data. Tools are also available for calculating percentiles, quantiles, and for creating clear map breaks for data visualization.
For multiple full/partial ranking lists, R package ExtMallows
can (1) detect whether the input ranking lists are over-correlated, and (2) use the Mallows model or extended Mallows model to integrate the ranking lists, and (3) use hierarchical extended Mallows model for rank integration if there are groups of over-correlated ranking lists.
Downloads a satellite image via ESRI and maptiles (these are originally from a variety of aerial photography sources), translates the image into a perceptually uniform color space, runs one of a few different clustering algorithms on the colors in the image searching for a user-supplied number of colors, and returns the resulting color palette.
This package provides efficient geospatial thinning algorithms to reduce the density of coordinate data while maintaining spatial relationships. Implements K-D Tree and brute-force distance-based thinning, as well as grid-based and precision-based thinning methods. For more information on the methods, see Elseberg et al. (2012) <https://hdl.handle.net/10446/86202>.
Consider the linear mixed model with normal random effects. A typical method to solve Henderson's Mixed Model Equations (HMME) is recursive estimation of the fixed effects and random effects. We provide a fast, stable, and scalable solver to the HMME without computing matrix inverse. See Kim (2017) <arXiv:1710.09663>
for more details.
Ternary plots made simple. This package allows to create ternary plots using graphics'. It provides functions to display the data in the ternary space, to add or tune graphical elements and to display statistical summaries. It also includes common ternary diagrams which are useful for the archaeologist (e.g. soil texture charts, ceramic phase diagram).
Implementation of methods Extremum Surface Estimator (ESE) and Extremum Distance Estimator (EDE) to identify the inflection point of a curve . Christopoulos, DT (2014) <doi:10.48550/arXiv.1206.5478>
. Christopoulos, DT (2016) <https://demovtu.veltech.edu.in/wp-content/uploads/2016/04/Paper-04-2016.pdf> . Christopoulos, DT (2016) <doi:10.2139/ssrn.3043076> .