Statistical exploration of textual corpora using several methods from French Textometrie (new name of Lexicometrie') and French Data Analysis schools. It includes methods for exploring irregularity of distribution of lexicon features across text sets or parts of texts (Specificity analysis); multi-dimensional exploration (Factorial analysis), etc. Those methods are used in the TXM software.
This package implements the methodology of "Cannings, T. I. and Samworth, R. J. (2017) Random-projection ensemble classification, J. Roy. Statist. Soc., Ser. B. (with discussion), 79, 959--1035". The random projection ensemble classifier is a general method for classification of high-dimensional data, based on careful combination of the results of applying an arbitrary base classifier to random projections of the feature vectors into a lower-dimensional space. The random projections are divided into non-overlapping blocks, and within each block the projection yielding the smallest estimate of the test error is selected. The random projection ensemble classifier then aggregates the results of applying the base classifier on the selected projections, with a data-driven voting threshold to determine the final assignment.
countsimQC provides functionality to create a comprehensive report comparing a broad range of characteristics across a collection of count matrices. One important use case is the comparison of one or more synthetic count matrices to a real count matrix, possibly the one underlying the simulations. However, any collection of count matrices can be compared.
DNAZooData is a data package giving programmatic access to genome assemblies and Hi-C contact matrices uniformly processed by the [DNA Zoo Consortium](https://www.dnazoo.org/). The matrices are available in the multi-resolution `.hic` format. A URL to corrected genome assemblies in `.fastq` format is also provided to the end-user.
This package provides a package for the integrative analysis of RNA-seq or microarray based gene transcription and histone modification data obtained by ChIP-seq. The package provides methods for data preprocessing and matching as well as methods for fitting bayesian mixture models in order to detect genes with differences in both data types.
The Lheuristic package identifies scatterpots that follow and L-shaped, negative distribution. It can be used to identify genes regulated by methylation by integration of an expression and a methylation array. The package uses two different methods to detect expression and methyaltion L- shapped scatterplots. The parameters can be changed to detect other scatterplot patterns.
This package provides a collection of functions dealing with labelled data, like reading and writing data between R and other statistical software packages. This includes easy ways to get, set or change value and variable label attributes, to convert labelled vectors into factors or numeric (and vice versa), or to deal with multiple declared missing values.
This package provides functionality for client-side navigation of the server side file system in shiny apps. In case the app is running locally this gives the user direct access to the file system without the need to "download" files to a temporary location. Both file and folder selection as well as file saving is available.
Read in activity measurements from standard file formats used by circadian rhythm researchers, currently only ClockLab format, and process and plot the data. The central type of plot is the actogram, as first described in "Activity and distribution of certain wild mice in relation to biotic communities" by MS Johnson (1926) doi:10.2307/1373575.
Data on the first 24 seasons of the UK TV show I'm a Celebrity, Get Me Out of Here', broadcast from 2002-2024. Taken from the Wikipedia pages for each season and the main page available at <https://en.wikipedia.org/wiki/I%27m_a_Celebrity...Get_Me_Out_of_Here!_(British_TV_series)>.
It helps in development of a principal component analysis based composite index by assigning weights to variables and combining the weighted variables. For method details see Sendhil, R., Jha, A., Kumar, A. and Singh, S. (2018). <doi:10.1016/j.ecolind.2018.02.053>, and Wu, T. (2021). <doi:10.1016/j.ecolind.2021.108006>.
This package provides a unified framework to building Area Deprivation Index (ADI), Social Vulnerability Index (SVI), and Neighborhood Deprivation Index (NDI) deprivation measures and accessing related data from the U.S. Census Bureau such as Gini coefficient data. Tools are also available for calculating percentiles, quantiles, and for creating clear map breaks for data visualization.
Three general demographic decomposition methods: Pseudo-continuous decomposition proposed by Horiuchi, Wilmoth, and Pletcher (2008) <doi:10.1353/dem.0.0033>, stepwise replacement decomposition proposed by Andreev, Shkolnikov and Begun (2002) <doi:10.4054/DemRes.2002.7.14>, and lifetable response experiments proposed by Caswell (1989) <doi:10.1016/0304-3800(89)90019-7>.
For multiple full/partial ranking lists, R package ExtMallows can (1) detect whether the input ranking lists are over-correlated, and (2) use the Mallows model or extended Mallows model to integrate the ranking lists, and (3) use hierarchical extended Mallows model for rank integration if there are groups of over-correlated ranking lists.
Downloads a satellite image via ESRI and maptiles (these are originally from a variety of aerial photography sources), translates the image into a perceptually uniform color space, runs one of a few different clustering algorithms on the colors in the image searching for a user-supplied number of colors, and returns the resulting color palette.
This package provides efficient geospatial thinning algorithms to reduce the density of coordinate data while maintaining spatial relationships. Implements K-D Tree and brute-force distance-based thinning, as well as grid-based and precision-based thinning methods. For more information on the methods, see Elseberg et al. (2012) <https://hdl.handle.net/10446/86202>.
Consider the linear mixed model with normal random effects. A typical method to solve Henderson's Mixed Model Equations (HMME) is recursive estimation of the fixed effects and random effects. We provide a fast, stable, and scalable solver to the HMME without computing matrix inverse. See Kim (2017) <arXiv:1710.09663> for more details.
An eclectic collection of short stories and poetry with topics on climate strange, connecting the geopolitical dots, the myth of us versus them, and the idiocy of war. Please refer to the COPYRIGHTS file and the text_citation.cff file for the reference copyright information and for the complete citations of the reference sources, respectively.
Ternary plots made simple. This package allows to create ternary plots using graphics'. It provides functions to display the data in the ternary space, to add or tune graphical elements and to display statistical summaries. It also includes common ternary diagrams which are useful for the archaeologist (e.g. soil texture charts, ceramic phase diagram).
Implementation of methods Extremum Surface Estimator (ESE) and Extremum Distance Estimator (EDE) to identify the inflection point of a curve . Christopoulos, DT (2014) <doi:10.48550/arXiv.1206.5478> . Christopoulos, DT (2016) <https://demovtu.veltech.edu.in/wp-content/uploads/2016/04/Paper-04-2016.pdf> . Christopoulos, DT (2016) <doi:10.2139/ssrn.3043076> .
Adjusted odds ratio conditional on potential confounders can be directly obtained from logistic regression. However, those adjusted odds ratios have been widely incorrectly interpreted as a relative risk. As relative risk is often of interest in public health, we provide a simple code to return adjusted relative risks from logistic regression model under potential confounders.
This package provides a framework that allows for easy logging of changes in data. Main features: start tracking changes by adding a single line of code to an existing script. Track changes in multiple datasets, using multiple loggers. Add custom-built loggers or use loggers offered by other packages. <doi:10.18637/jss.v098.i01>.
Computational routines for estimating local Gaussian parameters. Local Gaussian parameters are useful for characterizing and testing for non-linear dependence within bivariate data. See e.g. Tjostheim and Hufthammer, Local Gaussian correlation: A new measure of dependence, Journal of Econometrics, 2013, Volume 172 (1), pages 33-48 <DOI:10.1016/j.jeconom.2012.08.001>.
Compute the coefficient of determination for outcomes in n-dimensions. May be useful for multidimensional predictions (such as a multinomial model) or calculating goodness of fit from latent variable models such as probabilistic topic models like latent Dirichlet allocation or deterministic topic models like latent semantic analysis. Based on Jones (2019) <arXiv:1911.11061>.