In Switzerland, the landscape of municipalities is changing rapidly mainly due to mergers. The Swiss Municipal Data Merger Tool automatically detects these mutations and maps municipalities over time, i.e. municipalities of an old state to municipalities of a new state. This functionality is helpful when working with datasets that are based on different spatial references. The package's idea and use case is discussed in the following article: <doi:10.1111/spsr.12487>.
This package performs variable selection/feature reduction under a clustering or classification framework. In particular, it can be used in an automated fashion using mixture model-based methods ('teigen and mclust are currently supported). Can account for mixtures of non-Gaussian distributions via Manly transform (via ManlyMix
'). See Andrews and McNicholas
(2014) <doi:10.1007/s00357-013-9139-2> and Neal and McNicholas
(2023) <doi:10.48550/arXiv.2305.16464>
.
Implementation of the weighted iterative proportional fitting (WIPF) procedure for updating/adjusting a N-dimensional array (currently N<=3) given a weight structure and some target marginals. Acknowledgements: The author wish to thank Ministerio de Ciencia, Innovación y Universidades (grant PID2021-128228NB-I00) and Fundación Mapfre (grant Modelización espacial e intra-anual de la mortalidad en España. Una herramienta automática para el cálculo de productos de vida') for supporting this research.
Perform fast functional enrichment on feature lists (like genes or proteins) using the hypergeometric distribution. Tailored for speed, this package is ideal for interactive platforms such as Shiny. It supports the retrieval of functional data from sources like GO, KEGG, Reactome, Bioplanet and WikiPathways
. By downloading and preparing data first, it allows for rapid successive tests on various feature selections without the need for repetitive, time-consuming preparatory steps typical of other packages.
The SCDE package implements a set of statistical methods for analyzing single-cell RNA-seq data. SCDE fits individual error models for single-cell RNA-seq measurements. These models can then be used for assessment of differential expression between groups of cells, as well as other types of analysis. The SCDE package also contains the pagoda framework which applies pathway and gene set overdispersion analysis to identify aspects of transcriptional heterogeneity among single cells.
This package offers an implementation of the Abnormal blood profile score (ABPS). The ABPS is a part of the Athlete biological passport program of the World anti-doping agency, which combines several blood parameters into a single score in order to detect blood doping. The package also contains functions to calculate other scores used in anti-doping programs, such as the ratio of hemoglobin to reticulocytes (OFF-score), as well as example data.
Mappable vector library provides convenient way to access large datasets. Use all of your data at once, with few limits. Memory mapped data can be shared between multiple R processes. Access speed depends on storage medium, so solid state drive is recommended, preferably with PCI Express (or M.2 nvme) interface or a fast network file system. The data is memory mapped into R and then accessed using usual R list and array subscription operators. Convenience functions are provided for merging, grouping and indexing large vectors and data.frames. The layout of underlying MVL files is optimized for large datasets. The vectors are stored to guarantee alignment for vector intrinsics after memory map. The package is built on top of libMVL
, which can be used as a standalone C library. libMVL
has simple C API making it easy to interchange datasets with outside programs. Large MVL datasets are distributed via Academic Torrents <https://academictorrents.com/collection/mvl-datasets>.
This package provides functions to fit Accurate Generalized Linear Model (AGLM) models, visualize them, and predict for new data. AGLM is defined as a regularized GLM which applies a sort of feature transformations using a discretization of numerical features and specific coding methodologies of dummy variables. For more information on AGLM, see Suguru Fujita, Toyoto Tanaka, Kenji Kondo and Hirokazu Iwasawa (2020) <https://www.institutdesactuaires.com/global/gene/link.php?doc_id=16273&fg=1>.
This package implements bound constrained optimal sample size allocation (BCOSSA) framework described in Bulus & Dong (2021) <doi:10.1080/00220973.2019.1636197> for power analysis of multilevel regression discontinuity designs (MRDDs) and multilevel randomized trials (MRTs) with continuous outcomes. Minimum detectable effect size (MDES) and power computations for MRDDs allow polynomial functional form specification for the score variable (with or without interaction with the treatment indicator). See Bulus (2021) <doi:10.1080/19345747.2021.1947425>.
Written to help undergraduate as well as graduate students to get started with R for basic econometrics without the need to import specific functions and datasets from many different sources. Primarily, the package is meant to accompany the German textbook Auer, L.v., Hoffmann, S., Kranz, T. (2024, ISBN: 978-3-662-68263-0) from which the exercises cover all the topics from the textbook Auer, L.v. (2023, ISBN: 978-3-658-42699-6).
This package implements a Markov Chain Monte Carlo algorithm to approximate exact conditional inference for logistic regression models. Exact conditional inference is based on the distribution of the sufficient statistics for the parameters of interest given the sufficient statistics for the remaining nuisance parameters. Using model formula notation, users specify a logistic model and model terms of interest for exact inference. See Zamar et al. (2007) <doi:10.18637/jss.v021.i03> for more details.
This package provides functions to compute the Generalized Dynamic Principal Components introduced in Peña and Yohai (2016) <DOI:10.1080/01621459.2015.1072542>. The implementation includes an automatic procedure proposed in Peña, Smucler and Yohai (2020) <DOI:10.18637/jss.v092.c02> for the identification of both the number of lags to be used in the generalized dynamic principal components as well as the number of components required for a given reconstruction accuracy.
This package provides functions are provided for quantifying evolution and selection on complex traits. The package implements effective handling and analysis algorithms scaled for genome-wide data and calculates a composite statistic, denoted Ghat, which is used to test for selection on a trait. The package provides a number of simple examples for handling and analysing the genome data and visualising the output and results. Beissinger et al., (2018) <doi:10.1534/genetics.118.300857>.
IRT-M is a semi-supervised approach based on Bayesian Item Response Theory that produces theoretically identified underlying dimensions from input data and a constraints matrix. The methodology is fully described in Morucci et al. (2024), "Measurement That Matches Theory: Theory-Driven Identification in Item Response Theory Models"'. Details are available at <https://www.cambridge.org/core/journals/american-political-science-review/article/measurement-that-matches-theory-theorydriven-identification-in-item-response-theory-models/395DA1DFE3DCD7B866DC053D7554A30B>.
For fitting Bayesian joint latent class and regression models using Gibbs sampling. See the documentation for the model. The technical details of the model implemented here are described in Elliott, Michael R., Zhao, Zhangchen, Mukherjee, Bhramar, Kanaya, Alka, Needham, Belinda L., "Methods to account for uncertainty in latent class assignments when using latent classes as predictors in regression models, with application to acculturation strategy measures" (2020) In press at Epidemiology <doi:10.1097/EDE.0000000000001139>.
This package provides functions to fit linear mixed models based on convolutions of the generalized Laplace (GL) distribution. The GL mixed-effects model includes four special cases with normal random effects and normal errors (NN), normal random effects and Laplace errors (NL), Laplace random effects and normal errors (LN), and Laplace random effects and Laplace errors (LL). The methods are described in Geraci and Farcomeni (2020, Statistical Methods in Medical Research) <doi:10.1177/0962280220903763>.
Function for growing survival trees ensemble ('Naz Gul', Nosheen Faiz', Dan Brawn', Rafal Kulakowski', Zardad Khan', and Berthold Lausen (2020) <arXiv:2005.09043>
) is given. The trees are grown by the method of random survival forest ('Marvin Wright', Andreas Ziegler (2017) <doi:10.18637/jss.v077.i01>). The survival trees grown are assessed for both individual and collective performances. The ensemble can give promising results on fewer survival trees selected in the final ensemble.
Two stage curvature identification with machine learning for causal inference in settings when instrumental variable regression is not suitable because of potentially invalid instrumental variables. Based on Guo and Buehlmann (2022) "Two Stage Curvature Identification with Machine Learning: Causal Inference with Possibly Invalid Instrumental Variables" <arXiv:2203.12808>
. The vignette is available in Carl, Emmenegger, Bühlmann and Guo (2023) "TSCI: two stage curvature identification for causal inference with invalid instruments" <arXiv:2304.00513>
.
The typicality and eccentricity data analysis (TEDA) framework was put forward by Angelov (2013) <DOI:10.14313/JAMRIS_2-2014/16>. It has been further developed into multiple different techniques since, and provides a non-parametric way of determining how similar an observation, from a process that is not purely random, is to other observations generated by the process. This package provides code to use the batch and recursive TEDA methods that have been published.
This package provides easy-to-use tools for data analysis and visualization for hyperspectral remote sensing (also known as imaging spectroscopy), with a particular focus on vegetation hyperspectral data analysis. It consists of a set of functions, ranging from the organization of hyperspectral data in the proper data structure for spectral feature selection, calculation of vegetation index, multivariate analysis, as well as to the visualization of spectra and results of analysis in the ggplot2 style.
This package implements the diagnostic "theta" developed in Poetscher and Preinerstorfer (2020) "How Reliable are Bootstrap-based Heteroskedasticity Robust Tests?" <arXiv:2005.04089>
. This diagnostic can be used to detect and weed out bootstrap-based procedures that provably have size equal to one for a given testing problem. The implementation covers a large variety of bootstrap-based procedures, cf. the above mentioned article for details. A function for computing bootstrap p-values is provided.
Calculates the WEGE (Weighted Endemism including Global Endangerment index) index for a particular area. Additionally it also calculates rasters of KBA's (Key Biodiversity Area) criteria (A1a, A1b, A1e, and B1), Weighted endemism (WE), the EDGE (Evolutionarily Distinct and Globally Endangered) score, Evolutionary Distinctiveness (ED) and Extinction risk (ER). Farooq, H., Azevedo, J., Belluardo F., Nanvonamuquitxo, C., Bennett, D., Moat, J., Soares, A., Faurby, S. & Antonelli, A. (2020) <doi:10.1101/2020.01.17.910299>.
The AIPW package implements the augmented inverse probability weighting, a doubly robust estimator, for average causal effect estimation with user-defined stacked machine learning algorithms. To cite the AIPW package, please use: "Yongqi Zhong, Edward H. Kennedy, Lisa M. Bodnar, Ashley I. Naimi (2021). AIPW: An R Package for Augmented Inverse Probability Weighted Estimation of Average Causal Effects. American Journal of Epidemiology. <doi:10.1093/aje/kwab207>". Visit: <https://yqzhong7.github.io/AIPW/> for more information.
Developed for the following tasks. Simulating, computing maximum likelihood estimator, computing the Fisher information matrix, computing goodness-of-fit measures, and correcting bias of the ML estimator for a wide range of distributions fitted to units placed on progressive type-I interval censoring and progressive type-II censoring plans. The methods of Cox and Snell (1968) <doi:10.1111/j.2517-6161.1968.tb00724.x> and bootstrap method for computing the bias-corrected maximum likelihood estimator.