This package fits a model to the pattern of dropouts in single-cell RNASeq data. This model is used as a null to identify significantly variable (i.e. differentially expressed) genes for use in downstream analysis, such as clustering cells. Also includes an method for calculating exact Pearson residuals in UMI-tagged data using a library-size aware negative binomial model.
standR is an user-friendly R package providing functions to assist conducting good-practice analysis of Nanostring's GeoMX DSP data. All functions in the package are built based on the SpatialExperiment object, allowing integration into various spatial transcriptomics-related packages from Bioconductor. standR allows data inspection, quality control, normalization, batch correction and evaluation with informative visualizations.
The pattern of digestion and protection from DNA nucleases such as DNAse I, micrococcal nuclease, and Tn5 transposase can be used to infer the location of associated proteins. This package contains useful functions to analyze patterns of paired-end sequencing fragment density. VplotR facilitates the generation of V-plots and footprint profiles over single or aggregated genomic loci of interest.
This package provides an implementation of an algorithm for general-purpose unconstrained non-linear optimization. The algorithm is of quasi-Newton type with BFGS updating of the inverse Hessian and soft line search with a trust region type monitoring of the input to the line search algorithm. The interface of ucminf is designed for easy interchange with the package optim.
This package contain data sets and utilities from Project MOSAIC used to teach mathematics, statistics, computation and modeling. Project MOSAIC is a community of educators working to tie together aspects of quantitative work that students in science, technology, engineering and mathematics will need in their professional lives, but which are usually taught in isolation, if at all.
The tictoc package provides the timing functions tic and toc that can be nested. It provides an alternative to system.time() with a different syntax similar to that in another well-known software package. tic and toc are easy to use, and are especially useful when timing several sections in more than a few lines of code.
This package implements two methods of estimating runs scored in a softball scenario: (1) theoretical expectation using discrete Markov chains and (2) empirical distribution using multinomial random simulation. Scores are based on player-specific input probabilities (out, single, double, triple, walk, and homerun). Optional inputs include probability of attempting a steal, probability of succeeding in an attempted steal, and an indicator of whether a player is "fast" (e.g. the player could stretch home). These probabilities may be calculated from common player statistics that are publicly available on team's webpages. Scores are evaluated based on a nine-player lineup and may be used to compare lineups, evaluate base scenarios, and compare the offensive potential of individual players. Manuscript forthcoming. See Bukiet & Harold (1997) <doi:10.1287/opre.45.1.14> for implementation of discrete Markov chains.
The functions in this package compute robust estimators by minimizing a kernel-based distance known as MMD (Maximum Mean Discrepancy) between the sample and a statistical model. Recent works proved that these estimators enjoy a universal consistency property, and are extremely robust to outliers. Various optimization algorithms are implemented: stochastic gradient is available for most models, but the package also allows gradient descent in a few models for which an exact formula is available for the gradient. In terms of distribution fit, a large number of continuous and discrete distributions are available: Gaussian, exponential, uniform, gamma, Poisson, geometric, etc. In terms of regression, the models available are: linear, logistic, gamma, beta and Poisson. Alquier, P. and Gerber, M. (2024) <doi:10.1093/biomet/asad031> Cherief-Abdellatif, B.-E. and Alquier, P. (2022) <doi:10.3150/21-BEJ1338>.
Collection of methods for rating matrix completion, which is a statistical framework for recommender systems. Another relevant application is the imputation of rating-scale survey data in the social and behavioral sciences. Note that matrix completion and imputation are synonymous terms used in different streams of the literature. The main functionality implements robust matrix completion for discrete rating-scale data with a low-rank constraint on a latent continuous matrix (Archimbaud, Alfons, and Wilms (2025) <doi:10.48550/arXiv.2412.20802>). In addition, the package provides wrapper functions for softImpute (Mazumder, Hastie, and Tibshirani, 2010, <https://www.jmlr.org/papers/v11/mazumder10a.html>; Hastie, Mazumder, Lee, Zadeh, 2015, <https://www.jmlr.org/papers/v16/hastie15a.html>) for easy tuning of the regularization parameter, as well as benchmark methods such as median imputation and mode imputation.
ATPOL is a rectangular grid system used for botanical studies in Poland. The ATPOL grid was developed in Institute of Botany, Jagiellonian University, Krakow, Poland in 70. Since then it is widely used to represent distribution of plants in Poland. atpolR provides functions to translate geographic coordinates to the grid and vice versa. It also allows to create a choreograph map.
Allows the user to manage easily R packages removal and installation. It offers many functions to display installed packages according to specific dates and removes them if needed. The user is always prompted when running the removal functions in order to confirm the required action. It also provides functions that will install Github starred R packages whether available on CRAN or not.
Model-free selection of covariates under unconfoundedness for situations where the parameter of interest is an average causal effect. This package is based on model-free backward elimination algorithms proposed in de Luna, Waernbaum and Richardson (2011). Marginal co-ordinate hypothesis testing is used in situations where all covariates are continuous while kernel-based smoothing appropriate for mixed data is used otherwise.
Compile inline C code and easily call with automatically generated wrapper functions. By allowing user-defined headers and compilation flags (preprocessor, compiler and linking flags) the user can configure optimization options and linking to third party libraries. Multiple functions may be defined in a single block of code - which may be defined in a string or a path to a source file.
This package provides a function to query and extract data from the US Energy Information Administration ('EIA') API V2 <https://www.eia.gov/opendata/>. The EIA API provides a variety of information, in a time series format, about the energy sector in the US. The API is open, free, and requires an access key and registration at <https://www.eia.gov/opendata/>.
Systematic fit of hundreds of theoretical univariate distributions to empirical data via maximum likelihood estimation. Fits are reported and summarized by a data.frame, a csv file or a shiny app (here with additional features like visual representation of fits). All output formats provide assessment of goodness-of-fit by the following methods: Kolmogorov-Smirnov test, Shapiro-Wilks test, Anderson-Darling test.
Using overlap grouped-lasso penalties, gamsel selects whether a term in a gam is nonzero, linear, or a non-linear spline (up to a specified max df per variable). It fits the entire regularization path on a grid of values for the overall penalty lambda, both for gaussian and binomial families. See <doi:10.48550/arXiv.1506.03850> for more details.
Simulation, estimation and testing for geopolitical volatility (GEOVOL) based on the global common volatility model of Engle and Campos-Martins (2023) <doi:10.1016/j.jfineco.2022.09.009>. GEOVOL is modelled as a latent multiplicative volatility factor with heterogeneous factor loadings. Estimation is carried out as a maximization-maximization procedure, where GEOVOL and the GEOVOL loadings are estimated iteratively until convergence.
Aligning multiple visualisations by utilising generalised orthogonal Procrustes analysis (GPA) before combining coordinates into a single biplot display as described in Nienkemper-Swanepoel, le Roux and Lubbe (2023)<doi:10.1080/03610918.2021.1914089>. This is mainly suitable to combine visualisations constructed from multiple imputations, however, it can be generalised to combine variations of visualisations from the same datasets (i.e. resamples).
It allows running gretl (<http://gretl.sourceforge.net/index.html>) program from R, R Markdown and Quarto. gretl ('Gnu Regression, Econometrics', and Time-series Library) is a statistical software for Econometric analysis. This package does not only integrate gretl and R but also serves as a gretl Knit-Engine for knitr package. Write all your gretl commands in R', R Markdown chunk.
Make efficient Rust implementations of graph adjustment identification distances available in R. These distances (based on ancestor, optimal, and parent adjustment) count how often the respective adjustment identification strategy leads to causal inferences that are incorrect relative to a ground-truth graph when applied to a candidate graph instead. See also Henckel, Würtzen, Weichwald (2024) <doi:10.48550/arXiv.2402.08616>.
GitHub apps provide a powerful way to manage fine grained programmatic access to specific git repositories, without having to create dummy users, and which are safer than a personal access token for automated tasks. This package extends the gh package to let you authenticate and interact with GitHub <https://docs.github.com/en/rest/overview> in R as an app.
SQL back-end to dplyr for Apache Impala, the massively parallel processing query engine for Apache Hadoop'. Impala enables low-latency SQL queries on data stored in the Hadoop Distributed File System (HDFS)', Apache HBase', Apache Kudu', Amazon Simple Storage Service (S3)', Microsoft Azure Data Lake Store (ADLS)', and Dell EMC Isilon'. See <https://impala.apache.org> for more information about Impala.
This package provides a key-value store data structure. The keys are integers and the values can be any R object. This is like a list but indexed by a set of integers, not necessarily contiguous and possibly negative. The implementation uses a R6 class. These containers are not faster than lists but their usage can be more convenient for certain situations.
Computes and decomposes Gini, Bonferroni and Zenga 2007 point and synthetic concentration indexes. Decompositions are intended: by sources, by subpopulations and by sources and subpopulations jointly. References, Zenga M. M.(2007) <doi:10.1400/209575> Zenga M. (2015) <doi:10.1400/246627> Zenga M., Valli I. (2017) <doi:10.26350/999999_000005> Zenga M., Valli I. (2018) <doi:10.26350/999999_000011>.