It proposes a novel variable selection approach in classification problem that takes into account the correlations that may exist between the predictors of the design matrix in a high-dimensional logistic model. Our approach consists in rewriting the initial high-dimensional logistic model to remove the correlation between the predictors and in applying the generalized Lasso criterion.
This package is an R package designed for QC, analysis, and exploration of single cell RNA-seq data. It easily enables widely-used analytical techniques, including the identification of highly variable genes, dimensionality reduction; PCA, ICA, t-SNE, standard unsupervised clustering algorithms; density clustering, hierarchical clustering, k-means, and the discovery of differentially expressed genes and markers.
An implementation of calculating the R-squared measure as a total mediation effect size measure and its confidence interval for moderate- or high-dimensional mediator models. It gives an option to filter out non-mediators using variable selection methods. The original R package is directly related to the paper Yang et al (2021) "Estimation of mediation effect for high-dimensional omics mediators with application to the Framingham Heart Study" <doi:10.1101/774877>. The new version contains a choice of using cross-fitting, which is computationally faster. The details of the cross-fitting method are available in the paper Xu et al (2023) "Speeding up interval estimation for R2-based mediation effect of high-dimensional mediators via cross-fitting" <doi:10.1101/2023.02.06.527391>.
An R package for estimating conditional multivariate reference regions. The reference region is non parametrically estimated using a kernel density estimator. Covariates effects on the multivariate response means vector and variance-covariance matrix, thus on the region shape, are estimated by flexible additive predictors. Continuous covariates non linear effects might be estimated using penalized splines smoothers. Confidence intervals for the covariates estimated effects might be derived from bootstrap resampling. Kernel density bandwidth can be estimated with different methods, including a method that optimize the region coverage. Numerical, and graphical, summaries can be obtained by the user in order to evaluate reference region performance with real data. Full mathematical details can be found in <doi:10.1002/sim.9163> and <doi:10.1007/s00477-020-01901-1>.
Implementation of the technique of Lleonart et al. (2000) <doi:10.1006/jtbi.2000.2043> to scale body measurements that exhibit an allometric growth. This procedure is a theoretical generalization of the technique used by Thorpe (1975) <doi:10.1111/j.1095-8312.1975.tb00732.x> and Thorpe (1976) <doi:10.1111/j.1469-185X.1976.tb01063.x>.
Bayesian analysis of luminescence data and C-14 age estimates. Bayesian models are based on the following publications: Combes, B. & Philippe, A. (2017) <doi:10.1016/j.quageo.2017.02.003> and Combes et al. (2015) <doi:10.1016/j.quageo.2015.04.001>. This includes, amongst others, data import, export, application of age models and palaeodose model.
Typically, models in R exist in memory and can be saved via regular R serialization. However, some models store information in locations that cannot be saved using R serialization alone. The goal of bundle is to provide a common interface to capture this information, situate it within a portable object, and restore it for use in new settings.
The Crunch.io service <https://crunch.io/> provides a cloud-based data store and analytic engine, as well as an intuitive web interface. Using this package, analysts can interact with and manipulate Crunch datasets from within R. Importantly, this allows technical researchers to collaborate naturally with team members, managers, and clients who prefer a point-and-click interface.
Convert text into synthesized speech and get a list of supported voices for a region. Microsoft's Cognitive Services Text to Speech REST API <https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/rest-text-to-speech?tabs=streaming> supports neural text to speech voices, which support specific languages and dialects that are identified by locale.
Allows humanitarian community, academia, media, government, and non-governmental organizations to utilize the data collected by the Displacement Tracking Matrix (<https://dtm.iom.int>), a unit in the International Organization for Migration. This also provides non-sensitive Internally Displaced Person figures, aggregated at the country, Admin 1 (states, provinces, or equivalent), and Admin 2 (smaller administrative areas) levels.
Tool to print out the value of R objects/expressions while running an R script. Outputs can be made dependent on user-defined conditions/criteria. Debug messages only appear when a global option for debugging is set. This way, debugr code can even remain in the debugged code for later use without any negative effects during normal runtime.
Allows to perform the dynamic mixture estimation with state-space components and normal regression components, and clustering with normal mixture. Quasi-Bayesian estimation, as well as, that based on the Kerridge inaccuracy approximation are implemented. Main references: Nagy and Suzdaleva (2013) <doi:10.1016/j.apm.2013.05.038>; Nagy et al. (2011) <doi:10.1002/acs.1239>.
Create and customize interactive collapsible D3 trees using the D3 JavaScript
library and the htmlwidgets package. These trees can be used directly from the R console, from RStudio', in Shiny apps and R Markdown documents. When in Shiny the tree layout is observed by the server and can be used as a reactive filter of structured data.
This package provides utility functions for standardizing economic entity (economy, aggregate, institution, etc.) name and id in economic datasets such as those published by the International Monetary Fund and World Bank. Aims to facilitate consistent data analysis, reporting, and joining across datasets. Used as a foundational building block in the econdataverse family of packages (<https://www.econdataverse.org>).
This package performs test procedures for general hypothesis testing problems for four multivariate coefficients of variation (Ditzhaus and Smaga, 2023 <arXiv:2301.12009>
). We can verify the global hypothesis about equality as well as the particular hypotheses defined by contrasts, e.g., we can conduct post hoc tests. We also provide the simultaneous confidence intervals for contrasts.
This package provides a suite of routines for the hyperdirichlet distribution and reified Bradley-Terry; supersedes the hyperdirichlet package; uses disordR
discipline <doi:10.48550/ARXIV.2210.03856>. To cite in publications please use Hankin 2017 <doi:10.32614/rj-2017-061>, and for Generalized Plackett-Luce likelihoods use Hankin 2024 <doi:10.18637/jss.v109.i08>.
This package provides functions for genome-wide association studies (GWAS)/gene-environment-wide interaction studies (GEWIS) with longitudinal outcomes and exposures. He et al. (2017) "Set-Based Tests for Gene-Environment Interaction in Longitudinal Studies" and He et al. (2017) "Rare-variant association tests in longitudinal studies, with an application to the Multi-Ethnic Study of Atherosclerosis (MESA)".
Extends the mlr3 package with a backend to transparently work with databases such as SQLite', DuckDB
', MySQL
', MariaDB
', or PostgreSQL
'. The package provides two additional backends: DataBackendDplyr
relies on the abstraction of package dbplyr to interact with most DBMS. DataBackendDuckDB
operates on DuckDB
data bases and also on Apache Parquet files.
Simulation, analysis and sampling of spatial biodiversity data (May, Gerstner, McGlinn
, Xiao & Chase 2017) <doi:10.1111/2041-210x.12986>. In the simulation tools user define the numbers of species and individuals, the species abundance distribution and species aggregation. Functions for analysis include species rarefaction and accumulation curves, species-area relationships and the distance decay of similarity.
Useful functions and workflows for proteomics quality control and data analysis of both limited proteolysis-coupled mass spectrometry (LiP-MS
) (Feng et. al. (2014) <doi:10.1038/nbt.2999>) and regular bottom-up proteomics experiments. Data generated with search tools such as Spectronaut', MaxQuant
and Proteome Discover can be easily used due to flexibility of functions.
An assortment of functions that could be useful in analyzing data from psychophysical experiments. It includes functions for calculating d from several different experimental designs, links for m-alternative forced-choice (mafc) data to be used with the binomial family in glm (and possibly other contexts) and self-Start functions for estimating gamma values for CRT screen calibrations.
This package provides a ggplot2 front end to plot summary statistics on danish provinces, regions, municipalities, and zipcodes. The needed geoms of each of the four levels are inherent in the package, thus making these types of plots easy for the user. This is essentially an updated port of the previously available mapDK
package by Sebastian Barfort.
Performance metric provides different performance measures like mean squared error, root mean square error, mean absolute deviation, mean absolute percentage error etc. of a fitted model. These can provide a way for forecasters to quantitatively compare the performance of competing models. For method details see (i) Pankaj Das (2020) <http://krishi.icar.gov.in/jspui/handle/123456789/44138>.
This package provides functions for color-based visualization of multivariate data, i.e. colorgrams or heatmaps. Lower-level functions map numeric values to colors, display a matrix as an array of colors, and draw color keys. Higher-level plotting functions generate a bivariate histogram, a dendrogram aligned with a color-coded matrix, a triangular distance matrix, and more.