This package provides a set of RStudio addins that are designed to be used in combination with user-defined RStudio keyboard shortcuts. These addins either: 1) insert text at a cursor position (e.g. insert operators %>%, <<-, %$%, etc.), 2) replace symbols in selected pieces of text (e.g., convert backslashes to forward slashes which results in stings like "c:\data\" converted into "c:/data/") or 3) enclose text with special symbols (e.g., converts "bold" into "**bold**") which is convenient for editing R Markdown files.
Generalized estimating equations (GEE) are a popular choice for analyzing longitudinal binary outcomes. This package provides an interface for fitting GEE, currently for logistic regression, within the tern <https://cran.r-project.org/package=tern> framework (Zhu, Sabanés Bové et al., 2023) and tabulate results easily using rtables <https://cran.r-project.org/package=rtables> (Becker, Waddell et al., 2023). It builds on geepack <doi:10.18637/jss.v015.i02> (Højsgaard, Halekoh and Yan, 2006) for the actual GEE model fitting.
This package implements wild bootstrap tests for autocorrelation in Vector Autoregressive (VAR) models based on Ahlgren and Catani (2016) <doi:10.1007/s00362-016-0744-0>, a combined Lagrange Multiplier (LM) test for Autoregressive Conditional Heteroskedasticity (ARCH) in VAR models from Catani and Ahlgren (2016) <doi:10.1016/j.ecosta.2016.10.006>, and bootstrap-based methods for determining the cointegration rank from Cavaliere, Rahbek, and Taylor (2012) <doi:10.3982/ECTA9099> and Cavaliere, Rahbek, and Taylor (2014) <doi:10.1080/07474938.2013.825175>.
This package represents an implementation of functions to optimize ordering of nodes in a dendrogram, without affecting the meaning of the dendrogram. A dendrogram can be sorted based on the average distance of subtrees, or based on the smallest distance value. These sorting methods improve readability and interpretability of tree structure, especially for tasks such as comparison of different distance measures or linkage types and identification of tight clusters and outliers. As a result, it also introduces more meaningful reordering for a coupled heatmap visualization.
This package calculates classic and/or bootstrap confidence intervals for many parameters such as the population mean, variance, interquartile range (IQR), median absolute deviation (MAD), skewness, kurtosis, Cramer's V, odds ratio, R-squared, quantiles (including median), proportions, different types of correlation measures, difference in means, quantiles and medians. Many of the classic confidence intervals are described in Smithson, M. (2003, ISBN: 978-0761924999). Bootstrap confidence intervals are calculated with the R package boot
. Both one- and two-sided intervals are supported.
An unsupervised fully-automated pipeline for transcriptome analysis or a supervised option to identify characteristic genes from predefined subclasses. We rely on the pamr <http://www.bioconductor.org/packages//2.7/bioc/html/pamr.html> clustering algorithm to cluster the Data and then draw a heatmap of the clusters with the most significant genes and the least significant genes according to the pamr algorithm. This way we get easy to grasp heatmaps that show us for each cluster which are the clusters most defining genes.
This package provides users with an EZ-to-use platform for representing data with biplots. Currently principal component analysis (PCA), canonical variate analysis (CVA) and simple correspondence analysis (CA) biplots are included. This is accompanied by various formatting options for the samples and axes. Alpha-bags and concentration ellipses are included for visual enhancements and interpretation. For an extensive discussion on the topic, see Gower, J.C., Lubbe, S. and le Roux, N.J. (2011, ISBN: 978-0-470-01255-0) Understanding Biplots. Wiley: Chichester.
Is designed to test for association between methylation at CpG
sites across the genome and a phenotype of interest, adjusting for any relevant covariates. The package can perform standard analyses of large datasets very quickly with no need to impute the data. It can also handle mixed effects models with chip or batch entering the model as a random intercept. Also includes tools to apply quality control filters, perform permutation tests, and create QQ plots, manhattan plots, and scatterplots for individual CpG
sites.
Light-weight functions for computing descriptive statistics in different circular spaces (e.g., 2pi, 180, or 360 degrees), to handle angle-dependent biases, pad circular data, and more. Specifically aimed for psychologists and neuroscientists analyzing circular data. Basic methods are based on Jammalamadaka and SenGupta
(2001) <doi:10.1142/4031>, removal of cardinal biases is based on the approach introduced in van Bergen, Ma, Pratte, & Jehee (2015) <doi:10.1038/nn.4150> and Chetverikov and Jehee (2023) <doi:10.1038/s41467-023-43251-w>.
Given count data from two conditions, it determines which transcripts are differentially expressed across the two conditions using Bayesian inference of the parameters of a bottom-up model for PCR amplification. This model is developed in Ndifon Wilfred, Hilah Gal, Eric Shifrut, Rina Aharoni, Nissan Yissachar, Nir Waysbort, Shlomit Reich Zeliger, Ruth Arnon, and Nir Friedman (2012), <http://www.pnas.org/content/109/39/15865.full>, and results in a distribution for the counts that is a superposition of the binomial and negative binomial distribution.
Miscellaneous functions for data cleaning and data analysis of educational assessments. Includes functions for descriptive analyses, character vector manipulations and weighted statistics. Mainly a lightweight dependency for the packages eatRep
', eatGADS
', eatPrep
and eatModel
(which will be subsequently submitted to CRAN'). The function for defining (weighted) contrasts in weighted effect coding refers to te Grotenhuis et al. (2017) <doi:10.1007/s00038-016-0901-1>. Functions for weighted statistics refer to Wolter (2007) <doi:10.1007/978-0-387-35099-8>.
Calculate numerical asymptotic distribution functions of likelihood ratio statistics for fractional unit root tests and tests of cointegration rank. For these distributions, the included functions calculate critical values and P-values used in unit root tests, cointegration tests, and rank tests in the Fractionally Cointegrated Vector Autoregression (FCVAR) model. The functions implement procedures for tests described in the following articles: Johansen, S. and M. Ã . Nielsen (2012) <doi:10.3982/ECTA9299>, MacKinnon
, J. G. and M. Ã . Nielsen (2014) <doi:10.1002/jae.2295>.
Unconstrained and constrained maximum likelihood estimation of structural and reduced form Gaussian mixture vector autoregressive, Student's t mixture vector autoregressive, and Gaussian and Student's t mixture vector autoregressive models, quantile residual tests, graphical diagnostics, simulations, forecasting, and estimation of generalized impulse response function and generalized forecast error variance decomposition. Leena Kalliovirta, Mika Meitz, Pentti Saikkonen (2016) <doi:10.1016/j.jeconom.2016.02.012>, Savi Virolainen (2025) <doi:10.1080/07350015.2024.2322090>, Savi Virolainen (2022) <doi:10.48550/arXiv.2109.13648>
.
Raster based flood modelling internally using hyd1d', an R package to interpolate 1d water level and gauging data. The package computes flood extent and duration through strategies originally developed for INFORM', an ArcGIS'-based
hydro-ecological modelling framework. It does not provide a full, physical hydraulic modelling algorithm, but a simplified, near real time GIS approach for flood extent and duration modelling. Computationally demanding annual flood durations have been computed already and data products were published by Weber (2022) <doi:10.1594/PANGAEA.948042>.
This package provides a local haplotyping tool for use in trait association and trait prediction analyses pipelines. HaploVar
enables users take single nucleotide polymorphisms (SNPs) (in VCF format) and a linkage disequilibrium (LD) matrix, calculate local haplotypes and format the output to be compatible with a wide range of trait association and trait prediction tools. The local haplotypes are calculated from the LD matrix using a clustering algorithm called density-based spatial clustering of applications with noise ('DBSCAN') (Ester et al., 1996) <ISBN: 1577350049>.
R is great for installing software. Through the installr package you can automate the updating of R (on Windows, using updateR()
) and install new software. Software installation is initiated through a GUI (just run installr()
), or through functions such as: install.Rtools()
, install.pandoc()
, install.git()
, and many more. The updateR()
command performs the following: finding the latest R version, downloading it, running the installer, deleting the installation file, copy and updating old packages to the new R installation.
This package provides a function for classifying a landscape into different categories based on the Topographic Position Index (TPI) and slope. It offers two types of classifications: Slope Position Classification, and Landform Classification. The function internally calculates the TPI for the given landscape and then uses it along with the slope to perform the classification. Optionally, descriptive statistics for every class are calculated and plotted. The classifications are useful for identifying the position of a location on a slope and for identifying broader landform types.
Fits mixed membership models with discrete multivariate data (with or without repeated measures) following the general framework of Erosheva et al (2004). This package uses a Variational EM approach by approximating the posterior distribution of latent memberships and selecting hyperparameters through a pseudo-MLE procedure. Currently supported data types are Bernoulli, multinomial and rank (Plackett-Luce). The extended GoM
model with fixed stayers from Erosheva et al (2007) is now also supported. See Airoldi et al (2014) for other examples of mixed membership models.
Fast imputations under the object-oriented programming paradigm. Moreover there are offered a few functions built to work with popular R packages such as data.table or dplyr'. The biggest improvement in time performance could be achieve for a calculation where a grouping variable have to be used. A single evaluation of a quantitative model for the multiple imputations is another major enhancement. A new major improvement is one of the fastest predictive mean matching in the R world because of presorting and binary search.
This package provides tools for the structured processing of PET neuroimaging data in preparation for the estimation of Simultaneous Confidence Corridors (SCCs) for one-group, two-group, or single-patient vs group comparisons. The package facilitates PET image loading, data restructuring, integration into a Functional Data Analysis framework, contour extraction, identification of significant results, and performance evaluation. It bridges established packages (e.g., oro.nifti') with novel statistical methodologies (e.g., ImageSCC
') and enables reproducible analysis pipelines, including comparison with Statistical Parametric Mapping ('SPM').
Create surface forms from matrix or raster data for flexible plotting and conversion to other mesh types. The functions quadmesh or triangmesh produce a continuous surface as a mesh3d object as used by the rgl package. This is used for plotting raster data in 3D (optionally with texture), and allows the application of a map projection without data loss and many processing applications that are restricted by inflexible regular grid rasters. There are discrete forms of these continuous surfaces available with dquadmesh and dtriangmesh functions.
Application of the Self-Organizing Maps technique for spatial classification of time series. The package uses spatial data, point or gridded, to create clusters with similar characteristics. The clusters can be further refined to a smaller number of regions by hierarchical clustering and their spatial dependencies can be presented as complex networks. Thus, meaningful maps can be created, representing the regional heterogeneity of a single variable. More information and an example of implementation can be found in Markonis and Strnad (2020, <doi:10.1177/0959683620913924>).
Various methods for targeted and semiparametric inference including augmented inverse probability weighted (AIPW) estimators for missing data and causal inference (Bang and Robins (2005) <doi:10.1111/j.1541-0420.2005.00377.x>), variable importance and conditional average treatment effects (CATE) (van der Laan (2006) <doi:10.2202/1557-4679.1008>), estimators for risk differences and relative risks (Richardson et al. (2017) <doi:10.1080/01621459.2016.1192546>), assumption lean inference for generalized linear model parameters (Vansteelandt et al. (2022) <doi:10.1111/rssb.12504>).
This package implements nested cross-validation applied to the glmnet
and caret
packages. With glmnet this includes cross-validation of elastic net alpha parameter. A number of feature selection filter functions (t-test, Wilcoxon test, ANOVA, Pearson/Spearman correlation, random forest, ReliefF) for feature selection are provided and can be embedded within the outer loop of the nested CV. Nested CV can be also be performed with the caret
package giving access to the large number of prediction methods available in caret
.