This package provides tools for working with multiple related tables, stored as data frames or in a relational database. Multiple tables (data and metadata) are stored in a compound object, which can then be manipulated with a pipe-friendly syntax.
Dynamic model averaging for binary and continuous outcomes.
Perform estimation, prediction, and simulations using the Dynamic Multiple Quantile model of Catania and Luati (2023) <doi:10.1016/j.jeconom.2022.11.002>. Can be used to estimate a set of conditional time-varying quantiles of a time series that do not cross.
The state-of-the-art algorithms for distance metric learning, including global and local methods such as Relevant Component Analysis, Discriminative Component Analysis, Local Fisher Discriminant Analysis, etc. These distance metric learning methods are widely applied in feature extraction, dimensionality reduction, clustering, classification, information retrieval, and computer vision problems.
Mixed model analysis for quantitative genetics with multi-trait responses and pedigree-based partitioning of individual variation into a range of environmental and genetic variance components for individual and maternal effects. Method documented in dmmOverview.pdf
; dmm is an implementation of dispersion mean model described by Searle et al. (1992) "Variance Components", Wiley, NY. DMM can do MINQUE', bias-corrected-ML', and REML variance component estimates.
This package provides functions to calculate Divisia monetary aggregates index as given in Barnett, W. A. (1980) (<DOI:10.1016/0304-4076(80)90070-6>).
This package provides functions for fitting a Bayesian model for grouping binary dissimilarity matrices in homogeneous clusters. Currently, it includes methods only for binary data (<doi:10.18637/jss.v100.i16>).
This package implements the daily based Morgan-Morgan-Finney (DMMF) soil erosion model (Choi et al., 2017 <doi:10.3390/w9040278>) for estimating surface runoff and sediment budgets from a field or a catchment on a daily basis.
The framework provides functions to generate ODEs of reaction networks, parameter transformations, observation functions, residual functions, etc. The framework follows the paradigm that derivative information should be used for optimization whenever possible. Therefore, all major functions produce and can handle expressions for symbolic derivatives. The methods used in dMod
were published in Kaschek et al, 2019, <doi:10.18637/jss.v088.i10>.
This package provides statistical tools for analyzing net and relative survival, with a key feature of relaxing the assumption of independent censoring and incorporating the effect of dependent competing risks. It employs a copula-based methodology, specifically the Archimedean copula, to simulate data, conduct survival analysis, and offer comparisons with other methods. This approach is detailed in the work of Adatorwovor et al. (2022) <doi:10.1515/ijb-2021-0016>.
Implementation of a transfer learning framework employing distribution mapping based domain transfer. Uses the renowned concept of histogram matching (see Gonzalez and Fittes (1977) <doi:10.1016/0094-114X(77)90062-3>, Gonzalez and Woods (2008) <isbn:9780131687288>) and extends it to include distribution measures like kernel density estimates (KDE; see Wand and Jones (1995) <isbn:978-0-412-55270-0>, Jones et al. (1996) <doi:10.2307/2291420). In the typical application scenario, one can use the underlying sample distributions (histogram or KDE) to generate a map between two distinct but related domains to transfer the target data to the source domain and utilize the available source data for better predictive modeling design. Suitable for the case where a one-to-one sample matching is not possible, thus one needs to transform the underlying data distribution to utilize the more available data for modeling.
This package provides functions and data accompanying the second edition of the book "Data Mining with R, learning with case studies" by Luis Torgo, published by CRC Press.
DMCFB is a pipeline for identifying differentially methylated cytosines using a Bayesian functional regression model in bisulfite sequencing data. By using a functional regression data model, it tries to capture position-specific, group-specific and other covariates-specific methylation patterns as well as spatial correlation patterns and unknown underlying models of methylation data. It is robust and flexible with respect to the true underlying models and inclusion of any covariates, and the missing values are imputed using spatial correlation between positions and samples. A Bayesian approach is adopted for estimation and inference in the proposed method.
Several tests for differential methylation in methylation array data, including one-sided differential mean and variance test. Methods used in the package refer to Dai, J, Wang, X, Chen, H and others (2021) "Incorporating increased variability in discovering cancer methylation markers", Biostatistics, submitted.
This package implements an approach for scanning the genome to detect and perform accurate inference on differentially methylated regions from Whole Genome Bisulfite Sequencing data. The method is based on comparing detected regions to a pooled null distribution, that can be implemented even when as few as two samples per population are available. Region-level statistics are obtained by fitting a generalized least squares (GLS) regression model with a nested autoregressive correlated error structure for the effect of interest on transformed methylation proportions.
Implementation of double machine learning (DML) algorithms in R, based on Emmenegger and Buehlmann (2021) "Regularizing Double Machine Learning in Partially Linear Endogenous Models" <arXiv:2101.12525>
and Emmenegger and Buehlmann (2021) <arXiv:2108.13657>
"Double Machine Learning for Partially Linear Mixed-Effects Models with Repeated Measurements". First part: our goal is to perform inference for the linear parameter in partially linear models with confounding variables. The standard DML estimator of the linear parameter has a two-stage least squares interpretation, which can lead to a large variance and overwide confidence intervals. We apply regularization to reduce the variance of the estimator, which produces narrower confidence intervals that are approximately valid. Nuisance terms can be flexibly estimated with machine learning algorithms. Second part: our goal is to estimate and perform inference for the linear coefficient in a partially linear mixed-effects model with DML. Machine learning algorithms allows us to incorporate more complex interaction structures and high-dimensional variables.
Model selection algorithms for regression and classification, where the predictors can be continuous or categorical and the number of regressors may exceed the number of observations. The selected model consists of a subset of numerical regressors and partitions of levels of factors. Szymon Nowakowski, Piotr Pokarowski, Wojciech Rejchel and Agnieszka SoÅ tys, 2023. Improving Group Lasso for High-Dimensional Categorical Data. In: Computational Science â ICCS 2023. Lecture Notes in Computer Science, vol 14074, p. 455-470. Springer, Cham. <doi:10.1007/978-3-031-36021-3_47>. Aleksandra Maj-KaÅ ska, Piotr Pokarowski and Agnieszka Prochenka, 2015. Delete or merge regressors for linear model selection. Electronic Journal of Statistics 9(2): 1749-1778. <doi:10.1214/15-EJS1050>. Piotr Pokarowski and Jan Mielniczuk, 2015. Combined l1 and greedy l0 penalized least squares for linear model selection. Journal of Machine Learning Research 16(29): 961-992. <https://www.jmlr.org/papers/volume16/pokarowski15a/pokarowski15a.pdf>. Piotr Pokarowski, Wojciech Rejchel, Agnieszka SoÅ tys, MichaÅ Frej and Jan Mielniczuk, 2022. Improving Lasso for model selection and prediction. Scandinavian Journal of Statistics, 49(2): 831â 863. <doi:10.1111/sjos.12546>.
DMC model simulation detailed in Ulrich, R., Schroeter, H., Leuthold, H., & Birngruber, T. (2015). Automatic and controlled stimulus processing in conflict tasks: Superimposed diffusion processes and delta functions. Cognitive Psychology, 78, 148-174. Ulrich et al. (2015) <doi:10.1016/j.cogpsych.2015.02.005>. Decision processes within choice reaction-time (CRT) tasks are often modelled using evidence accumulation models (EAMs), a variation of which is the Diffusion Decision Model (DDM, for a review, see Ratcliff & McKoon
, 2008). Ulrich et al. (2015) introduced a Diffusion Model for Conflict tasks (DMC). The DMC model combines common features from within standard diffusion models with the addition of superimposed controlled and automatic activation. The DMC model is used to explain distributional reaction time (and error rate) patterns in common behavioural conflict-like tasks (e.g., Flanker task, Simon task). This R-package implements the DMC model and provides functionality to fit the model to observed data. Further details are provided in the following paper: Mackenzie, I.G., & Dudschig, C. (2021). DMCfun: An R package for fitting Diffusion Model of Conflict (DMC) to reaction time and error rate data. Methods in Psychology, 100074. <doi:10.1016/j.metip.2021.100074>.
This package provides a pipeline for identifying differentially methylated CpG
sites using Hidden Markov Model in bisulfite sequencing data. DNA methylation studies have enabled researchers to understand methylation patterns and their regulatory roles in biological processes and disease. However, only a limited number of statistical approaches have been developed to provide formal quantitative analysis. Specifically, a few available methods do identify differentially methylated CpG
(DMC) sites or regions (DMR), but they suffer from limitations that arise mostly due to challenges inherent in bisulfite sequencing data. These challenges include: (1) that read-depths vary considerably among genomic positions and are often low; (2) both methylation and autocorrelation patterns change as regions change; and (3) CpG
sites are distributed unevenly. Furthermore, there are several methodological limitations: almost none of these tools is capable of comparing multiple groups and/or working with missing values, and only a few allow continuous or multiple covariates. The last of these is of great interest among researchers, as the goal is often to find which regions of the genome are associated with several exposures and traits. To tackle these issues, we have developed an efficient DMC identification method based on Hidden Markov Models (HMMs) called “DMCHMM” which is a three-step approach (model selection, prediction, testing) aiming to address the aforementioned drawbacks.
Work within the dplyr workflow to add random variates to your data frame. Variates can be added at any level of an existing column. Also, bounds can be specified for simulated variates.
For checking the dataset from EDC(Electronic Data Capture) in clinical trials. dmtools reshape your dataset in a tidy view and check events. You can reshape the dataset and choose your target to check, for example, the laboratory reference range.
The package contains the experimental data and documented source code of the manuscript "Fischer et al., A Map of Directional Genetic Interactions in a Metazoan Cell, eLife
, 2015, in Press.". The vignette code generates all figures in the paper.
This is a package for de novo identification and extraction of differentially methylated regions (DMRs) from the human genome using Whole Genome Bisulfite Sequencing (WGBS) and Illumina Infinium Array (450K and EPIC) data. It provides functionality for filtering probes possibly confounded by SNPs and cross-hybridisation. It includes GRanges
generation and plotting functions.
This package detects significant differentially methylated regions (for both qualitative and quantitative traits), using a scan statistic with underlying Poisson heuristics. The scan statistic will depend on a sequence of window sizes (# of CpGs
within each window) and on a threshold for each window size. This threshold can be calculated by three different means: i) analytically using Siegmund et.al (2012) solution (preferred), ii) an important sampling as suggested by Zhang (2008), and a iii) full MCMC modeling of the data, choosing between a number of different options for modeling the dependency between each CpG
.