We provide an R interface to OpenML.org which is an online machine learning platform where researchers can access open data, download and upload data sets, share their machine learning tasks and experiments and organize them online to work and collaborate with other researchers. The R interface allows to query for data sets with specific properties, and allows the downloading and uploading of data sets, tasks, flows and runs. See <https://www.openml.org/guide/api> for more information.
This package provides a comprehensive set of indexes and tests for social segregation analysis, as described in Tivadar (2019) - OasisR': An R Package to Bring Some Order to the World of Segregation Measurement <doi:10.18637/jss.v089.i07>. The package is the most complete existing tool and it clarifies many ambiguities and errors regarding the definition of segregation indices. Additionally, OasisR introduces several resampling methods that enable testing their statistical significance (randomization tests, bootstrapping, and jackknife methods).
This package provides functions to Simultaneously Infer Causal Graphs and Genetic Architecture. Includes acyclic and cyclic graphs for data from an experimental cross with a modest number (<10) of phenotypes driven by a few genetic loci (QTL). Chaibub Neto E, Keller MP, Attie AD, Yandell BS (2010) Causal Graphical Models in Systems Genetics: a unified framework for joint inference of causal network and genetic architecture for correlated phenotypes. Annals of Applied Statistics 4: 320-339. <doi:10.1214/09-AOAS288>.
Based on the illness-death model a large number of clinical trials with oncology endpoints progression-free survival (PFS) and overall survival (OS) can be simulated, see Meller, Beyersmann and Rufibach (2019) <doi:10.1002/sim.8295>. The simulation set-up allows for random and event-driven censoring, an arbitrary number of treatment arms, staggered study entry and drop-out. Exponentially, Weibull and piecewise exponentially distributed survival times can be generated. The correlation between PFS and OS can be calculated.
Calculates the slope (longitudinal gradient or steepness) of linear geographic features such as roads (for more details, see Ariza-López et al. (2019) <doi:10.1038/s41597-019-0147-x>) and rivers (for more details, see Cohen et al. (2018) <doi:10.1016/j.jhydrol.2018.06.066>). It can use local Digital Elevation Model (DEM) data or download DEM data via the ceramic package. The package also provides functions to add elevation data to linestrings and visualize elevation profiles.
This package provides functions for estimation (parametric, semi-parametric and non-parametric) of copula-based dependence coefficients between a finite collection of random vectors, including phi-dependence measures and Bures-Wasserstein dependence measures. An algorithm for agglomerative hierarchical variable clustering is also implemented. Following the articles De Keyser & Gijbels (2024) <doi:10.1016/j.jmva.2024.105336>, De Keyser & Gijbels (2024) <doi:10.1016/j.ijar.2023.109090>, and De Keyser & Gijbels (2024) <doi:10.48550/arXiv.2404.07141>.
This package provides gsubfn which is like gsub but can take a replacement function or certain other objects instead of the replacement string. Matches and back references are input to the replacement function and replaced by the function output. gsubfn can be used to split strings based on content rather than delimiters and for quasi-perl-style string interpolation. The package also has facilities for translating formulas to functions and allowing such formulas in function calls instead of functions.
The RISC-V Proxy Kernel, pk, is a lightweight application execution environment that can host statically-linked RISC-V ELF binaries. It is designed to support tethered RISC-V implementations with limited I/O capability and thus handles I/O-related system calls by proxying them to a host computer.
This package also contains the Berkeley Boot Loader, bbl, which is a supervisor execution environment for tethered RISC-V systems. It is designed to host the RISC-V Linux port.
Gene set analysis methods exist to combine SNP-level association p-values into gene sets, calculating a single association p-value for each gene set. This package implements two such methods that require only the calculated SNP p-values, the gene set(s) of interest, and a correlation matrix (if desired). One method (GLOSSI) requires independent SNPs and the other (VEGAS) can take into account correlation (LD) among the SNPs. Built-in plotting functions are available to help users visualize results.
nuCpos, a derivative of NuPoP, is an R package for prediction of nucleosome positions. nuCpos calculates local and whole nucleosomal histone binding affinity (HBA) scores for a given 147-bp sequence. Note: This package was designed to demonstrate the use of chemical maps in prediction. As the parental package NuPoP now provides chemical-map-based prediction, the function for dHMM-based prediction was removed from this package. nuCpos continues to provide functions for HBA calculation.
The bootstrap ARDL tests for cointegration is the main functionality of this package. It also acts as a wrapper of the most commond ARDL testing procedures for cointegration: the bound tests of Pesaran, Shin and Smith (PSS; 2001 - <doi:10.1002/jae.616>) and the asymptotic test on the independent variables of Sam, McNown and Goh (SMG: 2019 - <doi:10.1016/j.econmod.2018.11.001>). Bootstrap and bound tests are performed under both the conditional and unconditional ARDL models.
Bisulfite-treated RNA non-conversion in a set of samples is analysed as follows : each sample's non-conversion distribution is identified to a Poisson distribution. P-values adjusted for multiple testing are calculated in each sample. Combined non-conversion P-values and standard errors are calculated on the intersection of the set of samples. For further details, see C Legrand, F Tuorto, M Hartmann, R Liebers, D Jakob, M Helm and F Lyko (2017) <doi:10.1101/gr.210666.116>.
Computes solutions for linear and logistic regression models with potentially high-dimensional categorical predictors. This is done by applying a nonconvex penalty (SCOPE) and computing solutions in an efficient path-wise fashion. The scaling of the solution paths is selected automatically. Includes functionality for selecting tuning parameter lambda by k-fold cross-validation and early termination based on information criteria. Solutions are computed by cyclical block-coordinate descent, iterating an innovative dynamic programming algorithm to compute exact solutions for each block.
This package provides a variety of methods to identify data quality issues in process-oriented data, which are useful to verify data quality in a process mining context. Builds on the class for activity logs implemented in the package bupaR'. Methods to identify data quality issues either consider each activity log entry independently (e.g. missing values, activity duration outliers,...), or focus on the relation amongst several activity log entries (e.g. batch registrations, violations of the expected activity order,...).
The purpose of this package is to tests whether a given moment of the distribution of a given sample is finite or not. For heavy-tailed distributions with tail exponent b, only moments of order smaller than b are finite. Tail exponent and heavy- tailedness are notoriously difficult to ascertain. But the finiteness of moments (including fractional moments) can be tested directly. This package does that following the test suggested by Trapani (2016) <doi:10.1016/j.jeconom.2015.08.006>.
Automatically performs desired statistical tests (e.g. wilcox.test(), t.test()) to compare between groups, and adds the resulting p-values to the plot with an annotation bar. Visualizing group differences are frequently performed by boxplots, bar plots, etc. Statistical test results are often needed to be annotated on these plots. This package provides a convenient function that works on ggplot2 objects, performs the desired statistical test between groups of interest and annotates the test results on the plot.
Addons for the mice package to perform multiple imputation using chained equations with two-level data. Includes imputation methods dedicated to sporadically and systematically missing values. Imputation of continuous, binary or count variables are available. Following the recommendations of Audigier, V. et al (2018) <doi:10.1214/18-STS646>, the choice of the imputation method for each variable can be facilitated by a default choice tuned according to the structure of the incomplete dataset. Allows parallel calculation and overimputation for mice'.
Recently, multiple marginal variable selection methods have been developed and shown to be effective in Gene-Environment interactions studies. We propose a novel marginal Bayesian variable selection method for Gene-Environment interactions studies. In particular, our marginal Bayesian method is robust to data contamination and outliers in the outcome variables. With the incorporation of spike-and-slab priors, we have implemented the Gibbs sampler based on Markov Chain Monte Carlo. The core algorithms of the package have been developed in C++'.
This package offers three important components: (1) to construct a use-defined linear mixed model, (2) to employ one of linear mixed model approaches: minimum norm quadratic unbiased estimation (MINQUE) (Rao, 1971) for variance component estimation and random effect prediction; and (3) to employ a jackknife resampling technique to conduct various statistical tests. In addition, this package provides the function for model or data evaluations.This R package offers fast computations for large data sets analyses for various irregular data structures.
Segregation is a network-level property such that edges between predefined groups of vertices are relatively less likely. Network homophily is a individual-level tendency to form relations with people who are similar on some attribute (e.g. gender, music taste, social status, etc.). In general homophily leads to segregation, but segregation might arise without homophily. This package implements descriptive indices measuring homophily/segregation. It is a computational companion to Bojanowski & Corten (2014) <doi:10.1016/j.socnet.2014.04.001>.
This package implements a unified framework of parametric simplex method for a variety of sparse learning problems (e.g., Dantzig selector (for linear regression), sparse quantile regression, sparse support vector machines, and compressive sensing) combined with efficient hyper-parameter selection strategies. The core algorithm is implemented in C++ with Eigen3 support for portable high performance linear algebra. For more details about parametric simplex method, see Haotian Pang (2017) <https://papers.nips.cc/paper/6623-parametric-simplex-method-for-sparse-learning.pdf>.
The plsdof package provides Degrees of Freedom estimates for Partial Least Squares (PLS) Regression. Model selection for PLS is based on various information criteria (aic, bic, gmdl) or on cross-validation. Estimates for the mean and covariance of the PLS regression coefficients are available. They allow the construction of approximate confidence intervals and the application of test procedures (Kramer and Sugiyama 2012 <doi:10.1198/jasa.2011.tm10107>). Further, cross-validation procedures for Ridge Regression and Principal Components Regression are available.
This package provides tools for cross-validated Lasso and Post-Lasso estimation. Built on top of the glmnet package by Friedman, Hastie and Tibshirani (2010) <doi:10.18637/jss.v033.i01>, the main function plasso() extends the standard glmnet output with coefficient paths for Post-Lasso models, while cv.plasso() performs cross-validation for both Lasso and Post-Lasso models and different ways to select the penalty parameter lambda as discussed in Knaus (2021) <doi:10.1111/rssa.12623>.
This package provides functions to infer co-mapping trait hotspots and causal models. Chaibub Neto E, Keller MP, Broman AF, Attie AD, Jansen RC, Broman KW, Yandell BS (2012) Quantile-based permutation thresholds for QTL hotspots. Genetics 191 : 1355-1365. <doi:10.1534/genetics.112.139451>. Chaibub Neto E, Broman AT, Keller MP, Attie AD, Zhang B, Zhu J, Yandell BS (2013) Modeling causality for pairs of phenotypes in system genetics. Genetics 193 : 1003-1013. <doi:10.1534/genetics.112.147124>.