This package provides a collection of tools to access prepared air quality monitoring data files from web servers with ease and speed. Air quality data are sourced from open and publicly accessible repositories and can be found in these locations: <https://www.eea.europa.eu/data-and-maps/data/airbase-the-european-air-quality-database-8> and <https://discomap.eea.europa.eu/map/fme/AirQualityExport.htm>
. The web server space has been provided by Ricardo Energy & Environment.
Generates binary test data based on Item Response Theory using the two-parameter logistic model (Lord, 1980 <doi:10.4324/9780203056615>). Useful functions for test equating are included, e.g. functions for generating internal and external common items between test forms and a function to create a linkage plans between those forms. Ancillary functions for generating true item and person parameters as well as for calculating the probability of a person correctly answering an item are also included.
Fastseg implements a very fast and efficient segmentation algorithm. It can segment data from DNA microarrays and data from next generation sequencing for example to detect copy number segments. Further it can segment data from RNA microarrays like tiling arrays to identify transcripts. Most generally, it can segment data given as a matrix or as a vector. Various data formats can be used as input to fastseg like expression set objects for microarrays or GRanges for sequencing data.
This package provides a flexible approach to Bayesian optimization / model based optimization building on the bbotk package. The mlr3mbo is a toolbox providing both ready-to-use optimization algorithms as well as their fundamental building blocks allowing for straightforward implementation of custom algorithms. Single- and multi-objective optimization is supported as well as mixed continuous, categorical and conditional search spaces. Moreover, using mlr3mbo for hyperparameter optimization of machine learning models within the mlr3 ecosystem is straightforward via mlr3tuning.
The provided benchmark suite enables the automated evaluation and comparison of any existing and novel indirect method for reference interval ('RI') estimation in a systematic way. Indirect methods take routine measurements of diagnostic tests, containing pathological and non-pathological samples as input and use sophisticated statistical methods to derive a model describing the distribution of the non-pathological samples, which can then be used to derive reference intervals. The benchmark suite contains 5,760 simulated test sets with varying difficulty. To include any indirect method, a custom wrapper function needs to be provided. The package offers functions for generating the test sets, executing the indirect method and evaluating the results. See ?RIbench or vignette("RIbench_package") for a more comprehensive description of the features. A detailed description and application is described in Ammer T., Schuetzenmeister A., Prokosch H.-U., Zierk J., Rank C.M., Rauh M. "RIbench: A Proposed Benchmark for the Standardized Evaluation of Indirect Methods for Reference Interval Estimation". Clinical Chemistry (2022) <doi:10.1093/clinchem/hvac142>.
This package provides a collection of functions to test spatial autocorrelation between variables, including Moran I, Geary C and Getis G together with scatter plots, functions for mapping and identifying clusters and outliers, functions associated with the moments of the previous statistics that will allow testing whether there is bivariate spatial autocorrelation, and a function that allows identifying (visualizing neighbours) on the map, the neighbors of any region once the scheme of the spatial weights matrix has been established.
Time series analysis of network connectivity. Detects and visualizes change points between networks. Methods included in the package are discussed in depth in Baek, C., Gates, K. M., Leinwand, B., Pipiras, V. (2021) "Two sample tests for high-dimensional auto-covariances" <doi:10.1016/j.csda.2020.107067> and Baek, C., Gampe, M., Leinwand B., Lindquist K., Hopfinger J. and Gates K. (2023) â Detecting functional connectivity changes in fMRI
dataâ <doi:10.1007/s11336-023-09908-7>.
This package provides functions to test for gene x gene interactions in a bi-parental population of inbred lines. The data are fitted with the mixed linear model described in Rio et al. (2022) <doi:10.1101/2022.12.18.520958>, that accounts for gene x gene interactions at both the fixed effect and variance levels. The package also provides graphical tools to display the gene x gene interaction trend at the mean level and the variance component analysis.
Inference of a multi-states birth-death model from a phylogeny, comprising a number of states N, birth and death rates for each state and on which edges each state appears. Inference is done using a hybrid approach: states are progressively added in a greedy approach. For a fixed number of states N the best model is selected via maximum likelihood. Reference: J. Barido-Sottani, T. G. Vaughan and T. Stadler (2018) <doi:10.1098/rsif.2018.0512>.
Estimating GARCH-MIDAS (MIxed-DAta-Sampling) models (Engle, Ghysels, Sohn, 2013, <doi:10.1162/REST_a_00300>) and related statistical inference, accompanying the paper "Two are better than one: Volatility forecasting using multiplicative component GARCH models" by Conrad and Kleen (2020, <doi:10.1002/jae.2742>). The GARCH-MIDAS model decomposes the conditional variance of (daily) stock returns into a short- and long-term component, where the latter may depend on an exogenous covariate sampled at a lower frequency.
R interface for the netstat command line utility used to retrieve and parse commonly used network statistics, including available and in-use transmission control protocol (TCP) ports. Primers offering technical background information on the netstat command line utility are available in the "Linux System Administrator's Manual" by Michael Kerrisk (2014) <https://man7.org/linux/man-pages/man8/netstat.8.html>, and on the Microsoft website (2017) <https://docs.microsoft.com/en-us/windows-server/administration/windows-commands/netstat>.
An R-package-version of an open online science-based personality test from <https://openpsychometrics.org/tests/IPIP-BFFM/>, providing a better-designed interface and a more detailed report. The core command launch_test()
opens a personality test in your browser, and generates a report after you click "Submit". In this report, your results are compared with other people's, to show what these results mean. Other people's data is from <https://openpsychometrics.org/_rawdata/BIG5.zip>.
The Common Workflow Language <https://www.commonwl.org/> is an open standard for describing data analysis workflows. This package takes the raw Common Workflow Language workflows encoded in JSON or YAML and turns the workflow elements into tidy data frames or lists. A graph representation for the workflow can be constructed and visualized with the parsed workflow inputs, outputs, and steps. Users can embed the visualizations in their Shiny applications, and export them as HTML files or static images.
Simple trustworthy utility functions to use TauDEM
(Terrain Analysis Using Digital Elevation Models <https://hydrology.usu.edu/taudem/taudem5/>) command-line interface. This package provides a guide to installation of TauDEM
and its dependencies GDAL (Geopatial Data Abstraction Library) and MPI (Message Passing Interface) for different operating systems. Moreover, it checks that TauDEM
and its dependencies are correctly installed and included to the PATH, and it provides wrapper commands for calling TauDEM
methods from R.
This package implements an algorithm for Latent Dirichlet Allocation (LDA), Blei et at. (2003) <https://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf>, using style conventions from the tidyverse', Wickham et al. (2019)<doi:10.21105/joss.01686>, and tidymodels', Kuhn et al.<https://tidymodels.github.io/model-implementation-principles/>. Fitting is done via collapsed Gibbs sampling. Also implements several novel features for LDA such as guided models and transfer learning based on ongoing and, as yet, unpublished research.
The shiny application Wallace is a modular platform for reproducible modeling of species niches and distributions. Wallace guides users through a complete analysis, from the acquisition of species occurrence and environmental data to visualizing model predictions on an interactive map, thus bundling complex workflows into a single, streamlined interface. An extensive vignette, which guides users through most package functionality can be found on the package's GitHub
Pages website: <https://wallaceecomod.github.io/wallace/articles/tutorial-v2.html>.
This tool enables in-database scoring of XGBoost models built in R, by translating trained model objects into SQL query. XGBoost <https://xgboost.readthedocs.io/en/latest/index.html> provides parallel tree boosting (also known as gradient boosting machine, or GBM) algorithms in a highly efficient, flexible and portable way. GBM algorithm is introduced by Friedman (2001) <doi:10.1214/aos/1013203451>, and more details on XGBoost can be found in Chen & Guestrin (2016) <doi:10.1145/2939672.2939785>.
This package provides whole-genome mappability tracks on human hg19/hg38 assembly. We employed the 100-mers mappability track from the ENCODE Project and computed weighted average of the mappability scores if multiple ENCODE regions overlap with the same bin. “Blacklist” bins, including segmental duplication regions and gaps in reference assembly from telomere, centromere, and/or heterochromatin regions are included. The dataset consists of three assembled .bam files of single-cell whole genome sequencing from 10X for illustration purposes.
Building modeling packages is hard. A large amount of effort generally goes into providing an implementation for a new method that is efficient, fast, and correct, but often less emphasis is put on the user interface. A good interface requires specialized knowledge about S3 methods and formulas, which the average package developer might not have. The goal of hardhat
is to reduce the burden around building new modeling packages by providing functionality for preprocessing, predicting, and validating input.
This package provides an implementation of the framework of reversed graph embedding (RGE) which projects data into a reduced dimensional space while constructs a principal tree which passes through the middle of the data simultaneously. DDRTree shows superiority to alternatives (Wishbone, DPT) for inferring the ordering as well as the intrinsic structure of single cell genomics data. In general, it could be used to reconstruct the temporal progression as well as the bifurcation structure of any data type.
RewriteFS is a FUSE to change the name of accessed files on the fly based on any number of regular expressions. It's like the rewrite
action of many Web servers, but for your file system. For example, it can help keep your home directory tidy by transparently rewriting the location of configuration files of software that doesn't follow the XDG directory specification from ~/.name
to ~/.config/name
.
The concept of reliable and clinically significant change (Jacobson & Truax, 1991) helps you answer the following questions for a sample with two measurements at different points in time (pre & post): Which proportion of my sample has a (considering the reliability of the instrument) probably not-just-by-chance difference in pre- vs. post-scores? Which proportion of my sample does not only change in a statistically significant way (see question one), but also in a clinically significant way (e.g. change from a test score regarded "dysfunctional" to a score regarded "functional")? This package allows you to very easily create a scatterplot of your sample in which the x-axis maps to the pre-scores, the y-axis maps to the post-scores and several graphical elements (lines, colors) allow you to gain a quick overview about reliable changes in these scores. An example of this kind of plot is Figure 2 of Jacobson & Truax (1991). Referenced article: Jacobson, N. S., & Truax, P. (1991) <doi:10.1037/0022-006X.59.1.12>.
Enable translation of a tiny subset of R to C++. The user has to define a R function which gets translated. For a full list of possible functions check the documentation. After translation an R function is returned which is a shallow wrapper around the C++ code. Alternatively an external pointer to the C++ function is returned to the user. The intention of the package is to generate fast functions which can be used as ode-system or during optimization.
Bayesian quantile regression using the asymmetric Laplace distribution, both continuous as well as binary dependent variables are supported. The package consists of implementations of the methods of Yu & Moyeed (2001) <doi:10.1016/S0167-7152(01)00124-9>, Benoit & Van den Poel (2012) <doi:10.1002/jae.1216> and Al-Hamzawi, Yu & Benoit (2012) <doi:10.1177/1471082X1101200304>. To speed up the calculations, the Markov Chain Monte Carlo core of all algorithms is programmed in Fortran and called from R.