Fastseg implements a very fast and efficient segmentation algorithm. It can segment data from DNA microarrays and data from next generation sequencing for example to detect copy number segments. Further it can segment data from RNA microarrays like tiling arrays to identify transcripts. Most generally, it can segment data given as a matrix or as a vector. Various data formats can be used as input to fastseg like expression set objects for microarrays or GRanges for sequencing data.
This package provides a flexible approach to Bayesian optimization / model based optimization building on the bbotk package. The mlr3mbo is a toolbox providing both ready-to-use optimization algorithms as well as their fundamental building blocks allowing for straightforward implementation of custom algorithms. Single- and multi-objective optimization is supported as well as mixed continuous, categorical and conditional search spaces. Moreover, using mlr3mbo for hyperparameter optimization of machine learning models within the mlr3 ecosystem is straightforward via mlr3tuning.
The provided benchmark suite enables the automated evaluation and comparison of any existing and novel indirect method for reference interval ('RI') estimation in a systematic way. Indirect methods take routine measurements of diagnostic tests, containing pathological and non-pathological samples as input and use sophisticated statistical methods to derive a model describing the distribution of the non-pathological samples, which can then be used to derive reference intervals. The benchmark suite contains 5,760 simulated test sets with varying difficulty. To include any indirect method, a custom wrapper function needs to be provided. The package offers functions for generating the test sets, executing the indirect method and evaluating the results. See ?RIbench or vignette("RIbench_package") for a more comprehensive description of the features. A detailed description and application is described in Ammer T., Schuetzenmeister A., Prokosch H.-U., Zierk J., Rank C.M., Rauh M. "RIbench: A Proposed Benchmark for the Standardized Evaluation of Indirect Methods for Reference Interval Estimation". Clinical Chemistry (2022) <doi:10.1093/clinchem/hvac142>.
Obtain coordinate system metadata from various data formats. There are functions to extract a CRS (coordinate reference system, <https://en.wikipedia.org/wiki/Spatial_reference_system>) in EPSG (European Petroleum Survey Group, <http://www.epsg.org/>), PROJ4 <https://proj.org/>, or WKT2 (Well-Known Text 2, <http://docs.opengeospatial.org/is/12-063r5/12-063r5.html>) forms. This is purely for getting simple metadata from in-memory formats, please use other tools for out of memory data sources.
This package contains an implementation of a confounding robust independent component analysis (ICA) for noisy and grouped data. The main function coroICA() performs a blind source separation, by maximizing an independence across sources and allows to adjust for varying confounding based on user-specified groups. Additionally, the package contains the function uwedge() which can be used to approximately jointly diagonalize a list of matrices. For more details see the project website <https://sweichwald.de/coroICA/>.
Comprehensive toolkit for addressing selection bias in binary disease models across diverse non-probability samples, each with unique selection mechanisms. It utilizes Inverse Probability Weighting (IPW) and Augmented Inverse Probability Weighting (AIPW) methods to reduce selection bias effectively in multiple non-probability cohorts by integrating data from either individual-level or summary-level external sources. The package also provides a variety of variance estimation techniques. Please refer to Kundu et al. <doi:10.48550/arXiv.2412.00228>.
Estimates RxC (R by C) vote transfer matrices (ecological contingency tables) from aggregate data building on Thomsen (1987) and Park (2008) approaches. References: Park, W.-H. (2008). Ecological Inference and Aggregate Analysis of Election''. PhD Dissertation. University of Michigan. <https://deepblue.lib.umich.edu/bitstream/handle/2027.42/58525/wpark_1.pdf> Thomsen, S.R. (1987, ISBN:87-7335-037-2). Danish Elections 1920 79: a Logit Approach to Ecological Analysis and Inference''. Politica, Aarhus, Denmark.
This package provides a general estimation framework for multi-state Markov processes with flexible specification of the transition intensities. The log-transition intensities can be specified through Generalised Additive Models which allow for virtually any type of covariate effect. Elementary specifications such as time-homogeneous processes and simple parametric forms are also supported. There are no limitations on the type of process one can assume, with both forward and backward transitions allowed and virtually any number of states.
An R package that allows for combining tree-boosting with Gaussian process and mixed effects models. It also allows for independently doing tree-boosting as well as inference and prediction for Gaussian process and mixed effects models. See <https://github.com/fabsig/GPBoost> for more information on the software and Sigrist (2022, JMLR) <https://www.jmlr.org/papers/v23/20-322.html> and Sigrist (2023, TPAMI) <doi:10.1109/TPAMI.2022.3168152> for more information on the methodology.
This package provides convenient access to the official spatial datasets of Peru as sf objects in R. This package includes a wide range of geospatial data covering various aspects of Peruvian geography, such as: administrative divisions (Source: INEI <https://ide.inei.gob.pe/>), protected natural areas (Source: GEO ANP - SERNANP <https://geo.sernanp.gob.pe/visorsernanp/>). All datasets are harmonized in terms of attributes, projection, and topology, ensuring consistency and ease of use for spatial analysis and visualization.
This package provides tools for processing and analyzing .har and .sl4 files, making it easier for GEMPACK users and GTAP researchers to handle large economic datasets. It simplifies the management of multiple experiment results, enabling faster and more efficient comparisons without complexity. Users can extract, restructure, and merge data seamlessly, ensuring compatibility across different tools. The processed data can be exported and used in R', Stata', Python', Julia', or any software that supports Text, CSV, or Excel formats.
This package provides a framework for clustering longitudinal datasets in a standardized way. The package provides an interface to existing R packages for clustering longitudinal univariate trajectories, facilitating reproducible and transparent analyses. Additionally, standard tools are provided to support cluster analyses, including repeated estimation, model validation, and model assessment. The interface enables users to compare results between methods, and to implement and evaluate new methods with ease. The akmedoids package is available from <https://github.com/MAnalytics/akmedoids>.
Assessment and diagnostics for comparing competing clustering solutions, using predictive models. The main intended use is for comparing clustering/classification solutions of ecological data (e.g. presence/absence, counts, ordinal scores) to 1) find an optimal partitioning solution, 2) identify characteristic species and 3) refine a classification by merging clusters that increase predictive performance. However, in a more general sense, this package can do the above for any set of clustering solutions for i observations of j variables.
Spatial model calculation for static and dynamic panel data models, weights matrix creation and Bayesian model comparison. Bayesian model comparison methods were described by LeSage (2014) <doi:10.1016/j.spasta.2014.02.002>. The Lee'-'Yu transformation approach is described in Yu', De Jong and Lee (2008) <doi:10.1016/j.jeconom.2008.08.002>, Lee and Yu (2010) <doi:10.1016/j.jeconom.2009.08.001> and Lee and Yu (2010) <doi:10.1017/S0266466609100099>.
Implementation of all possible forms of 2x2 and 3x3 space-filling curves, i.e., the generalized forms of the Hilbert curve <https://en.wikipedia.org/wiki/Hilbert_curve>, the Peano curve <https://en.wikipedia.org/wiki/Peano_curve> and the Peano curve in the meander type (Figure 5 in <https://eudml.org/doc/141086>). It can generates nxn curves expanded from any specific level-1 units. It also implements the H-curve and the three-dimensional Hilbert curve.
This package provides a collection of tools to access prepared air quality monitoring data files from web servers with ease and speed. Air quality data are sourced from open and publicly accessible repositories and can be found in these locations: <https://www.eea.europa.eu/data-and-maps/data/airbase-the-european-air-quality-database-8> and <https://discomap.eea.europa.eu/map/fme/AirQualityExport.htm>. The web server space has been provided by Ricardo Energy & Environment.
Generates binary test data based on Item Response Theory using the two-parameter logistic model (Lord, 1980 <doi:10.4324/9780203056615>). Useful functions for test equating are included, e.g. functions for generating internal and external common items between test forms and a function to create a linkage plans between those forms. Ancillary functions for generating true item and person parameters as well as for calculating the probability of a person correctly answering an item are also included.
Building modeling packages is hard. A large amount of effort generally goes into providing an implementation for a new method that is efficient, fast, and correct, but often less emphasis is put on the user interface. A good interface requires specialized knowledge about S3 methods and formulas, which the average package developer might not have. The goal of hardhat is to reduce the burden around building new modeling packages by providing functionality for preprocessing, predicting, and validating input.
This package provides an implementation of the framework of reversed graph embedding (RGE) which projects data into a reduced dimensional space while constructs a principal tree which passes through the middle of the data simultaneously. DDRTree shows superiority to alternatives (Wishbone, DPT) for inferring the ordering as well as the intrinsic structure of single cell genomics data. In general, it could be used to reconstruct the temporal progression as well as the bifurcation structure of any data type.
RewriteFS is a FUSE to change the name of accessed files on the fly based on any number of regular expressions. It's like the rewrite action of many Web servers, but for your file system. For example, it can help keep your home directory tidy by transparently rewriting the location of configuration files of software that doesn't follow the XDG directory specification from ~/.name to ~/.config/name.
The concept of reliable and clinically significant change (Jacobson & Truax, 1991) helps you answer the following questions for a sample with two measurements at different points in time (pre & post): Which proportion of my sample has a (considering the reliability of the instrument) probably not-just-by-chance difference in pre- vs. post-scores? Which proportion of my sample does not only change in a statistically significant way (see question one), but also in a clinically significant way (e.g. change from a test score regarded "dysfunctional" to a score regarded "functional")? This package allows you to very easily create a scatterplot of your sample in which the x-axis maps to the pre-scores, the y-axis maps to the post-scores and several graphical elements (lines, colors) allow you to gain a quick overview about reliable changes in these scores. An example of this kind of plot is Figure 2 of Jacobson & Truax (1991). Referenced article: Jacobson, N. S., & Truax, P. (1991) <doi:10.1037/0022-006X.59.1.12>.
This package provides a collection of functions to test spatial autocorrelation between variables, including Moran I, Geary C and Getis G together with scatter plots, functions for mapping and identifying clusters and outliers, functions associated with the moments of the previous statistics that will allow testing whether there is bivariate spatial autocorrelation, and a function that allows identifying (visualizing neighbours) on the map, the neighbors of any region once the scheme of the spatial weights matrix has been established.
Time series analysis of network connectivity. Detects and visualizes change points between networks. Methods included in the package are discussed in depth in Baek, C., Gates, K. M., Leinwand, B., Pipiras, V. (2021) "Two sample tests for high-dimensional auto-covariances" <doi:10.1016/j.csda.2020.107067> and Baek, C., Gampe, M., Leinwand B., Lindquist K., Hopfinger J. and Gates K. (2023) â Detecting functional connectivity changes in fMRI dataâ <doi:10.1007/s11336-023-09908-7>.
Interactive R tutorials and datasets for the textbook Field (2026), "Discovering Statistics Using R and RStudio", <https://www.discovr.rocks/>. Interactive tutorials cover general workflow in R and RStudio', summarizing data, visualizing data, fitting models and bias, correlation, the general linear model (GLM), moderation, mediation, missing values, comparing means using the GLM (analysis of variance), comparing adjusted means (analysis of covariance), factorial designs, repeated measures designs, exploratory factor analysis (EFA). There are no functions, only datasets and interactive tutorials.