Manager of tick-by-tick transaction data that performs cleaning', aggregation and import in an efficient and fast way. The package engine, written in C++, exploits the zlib and gzstream libraries to handle gzipped data without need to uncompress them. Cleaning and aggregation are performed according to Brownlees and Gallo (2006) <DOI:10.1016/j.csda.2006.09.030>. Currently, TAQMNGR processes raw data from WRDS (Wharton Research Data Service, <https://wrds-web.wharton.upenn.edu/wrds/>).
Offers a wide range of functions for reading and writing data in various file formats, including CSV, RDS, Excel and ZIP files. Additionally, it provides functions for retrieving metadata associated with files, such as file size and creation date, making it easy to manage and organize large data sets. This package is designed to simplify data import and export tasks, and provide users with a comprehensive set of tools to work with different types of data files.
The provided benchmark suite enables the automated evaluation and comparison of any existing and novel indirect method for reference interval ('RI') estimation in a systematic way. Indirect methods take routine measurements of diagnostic tests, containing pathological and non-pathological samples as input and use sophisticated statistical methods to derive a model describing the distribution of the non-pathological samples, which can then be used to derive reference intervals. The benchmark suite contains 5,760 simulated test sets with varying difficulty. To include any indirect method, a custom wrapper function needs to be provided. The package offers functions for generating the test sets, executing the indirect method and evaluating the results. See ?RIbench or vignette("RIbench_package") for a more comprehensive description of the features. A detailed description and application is described in Ammer T., Schuetzenmeister A., Prokosch H.-U., Zierk J., Rank C.M., Rauh M. "RIbench: A Proposed Benchmark for the Standardized Evaluation of Indirect Methods for Reference Interval Estimation". Clinical Chemistry (2022) <doi:10.1093/clinchem/hvac142>.
This package provides an implementation of the framework of reversed graph embedding (RGE) which projects data into a reduced dimensional space while constructs a principal tree which passes through the middle of the data simultaneously. DDRTree shows superiority to alternatives (Wishbone, DPT) for inferring the ordering as well as the intrinsic structure of single cell genomics data. In general, it could be used to reconstruct the temporal progression as well as the bifurcation structure of any data type.
Building modeling packages is hard. A large amount of effort generally goes into providing an implementation for a new method that is efficient, fast, and correct, but often less emphasis is put on the user interface. A good interface requires specialized knowledge about S3 methods and formulas, which the average package developer might not have. The goal of hardhat is to reduce the burden around building new modeling packages by providing functionality for preprocessing, predicting, and validating input.
RewriteFS is a FUSE to change the name of accessed files on the fly based on any number of regular expressions. It's like the rewrite action of many Web servers, but for your file system. For example, it can help keep your home directory tidy by transparently rewriting the location of configuration files of software that doesn't follow the XDG directory specification from ~/.name to ~/.config/name.
Obtain coordinate system metadata from various data formats. There are functions to extract a CRS (coordinate reference system, <https://en.wikipedia.org/wiki/Spatial_reference_system>) in EPSG (European Petroleum Survey Group, <http://www.epsg.org/>), PROJ4 <https://proj.org/>, or WKT2 (Well-Known Text 2, <http://docs.opengeospatial.org/is/12-063r5/12-063r5.html>) forms. This is purely for getting simple metadata from in-memory formats, please use other tools for out of memory data sources.
This package contains an implementation of a confounding robust independent component analysis (ICA) for noisy and grouped data. The main function coroICA() performs a blind source separation, by maximizing an independence across sources and allows to adjust for varying confounding based on user-specified groups. Additionally, the package contains the function uwedge() which can be used to approximately jointly diagonalize a list of matrices. For more details see the project website <https://sweichwald.de/coroICA/>.
Comprehensive toolkit for addressing selection bias in binary disease models across diverse non-probability samples, each with unique selection mechanisms. It utilizes Inverse Probability Weighting (IPW) and Augmented Inverse Probability Weighting (AIPW) methods to reduce selection bias effectively in multiple non-probability cohorts by integrating data from either individual-level or summary-level external sources. The package also provides a variety of variance estimation techniques. Please refer to Kundu et al. <doi:10.48550/arXiv.2412.00228>.
This package provides a general estimation framework for multi-state Markov processes with flexible specification of the transition intensities. The log-transition intensities can be specified through Generalised Additive Models which allow for virtually any type of covariate effect. Elementary specifications such as time-homogeneous processes and simple parametric forms are also supported. There are no limitations on the type of process one can assume, with both forward and backward transitions allowed and virtually any number of states.
This package provides convenient access to the official spatial datasets of Peru as sf objects in R. This package includes a wide range of geospatial data covering various aspects of Peruvian geography, such as: administrative divisions (Source: INEI <https://ide.inei.gob.pe/>), protected natural areas (Source: GEO ANP - SERNANP <https://geo.sernanp.gob.pe/visorsernanp/>). All datasets are harmonized in terms of attributes, projection, and topology, ensuring consistency and ease of use for spatial analysis and visualization.
An R package that allows for combining tree-boosting with Gaussian process and mixed effects models. It also allows for independently doing tree-boosting as well as inference and prediction for Gaussian process and mixed effects models. See <https://github.com/fabsig/GPBoost> for more information on the software and Sigrist (2022, JMLR) <https://www.jmlr.org/papers/v23/20-322.html> and Sigrist (2023, TPAMI) <doi:10.1109/TPAMI.2022.3168152> for more information on the methodology.
This package provides tools for processing and analyzing .har and .sl4 files, making it easier for GEMPACK users and GTAP researchers to handle large economic datasets. It simplifies the management of multiple experiment results, enabling faster and more efficient comparisons without complexity. Users can extract, restructure, and merge data seamlessly, ensuring compatibility across different tools. The processed data can be exported and used in R', Stata', Python', Julia', or any software that supports Text, CSV, or Excel formats.
This package provides a framework for clustering longitudinal datasets in a standardized way. The package provides an interface to existing R packages for clustering longitudinal univariate trajectories, facilitating reproducible and transparent analyses. Additionally, standard tools are provided to support cluster analyses, including repeated estimation, model validation, and model assessment. The interface enables users to compare results between methods, and to implement and evaluate new methods with ease. The akmedoids package is available from <https://github.com/MAnalytics/akmedoids>.
Assessment and diagnostics for comparing competing clustering solutions, using predictive models. The main intended use is for comparing clustering/classification solutions of ecological data (e.g. presence/absence, counts, ordinal scores) to 1) find an optimal partitioning solution, 2) identify characteristic species and 3) refine a classification by merging clusters that increase predictive performance. However, in a more general sense, this package can do the above for any set of clustering solutions for i observations of j variables.
Generates binary test data based on Item Response Theory using the two-parameter logistic model (Lord, 1980 <doi:10.4324/9780203056615>). Useful functions for test equating are included, e.g. functions for generating internal and external common items between test forms and a function to create a linkage plans between those forms. Ancillary functions for generating true item and person parameters as well as for calculating the probability of a person correctly answering an item are also included.
This package provides a collection of tools to access prepared air quality monitoring data files from web servers with ease and speed. Air quality data are sourced from open and publicly accessible repositories and can be found in these locations: <https://www.eea.europa.eu/data-and-maps/data/airbase-the-european-air-quality-database-8> and <https://discomap.eea.europa.eu/map/fme/AirQualityExport.htm>. The web server space has been provided by Ricardo Energy & Environment.
Spatial model calculation for static and dynamic panel data models, weights matrix creation and Bayesian model comparison. Bayesian model comparison methods were described by LeSage (2014) <doi:10.1016/j.spasta.2014.02.002>. The Lee'-'Yu transformation approach is described in Yu', De Jong and Lee (2008) <doi:10.1016/j.jeconom.2008.08.002>, Lee and Yu (2010) <doi:10.1016/j.jeconom.2009.08.001> and Lee and Yu (2010) <doi:10.1017/S0266466609100099>.
Calculate solar potential for LiDAR point clouds using the VOSTOK (Voxel Octree Solar Toolkit) algorithm. This R program provides an interface to the original VOSTOK C++ implementation by Bechtold and Hofle (2020), enabling efficient ray casting and solar position algorithms to compute solar irradiance for each point while accounting for shadowing effects. Integrates seamlessly with the lidR package for LiDAR data processing workflows. The original VOSTOK toolkit is available at <doi:10.11588/data/QNA02B>.
The concept of reliable and clinically significant change (Jacobson & Truax, 1991) helps you answer the following questions for a sample with two measurements at different points in time (pre & post): Which proportion of my sample has a (considering the reliability of the instrument) probably not-just-by-chance difference in pre- vs. post-scores? Which proportion of my sample does not only change in a statistically significant way (see question one), but also in a clinically significant way (e.g. change from a test score regarded "dysfunctional" to a score regarded "functional")? This package allows you to very easily create a scatterplot of your sample in which the x-axis maps to the pre-scores, the y-axis maps to the post-scores and several graphical elements (lines, colors) allow you to gain a quick overview about reliable changes in these scores. An example of this kind of plot is Figure 2 of Jacobson & Truax (1991). Referenced article: Jacobson, N. S., & Truax, P. (1991) <doi:10.1037/0022-006X.59.1.12>.
Guile RDF is an implementation of the RDF (Resource Description Framework) format defined by the W3C for GNU Guile. RDF structures include triples (facts with a subject, a predicate and an object), graphs which are sets of triples, and datasets, which are collections of graphs.
RDF specifications include the specification of concrete syntaxes and of operations on graphs. This library implements some basic functionalities, such as parsing and producing turtle and nquads syntax, as well as manipulating graphs and datasets.
This package provides whole-genome mappability tracks on human hg19/hg38 assembly. We employed the 100-mers mappability track from the ENCODE Project and computed weighted average of the mappability scores if multiple ENCODE regions overlap with the same bin. “Blacklist” bins, including segmental duplication regions and gaps in reference assembly from telomere, centromere, and/or heterochromatin regions are included. The dataset consists of three assembled .bam files of single-cell whole genome sequencing from 10X for illustration purposes.
This package provides a collection of functions to test spatial autocorrelation between variables, including Moran I, Geary C and Getis G together with scatter plots, functions for mapping and identifying clusters and outliers, functions associated with the moments of the previous statistics that will allow testing whether there is bivariate spatial autocorrelation, and a function that allows identifying (visualizing neighbours) on the map, the neighbors of any region once the scheme of the spatial weights matrix has been established.
Time series analysis of network connectivity. Detects and visualizes change points between networks. Methods included in the package are discussed in depth in Baek, C., Gates, K. M., Leinwand, B., Pipiras, V. (2021) "Two sample tests for high-dimensional auto-covariances" <doi:10.1016/j.csda.2020.107067> and Baek, C., Gampe, M., Leinwand B., Lindquist K., Hopfinger J. and Gates K. (2023) â Detecting functional connectivity changes in fMRI dataâ <doi:10.1007/s11336-023-09908-7>.