In computationally demanding data analysis pipelines, the targets R package (2021, <doi:10.21105/joss.02959>) maintains an up-to-date set of results while skipping tasks that do not need to rerun. This process increases speed and increases trust in the final end product. However, it also overwrites old output with new output, and past results disappear by default. To preserve historical output, the gittargets package captures version-controlled snapshots of the data store, and each snapshot links to the underlying commit of the source code. That way, when the user rolls back the code to a previous branch or commit, gittargets can recover the data contemporaneous with that commit so that all targets remain up to date.
This package provides a small package containing functions to perform a joint calibration of totals and quantiles. The calibration for totals is based on Deville and Särndal (1992) <doi:10.1080/01621459.1992.10475217>, the calibration for quantiles is based on Harms and Duchesne (2006) <https://www150.statcan.gc.ca/n1/en/catalogue/12-001-X20060019255>. The package uses standard calibration via the survey', sampling or laeken packages. In addition, entropy balancing via the ebal package and empirical likelihood based on codes from Wu (2005) <https://www150.statcan.gc.ca/n1/pub/12-001-x/2005002/article/9051-eng.pdf> can be used. See the paper by BerÄ sewicz and Szymkowiak (2023) for details <arXiv:2308.13281>.
We propose a pair of summary measures for the predictive power of a prediction function based on a regression model. The regression model can be linear or nonlinear, parametric, semi-parametric, or nonparametric, and correctly specified or mis-specified. The first measure, R-squared, is an extension of the classical R-squared statistic for a linear model, quantifying the prediction function's ability to capture the variability of the response. The second measure, L-squared, quantifies the prediction function's bias for predicting the mean regression function. When used together, they give a complete summary of the predictive power of a prediction function. Please refer to Gang Li and Xiaoyan Wang (2016) <arXiv:1611.03063> for more details.
It is designed to work with text written in Bahasa Malaysia. We provide functions and data sets that will make working with Bahasa Malaysia text much easier. For word stemming in particular, we will look up the Malay words in a dictionary and then proceed to remove "extra suffix" as explained in Khan, Rehman Ullah, Fitri Suraya Mohamad, Muh Inam UlHaq, Shahren Ahmad Zadi Adruce, Philip Nuli Anding, Sajjad Nawaz Khan, and Abdulrazak Yahya Saleh Al-Hababi (2017) <https://ijrest.net/vol-4-issue-12.html> . This package includes a dictionary of Malay words that may be used to perform word stemming, a dataset of Malay stop words, a dataset of sentiment words and a dataset of normalized words.
Datasets of accompany Harman, a PCA and constrained optimisation based technique. Contains three example datasets: IMR90, Human lung fibroblast cells exposed to nitric oxide; NPM, an experiment to test skin penetration of metal oxide nanoparticles following topical application of sunscreens in non-pregnant mice; OLF; an experiment to gauge the response of human olfactory neurosphere-derived (hONS) cells to ZnO nanoparticles. Since version 1.24, this package also contains the Infinium5 dataset, a set of batch correction adjustments across 5 Illumina Infinium Methylation BeadChip datasets. This file does not contain methylation data, but summary statistics of 5 datasets after correction. There is also an EpiSCOPE_sample file as exampling for the new methylation clustering functionality in Harman.
This package implements comprehensive test data engineering methods as described in Shojima (2022, ISBN:978-9811699856). Provides statistical techniques for engineering and processing test data: Classical Test Theory (CTT) with reliability coefficients for continuous ability assessment; Item Response Theory (IRT) including Rasch, 2PL, and 3PL models with item/test information functions; Latent Class Analysis (LCA) for nominal clustering; Latent Rank Analysis (LRA) for ordinal clustering with automatic determination of cluster numbers; Biclustering methods including infinite relational models for simultaneous clustering of examinees and items without predefined cluster numbers; and Bayesian Network Models (BNM) for visualizing inter-item dependencies. Features local dependence analysis through LRA and biclustering, parameter estimation, dimensionality assessment, and network structure visualization for educational, psychological, and social science research.
When added to an existing shiny app, users may subset any developer-chosen R data.frame on the fly. That is, users are empowered to slice & dice data by applying multiple (order specific) filters using the AND (&) operator between each, and getting real-time updates on the number of rows effected/available along the way. Thus, any downstream processes that leverage this data source (like tables, plots, or statistical procedures) will re-render after new filters are applied. The shiny moduleâ s user interface has a minimalist aesthetic so that the focus can be on the data & other visuals. In addition to returning a reactive (filtered) data.frame, IDEAFilter as also returns dplyr filter statements used to actually slice the data.
Manages, builds and computes statistics and datasets for the construction of quarterly (sub-annual) life tables by exploiting micro-data from either a general or an insured population. References: Pavà a and Lledó (2022) <doi:10.1111/rssa.12769>. Pavà a and Lledó (2023) <doi:10.1017/asb.2023.16>. Pavà a and Lledó (2025) <doi:10.1371/journal.pone.0315937>. Acknowledgements: The authors wish to thank Conselleria de Educación, Universidades y Empleo, Generalitat Valenciana (grants AICO/2021/257; CIAICO/2024/031), Ministerio de Ciencia e Innovación (grant PID2021-128228NB-I00) and Fundación Mapfre (grant Modelización espacial e intra-anual de la mortalidad en España. Una herramienta automática para el calculo de productos de vida') for supporting this research.
Sample surveys use scientific methods to draw inferences about population parameters by observing a representative part of the population, called sample. The SRSWOR (Simple Random Sampling Without Replacement) is one of the most widely used probability sampling designs, wherein every unit has an equal chance of being selected and units are not repeated.This function draws multiple SRSWOR samples from a finite population and estimates the population parameter i.e. total of HT, Ratio, and Regression estimators. Repeated simulations (e.g., 500 times) are used to assess and compare estimators using metrics such as percent relative bias (%RB), percent relative root means square error (%RRMSE).For details on sampling methodology, see, Cochran (1977) "Sampling Techniques" <https://archive.org/details/samplingtechniqu0000coch_t4x6>.
The design of this package allows us to run different clustering packages and compare the results between them, to determine which algorithm behaves best from the data provided. See Martos, L.A.P., Garcà a-Vico, à .M., González, P. et al.(2023) <doi:10.1007/s13748-022-00294-2> "Clustering: an R library to facilitate the analysis and comparison of cluster algorithms.", Martos, L.A.P., Garcà a-Vico, à .M., González, P. et al. "A Multiclustering Evolutionary Hyperrectangle-Based Algorithm" <doi:10.1007/s44196-023-00341-3> and L.A.P., Garcà a-Vico, à .M., González, P. et al. "An Evolutionary Fuzzy System for Multiclustering in Data Streaming" <doi:10.1016/j.procs.2023.12.058>.
This package provides functions to estimate the disparities across categories (e.g. Black and white) that persists if a treatment variable (e.g. college) is equalized. Makes estimates by treatment modeling, outcome modeling, and doubly-robust augmented inverse probability weighting estimation, with standard errors calculated by a nonparametric bootstrap. Cross-fitting is supported. Survey weights are supported for point estimation but not for standard error estimation; those applying this package with complex survey samples should consult the data distributor to select an appropriate approach for standard error construction, which may involve calling the functions repeatedly for many sets of replicate weights provided by the data distributor. The methods in this package are described in Lundberg (2021) <doi:10.31235/osf.io/gx4y3>.
This package provides a comprehensive suite of helper functions designed to facilitate the analysis of genomic annotations from the GENCODE database <https://www.gencodegenes.org/>, supporting both human and mouse genomes. This toolkit enables users to extract, filter, and analyze a wide range of annotation features including genes, transcripts, exons, and introns across different GENCODE releases. It provides functionality for cross-version comparisons, allowing researchers to systematically track annotation updates, structural changes, and feature-level differences between releases. In addition, the package can generate high-quality FASTA files containing donor and acceptor splice site motifs, which are formatted for direct input into the MaxEntScan tool (Yeo and Burge, 2004 <doi:10.1089/1066527041410418>), enabling accurate calculation of splice site strength scores.
This package provides functions to estimate latent dimensions of choice and judgment using Aldrich-McKelvey and Blackbox scaling methods, as described in Poole et al. (2016, <doi:10.18637/jss.v069.i07>). These techniques allow researchers (particularly those analyzing political attitudes, public opinion, and legislative behavior) to recover spatial estimates of political actors ideal points and stimuli from issue scale data, accounting for perceptual bias, multidimensional spaces, and missing data. The package uses singular value decomposition and alternating least squares (ALS) procedures to scale self-placement and perceptual data into a common latent space for the analysis of ideological or evaluative dimensions. Functionality also include tools for assessing model fit, handling complex survey data structures, and reproducing simulated datasets for methodological validation.
This package provides a compilation of tests for hypotheses regarding covariance and correlation matrices for one or more groups. The hypothesis can be specified through a corresponding hypothesis matrix and a vector or by choosing one of the basic hypotheses, while for the structure test, only the latter works. Thereby Monte-Carlo and Bootstrap-techniques are used, and the respective method must be chosen, and the functions provide p-values and mostly also estimators of calculated covariance matrices of test statistics. For more details on the methodology, see Sattler et al. (2022) <doi:10.1016/j.jspi.2021.12.001>, Sattler and Pauly (2024) <doi:10.1007/s11749-023-00906-6>, and Sattler and Dobler (2025) <doi:10.48550/arXiv.2310.11799>.
Takes a distance matrix and plots it as an interactive graph. One point is focused at the center of the graph, around which all other points are plotted in their exact distances as given in the distance matrix. All other non-focus points are plotted as best as possible in relation to one another. Double click on any point to choose a new focus point, and hover over points to see their ID labels. If color label categories are given, hover over colors in the legend to highlight only those points and click on colors to highlight multiple groups. For more information on the rationale and mathematical background, as well as an interactive introduction, see <https://lea-urpa.github.io/focusedMDS.html>.
This is an add-on package to gamlss'. The purpose of this package is to allow users to fit GAMLSS (Generalised Additive Models for Location Scale and Shape) models when the response variable is defined either in the intervals [0,1), (0,1] and [0,1] (inflated at zero and/or one distributions), or in the positive real line including zero (zero-adjusted distributions). The mass points at zero and/or one are treated as extra parameters with the possibility to include a linear predictor for both. The package also allows transformed or truncated distributions from the GAMLSS family to be used for the continuous part of the distribution. Standard methods and GAMLSS diagnostics can be used with the resulting fitted object.
This package provides a suite of computer model test functions that can be used to test and evaluate algorithms for Bayesian (also known as sequential) optimization. Some of the functions have known functional forms, however, most are intended to serve as black-box functions where evaluation requires running computer code that reveals little about the functional forms of the objective and/or constraints. The primary goal of the package is to provide users (especially those who do not have access to real computer models) a source of reproducible and shareable examples that can be used for benchmarking algorithms. The package is a living repository, and so more functions will be added over time. For function suggestions, please do contact the author of the package.
Alternative implementation of the beautiful MissForest algorithm used to impute mixed-type data sets by chaining random forests, introduced by Stekhoven, D.J. and Buehlmann, P. (2012) <doi:10.1093/bioinformatics/btr597>. Under the hood, it uses the lightning fast random forest package ranger'. Between the iterative model fitting, we offer the option of using predictive mean matching. This firstly avoids imputation with values not already present in the original data (like a value 0.3334 in 0-1 coded variable). Secondly, predictive mean matching tries to raise the variance in the resulting conditional distributions to a realistic level. This would allow, e.g., to do multiple imputation when repeating the call to missRanger(). Out-of-sample application is supported as well.
This software has evolved from fisheries research conducted at the Pacific Biological Station (PBS) in Nanaimo', British Columbia, Canada. It extends the R language to include two-dimensional plotting features similar to those commonly available in a Geographic Information System (GIS). Embedded C code speeds algorithms from computational geometry, such as finding polygons that contain specified point events or converting between longitude-latitude and Universal Transverse Mercator (UTM) coordinates. Additionally, we include C++ code developed by Angus Johnson for the Clipper library, data for a global shoreline, and other data sets in the public domain. Under the user's R library directory .libPaths()', specifically in ./PBSmapping/doc', a complete user's guide is offered and should be consulted to use package functions effectively.
The package xmapbridge can plot graphs in the X:Map genome browser. X:Map uses the Google Maps API to provide a scrollable view of the genome. It supports a number of species, and can be accessed at http://xmap.picr.man.ac.uk. This package exports plotting files in a suitable format. Graph plotting in R is done using calls to the functions xmap.plot and xmap.points, which have parameters that aim to be similar to those used by the standard plot methods in R. These result in data being written to a set of files (in a specific directory structure) that contain the data to be displayed, as well as some additional meta-data describing each of the graphs.
Providing equivalent functions for the dummy classifier and regressor used in Python scikit-learn library. Our goal is to allow R users to easily identify baseline performance for their classification and regression problems. Our baseline models use no predictors, and are useful in cases of class imbalance, multiclass classification, and when users want to quickly identify how much improvement their statistical and machine learning models are over several baseline models. We use a "better" default (proportional guessing) for the dummy classifier than the Python implementation ("prior", which is the most frequent class in the training set). The functions in the package can be used on their own, or introduce methods named dummy_regressor or dummy_classifier that can be used within the caret package pipeline.
It uses the first-order sensitivity index to measure whether the weights assigned by the creator of the composite indicator match the actual importance of the variables. Moreover, the variance inflation factor is used to reduce the set of correlated variables. In the case of a discrepancy between the importance and the assigned weight, the script determines weights that allow adjustment of the weights to the intended impact of variables. If the optimised weights are unable to reflect the desired importance, the highly correlated variables are reduced, taking into account variance inflation factor. The final outcome of the script is the calculated value of the composite indicator based on optimal weights and a reduced set of variables, and the linear ordering of the analysed objects.
This package provides a workflow for correction of Differential Interferometric Synthetic Aperture Radar (DInSAR) atmospheric delay base on Generic Atmospheric Correction Online Service for InSAR (GACOS) data and correction algorithms proposed by Chen Yu. This package calculate the Both Zenith and LOS direction (User Depend). You have to just download GACOS product on your area and preprocessed D-InSAR unwrapped images. Cite those references and this package in your work, when using this framework. References: Yu, C., N. T. Penna, and Z. Li (2017) <doi:10.1016/j.rse.2017.10.038>. Yu, C., Li, Z., & Penna, N. T. (2017) <doi:10.1016/j.rse.2017.10.038>. Yu, C., Penna, N. T., and Li, Z. (2017) <doi:10.1002/2016JD025753>.
Linear regression model and generalized linear models with nonparametric network effects on network-linked observations. The model is originally proposed by Le and Li (2022) <doi:10.48550/arXiv.2007.00803> and is assumed on observations that are connected by a network or similar relational data structure. A more recent work by Wang, Le and Li (2024) <doi:10.48550/arXiv.2410.01163> further extends the framework to generalized linear models. All these models are implemented in the current package. The model does not assume that the relational data or network structure to be precisely observed; thus, the method is provably robust to a certain level of perturbation of the network structure. The package contains the estimation and inference function for the model.