This package provides a basic implementation of the change in mean detection method outlined in: Taylor, Wayne A. (2000) <https://variation.com/wp-content/uploads/change-point-analyzer/change-point-analysis-a-powerful-new-tool-for-detecting-changes.pdf>. The package recursively uses the mean-squared error change point calculation to identify candidate change points. The candidate change points are then re-estimated and Taylor's backwards elimination process is then employed to come up with a final set of change points. Many of the underlying functions are written in C++ for improved performance.
Implementation of selected Tidyverse functions within DataSHIELD', an open-source federated analysis solution in R. Currently, DataSHIELD contains very limited tools for data manipulation, so the aim of this package is to improve the researcher experience by implementing essential functions for data manipulation, including subsetting, filtering, grouping, and renaming variables. This is the clientside package which should be installed locally, and is used in conjuncture with the serverside package dsTidyverse which is installed on the remote server holding the data. For more information, see <https://tidyverse.org/> and <https://datashield.org/>.
Quantifying similarity between high-dimensional single cell samples is challenging, and usually requires some simplifying hypothesis to be made. By transforming the high dimensional space into a high dimensional grid, the number of cells in each sub-space of the grid is characteristic of a given sample. Using a Hilbert curve each sample can be visualized as a simple density plot, and the distance between samples can be calculated from the distribution of cells using the Jensen-Shannon distance. Bins that correspond to significant differences between samples can identified using a simple bootstrap procedure.
Computes marginal likelihood in Gaussian graphical models through a novel telescoping block decomposition of the precision matrix which allows estimation of model evidence. The top level function used to estimate marginal likelihood is called evidence(), which expects the prior name, data, and relevant prior specific parameters. This package also provides an MCMC prior sampler using the same underlying approach, implemented in prior_sampling(), which expects a prior name and prior specific parameters. Both functions also expect the number of burn-in iterations and the number of sampling iterations for the underlying MCMC sampler.
This package provides the ability to perform "Marginal Mediation"--mediation wherein the indirect and direct effects are in terms of the average marginal effects (Bartus, 2005, <https://EconPapers.repec.org/RePEc:tsj:stataj:v:5:y:2005:i:3:p:309-329>). The style of the average marginal effects stems from Thomas Leeper's work on the "margins" package. This framework allows the use of categorical mediators and outcomes with little change in interpretation from the continuous mediators/outcomes. See <doi:10.13140/RG.2.2.18465.92001> for more details on the method.
The expression levels of approximately 4600 cellular RNA transcripts were assessed in CD4+ T cell lines at different times after infection with HIV-1BRU using DNA microarrays. This data corresponds to the first block of a 12 block array image (001030_08_1.GEL) in the first data set (2000095918 A) in the first experiment (CEM LAI vs HI-LAI 24hr). There are two data sets, which are part of a dye-swap experiment with replicates, representing the Cy3 (green) absorption intensities for channel 1 (hiv1raw) and the Cy5 (red) absorption intensities for channel 2 (hiv2raw).
VariantExperiment is a Bioconductor package for saving data in VCF/GDS format into RangedSummarizedExperiment object. The high-throughput genetic/genomic data are saved in GDSArray objects. The annotation data for features/samples are saved in DelayedDataFrame format with mono-dimensional GDSArray in each column. The on-disk representation of both assay data and annotation data achieves on-disk reading and processing and saves memory space significantly. The interface of RangedSummarizedExperiment data format enables easy and common manipulations for high-throughput genetic/genomic data with common SummarizedExperiment metaphor in R and Bioconductor.
Likelihood-based inference methods with doubly-truncated data are developed under various models. Nonparametric models are based on Efron and Petrosian (1999) <doi:10.1080/01621459.1999.10474187> and Emura, Konno, and Michimae (2015) <doi:10.1007/s10985-014-9297-5>. Parametric models from the special exponential family (SEF) are based on Hu and Emura (2015) <doi:10.1007/s00180-015-0564-z> and Emura, Hu and Konno (2017) <doi:10.1007/s00362-015-0730-y>. The parametric location-scale models are based on Dorre et al. (2021) <doi:10.1007/s00180-020-01027-6>.
High-performance implementation of 36 optimal binning algorithms (16 categorical, 20 numerical) for Weight of Evidence ('WoE') transformation, credit scoring, and risk modeling. Includes advanced methods such as Mixed Integer Linear Programming ('MILP'), Genetic Algorithms, Simulated Annealing, and Monotonic Regression. Features automatic method selection based on Information Value ('IV') maximization, strict monotonicity enforcement, and efficient handling of large datasets via Rcpp'. Fully integrated with the tidymodels ecosystem for building robust machine learning pipelines. Based on methods described in Siddiqi (2006) <doi:10.1002/9781119201731> and Navas-Palencia (2020) <doi:10.48550/arXiv.2001.08025>.
This package contains fast functions to calculate the exact Bayes posterior for the Sparse Normal Sequence Model, implementing the algorithms described in Van Erven and Szabo (2021, <doi:10.1214/20-BA1227>). For general hierarchical priors, sample sizes up to 10,000 are feasible within half an hour on a standard laptop. For beta-binomial spike-and-slab priors, a faster algorithm is provided, which can handle sample sizes of 100,000 in half an hour. In the implementation, special care has been taken to assure numerical stability of the methods even for such large sample sizes.
Derivation of indexes for benchmarking purposes. A methodology with flexible number of constituents is implemented. Also functions for market capitalization and volume weighted indexes with fixed number of constituents are available. The main function of the package, indexComp(), provides the derived index, suitable for analysis purposes. The functions indexUpdate(), indexMemberSelection() and indexMembersUpdate() are components of indexComp() and enable one to construct and continuously update an index, e.g. for display on a website. The methodology behind the functions provided gets introduced in Trimborn and Haerdle (2018) <doi:10.1016/j.jempfin.2018.08.004>.
This package provides a classification (decision) tree is constructed from survival data with high-dimensional covariates. The method is a robust version of the logrank tree, where the variance is stabilized. The main function "uni.tree" returns a classification tree for a given survival dataset. The inner nodes (splitting criterion) are selected by minimizing the P-value of the two-sample the score tests. The decision of declaring terminal nodes (stopping criterion) is the P-value threshold given by an argument (specified by user). This tree construction algorithm is proposed by Emura et al. (2021, in review).
Allows to generate automatically testthat code files from offensive programming test cases. Generated test files are complete and ready to run. Using wyz.code.testthat you will earn a lot of time, reduce the number of errors in test case production, be able to test immediately generated files without any need to view or modify them, and enter a zero time latency between code implementation and industrial testing. As with testthat', you may complete provided test cases according to your needs to push testing further, but this need is nearly void when using wyz.code.offensiveProgramming'.
Providing a common set of simplified web scraping tools for working with the NHS Data Dictionary <https://datadictionary.nhs.uk/data_elements_overview.html>. The intended usage is to access the data elements section of the NHS Data Dictionary to access key lookups. The benefits of having it in this package are that the lookups are the live lookups on the website and will not need to be maintained. This package was commissioned by the NHS-R community <https://nhsrcommunity.com/> to provide this consistency of lookups. The OpenSafely lookups have now been added <https://www.opencodelists.org/docs/>.
Allows users to create time series of tropical storm exposure histories for chosen counties for a number of hazard metrics (wind, rain, distance from the storm, etc.). This package interacts with data available through the hurricaneexposuredata package, which is available in a drat repository. To access this data package, see the instructions at <https://github.com/geanders/hurricaneexposure>. The size of the hurricaneexposuredata package is approximately 20 MB. This work was supported in part by grants from the National Institute of Environmental Health Sciences (R00ES022631), the National Science Foundation (1331399), and a NASA Applied Sciences Program/Public Health Program Grant (NNX09AV81G).
Missing data imputation based on the missForest algorithm (Stekhoven, Daniel J (2012) <doi:10.1093/bioinformatics/btr597>) with adaptations for prediction settings. The function missForest() is used to impute a (training) dataset with missing values and to learn imputation models that can be later used for imputing new observations. The function missForestPredict() is used to impute one or multiple new observations (test set) using the models learned on the training data. For more details see Albu, E., Gao, S., Wynants, L., & Van Calster, B. (2024). missForestPredict--Missing data imputation for prediction settings <doi:10.48550/arXiv.2407.03379>.
This package provides an extensive and curated collection of datasets related to the digestive system, stomach, intestines, liver, pancreas, and associated diseases. This package includes clinical trials, observational studies, experimental datasets, cohort data, and case series involving gastrointestinal disorders such as gastritis, ulcers, pancreatitis, liver cirrhosis, colon cancer, colorectal conditions, Helicobacter pylori infection, irritable bowel syndrome, intestinal infections, and post-surgical outcomes. The datasets support educational, clinical, and research applications in gastroenterology, public health, epidemiology, and biomedical sciences. Designed for researchers, clinicians, data scientists, students, and educators interested in digestive diseases, the package facilitates reproducible analysis, modeling, and hypothesis testing using real-world and historical data.
The package provides a comprehensive mapping table of metabolites linked to Wikipathways pathways. The tables include HMDB, KEGG, ChEBI, Drugbank, PubChem compound, ChemSpider, KNApSAcK, and Wikidata IDs plus CAS and InChIKey. The tables are provided for each of the 25 species ("Anopheles gambiae", "Arabidopsis thaliana", "Bacillus subtilis", "Bos taurus", "Caenorhabditis elegans", "Canis familiaris", "Danio rerio", "Drosophila melanogaster", "Equus caballus", "Escherichia coli", "Gallus gallus", "Gibberella zeae", "Homo sapiens", "Hordeum vulgare", "Mus musculus", "Mycobacterium tuberculosis", "Oryza sativa", "Pan troglodytes", "Plasmodium falciparum", "Populus trichocarpa", "Rattus norvegicus", "Saccharomyces cerevisiae", "Solanum lycopersicum", "Sus scrofa", "Zea mays"). These table information can be used for Metabolite Set Enrichment Analysis.
This package provides pedagogical tools for visualization and numerical computation in vector calculus. Includes functions for parametric curves, scalar and vector fields, gradients, divergences, curls, line and surface integrals, and dynamic 2D/3D graphical analysis to support teaching and learning. The implemented methods follow standard treatments in vector calculus and multivariable analysis as presented in Marsden and Tromba (2011) <ISBN:9781429215084>, Stewart (2015) <ISBN:9781285741550>, Thomas, Weir and Hass (2018) <ISBN:9780134438986>, Larson and Edwards (2016) <ISBN:9781285255869>, Apostol (1969) <ISBN:9780471000051>, Spivak (1971) <ISBN:9780805390216>, Schey (2005) <ISBN:9780071369080>, Colley (2019) <ISBN:9780321982384>, Lizarazo Osorio (2020) <ISBN:9789585450103>, Sievert (2020) <ISBN:9780367180165>, and Borowko (2013) <ISBN:9781439870791>.
This package provides a metric expressing the quality of a UMAP layout. This is a package that contains the Saturn_coefficient() function that reads an input matrix, its dimensionality reduction produced by UMAP, and evaluates the quality of this dimensionality reduction by producing a real value in the [0; 1] interval. We call this real value Saturn coefficient. A higher value means better dimensionality reduction; a lower value means worse dimensionality reduction. Reference: Davide Chicco et al. (February 2026), "The advantages of our proposed Saturn coefficient over continuity and trustworthiness for UMAP dimensionality reduction evaluation", PeerJ Computer Science 12:e3424 (pp. 1-30), <doi:10.7717/peerj-cs.3424>.
Automated flagging of common spatial and temporal errors in biological and paleontological collection data, for the use in conservation, ecology and paleontology. Includes automated tests to easily flag (and exclude) records assigned to country or province centroid, the open ocean, the headquarters of the Global Biodiversity Information Facility, urban areas or the location of biodiversity institutions (museums, zoos, botanical gardens, universities). Furthermore identifies per species outlier coordinates, zero coordinates, identical latitude/longitude and invalid coordinates. Also implements an algorithm to identify data sets with a significant proportion of rounded coordinates. Especially suited for large data sets. The reference for the methodology is: Zizka et al. (2019) <doi:10.1111/2041-210X.13152>.
Calculates insulin secretion rates from C-peptide values based on the methods described in Van Cauter et al. (1992) <doi:10.2337/diab.41.3.368>. Includes functions to calculate estimated insulin secretion rates using linear or cubic spline interpolation of c-peptide values (see Eaton et al., 1980 <doi:10.1210/jcem-51-3-520> and Polonsky et al., 1986 <doi:10.1172/JCI112308>) and to calculate estimates of input coefficients (volume of distribution, short half life, long half life, and fraction attributed to short half life) as described by Van Cauter. Although the generated coefficients are specific to insulin secretion, the two-compartment secretion model used here is useful for certain applications beyond insulin.
Dominance analysis is a method that allows to compare the relative importance of predictors in multiple regression models: ordinary least squares, generalized linear models, hierarchical linear models, beta regression and dynamic linear models. The main principles and methods of dominance analysis are described in Budescu, D. V. (1993) <doi:10.1037/0033-2909.114.3.542> and Azen, R., & Budescu, D. V. (2003) <doi:10.1037/1082-989X.8.2.129> for ordinary least squares regression. Subsequently, the extensions for multivariate regression, logistic regression and hierarchical linear models were described in Azen, R., & Budescu, D. V. (2006) <doi:10.3102/10769986031002157>, Azen, R., & Traxel, N. (2009) <doi:10.3102/1076998609332754> and Luo, W., & Azen, R. (2013) <doi:10.3102/1076998612458319>, respectively.
Estimation and testing methods for dependently truncated data. Semi-parametric methods are based on Emura et al. (2011)<Stat Sinica 21:349-67>, Emura & Wang (2012)<doi:10.1016/j.jmva.2012.03.012>, and Emura & Murotani (2015)<doi:10.1007/s11749-015-0432-8>. Parametric approaches are based on Emura & Konno (2012)<doi:10.1007/s00362-014-0626-2> and Emura & Pan (2017)<doi:10.1007/s00362-017-0947-z>. A regression approach is based on Emura & Wang (2016)<doi:10.1007/s10463-015-0526-9>. Quasi-independence tests are based on Emura & Wang (2010)<doi:10.1016/j.jmva.2009.07.006>. Right-truncated data for Japanese male centenarians are given by Emura & Murotani (2015)<doi:10.1007/s11749-015-0432-8>.