This package provides a user-friendly shiny application to explore statistical associations and visual patterns in multivariate datasets. The app provides interactive correlation networks, bivariate plots, and summary tables for different types of variables (numeric and categorical). It also supports optional survey weights and range-based filters on association strengths, making it suitable for the exploration of survey and public data by non-technical users, journalists, educators, and researchers. For background and methodological details, see Soetewey et al. (2025) <https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5637359>.
Functionality for segmenting individual trees from a forest stand scanned with a close-range (e.g., terrestrial or mobile) laser scanner. The complete workflow from a raw point cloud to a complete tabular forest inventory is provided. The package contains several algorithms for detecting tree bases and a graph-based algorithm to attach all remaining points to these tree bases. It builds heavily on the lidR package. A description of the segmentation algorithm can be found in Larysch et al. (2025) <doi:10.1007/s10342-025-01796-z>.
Tests for a comparison of two partially overlapping samples. A comparison of means using the partially overlapping samples t-test: See Derrick, Russ, Toher and White (2017), Test statistics for the comparison of means for two samples which include both paired observations and independent observations, Journal of Modern Applied Statistical Methods, 16(1). A comparison of proportions using the partially overlapping samples z-test: See Derrick, Dobson-Mckittrick, Toher and White (2015), Test statistics for comparing two proportions with partially overlapping samples. Journal of Applied Quantitative Methods, 10(3).
This package contains an implementation of StabilizedRegression', a regression framework for heterogeneous data introduced in Pfister et al. (2021) <arXiv:1911.01850>. The procedure uses averaging to estimate a regression of a set of predictors X on a response variable Y by enforcing stability with respect to a given environment variable. The resulting regression leads to a variable selection procedure which allows to distinguish between stable and unstable predictors. The package further implements a visualization technique which illustrates the trade-off between stability and predictiveness of individual predictors.
If you have a set of genomic ranges, this package can help you with visualization and comparison. It produces several kinds of plots, for example: Chromosome distribution plots, which visualize how your regions are distributed over chromosomes; feature distance distribution plots, which visualizes how your regions are distributed relative to a feature of interest, like Transcription Start Sites (TSSs); genomic partition plots, which visualize how your regions overlap given genomic features such as promoters, introns, exons, or intergenic regions. It also makes it easy to compare one set of ranges to another.
Providing the kubernetes-like class ManagedCloudProvider as a child class of the CloudProvider class in the DockerParallel package. The class is able to manage the cloud instance made by the non-kubernetes cloud service. For creating a provider for the non-kubernetes cloud service, the developer needs to define a reference class inherited from ManagedCloudProvider and define the method for the generics runDockerWorkerContainers(), getDockerWorkerStatus() and killDockerWorkerContainers(). For more information, please see the vignette in this package and <https://CRAN.R-project.org/package=DockerParallel>.
transomics2cytoscape generates a file for 3D transomics visualization by providing input that specifies the IDs of multiple KEGG pathway layers, their corresponding Z-axis heights, and an input that represents the edges between the pathway layers. The edges are used, for example, to describe the relationships between kinase on a pathway and enzyme on another pathway. This package automates creation of a transomics network as shown in the figure in Yugi.2014 (https://doi.org/10.1016/j.celrep.2014.07.021) using Cytoscape automation (https://doi.org/10.1186/s13059-019-1758-4).
The reliability of assessment tools is a crucial aspect of monitoring student performance in various educational settings. It ensures that the assessment outcomes accurately reflect a student's true level of performance. However, when assessments are combined, determining composite reliability can be challenging, especially for naturalistic and unbalanced datasets. This package provides an easy-to-use solution for calculating composite reliability for different assessment types. It allows for the inclusion of weight per assessment type and produces extensive G- and D-study results with graphical interpretations. Overall, our approach enhances the reliability of composite assessments, making it suitable for various education contexts.
The log-rank test is performed to assess the survival outcomes between two group. When there is no proper control group or obtaining such data is cumbersome, one sample log-rank test can be applied. This package performs one sample log-rank test as described in Finkelstein et al. (2003)<doi:10.1093/jnci/djt227> and variation of the test for small sample sizes which is detailed in FD Liddell (1984)<doi:10.1136/jech.38.1.85> paper. Visualization function in the package generates Kaplan-Meier Curve comparing survival curve of the general population against that of the population of interest.
Routines for performing empirical calibration of observational study estimates. By using a set of negative control hypotheses we can estimate the empirical null distribution of a particular observational study setup. This empirical null distribution can be used to compute a calibrated p-value, which reflects the probability of observing an estimated effect size when the null hypothesis is true taking both random and systematic error into account. A similar approach can be used to calibrate confidence intervals, using both negative and positive controls. For more details, see Schuemie et al. (2013) <doi:10.1002/sim.5925> and Schuemie et al. (2018) <doi:10.1073/pnas.1708282114>.
The Food and Agriculture Organization-56 Penman-Monteith is one of the important method for estimating evapotranspiration from vegetated land areas. This package helps to calculate reference evapotranspiration using the weather variables collected from weather station. Evapotranspiration is the process of water transfer from the land surface to the atmosphere through evaporation from soil and other surfaces and transpiration from plants. The package aims to support agricultural, hydrological, and environmental research by offering accurate and accessible reference evapotranspiration calculation. This package has been developed using concept of Córdova et al. (2015)<doi:10.1016/j.apm.2022.09.004> and Debnath et al. (2015) <doi:10.1007/s40710-015-0107-1>.
This package implements the Age Band Decomposition (ABD) method for standardizing tree ring width data while preserving both low and high frequency variability. Unlike traditional detrending approaches that can distort long term growth trends, ABD decomposes ring width series into multiple age classes, detrends each class separately, and then recombines them to create standardized chronologies. This approach improves the detection of growth signals linked to past climatic and environmental factors, making it particularly valuable for dendroecological and dendroclimatological studies. The package provides functions to perform ABD-based standardization, compare results with other common methods (e.g., BAI, C method, RCS), and facilitate the interpretation of growth patterns under current and future climate variability.
This package provides a study based on the screened selection design (SSD) is an exploratory phase II randomized trial with two or more arms but without concurrent control. The primary aim of the SSD trial is to pick a desirable treatment arm (e.g., in terms of the response rate) to recommend to the subsequent randomized phase IIb (with the concurrent control) or phase III. The proposed designs can â partiallyâ control or provide the empirical type I error/false positive rate by an optimal algorithm (implemented by the optimal_2arm_binary() or optimal_3arm_binary() function) for each arm. All the design needed components (sample size, operating characteristics) are supported.
We focus on the diagnostic ability assessment of medical tests when the outcome of interest is the status (alive or dead) of the subjects at a certain time-point t. This binary status is determined by right-censored times to event and it is unknown for those subjects censored before t. Here we provide three methods (unknown status exclusion, imputation of censored times and using time-dependent ROC curves) to evaluate the diagnostic ability of binary and continuous tests in this context. Two references for the methods used here are Skaltsa et al. (2010) <doi:10.1002/bimj.200900294> and Heagerty et al. (2000) <doi:10.1111/j.0006-341x.2000.00337.x>.
Most estimators implemented by the video game industry cannot obtain reliable initial estimates nor guarantee comparability between distant estimates. TrueSkill Through Time solves all these problems by modeling the entire history of activities using a single Bayesian network allowing the information to propagate correctly throughout the system. This algorithm requires only a few iterations to converge, allowing millions of observations to be analyzed using any low-end computer. Landfried G, Mocskos E (2025). "TrueSkill Through Time: Reliable Initial Skill Estimates and Historical Comparability with Julia, Python, and R." <doi:10.18637/jss.v112.i06>. The core ideas implemented in this project were developed by Dangauthier P, Herbrich R, Minka T, Graepel T (2007). "Trueskill through time: Revisiting the history of chess.".
This package provides functions to load and analyze three open Electronic Health Records (EHRs) datasets of patients diagnosed with glioblastoma, previously released under the Creative Common Attribution 4.0 International (CC BY 4.0) license. Users can generate basic descriptive statistics, frequency tables and save descriptive summary tables, as well as create and export univariate or bivariate plots. The package is designed to work with the included datasets and to facilitate quick exploratory data analysis and reporting. More information about these three datasets of EHRs of patients with glioblastoma can be found in this article: Gabriel Cerono, Ombretta Melaiu, and Davide Chicco, Clinical feature ranking based on ensemble machine learning reveals top survival factors for glioblastoma multiforme', Journal of Healthcare Informatics Research 8, 1-18 (March 2024). <doi:10.1007/s41666-023-00138-1>.
Meta testing is the ability to test a function without having to provide its parameter values. Those values will be generated, based on semantic naming of parameters, as introduced by package wyz.code.offensiveProgramming'. Value generation logic can be completed with your own data types and generation schemes. This to meet your most specific requirements and to answer to a wide variety of usages, from general use case to very specific ones. While using meta testing, it becomes easier to generate stress test campaigns, non-regression test campaigns and robustness test campaigns, as generated tests can be saved and reused from session to session. Main benefits of using wyz.code.metaTesting is ability to discover valid and invalid function parameter combinations, ability to infer valid parameter values, and to provide smart summaries that allows you to focus on dysfunctional cases.
It helps in determination of sample size for estimating population mean or proportion under simple random sampling with or without replacement and stratified random sampling without replacement. When prior information on the population coefficient of variation (CV) is unavailable, then a preliminary sample is drawn to estimate the CV which is used to compute the final sample size. If the final size exceeds the preliminary sample size, then additional units are drawn; otherwise, the preliminary sample size is considered as final sample size. For stratified random sampling without replacement design, it also calculates the sample size in each stratum under different allocation methods for estimation of population mean and proportion based upon the availability of prior information on sizes of the strata, standard deviations of the strata and costs of drawing a sampling unit in the strata.For details on sampling methodology, see, Cochran (1977) "Sampling Techniques" <https://archive.org/details/samplingtechniqu0000coch_t4x6>.
This package provides functions for estimating the gliding box lacunarity (GBL), covariance, and pair-correlation of a random closed set (RACS) in 2D from a binary coverage map (e.g. presence-absence land cover maps). Contains a number of newly-developed covariance-based estimators of GBL (Hingee et al., 2019) <doi:10.1007/s13253-019-00351-9> and balanced estimators, proposed by Picka (2000) <http://www.jstor.org/stable/1428408>, for covariance, centred covariance, and pair-correlation. Also contains methods for estimating contagion-like properties of RACS and simulating 2D Boolean models. Binary coverage maps are usually represented as raster images with pixel values of TRUE, FALSE or NA, with NA representing unobserved pixels. A demo for extracting such a binary map from a geospatial data format is provided. Binary maps may also be represented using polygonal sets as the foreground, however for most computations such maps are converted into raster images. The package is based on research conducted during the author's PhD studies.
The proximate composition analysis is the quantification of main components that constitutes nutritional profile of any food and food products including fish, shellfish, fish feed and their ingredients. Understanding this composition is essential for evaluating their nutritional value and for making informed dietary choices. The primary components typically analyzed include; moisture/ water in foods, crude protein, crude fat/ lipid, total ash, fiber and carbohydrates AOAC(2005,ISBN:0-935584-77-3). In case of fish, shellfish and its products, the proximate composition consists of four primary constituents - water, protein, fat, and ash (mostly minerals). Fish exhibit significant variation in their chemical makeup based on age, sex, environment, and season, both within the same species and between individual fish. There is minimal fluctuation in the content of ash and protein. The lipid concentration varies remarkably and is inversely correlated with the water content. In case of fish, carbohydrates are present in minor quantity so that are quantified by subtracting total of other components from 100 to get percentage of carbohydrates.
It is often challenging to strongly control the family-wise type-1 error rate in the group-sequential trials with multiple endpoints (hypotheses). The inflation of type-1 error rate comes from two sources (S1) repeated testing individual hypothesis and (S2) simultaneous testing multiple hypotheses. The MultiGroupSequential package is intended to help researchers to tackle this challenge. The procedures provided include the sequential procedures described in Luo and Quan (2023) <doi:10.1080/19466315.2023.2191989> and the graphical procedure proposed by Maurer and Bretz (2013) <doi:10.1080/19466315.2013.807748>. Luo and Quan (2013) describes three procedures, and the functions to implement these procedures are (1) seqgspgx() implements a sequential graphical procedure based on the group-sequential p-values; (2) seqgsphh() implements a sequential Hochberg/Hommel procedure based on the group-sequential p-values; and (3) seqqvalhh() implements a sequential Hochberg/Hommel procedure based on the q-values. In addition, seqmbgx() implements the sequential graphical procedure described in Maurer and Bretz (2013).
The CellScore Standard Dataset contains expression data from a wide variety of human cells and tissues, which should be used as standard cell types in the calculation of the CellScore. All data was curated from public databases such as Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/) or ArrayExpress (https://www.ebi.ac.uk/arrayexpress/). This standard dataset only contains data from the Affymetrix GeneChip Human Genome U133 Plus 2.0 microarrays. Samples were manually annotated using the database information or consulting the publications in which the datasets originated. The sample annotations are stored in the phenoData slot of the expressionSet object. Raw data (CEL files) were processed with the affy package to generate present/absent calls (mas5calls) and background-subtracted values, which were then normalized by the R-package yugene to yield the final expression values for the standard expression matrix. The annotation table for the microarray was retrieved from the BioC annotation package hgu133plus2. All data are stored in an expressionSet object.
This package provides a user-friendly toolbox for doing the statistical analysis of interval-valued responses in questionnaires measuring intrinsically imprecise human attributes or features (attitudes, perceptions, opinions, feelings, etc.). In particular, this package provides S4 classes, methods, and functions in order to compute basic arithmetic and statistical operations with interval-valued data; prepare customized plots; associate each interval-valued response to its equivalent Likert-type and visual analogue scales answers through the minimum theta-distance and the mid-point criteria; analyze the reliability of respondents answers from the internal consistency point of view by means of Cronbach's alpha coefficient; and simulate interval-valued responses in this type of questionnaires. The package also incorporates some real-life data that can be used to illustrate its working with several non-trivial reproducible examples. The methodology used in this package is based in many theoretical and applied publications from SMIRE+CoDiRE (Statistical Methods with Imprecise Random Elements and Comparison of Distributions of Random Elements) Research Group (<https://bellman.ciencias.uniovi.es/smire+codire/>) from the University of Oviedo (Spain).
This package provides a suite of tools are provided here to support authors in making their research more discoverable. check_keywords() - this function checks the keywords to assess whether they are already represented in the title and abstract. check_fields() - this function compares terminology used across the title, abstract and keywords to assess where terminological diversity (i.e. the use of synonyms) could increase the likelihood of the record being identified in a search. The function looks for terms in the title and abstract that also exist in other fields and highlights these as needing attention. suggest_keywords() - this function takes a full text document and produces a list of unigrams, bigrams and trigrams (1-, 2- or 2-word phrases) present in the full text after removing stop words (words with a low utility in natural language processing) that do not occur in the title or abstract that may be suitable candidates for keywords. suggest_title() - this function takes a full text document and produces a list of the most frequently used unigrams, bigrams and trigrams after removing stop words that do not occur in the abstract or keywords that may be suitable candidates for title words. check_title() - this function carries out a number of sub tasks: 1) it compares the length (number of words) of the title with the mean length of titles in major bibliographic databases to assess whether the title is likely to be too short; 2) it assesses the proportion of stop words in the title to highlight titles with low utility in search engines that strip out stop words; 3) it compares the title with a given sample of record titles from an .ris import and calculates a similarity score based on phrase overlap. This highlights the level of uniqueness of the title. This version of the package also contains functions currently in a non-CRAN package called litsearchr <https://github.com/elizagrames/litsearchr>.