This package provides estimates of several summary measures for clinical trials including the average hazard ratio, the weighted average hazard ratio, the restricted superiority probability ratio, the restricted mean survival difference and the ratio of restricted mean times lost, based on the short-term and long-term hazard ratio model (Yang, 2005 <doi:10.1093/biomet/92.1.1>) which accommodates various non-proportional hazards scenarios. The inference procedures and the asymptotic results for the summary measures are discussed in Yang (2018, <doi:10.1002/sim.7676>).
This package provides methods for analyzing the dispersion of tabular datasets with batched and ordered samples. Based on convex hull or integrated covariance Mahalanobis, several indicators are implemented for inter and intra batch dispersion analysis. It is designed to facilitate robust statistical assessment of data variability, supporting applications in exploratory data analysis and quality control, for such datasets as the one found in metabololomics studies. For more details see Salanon (2024) <doi:10.1016/j.chemolab.2024.105148> and Salanon (2025) <doi:10.1101/2025.08.01.668073>.
Runs multiple individual time series models, and combines them into an ensembles of time series models. This is mainly used to predict the results of the monthly labor market report from the United States Bureau of Labor Statistics for virtually any part of the economy reported by the Bureau of Labor Statistics, but it can be easily modified to work with other types of time series data. For example, the package was used to predict the winning men's and women's time for the 2024 London Marathon.
Tests for a comparison of two partially overlapping samples. A comparison of means using the partially overlapping samples t-test: See Derrick, Russ, Toher and White (2017), Test statistics for the comparison of means for two samples which include both paired observations and independent observations, Journal of Modern Applied Statistical Methods, 16(1). A comparison of proportions using the partially overlapping samples z-test: See Derrick, Dobson-Mckittrick, Toher and White (2015), Test statistics for comparing two proportions with partially overlapping samples. Journal of Applied Quantitative Methods, 10(3).
This package contains an implementation of StabilizedRegression', a regression framework for heterogeneous data introduced in Pfister et al. (2021) <arXiv:1911.01850>. The procedure uses averaging to estimate a regression of a set of predictors X on a response variable Y by enforcing stability with respect to a given environment variable. The resulting regression leads to a variable selection procedure which allows to distinguish between stable and unstable predictors. The package further implements a visualization technique which illustrates the trade-off between stability and predictiveness of individual predictors.
Providing the kubernetes-like class ManagedCloudProvider as a child class of the CloudProvider class in the DockerParallel package. The class is able to manage the cloud instance made by the non-kubernetes cloud service. For creating a provider for the non-kubernetes cloud service, the developer needs to define a reference class inherited from ManagedCloudProvider and define the method for the generics runDockerWorkerContainers(), getDockerWorkerStatus() and killDockerWorkerContainers(). For more information, please see the vignette in this package and <https://CRAN.R-project.org/package=DockerParallel>.
If you have a set of genomic ranges, this package can help you with visualization and comparison. It produces several kinds of plots, for example: Chromosome distribution plots, which visualize how your regions are distributed over chromosomes; feature distance distribution plots, which visualizes how your regions are distributed relative to a feature of interest, like Transcription Start Sites (TSSs); genomic partition plots, which visualize how your regions overlap given genomic features such as promoters, introns, exons, or intergenic regions. It also makes it easy to compare one set of ranges to another.
transomics2cytoscape generates a file for 3D transomics visualization by providing input that specifies the IDs of multiple KEGG pathway layers, their corresponding Z-axis heights, and an input that represents the edges between the pathway layers. The edges are used, for example, to describe the relationships between kinase on a pathway and enzyme on another pathway. This package automates creation of a transomics network as shown in the figure in Yugi.2014 (https://doi.org/10.1016/j.celrep.2014.07.021) using Cytoscape automation (https://doi.org/10.1186/s13059-019-1758-4).
The reliability of assessment tools is a crucial aspect of monitoring student performance in various educational settings. It ensures that the assessment outcomes accurately reflect a student's true level of performance. However, when assessments are combined, determining composite reliability can be challenging, especially for naturalistic and unbalanced datasets. This package provides an easy-to-use solution for calculating composite reliability for different assessment types. It allows for the inclusion of weight per assessment type and produces extensive G- and D-study results with graphical interpretations. Overall, our approach enhances the reliability of composite assessments, making it suitable for various education contexts.
The log-rank test is performed to assess the survival outcomes between two group. When there is no proper control group or obtaining such data is cumbersome, one sample log-rank test can be applied. This package performs one sample log-rank test as described in Finkelstein et al. (2003)<doi:10.1093/jnci/djt227> and variation of the test for small sample sizes which is detailed in FD Liddell (1984)<doi:10.1136/jech.38.1.85> paper. Visualization function in the package generates Kaplan-Meier Curve comparing survival curve of the general population against that of the population of interest.
Routines for performing empirical calibration of observational study estimates. By using a set of negative control hypotheses we can estimate the empirical null distribution of a particular observational study setup. This empirical null distribution can be used to compute a calibrated p-value, which reflects the probability of observing an estimated effect size when the null hypothesis is true taking both random and systematic error into account. A similar approach can be used to calibrate confidence intervals, using both negative and positive controls. For more details, see Schuemie et al. (2013) <doi:10.1002/sim.5925> and Schuemie et al. (2018) <doi:10.1073/pnas.1708282114>.
The Food and Agriculture Organization-56 Penman-Monteith is one of the important method for estimating evapotranspiration from vegetated land areas. This package helps to calculate reference evapotranspiration using the weather variables collected from weather station. Evapotranspiration is the process of water transfer from the land surface to the atmosphere through evaporation from soil and other surfaces and transpiration from plants. The package aims to support agricultural, hydrological, and environmental research by offering accurate and accessible reference evapotranspiration calculation. This package has been developed using concept of Córdova et al. (2015)<doi:10.1016/j.apm.2022.09.004> and Debnath et al. (2015) <doi:10.1007/s40710-015-0107-1>.
This package implements the Age Band Decomposition (ABD) method for standardizing tree ring width data while preserving both low and high frequency variability. Unlike traditional detrending approaches that can distort long term growth trends, ABD decomposes ring width series into multiple age classes, detrends each class separately, and then recombines them to create standardized chronologies. This approach improves the detection of growth signals linked to past climatic and environmental factors, making it particularly valuable for dendroecological and dendroclimatological studies. The package provides functions to perform ABD-based standardization, compare results with other common methods (e.g., BAI, C method, RCS), and facilitate the interpretation of growth patterns under current and future climate variability.
This package provides a study based on the screened selection design (SSD) is an exploratory phase II randomized trial with two or more arms but without concurrent control. The primary aim of the SSD trial is to pick a desirable treatment arm (e.g., in terms of the response rate) to recommend to the subsequent randomized phase IIb (with the concurrent control) or phase III. The proposed designs can â partiallyâ control or provide the empirical type I error/false positive rate by an optimal algorithm (implemented by the optimal_2arm_binary() or optimal_3arm_binary() function) for each arm. All the design needed components (sample size, operating characteristics) are supported.
We focus on the diagnostic ability assessment of medical tests when the outcome of interest is the status (alive or dead) of the subjects at a certain time-point t. This binary status is determined by right-censored times to event and it is unknown for those subjects censored before t. Here we provide three methods (unknown status exclusion, imputation of censored times and using time-dependent ROC curves) to evaluate the diagnostic ability of binary and continuous tests in this context. Two references for the methods used here are Skaltsa et al. (2010) <doi:10.1002/bimj.200900294> and Heagerty et al. (2000) <doi:10.1111/j.0006-341x.2000.00337.x>.
Most estimators implemented by the video game industry cannot obtain reliable initial estimates nor guarantee comparability between distant estimates. TrueSkill Through Time solves all these problems by modeling the entire history of activities using a single Bayesian network allowing the information to propagate correctly throughout the system. This algorithm requires only a few iterations to converge, allowing millions of observations to be analyzed using any low-end computer. Landfried G, Mocskos E (2025). "TrueSkill Through Time: Reliable Initial Skill Estimates and Historical Comparability with Julia, Python, and R." <doi:10.18637/jss.v112.i06>. The core ideas implemented in this project were developed by Dangauthier P, Herbrich R, Minka T, Graepel T (2007). "Trueskill through time: Revisiting the history of chess.".
This package provides functions to load and analyze three open Electronic Health Records (EHRs) datasets of patients diagnosed with glioblastoma, previously released under the Creative Common Attribution 4.0 International (CC BY 4.0) license. Users can generate basic descriptive statistics, frequency tables and save descriptive summary tables, as well as create and export univariate or bivariate plots. The package is designed to work with the included datasets and to facilitate quick exploratory data analysis and reporting. More information about these three datasets of EHRs of patients with glioblastoma can be found in this article: Gabriel Cerono, Ombretta Melaiu, and Davide Chicco, Clinical feature ranking based on ensemble machine learning reveals top survival factors for glioblastoma multiforme', Journal of Healthcare Informatics Research 8, 1-18 (March 2024). <doi:10.1007/s41666-023-00138-1>.
Meta testing is the ability to test a function without having to provide its parameter values. Those values will be generated, based on semantic naming of parameters, as introduced by package wyz.code.offensiveProgramming'. Value generation logic can be completed with your own data types and generation schemes. This to meet your most specific requirements and to answer to a wide variety of usages, from general use case to very specific ones. While using meta testing, it becomes easier to generate stress test campaigns, non-regression test campaigns and robustness test campaigns, as generated tests can be saved and reused from session to session. Main benefits of using wyz.code.metaTesting is ability to discover valid and invalid function parameter combinations, ability to infer valid parameter values, and to provide smart summaries that allows you to focus on dysfunctional cases.
It helps in determination of sample size for estimating population mean or proportion under simple random sampling with or without replacement and stratified random sampling without replacement. When prior information on the population coefficient of variation (CV) is unavailable, then a preliminary sample is drawn to estimate the CV which is used to compute the final sample size. If the final size exceeds the preliminary sample size, then additional units are drawn; otherwise, the preliminary sample size is considered as final sample size. For stratified random sampling without replacement design, it also calculates the sample size in each stratum under different allocation methods for estimation of population mean and proportion based upon the availability of prior information on sizes of the strata, standard deviations of the strata and costs of drawing a sampling unit in the strata.For details on sampling methodology, see, Cochran (1977) "Sampling Techniques" <https://archive.org/details/samplingtechniqu0000coch_t4x6>.
This package provides functions for estimating the gliding box lacunarity (GBL), covariance, and pair-correlation of a random closed set (RACS) in 2D from a binary coverage map (e.g. presence-absence land cover maps). Contains a number of newly-developed covariance-based estimators of GBL (Hingee et al., 2019) <doi:10.1007/s13253-019-00351-9> and balanced estimators, proposed by Picka (2000) <http://www.jstor.org/stable/1428408>, for covariance, centred covariance, and pair-correlation. Also contains methods for estimating contagion-like properties of RACS and simulating 2D Boolean models. Binary coverage maps are usually represented as raster images with pixel values of TRUE, FALSE or NA, with NA representing unobserved pixels. A demo for extracting such a binary map from a geospatial data format is provided. Binary maps may also be represented using polygonal sets as the foreground, however for most computations such maps are converted into raster images. The package is based on research conducted during the author's PhD studies.
This package provides functions for generating and visualising networks for characterising the physical attributes and spatial organisations of habitat components (i.e. habitat physical configurations). The network generating algorithm first determines the X and Y coordinates of N nodes within a rectangle with a side length of L and an area of A. Then it computes the pair-wise Euclidean distance Dij between node i and j, and then a complete network with 1/Dij as link weights is constructed. Then, the algorithm removes links from the complete network with the probability as shown in the function ahn_prob(). Such link removals can make the network disconnected whereas a connected network is wanted. In such cases, the algorithm rewires one network component to its spatially nearest neighbouring component and repeat doing this until the network is connected again. Finally, it outputs an undirected network (weighted or unweighted, connected or disconnected). This package came with a manuscript on modelling the physical configurations of animal habitats using networks (in preparation).
The proximate composition analysis is the quantification of main components that constitutes nutritional profile of any food and food products including fish, shellfish, fish feed and their ingredients. Understanding this composition is essential for evaluating their nutritional value and for making informed dietary choices. The primary components typically analyzed include; moisture/ water in foods, crude protein, crude fat/ lipid, total ash, fiber and carbohydrates AOAC(2005,ISBN:0-935584-77-3). In case of fish, shellfish and its products, the proximate composition consists of four primary constituents - water, protein, fat, and ash (mostly minerals). Fish exhibit significant variation in their chemical makeup based on age, sex, environment, and season, both within the same species and between individual fish. There is minimal fluctuation in the content of ash and protein. The lipid concentration varies remarkably and is inversely correlated with the water content. In case of fish, carbohydrates are present in minor quantity so that are quantified by subtracting total of other components from 100 to get percentage of carbohydrates.
It is often challenging to strongly control the family-wise type-1 error rate in the group-sequential trials with multiple endpoints (hypotheses). The inflation of type-1 error rate comes from two sources (S1) repeated testing individual hypothesis and (S2) simultaneous testing multiple hypotheses. The MultiGroupSequential package is intended to help researchers to tackle this challenge. The procedures provided include the sequential procedures described in Luo and Quan (2023) <doi:10.1080/19466315.2023.2191989> and the graphical procedure proposed by Maurer and Bretz (2013) <doi:10.1080/19466315.2013.807748>. Luo and Quan (2013) describes three procedures, and the functions to implement these procedures are (1) seqgspgx() implements a sequential graphical procedure based on the group-sequential p-values; (2) seqgsphh() implements a sequential Hochberg/Hommel procedure based on the group-sequential p-values; and (3) seqqvalhh() implements a sequential Hochberg/Hommel procedure based on the q-values. In addition, seqmbgx() implements the sequential graphical procedure described in Maurer and Bretz (2013).
The CellScore Standard Dataset contains expression data from a wide variety of human cells and tissues, which should be used as standard cell types in the calculation of the CellScore. All data was curated from public databases such as Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/) or ArrayExpress (https://www.ebi.ac.uk/arrayexpress/). This standard dataset only contains data from the Affymetrix GeneChip Human Genome U133 Plus 2.0 microarrays. Samples were manually annotated using the database information or consulting the publications in which the datasets originated. The sample annotations are stored in the phenoData slot of the expressionSet object. Raw data (CEL files) were processed with the affy package to generate present/absent calls (mas5calls) and background-subtracted values, which were then normalized by the R-package yugene to yield the final expression values for the standard expression matrix. The annotation table for the microarray was retrieved from the BioC annotation package hgu133plus2. All data are stored in an expressionSet object.