The currentSurvival
package contains functions for the estimation of the current cumulative incidence (CCI) and the current leukaemia-free survival (CLFS). The CCI is the probability that a patient is alive and in any disease remission (e.g. complete cytogenetic remission in chronic myeloid leukaemia) after initiating his or her therapy (e.g. tyrosine kinase therapy for chronic myeloid leukaemia). The CLFS is the probability that a patient is alive and in any disease remission after achieving the first disease remission.
Test the marginal correlation between a scalar response variable with a vector of explanatory variables using the max-type test with bootstrap. The test is based on the max-type statistic and its asymptotic distribution under the null hypothesis of no marginal correlation. The bootstrap procedure is used to approximate the null distribution of the test statistic. The package provides a function for performing the test. For more technical details, refer to Zhang and Laber (2014) <doi:10.1080/01621459.2015.1106403>.
This package contains a collection of 9 datasets, andrews and bakulski cord blood, blood gse35069, blood gse35069 chen, blood gse35069 complete, combined cord blood, cord bloo d gse68456, gervin and lyle cord blood, guintivano dlpfc and saliva gse48472. The data are used to estimate cell counts using Extrinsic epigenetic age acceleration (EEAA) method. It also contains a collection of 12 datasets to use with MethylClock
package to estimate chronological and gestational DNA methylation with estimators to use with different methylation clocks.
This package provides functions to calculate the requisite sample size for studies where ICC is the primary outcome. Can also be used for calculation of power. In both cases it allows the user to test the impact of changing input variables by calculating the outcome for several different values of input variables. Based off the work of Zou. Zou, G. Y. (2012). Sample size formulas for estimating intraclass correlation coefficients with precision and assurance. Statistics in medicine, 31(29), 3972-3981.
Values different types of assets and calibrates discount curves for quantitative financial analysis. It covers fixed coupon assets, floating note assets, interest and cross currency swaps with different payment frequencies. Enables the calibration of spot, instantaneous forward and basis curves, making it a powerful tool for accurate and flexible bond valuation and curve generation. The valuation and calibration techniques presented here are consistent with industry standards and incorporates author's own calculations. Tuckman, B., Serrat, A. (2022, ISBN: 978-1-119-83555-4).
This package provides the data that were used in the http://quinlanlab.org/tutorials/bedtools/bedtools.html. It includes a subset of the DnaseI
hypersensitivity data from "Maurano et al. Systematic Localization of Common Disease-Associated Variation in Regulatory DNA. Science. 2012. Vol. 337 no. 6099 pp. 1190-1195." The rest of the tracks were originally downloaded from the UCSC table browser. See the HelloRanges
vignette for a port of the bedtools tutorial to R.
Volcano plots represent a useful way to visualise the results of differential expression analyses. This package provides a highly-configurable function that produces publication-ready volcano plots. EnhancedVolcano will attempt to fit as many point labels in the plot window as possible, thus avoiding clogging up the plot with labels that could not otherwise have been read. Other functionality allows the user to identify up to 4 different types of attributes in the same plot space via color, shape, size, and shade parameter configurations.
This package provides tools for integrated sensitivity analysis of evidence factors in observational studies. When an observational study allows for multiple independent or nearly independent inferences which, if vulnerable, are vulnerable to different biases, we have multiple evidence factors. This package provides methods that respect type I error rate control. Examples are provided of integrated evidence factors analysis in a longitudinal study with continuous outcome and in a case-control study. Karmakar, B., French, B., and Small, D. S. (2019)<DOI:10.1093/biomet/asz003>.
This package provides tools to extract word frequencies from the CHILDES (Child Language Data Exchange System) corpus. The main function allows users to input a list of words and receive speaker-role-specific frequency counts and a summary of the dataset. The output includes Excel-formatted tables of word counts and metadata summaries such as number of speakers, transcripts, children, and token counts. Useful for researchers studying early language acquisition, corpus linguistics, and speaker role variation. The CHILDES database is maintained at <https://childes.talkbank.org/>.
Estimation and goodness-of-fit functions for copula-based models of bivariate data with arbitrary distributions (discrete, continuous, mixture of both types). The copula families considered here are the Gaussian, Student, Clayton, Frank, Gumbel, Joe, Plackett, BB1, BB6, BB7,BB8, together with the following non-central squared copula families in Nasri (2020) <doi:10.1016/j.spl.2020.108704>: ncs-gaussian, ncs-clayton, ncs-gumbel, ncs-frank, ncs-joe, and ncs-plackett. For theoretical details, see, e.g., Nasri and Remillard (2023) <arXiv:2301.13408>
.
RStudio as of recently offers the option to define addins and assign shortcuts to them. This package contains addins for a few most frequently used functions in a data scientist's (at least mine) daily work (like str()
, example()
, plot()
, head()
, view()
, Desc()
). Most of these functions will use the current selection in the editor window and send the specific command to the console while instantly executing it. Assigning shortcuts to these addins will save you quite a few keystrokes.
This package provides implementations of some of the most important outlier detection algorithms. Includes a tutorial mode option that shows a description of each algorithm and provides a step-by-step execution explanation of how it identifies outliers from the given data with the specified input parameters. References include the works of Azzedine Boukerche, Lining Zheng, and Omar Alfandi (2020) <doi:10.1145/3381028>, Abir Smiti (2020) <doi:10.1016/j.cosrev.2020.100306>, and Xiaogang Su, Chih-Ling Tsai (2011) <doi:10.1002/widm.19>.
CukeModeler facilitates modeling a test suite that is written in Gherkin (e.g. Cucumber, SpecFlow, Lettuce, etc.). It does this by providing an abstraction layer on top of the Abstract Syntax Tree (AST) that the cucumber-gherkin
generates when parsing features, as well as providing models for feature files and directories in order to be able to have a fully traversable model tree of a test suite's structure. These models can then be analyzed or manipulated more easily than the underlying AST layer.
This package provides the heuristics miner algorithm for process discovery as proposed by Weijters et al. (2011) <doi:10.1109/CIDM.2011.5949453>. The algorithm builds a causal net from an event log created with the bupaR
package. Event logs are a set of ordered sequences of events for which bupaR
provides the S3 class eventlog()
. The discovered causal nets can be visualised as htmlwidgets and it is possible to annotate them with the occurrence frequency or processing and waiting time of process activities.
Sample size requirements calculation using three different Bayesian criteria in the context of designing an experiment to estimate a normal mean or the difference between two normal means. Functions for calculation of required sample sizes for the Average Length Criterion, the Average Coverage Criterion and the Worst Outcome Criterion in the context of normal means are provided. Functions for both the fully Bayesian and the mixed Bayesian/likelihood approaches are provided. For reference see Joseph L. and Bélisle P. (1997) <https://www.jstor.org/stable/2988525>.
This library is a collection of pseudo random number generators.
While Common Lisp does provide a RANDOM
function, it does not allow the user to pass an explicit SEED
, nor to portably exchange the random state between implementations. This can be a headache in cases like games, where a controlled seeding process can be very useful.
For both curiosity and convenience, this library offers multiple algorithms to generate random numbers, as well as a bunch of generally useful methods to produce desired ranges.
Efficient simulation-based power and sample size calculations are supported for a broad class of late-stage clinical trials. The following modules are included in the package: Adaptive designs with data-driven sample size or event count re-estimation, Adaptive designs with data-driven treatment selection, Adaptive designs with data-driven population selection, Optimal selection of a futility stopping rule, Event prediction in event-driven trials, Adaptive trials with response-adaptive randomization (experimental module), Traditional trials with multiple objectives (experimental module). Traditional trials with cluster-randomized designs (experimental module).
This package provides a set of tools and methods for making and manipulating transcript centric annotations. With these tools the user can easily download the genomic locations of the transcripts, exons and cds of a given organism, from either the UCSC Genome Browser or a BioMart database (more sources will be supported in the future). This information is then stored in a local database that keeps track of the relationship between transcripts, exons, cds and genes. Flexible methods are provided for extracting the desired features in a convenient format.
Managing and exploring parameter estimation results derived from Maximum Likelihood Estimation (MLE) using the likelihood package. It provides functions for organizing, visualizing, and summarizing MLE outcomes, streamlining statistical analysis workflows. By improving interpretation and facilitating model evaluation, it helps users gain deeper insights into parameter estimation and model fitting, making MLE result exploration more efficient and accessible. See Goffe et al. (1994) <doi:10.1016/0304-4076(94)90038-8> for details on MLE, and Canham and Uriarte (2006) <doi:10.1890/04-0657> for application of MLE using likelihood'.
The multispatial convergent cross mapping algorithm can be used as a test for causal associations between pairs of processes represented by time series. This is a combination of convergent cross mapping (CCM), described in Sugihara et al., 2012, Science, 338, 496-500, and dew-drop regression, described in Hsieh et al., 2008, American Naturalist, 171, 71â 80. The algorithm allows CCM to be implemented on data that are not from a single long time series. Instead, data can come from many short time series, which are stitched together using bootstrapping.
This package implements methods for inference on potential waning of vaccine efficacy and for estimation of vaccine efficacy at a user-specified time after vaccination based on data from a randomized, double-blind, placebo-controlled vaccine trial in which participants may be unblinded and placebo subjects may be crossed over to the study vaccine. The methods also for variant stratification and allow adjustment for possible confounding via inverse probability weighting through specification of models for the trial entry process, unblinding mechanisms, and the probability an unblinded placebo participant accepts study vaccine.
This package provides generic data structures and algorithms for use with forest mensuration data in a consistent framework. The functions and objects included are a collection of broadly applicable tools. More specialized applications should be implemented in separate packages that build on this foundation. Documentation about ForestElementsR
is provided by three vignettes included in this package. For an introduction to the field of forest mensuration, refer to the textbooks by Kershaw et al. (2017) <doi:10.1002/9781118902028>, and van Laar and Akca (2007) <doi:10.1007/978-1-4020-5991-9>.
Programs for detecting and cleaning outliers in single time series and in time series from homogeneous and heterogeneous databases using an Orthogonal Greedy Algorithm (OGA) for saturated linear regression models. The programs implement the procedures presented in the paper entitled "Efficient Outlier Detection for Large Time Series Databases" by Pedro Galeano, Daniel Peña and Ruey S. Tsay (2025), working paper, Universidad Carlos III de Madrid. Version 1.0.1 contains some improvements to the algorithm, so the results may vary slightly compared to those obtained with version 0.0.1.
This package aggregateBioVar
contains tools to summarize single cell gene expression profiles at the level of subject for single cell RNA-seq data collected from more than one subject (e.g. biological sample or technical replicates). A SingleCellExperiment
object is taken as input and converted to a list of SummarizedExperiment
objects, where each list element corresponds to an assigned cell type. The SummarizedExperiment
objects contain aggregate gene-by-subject count matrices and inter-subject column metadata for individual subjects that can be processed using downstream bulk RNA-seq tools.