We provide a non-parametric and a parametric approach to investigate the equivalence (or non-inferiority) of two survival curves, obtained from two given datasets. The test is based on the creation of confidence intervals at pre-specified time points. For the non-parametric approach, the curves are given by Kaplan-Meier curves and the variance for calculating the confidence intervals is obtained by Greenwood's formula. The parametric approach is based on estimating the underlying distribution, where the user can choose between a Weibull, Exponential, Gaussian, Logistic, Log-normal or a Log-logistic distribution. Estimates for the variance for calculating the confidence bands are obtained by a (parametric) bootstrap approach. For this bootstrap censoring is assumed to be exponentially distributed and estimates are obtained from the datasets under consideration. All details can be found in K.Moellenhoff and A.Tresch: Survival analysis under non-proportional hazards: investigating non-inferiority or equivalence in time-to-event data <arXiv:2009.06699>
.
Software of esDesign
is developed to implement the adaptive enrichment designs with sample size re-estimation presented in Lin et al. (2021) <doi: 10.1016/j.cct.2020.106216>. In details, three-proposed trial designs are provided, including the AED1-SSR (or ES1-SSR), AED2-SSR (or ES2-SSR) and AED3-SSR (or ES3-SSR). In addition, this package also contains several widely used adaptive designs, such as the Marker Sequential Test (MaST
) design proposed Freidlin et al. (2014) <doi:10.1177/1740774513503739>, the adaptive enrichment designs without early stopping (AED or ES), the sample size re-estimation procedure (SSR) based on the conditional power proposed by Proschan and Hunsberger (1995), and some useful functions. In details, we can calculate the futility and/or efficacy stopping boundaries, the sample size required, calibrate the value of the threshold of the difference between subgroup-specific test statistics, conduct the simulation studies in AED, SSR, AED1-SSR, AED2-SSR and AED3-SSR.
Given independent and identically distributed observations X(1), ..., X(n) from a density f, provides five methods to perform a multiscale analysis about f as well as the necessary critical values. The first method, introduced in Duembgen and Walther (2008), provides simultaneous confidence statements for the existence and location of local increases (or decreases) of f, based on all intervals I(all) spanned by any two observations X(j), X(k). The second method approximates the latter approach by using only a subset of I(all) and is therefore computationally much more efficient, but asymptotically equivalent. Omitting the additive correction term Gamma in either method offers another two approaches which are more powerful on small scales and less powerful on large scales, however, not asymptotically minimax optimal anymore. Finally, the block procedure is a compromise between adding Gamma or not, having intermediate power properties. The latter is again asymptotically equivalent to the first and was introduced in Rufibach and Walther (2010).
Detection of overdispersion in count data for multiple regression analysis. Log-linear count data regression is one of the most popular techniques for predictive modeling where there is a non-negative discrete quantitative dependent variable. In order to ensure the inferences from the use of count data models are appropriate, researchers may choose between the estimation of a Poisson model and a negative binomial model, and the correct decision for prediction from a count data estimation is directly linked to the existence of overdispersion of the dependent variable, conditional to the explanatory variables. Based on the studies of Cameron and Trivedi (1990) <doi:10.1016/0304-4076(90)90014-K> and Cameron and Trivedi (2013, ISBN:978-1107667273), the overdisp()
command is a contribution to researchers, providing a fast and secure solution for the detection of overdispersion in count data. Another advantage is that the installation of other packages is unnecessary, since the command runs in the basic R language.
Simulate multivariate correlated data given nonparametric marginals and their joint structure characterized by a Pearson or Spearman correlation matrix. The simulator engages the problem from a purely computational perspective. It assumes no statistical models such as copulas or parametric distributions, and can approximate the target correlations regardless of theoretical feasibility. The algorithm integrates and advances the Iman-Conover (1982) approach <doi:10.1080/03610918208812265> and the Ruscio-Kaczetow iteration (2008) <doi:10.1080/00273170802285693>. Package functions are carefully implemented in C++ for squeezing computing speed, suitable for large input in a manycore environment. Precision of the approximation and computing speed both substantially outperform various CRAN packages to date. Benchmarks are detailed in function examples. A simple heuristic algorithm is additionally designed to optimize the joint distribution in the post-simulation stage. The heuristic demonstrated good potential of achieving the same level of precision of approximation without the enhanced Iman-Conover-Ruscio-Kaczetow. The package contains a copy of Permuted Congruential Generator.
Simple and efficient access to Yahoo Finance's screener API <https://finance.yahoo.com/research-hub/screener/> for querying and retrieval of financial data. The core functionality abstracts the complexities of interacting with Yahoo Finance APIs, such as session management, crumb and cookie handling, query construction, pagination, and JSON payload generation. This abstraction allows users to focus on filtering and retrieving data rather than managing API details. Use cases include screening across a range of security types including equities, mutual funds, ETFs, indices, and futures. The package supports advanced query capabilities, including logical operators, nested filters, and customizable payloads. It automatically handles pagination to ensure efficient retrieval of large datasets by fetching results in batches of up to 250 entries per request. Filters can be dynamically defined to accommodate a wide range of screening needs. The implementation leverages standard HTTP libraries to handle API interactions efficiently and provides support for both R and Python to ensure accessibility for a broad audience.
Cell surface proteins form a major fraction of the druggable proteome and can be used for tissue-specific delivery of oligonucleotide/cell-based therapeutics. Alternatively spliced surface protein isoforms have been shown to differ in their subcellular localization and/or their transmembrane (TM) topology. Surface proteins are hydrophobic and remain difficult to study thereby necessitating the use of TM topology prediction methods such as TMHMM and Phobius. However, there exists a need for bioinformatic approaches to streamline batch processing of isoforms for comparing and visualizing topologies. To address this gap, we have developed an R package, surfaltr. It pairs inputted isoforms, either known alternatively spliced or novel, with their APPRIS annotated principal counterparts, predicts their TM topologies using TMHMM or Phobius, and generates a customizable graphical output. Further, surfaltr facilitates the prioritization of biologically diverse isoform pairs through the incorporation of three different ranking metrics and through protein alignment functions. Citations for programs mentioned here can be found in the vignette.
This package can do non-parametric bootstrap and permutation resampling-based multiple testing procedures (including empirical Bayes methods) for controlling the family-wise error rate (FWER), generalized family-wise error rate (gFWER), tail probability of the proportion of false positives (TPPFP), and false discovery rate (FDR). Several choices of bootstrap-based null distribution are implemented (centered, centered and scaled, quantile-transformed). Single-step and step-wise methods are available. Tests based on a variety of T- and F-statistics (including T-statistics based on regression parameters from linear and survival models as well as those based on correlation parameters) are included. When probing hypotheses with T-statistics, users may also select a potentially faster null distribution which is multivariate normal with mean zero and variance covariance matrix derived from the vector influence function. Results are reported in terms of adjusted P-values, confidence regions and test statistic cutoffs. The procedures are directly applicable to identifying differentially expressed genes in DNA microarray experiments.
Offers calculation, visualization and comparison of algorithmic fairness metrics. Fair machine learning is an emerging topic with the overarching aim to critically assess whether ML algorithms reinforce existing social biases. Unfair algorithms can propagate such biases and produce predictions with a disparate impact on various sensitive groups of individuals (defined by sex, gender, ethnicity, religion, income, socioeconomic status, physical or mental disabilities). Fair algorithms possess the underlying foundation that these groups should be treated similarly or have similar prediction outcomes. The fairness R package offers the calculation and comparisons of commonly and less commonly used fairness metrics in population subgroups. These methods are described by Calders and Verwer (2010) <doi:10.1007/s10618-010-0190-x>, Chouldechova (2017) <doi:10.1089/big.2016.0047>, Feldman et al. (2015) <doi:10.1145/2783258.2783311> , Friedler et al. (2018) <doi:10.1145/3287560.3287589> and Zafar et al. (2017) <doi:10.1145/3038912.3052660>. The package also offers convenient visualizations to help understand fairness metrics.
The phenomis package provides methods to perform post-processing (i.e. quality control and normalization) as well as univariate statistical analysis of single and multi-omics data sets. These methods include quality control metrics, signal drift and batch effect correction, intensity transformation, univariate hypothesis testing, but also clustering (as well as annotation of metabolomics data). The data are handled in the standard Bioconductor formats (i.e. SummarizedExperiment
and MultiAssayExperiment
for single and multi-omics datasets, respectively; the alternative ExpressionSet
and MultiDataSet
formats are also supported for convenience). As a result, all methods can be readily chained as workflows. The pipeline can be further enriched by multivariate analysis and feature selection, by using the ropls and biosigner packages, which support the same formats. Data can be conveniently imported from and exported to text files. Although the methods were initially targeted to metabolomics data, most of the methods can be applied to other types of omics data (e.g., transcriptomics, proteomics).
This package provides a tool for researchers and psychologists to automatically code open-ended responses to the Cognitive Reflection Test (CRT), a widely used class of tests in cognitive science and psychology for assessing an individual's propensity to override an incorrect gut response and engage in further reflection to find a correct answer. This package facilitates the standardization of Cognitive Reflection Test responses analysis across large datasets in cognitive psychology, decision-making, and related fields. By automating the coding process, it not only reduces manual effort but also aims to reduce the variability introduced by subjective interpretation of open-ended responses, contributing to a more consistent and reliable analysis. reflectR
supports automatic coding and machine scoring for the original English-language version of CRT (Frederick, 2005) <doi:10.1257/089533005775196732>, as well as for CRT4 and CRT7, 4- and 7-item versions, respectively (Toplak et al., 2014) <doi:10.1080/13546783.2013.844729>, for the CRT-long version built via Item Response Theory by Primi and colleagues (2016) <doi:10.1002/bdm.1883>, and for CRT-2 by Thomson & Oppenheimer (2016) <doi:10.1017/s1930297500007622>. Note: While reflectR
draws inspiration from the principles and scientific literature underlying the different versions of the Cognitive Reflection Test, it has been independently developed and does not hold any affiliation with any of the original authors. The development of this package benefited significantly from the kind insight and suggestion provided by Dr. Keela Thomson, whose contribution is gratefully acknowledged. Additional gratitude is extended to Dr. Paolo Giovanni Cicirelli, Prof. Marinella Paciello, Dr. Carmela Sportelli, and Prof. Francesca D'Errico, who not only contributed to the manual multi-rater coding of CRT-2 items but also profoundly influenced the understanding of the importance and practical relevance of cognitive reflection within personality, social, and cognitive psychology research. Acknowledgment is also due to the European project STERHEOTYPES (STudying European Racial Hoaxes and sterEOTYPES
) for funding the data collection that produced the datasets initially used for manual multi-rater coding of CRT-2 items.
Build graphs for landscape genetics analysis. This set of functions can be used to import and convert spatial and genetic data initially in different formats, import landscape graphs created with GRAPHAB software (Foltete et al., 2012) <doi:10.1016/j.envsoft.2012.07.002>, make diagnosis plots of isolation by distance relationships in order to choose how to build genetic graphs, create graphs with a large range of pruning methods, weight their links with several genetic distances, plot and analyse graphs, compare them with other graphs. It uses functions from other packages such as adegenet (Jombart, 2008) <doi:10.1093/bioinformatics/btn129> and igraph (Csardi et Nepusz, 2006) <https://igraph.org/>. It also implements methods commonly used in landscape genetics to create graphs, described by Dyer et Nason (2004) <doi:10.1111/j.1365-294X.2004.02177.x> and Greenbaum et Fefferman (2017) <doi:10.1111/mec.14059>, and to analyse distance data (van Strien et al., 2015) <doi:10.1038/hdy.2014.62>.
This package implements Additive Logistic Transformation (alr) for Small Area Estimation under Fay Herriot Model. Small Area Estimation is used to borrow strength from auxiliary variables to improve the effectiveness of a domain sample size. This package uses Empirical Best Linear Unbiased Prediction (EBLUP). The Additive Logistic Transformation (alr) are based on transformation by Aitchison J (1986). The covariance matrix for multivariate application is based on covariance matrix used by Esteban M, Lombardà a M, López-Vizcaà no E, Morales D, and Pérez A <doi:10.1007/s11749-019-00688-w>. The non-sampled models are modified area-level models based on models proposed by Anisa R, Kurnia A, and Indahwati I <doi:10.9790/5728-10121519>, with univariate model using model-3, and multivariate model using model-1. The MSE are estimated using Parametric Bootstrap approach. For non-sampled cases, MSE are estimated using modified approach proposed by Haris F and Ubaidillah A <doi:10.4108/eai.2-8-2019.2290339>.
In total it has 7 functions, three for calculating machine calibration, which determine application rate (L/ha), nozzle flow (L/min) and amount of product (L or kg) to be added. to the tank with each sprayer filling. Two functions for graphs of the flow distribution of the nozzles (L/min) in the application bar and, of the temporal variability of the meteorological conditions (air temperature, relative humidity of the air and wind speed). Two functions to determine the spray deposit (uL/cm2
), through the methodology called spectrophotometry, with the aid of bright blue (Palladini, L.A., Raetano, C.G., Velini, E.D. (2005), <doi:10.1590/S0103-90162005000500005>) or metallic markers (Chaim, A., Castro, V.L.S.S., Correles, F.M., Galvão, J.A.H., Cabral, O.M.R., Nicolella, G. (1999), <doi:10.1590/S0100-204X1999000500003>). The package supports the analysis and representation of information, using a single free software that meets the most diverse areas of activity in application technology.
This package implements the non-iterative conditional expectation (NICE) algorithm of the g-formula algorithm (Robins (1986) <doi:10.1016/0270-0255(86)90088-6>, Hernán and Robins (2024, ISBN:9781420076165)). The g-formula can estimate an outcome's counterfactual mean or risk under hypothetical treatment strategies (interventions) when there is sufficient information on time-varying treatments and confounders. This package can be used for discrete or continuous time-varying treatments and for failure time outcomes or continuous/binary end of follow-up outcomes. The package can handle a random measurement/visit process and a priori knowledge of the data structure, as well as censoring (e.g., by loss to follow-up) and two options for handling competing events for failure time outcomes. Interventions can be flexibly specified, both as interventions on a single treatment or as joint interventions on multiple treatments. See McGrath
et al. (2020) <doi:10.1016/j.patter.2020.100008> for a guide on how to use the package.
S4 tool box for capacity (or non-additive measure, fuzzy measure) and integral manipulation in a finite setting. It contains routines for handling various types of set functions such as games or capacities. It can be used to compute several non-additive integrals: the Choquet integral, the Sugeno integral, and the symmetric and asymmetric Choquet integrals. An analysis of capacities in terms of decision behavior can be performed through the computation of various indices such as the Shapley value, the interaction index, the orness degree, etc. The well-known Möbius transform, as well as other equivalent representations of set functions can also be computed. Kappalab further contains seven capacity identification routines: three least squares based approaches, a method based on linear programming, a maximum entropy like method based on variance minimization, a minimum distance approach and an unsupervised approach based on parametric entropies. The functions contained in Kappalab can for instance be used in the framework of multicriteria decision making or cooperative game theory.
This package provides a modeling tool dedicated to biological network modeling (Bertrand and others 2020, <doi:10.1093/bioinformatics/btaa855>). It allows for single or joint modeling of, for instance, genes and proteins. It starts with the selection of the actors that will be the used in the reverse engineering upcoming step. An actor can be included in that selection based on its differential measurement (for instance gene expression or protein abundance) or on its time course profile. Wrappers for actors clustering functions and cluster analysis are provided. It also allows reverse engineering of biological networks taking into account the observed time course patterns of the actors. Many inference functions are provided and dedicated to get specific features for the inferred network such as sparsity, robust links, high confidence links or stable through resampling links. Some simulation and prediction tools are also available for cascade networks (Jung and others 2014, <doi:10.1093/bioinformatics/btt705>). Example of use with microarray or RNA-Seq data are provided.
Mutational signatures are carcinogenic exposures or aberrant cellular processes that can cause alterations to the genome. We created musicatk (MUtational SIgnature Comprehensive Analysis ToolKit
) to address shortcomings in versatility and ease of use in other pre-existing computational tools. Although many different types of mutational data have been generated, current software packages do not have a flexible framework to allow users to mix and match different types of mutations in the mutational signature inference process. Musicatk enables users to count and combine multiple mutation types, including SBS, DBS, and indels. Musicatk calculates replication strand, transcription strand and combinations of these features along with discovery from unique and proprietary genomic feature associated with any mutation type. Musicatk also implements several methods for discovery of new signatures as well as methods to infer exposure given an existing set of signatures. Musicatk provides functions for visualization and downstream exploratory analysis including the ability to compare signatures between cohorts and find matching signatures in COSMIC V2 or COSMIC V3.
This package provides functions to develop simulated continuous data (e.g., gene expression) from a sigma covariance matrix derived from a graph structure in igraph objects. Intended to extend mvtnorm to take igraph structures rather than sigma matrices as input. This allows the use of simulated data that correctly accounts for pathway relationships and correlations. This allows the use of simulated data that correctly accounts for pathway relationships and correlations. Here we present a versatile statistical framework to simulate correlated gene expression data from biological pathways, by sampling from a multivariate normal distribution derived from a graph structure. This package allows the simulation of biological pathways from a graph structure based on a statistical model of gene expression. For example methods to infer biological pathways and gene regulatory networks from gene expression data can be tested on simulated datasets using this framework. This also allows for pathway structures to be considered as a confounding variable when simulating gene expression data to test the performance of genomic analyses.
Simplifies the process of estimating above ground biomass components for teak trees using a few basic inputs, based on the equations taken from the journal "Allometric equations for estimating above ground biomass and leaf area of planted teak (Tectona grandis) forests under agroforestry management in East Java, Indonesia" (Purwanto & Shiba, 2006) <doi:10.60409/forestresearch.76.0_1>. This function is most reliable when applied to trees from the same region where the equations were developed, specifically East Java, Indonesia. This function help to estimate the stem diameter at the lowest major living branch (DB) using the stem diameter at breast height with R^2 = 0.969. Estimate the branch dry weight (WB) using the stem diameter at breast height and tree height (R^2 = 0.979). Estimate the stem weight (WS) using the stem diameter at breast height and tree height (R^2 = 0.997. Also estimate the leaf dry weight (WL) using the stem diameter at the lowest major living branch (R^2 = 0.996).
This package provides a Boolean network is a particular kind of discrete dynamical system where the variables are simple binary switches. Despite its simplicity, Boolean network modeling has been a successful method to describe the behavioral pattern of various phenomena. Applying stochastic noise to Boolean networks is a useful approach for representing the effects of various perturbing stimuli on complex systems. A number of methods have been developed to control noise effects on Boolean networks using parameters integrated into the update rules. This package provides functions to examine three such methods: Boolean network with perturbations (BNp), described by Trairatphisan et al. (2013) <doi:10.1186/1478-811X-11-46>, stochastic discrete dynamical systems (SDDS), proposed by Murrugarra et al. (2012) <doi:10.1186/1687-4153-2012-5>, and Boolean network with probabilistic edge weights (PEW), presented by Deritei et al. (2022) <doi:10.1371/journal.pcbi.1010536>. This package includes source code derived from the BoolNet
package, which is licensed under the Artistic License 2.0.
Optimizers for torch deep learning library. These functions include recent results published in the literature and are not part of the optimizers offered in torch'. Prospective users should test these optimizers with their data, since performance depends on the specific problem being solved. The packages includes the following optimizers: (a) adabelief by Zhuang et al (2020), <arXiv:2010.07468>
; (b) adabound by Luo et al.(2019), <arXiv:1902.09843>
; (c) adahessian by Yao et al.(2021) <arXiv:2006.00719>
; (d) adamw by Loshchilov & Hutter (2019), <arXiv:1711.05101>
; (e) madgrad by Defazio and Jelassi (2021), <arXiv:2101.11075>
; (f) nadam by Dozat (2019), <https://openreview.net/pdf/OM0jvwB8jIp57ZJjtNEZ.pdf>
; (g) qhadam by Ma and Yarats(2019), <arXiv:1810.06801>
; (h) radam by Liu et al. (2019), <arXiv:1908.03265>
; (i) swats by Shekar and Sochee (2018), <arXiv:1712.07628>
; (j) yogi by Zaheer et al.(2019), <https://papers.nips.cc/paper/8186-adaptive-methods-for-nonconvex-optimization>.
Datasets from most recent CCIIO DIY entry in a tidy format. These support the Centers for Medicare and Medicaid Services (CMS) risk adjustment Do-It-Yourself (DIY) process, which allows health insurance issuers to calculate member risk profiles under the Health and Human Services-Hierarchical Condition Categories (HHS-HCC) regression model. This regression model is used to calculate risk adjustment transfers. Risk adjustment is a selection mitigation program implemented under the Patient Protection and Affordable Care Act (ACA or Obamacare) in the USA. Under the ACA, health insurance issuers submit claims data to CMS in order for CMS to calculate a risk score under the HHS-HCC regression model. However, CMS does not inform issuers of their average risk score until after the data submission deadline. These data sets can be used by issuers to calculate their average risk score mid-year. More information about risk adjustment and the HHS-HCC model can be found here: <https://www.cms.gov/mmrr/Articles/A2014/MMRR2014_004_03_a03.html>.
Implementation of trigonometric functions to calculate the exposure of flat, tilted surfaces, such as leaves and slopes, to direct solar radiation. It implements the equations in A.G. Escribano-Rocafort, A. Ventre-Lespiaucq, C. Granado-Yela, et al. (2014) <doi:10.1111/2041-210X.12141> in a few user-friendly R functions. All functions handle data obtained with Ahmes 1.0 for Android, as well as more traditional data sources (compass, protractor, inclinometer). The main function (star()
) calculates the potential exposure of flat, tilted surfaces to direct solar radiation (silhouette to area ratio, STAR). It is equivalent to the ratio of the leaf projected area to total leaf area, but instead of using area data it uses spatial position angles, such as pitch, roll and course, and information on the geographical coordinates, hour, and date. The package includes additional functions to recalculate STAR with custom settings of location and time, to calculate the tilt angle of a surface, and the minimum angle between two non-orthogonal planes.