Objective: Implement new methods for detecting change points in high-dimensional time series data. These new methods can be applied to non-Gaussian data, account for spatial and temporal dependence, and detect a wide variety of change-point configurations, including changes near the boundary and changes in close proximity. Additionally, this package helps address the â small n, large pâ problem, which occurs in many research contexts. This problem arises when a dataset contains changes that are visually evident but do not rise to the level of statistical significance due to the small number of observations and large number of parameters. The problem is overcome by treating the dimensions as a whole and scaling the test statistics only by its standard deviation, rather than scaling each dimension individually. Due to the computational complexity of the functions, the package runs best on datasets with a relatively large number of attributes but no more than a few hundred observations.
The goal of this package is to cover the most common steps in Loss Given Default (LGD) rating model development. The main procedures available are those that refer to bivariate and multivariate analysis. In particular two statistical methods for multivariate analysis are currently implemented â OLS regression and fractional logistic regression. Both methods are also available within different blockwise model designs and both have customized stepwise algorithms. Descriptions of these customized designs are available in Siddiqi (2016) <doi:10.1002/9781119282396.ch10> and Anderson, R.A. (2021) <doi:10.1093/oso/9780192844194.001.0001>. Although they are explained for PD model, the same designs are applicable for LGD model with different underlying regression methods (OLS and fractional logistic regression). To cover other important steps for LGD model development, it is recommended to use LGDtoolkit package along with PDtoolkit', and monobin (or monobinShiny') packages. Additionally, LGDtoolkit provides set of procedures handy for initial and periodical model validation.
There are some experimental scenarios where each experimental unit receives a sequence of treatments across multiple periods, and treatment effects persist beyond the period of application. It focuses on the construction and calculation of the parametric value of the residual effect designs balanced for carryover effects, also referred to as crossover designs, change-over designs, or repeated measurements designs (Aggarwal and Jha, 2010<doi:10.1080/15598608.2010.10412013>). The primary objective of the package is to generate a new class of Balanced Ternary Residual Effect Designs (BTREDs), balanced for carryover effects tailored explicitly for situations where the number of periods is less than or equal to the number of treatments. In addition, the package provides four new classes of Partially Balanced Ternary Residual Effect Designs (PBTREDs), constructed using incomplete block designs, initial sequences, and rectangular association scheme. In addition, one extra function is included to help study the parametric properties of a given residual effect design.
Hierarchical and partitioning algorithms to cluster blocks of variables. The partitioning algorithm includes an option called noise cluster to set aside atypical blocks of variables. Different thresholds per cluster can be sets. The CLUSTATIS method (for quantitative blocks) (Llobell, Cariou, Vigneau, Labenne & Qannari (2020) <doi:10.1016/j.foodqual.2018.05.013>, Llobell, Vigneau & Qannari (2019) <doi:10.1016/j.foodqual.2019.02.017>) and the CLUSCATA method (for Check-All-That-Apply data) (Llobell, Cariou, Vigneau, Labenne & Qannari (2019) <doi:10.1016/j.foodqual.2018.09.006>, Llobell, Giacalone, Labenne & Qannari (2019) <doi:10.1016/j.foodqual.2019.05.017>) are the core of this package. The CATATIS methods allows to compute some indices and tests to control the quality of CATA data. Multivariate analysis and clustering of subjects for quantitative multiblock data, CATA, RATA, Free Sorting and JAR experiments are available. Clustering of rows in multi-block context (notably with ClusMB strategy) is also included.
HiCcompare provides functions for joint normalization and difference detection in multiple Hi-C datasets. HiCcompare operates on processed Hi-C data in the form of chromosome-specific chromatin interaction matrices. It accepts three-column tab-separated text files storing chromatin interaction matrices in a sparse matrix format which are available from several sources. HiCcompare is designed to give the user the ability to perform a comparative analysis on the 3-Dimensional structure of the genomes of cells in different biological states.`HiCcompare` differs from other packages that attempt to compare Hi-C data in that it works on processed data in chromatin interaction matrix format instead of pre-processed sequencing data. In addition, `HiCcompare` provides a non-parametric method for the joint normalization and removal of biases between two Hi-C datasets for the purpose of comparative analysis. `HiCcompare` also provides a simple yet robust method for detecting differences between Hi-C datasets.
Clustering is carried out to identify patterns in transcriptomics profiles to determine clinically relevant subgroups of patients. Feature (gene) selection is a critical and an integral part of the process. Currently, there are many feature selection and clustering methods to identify the relevant genes and perform clustering of samples. However, choosing an appropriate methodology is difficult. In addition, extensive feature selection methods have not been supported by the available packages. Hence, we developed an integrative R-package called multiClust that allows researchers to experiment with the choice of combination of methods for gene selection and clustering with ease. Using multiClust, we identified the best performing clustering methodology in the context of clinical outcome. Our observations demonstrate that simple methods such as variance-based ranking perform well on the majority of data sets, provided that the appropriate number of genes is selected. However, different gene ranking and selection methods remain relevant as no methodology works for all studies.
This package provides functions to work with directed (asymmetric) and undirected (symmetric) spatial networks. It makes the creation of connectivity matrices easier, i.e. a binary matrix of dimension n x n, where n is the number of nodes (sampling units) indicating the presence (1) or the absence (0) of an edge (link) between pairs of nodes. Different network objects can be produced by chessboard': node list, neighbor list, edge list, connectivity matrix. It can also produce objects that will be used later in Moran's Eigenvector Maps (Dray et al. (2006) <doi:10.1016/j.ecolmodel.2006.02.015>) and Asymetric Eigenvector Maps (Blanchet et al. (2008) <doi:10.1016/j.ecolmodel.2008.04.001>), methods available in the package adespatial (Dray et al. (2023) <https://CRAN.R-project.org/package=adespatial>). This work is part of the FRB-CESAB working group Bridge <https://www.fondationbiodiversite.fr/en/the-frb-in-action/programs-and-projects/le-cesab/bridge/>.
Screens daily streamflow time series for temporal trends and change-points. This package has been primarily developed for assessing the quality of daily streamflow time series. It also contains tools for plotting and calculating many different streamflow metrics. The package can be used to produce summary screening plots showing change-points and significant temporal trends for high flow, low flow, and/or baseflow statistics, or it can be used to perform more detailed hydrological time series analyses. The package was designed for screening daily streamflow time series from Water Survey Canada and the United States Geological Survey but will also work with streamflow time series from many other agencies. Package update to version 2.0 made updates to read.flows function to allow loading of GRDC and ROBIN streamflow record formats. This package uses the `changepoint` package for change point detection. For more information on change point methods, see the changepoint package at <https://cran.r-project.org/package=changepoint>.
Addressing crucial research questions often necessitates a small sample size due to factors such as distinctive target populations, rarity of the event under study, time and cost constraints, ethical concerns, or group-level unit of analysis. Many readily available analytic methods, however, do not accommodate small sample sizes, and the choice of the best method can be unclear. The npboottprm package enables the execution of nonparametric bootstrap tests with pooled resampling to help fill this gap. Grounded in the statistical methods for small sample size studies detailed in Dwivedi, Mallawaarachchi, and Alvarado (2017) <doi:10.1002/sim.7263>, the package facilitates a range of statistical tests, encompassing independent t-tests, paired t-tests, and one-way Analysis of Variance (ANOVA) F-tests. The nonparboot() function undertakes essential computations, yielding detailed outputs which include test statistics, effect sizes, confidence intervals, and bootstrap distributions. Further, npboottprm incorporates an interactive shiny web application, nonparboot_app(), offering intuitive, user-friendly data exploration.
Under- and over-dispersed binary data are modeled using an extended Poisson process model (EPPM) appropriate for binary data. A feature of the model is that the under-dispersion relative to the binomial distribution only needs to be greater than zero, but the over-dispersion is restricted compared to other distributional models such as the beta and correlated binomials. Because of this, the examples focus on under-dispersed data and how, in combination with the beta or correlated distributions, flexible models can be fitted to data displaying both under- and over-dispersion. Using Generalized Linear Model (GLM) terminology, the functions utilize linear predictors for the probability of success and scale-factor with various link functions for p, and log link for scale-factor, to fit a variety of models relevant to areas such as bioassay. Details of the EPPM are in Faddy and Smith (2012) <doi:10.1002/bimj.201100214> and Smith and Faddy (2019) <doi:10.18637/jss.v090.i08>.
Compare functional enrichment between two experimentally-derived groups of genes or proteins (Peterson, DR., et al.(2018)) <doi: 10.1371/journal.pone.0198139>. Given a list of gene symbols, diffEnrich will perform differential enrichment analysis using the Kyoto Encyclopedia of Genes and Genomes (KEGG) REST API. This package provides a number of functions that are intended to be used in a pipeline. Briefly, the user provides a KEGG formatted species id for either human, mouse or rat, and the package will download and clean species specific ENTREZ gene IDs and map them to their respective KEGG pathways by accessing KEGG's REST API. KEGG's API is used to guarantee the most up-to-date pathway data from KEGG. Next, the user will identify significantly enriched pathways from two gene sets, and finally, the user will identify pathways that are differentially enriched between the two gene sets. In addition to the analysis pipeline, this package also provides a plotting function.
This package contains all the formulae of the growth and trace element uptake model described in the equally-named Geoscientific Model Development paper (de Winter, 2017, <doi:10.5194/gmd-2017-137>). The model takes as input a file with X- and Y-coordinates of digitized growth increments recognized on a longitudinal cross section through the bivalve shell, as well as a BMP file of an elemental map of the cross section surface with chemically distinct phases separated by phase analysis. It proceeds by a step-by-step process described in the paper, by which digitized growth increments are used to calculate changes in shell height, shell thickness, shell volume, shell mass and shell growth rate through the bivalve's life time. Then, results of this growth modelling are combined with the trace element mapping results to trace the incorporation of trace elements into the bivalve shell. Results of various modelling parameters can be exported in the form of XLSX files.
preciseTAD provides functions to predict the location of boundaries of topologically associated domains (TADs) and chromatin loops at base-level resolution. As an input, it takes BED-formatted genomic coordinates of domain boundaries detected from low-resolution Hi-C data, and coordinates of high-resolution genomic annotations from ENCODE or other consortia. preciseTAD employs several feature engineering strategies and resampling techniques to address class imbalance, and trains an optimized random forest model for predicting low-resolution domain boundaries. Translated on a base-level, preciseTAD predicts the probability for each base to be a boundary. Density-based clustering and scalable partitioning techniques are used to detect precise boundary regions and summit points. Compared with low-resolution boundaries, preciseTAD boundaries are highly enriched for CTCF, RAD21, SMC3, and ZNF143 signal and more conserved across cell lines. The pre-trained model can accurately predict boundaries in another cell line using CTCF, RAD21, SMC3, and ZNF143 annotation data for this cell line.
Power of non-parametric Mann-Kendall test and Spearmanâ s Rho test is highly influenced by serially correlated data. To address this issue, trend tests may be applied on the modified versions of the time series data by Block Bootstrapping (BBS), Prewhitening (PW) , Trend Free Prewhitening (TFPW), Bias Corrected Prewhitening and Variance Correction Approach by calculating effective sample size. Mann, H. B. (1945).<doi:10.1017/CBO9781107415324.004>. Kendall, M. (1975). Multivariate analysis. Charles Griffin&Company Ltd,. sen, P. K. (1968).<doi:10.2307/2285891>. à nöz, B., & Bayazit, M. (2012) <doi:10.1002/hyp.8438>. Hamed, K. H. (2009).<doi:10.1016/j.jhydrol.2009.01.040>. Yue, S., & Wang, C. Y. (2002) <doi:10.1029/2001WR000861>. Yue, S., Pilon, P., Phinney, B., & Cavadias, G. (2002) <doi:10.1002/hyp.1095>. Hamed, K. H., & Ramachandra Rao, A. (1998) <doi:10.1016/S0022-1694(97)00125-X>. Yue, S., & Wang, C. Y. (2004) <doi:10.1023/B:WARM.0000043140.61082.60>.
An application to calculate a patient's pretest probability (PTP) for obstructive Coronary Artery Disease (CAD) from a collection of guidelines or studies. Guidelines usually comes from the American Heart Association (AHA), American College of Cardiology (ACC) or European Society of Cardiology (ESC). Examples of PTP scores that comes from studies are the 2020 Winther et al. basic, Risk Factor-weighted Clinical Likelihood (RF-CL) and Coronary Artery Calcium Score-weighted Clinical Likelihood (CACS-CL) models <doi:10.1016/j.jacc.2020.09.585>, 2019 Reeh et al. basic and clinical models <doi:10.1093/eurheartj/ehy806> and 2017 Fordyce et al. PROMISE Minimal-Risk Tool <doi:10.1001/jamacardio.2016.5501>. As diagnosis of CAD involves a costly and invasive coronary angiography procedure for patients, having a reliable PTP for CAD helps doctors to make better decisions during patient management. This ensures high risk patients can be diagnosed and treated early for CAD while avoiding unnecessary testing for low risk patients.
This package implements several methods to meta-analyze studies that report the sample median of the outcome. The methods described by McGrath et al. (2019) <doi:10.1002/sim.8013>, Ozturk and Balakrishnan (2020) <doi:10.1002/sim.8738>, and McGrath et al. (2020a) <doi:10.1002/bimj.201900036> can be applied to directly meta-analyze the median or difference of medians between groups. Additionally, a number of methods (e.g., McGrath et al. (2020b) <doi:10.1177/0962280219889080>, Cai et al. (2021) <doi:10.1177/09622802211047348>, and McGrath et al. (2023) <doi:10.1177/09622802221139233>) are implemented to estimate study-specific (difference of) means and their standard errors in order to estimate the pooled (difference of) means. Methods for meta-analyzing median survival times (McGrath et al. (2025) <doi:10.48550/arXiv.2503.03065>) are also implemented. See McGrath et al. (2024) <doi:10.1002/jrsm.1686> for a detailed guide on using the package.
This package performs partial verification bias (PVB) correction for binary diagnostic tests, where PVB arises from selective patient verification in diagnostic accuracy studies. Supports correction of important accuracy measures -- sensitivity, specificity, positive predictive values and negative predictive value -- under missing-at-random and missing-not-at-random missing data mechanisms. Available methods and references are "Begg and Greenes methods" in Alonzo & Pepe (2005) <doi:10.1111/j.1467-9876.2005.00477.x> and deGroot et al. (2011) <doi:10.1016/j.annepidem.2010.10.004>; "Multiple imputation" in Harel & Zhou (2006) <doi:10.1002/sim.2494>, "EM-based logistic regression" in Kosinski & Barnhart (2003) <doi:10.1111/1541-0420.00019>; "Inverse probability weighting" in Alonzo & Pepe (2005) <doi:10.1111/j.1467-9876.2005.00477.x>; "Inverse probability bootstrap sampling" in Nahorniak et al. (2015) <doi:10.1371/journal.pone.0131765> and Arifin & Yusof (2022) <doi:10.3390/diagnostics12112839>; "Scaled inverse probability resampling methods" in Arifin & Yusof (2025) <doi:10.1371/journal.pone.0321440>.
An algorithm for flexible conditional density estimation based on application of pooled hazard regression to an artificial repeated measures dataset constructed by discretizing the support of the outcome variable. To facilitate flexible estimation of the conditional density, the highly adaptive lasso, a non-parametric regression function shown to estimate cadlag (RCLL) functions at a suitably fast convergence rate, is used. The use of pooled hazards regression for conditional density estimation as implemented here was first described for by DÃ az and van der Laan (2011) <doi:10.2202/1557-4679.1356>. Building on the conditional density estimation utilities, non-parametric inverse probability weighted (IPW) estimators of the causal effects of additive modified treatment policies are implemented, using conditional density estimation to estimate the generalized propensity score. Non-parametric IPW estimators based on this can be coupled with undersmoothing of the generalized propensity score estimator to attain the semi-parametric efficiency bound (per Hejazi, DÃ az, and van der Laan <doi:10.48550/arXiv.2205.05777>).
Circular genomic permutation approach uses genome wide association studies (GWAS) results to establish the significance of pathway/gene-set associations whilst accounting for genomic structure(Cabrera et al (2012) <doi:10.1534/g3.112.002618>). All single nucleotide polymorphisms (SNPs) in the GWAS are placed in a circular genome according to their location. Then the complete set of SNP association p-values are permuted by rotation with respect to the SNPs genomic locations. Two testing frameworks are available: permutations at the gene level, and permutations at the SNP level. The permutation at the gene level uses Fisher's combination test to calculate a single gene p-value, followed by the hypergeometric test. The SNP count methodology maps each SNP to pathways/gene-sets and calculates the proportion of SNPs for the real and the permutated datasets above a pre-defined threshold. Genomicper requires a matrix of GWAS association p-values and SNPs annotation to genes. Pathways can be obtained from within the package or can be provided by the user.
This package provides tools for phase-type distributions including the following variants: continuous, discrete, multivariate, in-homogeneous, right-censored, and regression. Methods for functional evaluation, simulation and estimation using the expectation-maximization (EM) algorithm are provided for all models. The methods of this package are based on the following references. Asmussen, S., Nerman, O., & Olsson, M. (1996). Fitting phase-type distributions via the EM algorithm, Olsson, M. (1996). Estimation of phase-type distributions from censored data, Albrecher, H., & Bladt, M. (2019) <doi:10.1017/jpr.2019.60>, Albrecher, H., Bladt, M., & Yslas, J. (2022) <doi:10.1111/sjos.12505>, Albrecher, H., Bladt, M., Bladt, M., & Yslas, J. (2022) <doi:10.1016/j.insmatheco.2022.08.001>, Bladt, M., & Yslas, J. (2022) <doi:10.1080/03461238.2022.2097019>, Bladt, M. (2022) <doi:10.1017/asb.2021.40>, Bladt, M. (2023) <doi:10.1080/10920277.2023.2167833>, Albrecher, H., Bladt, M., & Mueller, A. (2023) <doi:10.1515/demo-2022-0153>, Bladt, M. & Yslas, J. (2023) <doi:10.1016/j.insmatheco.2023.02.008>.
Ranked Set Sampling (RSS) is a stratified sampling method known for its efficiency compared to Simple Random Sampling (SRS). When sample allocation is equal across strata, it is referred to as balanced RSS (BRSS) whereas unequal allocation is called unbalanced RSS (URSS), which is particularly effective for asymmetric or skewed distributions. This package offers practical statistical tools and sampling methods for both BRSS and URSS, emphasizing flexible sampling designs and inference for population means, medians, proportions, and Area Under the Curve (AUC). It incorporates parametric and nonparametric tests, including empirical likelihood ratio (LR) methods. The package provides ranked set sampling methods from a given population, including sampling with imperfect ranking using auxiliary variables. Furthermore, it provides tools for efficient sample allocation in URSS, ensuring greater efficiency than SRS and BRSS. For more details, refer e.g. to Chen et al. (2003) <doi:10.1007/978-0-387-21664-5>, Ahn et al. (2022) <doi:10.1007/978-3-031-14525-4_3>, and Ahn et al. (2024) <doi:10.1111/insr.12589>.
Given independent and identically distributed observations X(1), ..., X(n), compute the maximum likelihood estimator (MLE) of a density as well as a smoothed version of it under the assumption that the density is log-concave, see Rufibach (2007) and Duembgen and Rufibach (2009). The main function of the package is logConDens that allows computation of the log-concave MLE and its smoothed version. In addition, we provide functions to compute (1) the value of the density and distribution function estimates (MLE and smoothed) at a given point (2) the characterizing functions of the estimator, (3) to sample from the estimated distribution, (5) to compute a two-sample permutation test based on log-concave densities, (6) the ROC curve based on log-concave estimates within cases and controls, including confidence intervals for given values of false positive fractions (7) computation of a confidence interval for the value of the true density at a fixed point. Finally, three datasets that have been used to illustrate log-concave density estimation are made available.
The base apply function and its variants, as well as the related functions in the plyr package, typically apply user-defined functions to a single argument (or a list of vectorized arguments in the case of mapply). The multiApply package extends this paradigm with its only function, Apply, which efficiently applies functions taking one or a list of multiple unidimensional or multidimensional arrays (or combinations thereof) as input. The input arrays can have different numbers of dimensions as well as different dimension lengths, and the applied function can return one or a list of unidimensional or multidimensional arrays as output. This saves development time by preventing the R user from writing often error-prone and memory-inefficient loops dealing with multiple complex arrays. Also, a remarkable feature of Apply is the transparent use of multi-core through its parameter ncores'. In contrast to the base apply function, this package suggests the use of target dimensions as opposite to the margins for specifying the dimensions relevant to the function to be applied.
Bundles a number of established statistical methods to facilitate the visual interpretation of large datasets in sedimentary geology. Includes functionality for adaptive kernel density estimation, principal component analysis, correspondence analysis, multidimensional scaling, generalised procrustes analysis and individual differences scaling using a variety of dissimilarity measures. Univariate provenance proxies, such as single-grain ages or (isotopic) compositions are compared with the Kolmogorov-Smirnov, Kuiper, Wasserstein-2 or Sircombe-Hazelton L2 distances. Categorical provenance proxies such as chemical compositions are compared with the Aitchison and Bray-Curtis distances,and count data with the chi-square distance. Varietal data can either be converted to one or more distributional datasets, or directly compared using the multivariate Wasserstein distance. Also included are tools to plot compositional and count data on ternary diagrams and point-counting data on radial plots, to calculate the sample size required for specified levels of statistical precision, and to assess the effects of hydraulic sorting on detrital compositions. Includes an intuitive query-based user interface for users who are not proficient in R.