There are some experimental scenarios where each experimental unit receives a sequence of treatments across multiple periods, and treatment effects persist beyond the period of application. It focuses on the construction and calculation of the parametric value of the residual effect designs balanced for carryover effects, also referred to as crossover designs, change-over designs, or repeated measurements designs (Aggarwal and Jha, 2010<doi:10.1080/15598608.2010.10412013>). The primary objective of the package is to generate a new class of Balanced Ternary Residual Effect Designs (BTREDs), balanced for carryover effects tailored explicitly for situations where the number of periods is less than or equal to the number of treatments. In addition, the package provides four new classes of Partially Balanced Ternary Residual Effect Designs (PBTREDs), constructed using incomplete block designs, initial sequences, and rectangular association scheme. In addition, one extra function is included to help study the parametric properties of a given residual effect design.
Hierarchical and partitioning algorithms to cluster blocks of variables. The partitioning algorithm includes an option called noise cluster to set aside atypical blocks of variables. Different thresholds per cluster can be sets. The CLUSTATIS method (for quantitative blocks) (Llobell, Cariou, Vigneau, Labenne & Qannari (2020) <doi:10.1016/j.foodqual.2018.05.013>, Llobell, Vigneau & Qannari (2019) <doi:10.1016/j.foodqual.2019.02.017>) and the CLUSCATA method (for Check-All-That-Apply data) (Llobell, Cariou, Vigneau, Labenne & Qannari (2019) <doi:10.1016/j.foodqual.2018.09.006>, Llobell, Giacalone, Labenne & Qannari (2019) <doi:10.1016/j.foodqual.2019.05.017>) are the core of this package. The CATATIS methods allows to compute some indices and tests to control the quality of CATA data. Multivariate analysis and clustering of subjects for quantitative multiblock data, CATA, RATA, Free Sorting and JAR experiments are available. Clustering of rows in multi-block context (notably with ClusMB
strategy) is also included.
This package provides functions to work with directed (asymmetric) and undirected (symmetric) spatial networks. It makes the creation of connectivity matrices easier, i.e. a binary matrix of dimension n x n, where n is the number of nodes (sampling units) indicating the presence (1) or the absence (0) of an edge (link) between pairs of nodes. Different network objects can be produced by chessboard': node list, neighbor list, edge list, connectivity matrix. It can also produce objects that will be used later in Moran's Eigenvector Maps (Dray et al. (2006) <doi:10.1016/j.ecolmodel.2006.02.015>) and Asymetric Eigenvector Maps (Blanchet et al. (2008) <doi:10.1016/j.ecolmodel.2008.04.001>), methods available in the package adespatial (Dray et al. (2023) <https://CRAN.R-project.org/package=adespatial>). This work is part of the FRB-CESAB working group Bridge <https://www.fondationbiodiversite.fr/en/the-frb-in-action/programs-and-projects/le-cesab/bridge/>.
HiCcompare
provides functions for joint normalization and difference detection in multiple Hi-C datasets. HiCcompare
operates on processed Hi-C data in the form of chromosome-specific chromatin interaction matrices. It accepts three-column tab-separated text files storing chromatin interaction matrices in a sparse matrix format which are available from several sources. HiCcompare
is designed to give the user the ability to perform a comparative analysis on the 3-Dimensional structure of the genomes of cells in different biological states.`HiCcompare`
differs from other packages that attempt to compare Hi-C data in that it works on processed data in chromatin interaction matrix format instead of pre-processed sequencing data. In addition, `HiCcompare`
provides a non-parametric method for the joint normalization and removal of biases between two Hi-C datasets for the purpose of comparative analysis. `HiCcompare`
also provides a simple yet robust method for detecting differences between Hi-C datasets.
Clustering is carried out to identify patterns in transcriptomics profiles to determine clinically relevant subgroups of patients. Feature (gene) selection is a critical and an integral part of the process. Currently, there are many feature selection and clustering methods to identify the relevant genes and perform clustering of samples. However, choosing an appropriate methodology is difficult. In addition, extensive feature selection methods have not been supported by the available packages. Hence, we developed an integrative R-package called multiClust
that allows researchers to experiment with the choice of combination of methods for gene selection and clustering with ease. Using multiClust
, we identified the best performing clustering methodology in the context of clinical outcome. Our observations demonstrate that simple methods such as variance-based ranking perform well on the majority of data sets, provided that the appropriate number of genes is selected. However, different gene ranking and selection methods remain relevant as no methodology works for all studies.
Addressing crucial research questions often necessitates a small sample size due to factors such as distinctive target populations, rarity of the event under study, time and cost constraints, ethical concerns, or group-level unit of analysis. Many readily available analytic methods, however, do not accommodate small sample sizes, and the choice of the best method can be unclear. The npboottprm package enables the execution of nonparametric bootstrap tests with pooled resampling to help fill this gap. Grounded in the statistical methods for small sample size studies detailed in Dwivedi, Mallawaarachchi, and Alvarado (2017) <doi:10.1002/sim.7263>, the package facilitates a range of statistical tests, encompassing independent t-tests, paired t-tests, and one-way Analysis of Variance (ANOVA) F-tests. The nonparboot()
function undertakes essential computations, yielding detailed outputs which include test statistics, effect sizes, confidence intervals, and bootstrap distributions. Further, npboottprm incorporates an interactive shiny web application, nonparboot_app()
, offering intuitive, user-friendly data exploration.
Under- and over-dispersed binary data are modeled using an extended Poisson process model (EPPM) appropriate for binary data. A feature of the model is that the under-dispersion relative to the binomial distribution only needs to be greater than zero, but the over-dispersion is restricted compared to other distributional models such as the beta and correlated binomials. Because of this, the examples focus on under-dispersed data and how, in combination with the beta or correlated distributions, flexible models can be fitted to data displaying both under- and over-dispersion. Using Generalized Linear Model (GLM) terminology, the functions utilize linear predictors for the probability of success and scale-factor with various link functions for p, and log link for scale-factor, to fit a variety of models relevant to areas such as bioassay. Details of the EPPM are in Faddy and Smith (2012) <doi:10.1002/bimj.201100214> and Smith and Faddy (2019) <doi:10.18637/jss.v090.i08>.
Compare functional enrichment between two experimentally-derived groups of genes or proteins (Peterson, DR., et al.(2018)) <doi: 10.1371/journal.pone.0198139>. Given a list of gene symbols, diffEnrich
will perform differential enrichment analysis using the Kyoto Encyclopedia of Genes and Genomes (KEGG) REST API. This package provides a number of functions that are intended to be used in a pipeline. Briefly, the user provides a KEGG formatted species id for either human, mouse or rat, and the package will download and clean species specific ENTREZ gene IDs and map them to their respective KEGG pathways by accessing KEGG's REST API. KEGG's API is used to guarantee the most up-to-date pathway data from KEGG. Next, the user will identify significantly enriched pathways from two gene sets, and finally, the user will identify pathways that are differentially enriched between the two gene sets. In addition to the analysis pipeline, this package also provides a plotting function.
This package contains all the formulae of the growth and trace element uptake model described in the equally-named Geoscientific Model Development paper (de Winter, 2017, <doi:10.5194/gmd-2017-137>). The model takes as input a file with X- and Y-coordinates of digitized growth increments recognized on a longitudinal cross section through the bivalve shell, as well as a BMP file of an elemental map of the cross section surface with chemically distinct phases separated by phase analysis. It proceeds by a step-by-step process described in the paper, by which digitized growth increments are used to calculate changes in shell height, shell thickness, shell volume, shell mass and shell growth rate through the bivalve's life time. Then, results of this growth modelling are combined with the trace element mapping results to trace the incorporation of trace elements into the bivalve shell. Results of various modelling parameters can be exported in the form of XLSX files.
Power of non-parametric Mann-Kendall test and Spearmanâ s Rho test is highly influenced by serially correlated data. To address this issue, trend tests may be applied on the modified versions of the time series data by Block Bootstrapping (BBS), Prewhitening (PW) , Trend Free Prewhitening (TFPW), Bias Corrected Prewhitening and Variance Correction Approach by calculating effective sample size. Mann, H. B. (1945).<doi:10.1017/CBO9781107415324.004>. Kendall, M. (1975). Multivariate analysis. Charles Griffin&Company Ltd,. sen, P. K. (1968).<doi:10.2307/2285891>. à nöz, B., & Bayazit, M. (2012) <doi:10.1002/hyp.8438>. Hamed, K. H. (2009).<doi:10.1016/j.jhydrol.2009.01.040>. Yue, S., & Wang, C. Y. (2002) <doi:10.1029/2001WR000861>. Yue, S., Pilon, P., Phinney, B., & Cavadias, G. (2002) <doi:10.1002/hyp.1095>. Hamed, K. H., & Ramachandra Rao, A. (1998) <doi:10.1016/S0022-1694(97)00125-X>. Yue, S., & Wang, C. Y. (2004) <doi:10.1023/B:WARM.0000043140.61082.60>.
An application to calculate a patient's pretest probability (PTP) for obstructive Coronary Artery Disease (CAD) from a collection of guidelines or studies. Guidelines usually comes from the American Heart Association (AHA), American College of Cardiology (ACC) or European Society of Cardiology (ESC). Examples of PTP scores that comes from studies are the 2020 Winther et al. basic, Risk Factor-weighted Clinical Likelihood (RF-CL) and Coronary Artery Calcium Score-weighted Clinical Likelihood (CACS-CL) models <doi:10.1016/j.jacc.2020.09.585>, 2019 Reeh et al. basic and clinical models <doi:10.1093/eurheartj/ehy806> and 2017 Fordyce et al. PROMISE Minimal-Risk Tool <doi:10.1001/jamacardio.2016.5501>. As diagnosis of CAD involves a costly and invasive coronary angiography procedure for patients, having a reliable PTP for CAD helps doctors to make better decisions during patient management. This ensures high risk patients can be diagnosed and treated early for CAD while avoiding unnecessary testing for low risk patients.
preciseTAD
provides functions to predict the location of boundaries of topologically associated domains (TADs) and chromatin loops at base-level resolution. As an input, it takes BED-formatted genomic coordinates of domain boundaries detected from low-resolution Hi-C data, and coordinates of high-resolution genomic annotations from ENCODE or other consortia. preciseTAD
employs several feature engineering strategies and resampling techniques to address class imbalance, and trains an optimized random forest model for predicting low-resolution domain boundaries. Translated on a base-level, preciseTAD
predicts the probability for each base to be a boundary. Density-based clustering and scalable partitioning techniques are used to detect precise boundary regions and summit points. Compared with low-resolution boundaries, preciseTAD
boundaries are highly enriched for CTCF, RAD21, SMC3, and ZNF143 signal and more conserved across cell lines. The pre-trained model can accurately predict boundaries in another cell line using CTCF, RAD21, SMC3, and ZNF143 annotation data for this cell line.
Implementations of the k-means, hierarchical agglomerative and DBSCAN clustering methods for functional data which allows for jointly aligning and clustering curves. It supports functional data defined on one-dimensional domains but possibly evaluating in multivariate codomains. It supports functional data defined in arrays but also via the fd and funData
classes for functional data defined in the fda and funData
packages respectively. It currently supports shift, dilation and affine warping functions for functional data defined on the real line and uses the SRVF framework to handle boundary-preserving warping for functional data defined on a specific interval. Main reference for the k-means algorithm: Sangalli L.M., Secchi P., Vantini S., Vitelli V. (2010) "k-mean alignment for curve clustering" <doi:10.1016/j.csda.2009.12.008>. Main reference for the SRVF framework: Tucker, J. D., Wu, W., & Srivastava, A. (2013) "Generative models for functional data using phase and amplitude separation" <doi:10.1016/j.csda.2012.12.001>.
This package implements several methods to meta-analyze studies that report the sample median of the outcome. The methods described by McGrath
et al. (2019) <doi:10.1002/sim.8013>, Ozturk and Balakrishnan (2020) <doi:10.1002/sim.8738>, and McGrath
et al. (2020a) <doi:10.1002/bimj.201900036> can be applied to directly meta-analyze the median or difference of medians between groups. Additionally, a number of methods (e.g., McGrath
et al. (2020b) <doi:10.1177/0962280219889080>, Cai et al. (2021) <doi:10.1177/09622802211047348>, and McGrath
et al. (2023) <doi:10.1177/09622802221139233>) are implemented to estimate study-specific (difference of) means and their standard errors in order to estimate the pooled (difference of) means. Methods for meta-analyzing median survival times (McGrath
et al. (2025) <doi:10.48550/arXiv.2503.03065>
) are also implemented. See McGrath
et al. (2024) <doi:10.1002/jrsm.1686> for a detailed guide on using the package.
The goal of checkpoint is to solve the problem of package reproducibility in R. Specifically, checkpoint allows you to install packages as they existed on CRAN on a specific snapshot date as if you had a CRAN time machine. To achieve reproducibility, the checkpoint()
function installs the packages required or called by your project and scripts to a local library exactly as they existed at the specified point in time. Only those packages are available to your project, thereby avoiding any package updates that came later and may have altered your results. In this way, anyone using checkpoint's checkpoint()
can ensure the reproducibility of your scripts or projects at any time. To create the snapshot archives, once a day (at midnight UTC) Microsoft refreshes the Austria CRAN mirror on the "Microsoft R Archived Network" server (<https://mran.microsoft.com/>). Immediately after completion of the rsync mirror process, the process takes a snapshot, thus creating the archive. Snapshot archives exist starting from 2014-09-17.
Circular genomic permutation approach uses genome wide association studies (GWAS) results to establish the significance of pathway/gene-set associations whilst accounting for genomic structure(Cabrera et al (2012) <doi:10.1534/g3.112.002618>). All single nucleotide polymorphisms (SNPs) in the GWAS are placed in a circular genome according to their location. Then the complete set of SNP association p-values are permuted by rotation with respect to the SNPs genomic locations. Two testing frameworks are available: permutations at the gene level, and permutations at the SNP level. The permutation at the gene level uses Fisher's combination test to calculate a single gene p-value, followed by the hypergeometric test. The SNP count methodology maps each SNP to pathways/gene-sets and calculates the proportion of SNPs for the real and the permutated datasets above a pre-defined threshold. Genomicper requires a matrix of GWAS association p-values and SNPs annotation to genes. Pathways can be obtained from within the package or can be provided by the user.
This package provides tools for phase-type distributions including the following variants: continuous, discrete, multivariate, in-homogeneous, right-censored, and regression. Methods for functional evaluation, simulation and estimation using the expectation-maximization (EM) algorithm are provided for all models. The methods of this package are based on the following references. Asmussen, S., Nerman, O., & Olsson, M. (1996). Fitting phase-type distributions via the EM algorithm, Olsson, M. (1996). Estimation of phase-type distributions from censored data, Albrecher, H., & Bladt, M. (2019) <doi:10.1017/jpr.2019.60>, Albrecher, H., Bladt, M., & Yslas, J. (2022) <doi:10.1111/sjos.12505>, Albrecher, H., Bladt, M., Bladt, M., & Yslas, J. (2022) <doi:10.1016/j.insmatheco.2022.08.001>, Bladt, M., & Yslas, J. (2022) <doi:10.1080/03461238.2022.2097019>, Bladt, M. (2022) <doi:10.1017/asb.2021.40>, Bladt, M. (2023) <doi:10.1080/10920277.2023.2167833>, Albrecher, H., Bladt, M., & Mueller, A. (2023) <doi:10.1515/demo-2022-0153>, Bladt, M. & Yslas, J. (2023) <doi:10.1016/j.insmatheco.2023.02.008>.
Ranked Set Sampling (RSS) is a stratified sampling method known for its efficiency compared to Simple Random Sampling (SRS). When sample allocation is equal across strata, it is referred to as balanced RSS (BRSS) whereas unequal allocation is called unbalanced RSS (URSS), which is particularly effective for asymmetric or skewed distributions. This package offers practical statistical tools and sampling methods for both BRSS and URSS, emphasizing flexible sampling designs and inference for population means, medians, proportions, and Area Under the Curve (AUC). It incorporates parametric and nonparametric tests, including empirical likelihood ratio (LR) methods. The package provides ranked set sampling methods from a given population, including sampling with imperfect ranking using auxiliary variables. Furthermore, it provides tools for efficient sample allocation in URSS, ensuring greater efficiency than SRS and BRSS. For more details, refer e.g. to Chen et al. (2003) <doi:10.1007/978-0-387-21664-5>, Ahn et al. (2022) <doi:10.1007/978-3-031-14525-4_3>, and Ahn et al. (2024) <doi:10.1111/insr.12589>.
Given independent and identically distributed observations X(1), ..., X(n), compute the maximum likelihood estimator (MLE) of a density as well as a smoothed version of it under the assumption that the density is log-concave, see Rufibach (2007) and Duembgen and Rufibach (2009). The main function of the package is logConDens
that allows computation of the log-concave MLE and its smoothed version. In addition, we provide functions to compute (1) the value of the density and distribution function estimates (MLE and smoothed) at a given point (2) the characterizing functions of the estimator, (3) to sample from the estimated distribution, (5) to compute a two-sample permutation test based on log-concave densities, (6) the ROC curve based on log-concave estimates within cases and controls, including confidence intervals for given values of false positive fractions (7) computation of a confidence interval for the value of the true density at a fixed point. Finally, three datasets that have been used to illustrate log-concave density estimation are made available.
The base apply function and its variants, as well as the related functions in the plyr package, typically apply user-defined functions to a single argument (or a list of vectorized arguments in the case of mapply). The multiApply
package extends this paradigm with its only function, Apply, which efficiently applies functions taking one or a list of multiple unidimensional or multidimensional arrays (or combinations thereof) as input. The input arrays can have different numbers of dimensions as well as different dimension lengths, and the applied function can return one or a list of unidimensional or multidimensional arrays as output. This saves development time by preventing the R user from writing often error-prone and memory-inefficient loops dealing with multiple complex arrays. Also, a remarkable feature of Apply is the transparent use of multi-core through its parameter ncores'. In contrast to the base apply function, this package suggests the use of target dimensions as opposite to the margins for specifying the dimensions relevant to the function to be applied.
Bundles a number of established statistical methods to facilitate the visual interpretation of large datasets in sedimentary geology. Includes functionality for adaptive kernel density estimation, principal component analysis, correspondence analysis, multidimensional scaling, generalised procrustes analysis and individual differences scaling using a variety of dissimilarity measures. Univariate provenance proxies, such as single-grain ages or (isotopic) compositions are compared with the Kolmogorov-Smirnov, Kuiper, Wasserstein-2 or Sircombe-Hazelton L2 distances. Categorical provenance proxies such as chemical compositions are compared with the Aitchison and Bray-Curtis distances,and count data with the chi-square distance. Varietal data can either be converted to one or more distributional datasets, or directly compared using the multivariate Wasserstein distance. Also included are tools to plot compositional and count data on ternary diagrams and point-counting data on radial plots, to calculate the sample size required for specified levels of statistical precision, and to assess the effects of hydraulic sorting on detrital compositions. Includes an intuitive query-based user interface for users who are not proficient in R.
Facilitates the import and analysis of SNP (single nucleotide polymorphism') and silicodart (presence/absence) data. The main focus is on data generated by DarT
(Diversity Arrays Technology), however, data from other sequencing platforms can be used once SNP or related fragment presence/absence data from any source is imported. Genetic datasets are stored in a derived genlight format (package adegenet'), that allows for a very compact storage of data and metadata. Functions are available for importing and exporting of SNP and silicodart data, for reporting on and filtering on various criteria (e.g. callrate', heterozygosity', reproducibility', maximum allele frequency). Additional functions are available for visualization (e.g. Principle Coordinate Analysis) and creating a spatial representation using maps. dartR.base
is the base package of the dartRverse
suits of packages. To install the other packages, we recommend to install the dartRverse
package, that supports the installation of all packages in the dartRverse
'. If you want to cite dartR
', you find the information by typing citation('dartR.base
') in the console.
Recent gcc and clang compiler versions provide functionality to test for memory violations and other undefined behaviour; this is often referred to as "Address Sanitizer" (or ASAN') and "Undefined Behaviour Sanitizer" ('UBSAN'). The Writing R Extension manual describes this in some detail in Section 4.3 title "Checking Memory Access". . This feature has to be enabled in the corresponding binary, eg in R, which is somewhat involved as it also required a current compiler toolchain which is not yet widely available, or in the case of Windows, not available at all (via the common Rtools mechanism). . As an alternative, pre-built Docker containers such as the Rocker container r-devel-san or the multi-purpose container r-debug can be used. . This package then provides a means of testing the compiler setup as the known code failures provides in the sample code here should be detected correctly, whereas a default build of R will let the package pass. . The code samples are based on the examples from the Address Sanitizer Wiki at <https://github.com/google/sanitizers/wiki>.
This package provides methods for analyzing (cell) motion in two or three dimensions. Available measures include displacement, confinement ratio, autocorrelation, straightness, turning angle, and fractal dimension. Measures can be applied to entire tracks, steps, or subtracks with varying length. While the methodology has been developed for cell trajectory analysis, it is applicable to anything that moves including animals, people, or vehicles. Some of the methodology implemented in this packages was described by: Beauchemin, Dixit, and Perelson (2007) <doi:10.4049/jimmunol.178.9.5505>, Beltman, Maree, and de Boer (2009) <doi:10.1038/nri2638>, Gneiting and Schlather (2004) <doi:10.1137/S0036144501394387>, Mokhtari, Mech, Zitzmann, Hasenberg, Gunzer, and Figge (2013) <doi:10.1371/journal.pone.0080808>, Moreau, Lemaitre, Terriac, Azar, Piel, Lennon-Dumenil, and Bousso (2012) <doi:10.1016/j.immuni.2012.05.014>, Textor, Peixoto, Henrickson, Sinn, von Andrian, and Westermann (2011) <doi:10.1073/pnas.1102288108>, Textor, Sinn, and de Boer (2013) <doi:10.1186/1471-2105-14-S6-S10>, Textor, Henrickson, Mandl, von Andrian, Westermann, de Boer, and Beltman (2014) <doi:10.1371/journal.pcbi.1003752>.