Microarray probe ID is not convenient for further enrichment analysis and target gene selection. The package is created for the rice microarray probe ID conversion. This package can convert microarray probe ID from GPL6864 <https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL6864>, GPL8852 <https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL8852>, and GPL2025 <https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL2025> platforms to RAP-DB ID. RAP-DB "The Rice Annotation Project Database" <https://rapdb.dna.affrc.go.jp> is a well-known database for rice Oryza sativa, and the gene ID in this database is widely used in many areas related to rice research. For multiple probes representing a single gene, This package can merge them by taking the mean, max, or min value of these probes. Or we can keep multiple probes by appending sequence numbers to duplicate the RAP-DB ID.
Small set of functions to fast computation of some matrices and operations useful in statistics and econometrics. Currently, there are functions for efficient computation of duplication, commutation and symmetrizer matrices with minimal storage requirements. Some commonly used matrix decompositions (LU and LDL), basic matrix operations (for instance, Hadamard, Kronecker products and the Sherman-Morrison formula) and iterative solvers for linear systems are also available. In addition, the package includes a number of common statistical procedures such as the sweep operator, weighted mean and covariance matrix using an online algorithm, linear regression (using Cholesky, QR, SVD, sweep operator and conjugate gradients methods), ridge regression (with optimal selection of the ridge parameter considering several procedures), omnibus tests for univariate normality, functions to compute the multivariate skewness, kurtosis, the Mahalanobis distance (checking the positive defineteness), and the Wilson-Hilferty transformation of gamma variables. Furthermore, the package provides interfaces to C code callable by another C code from other R packages.
When the number of treatments is large with limited experimental resources then Row-Column(RC) designs with multiple units per cell can be used. These designs are called Generalized Row-Column (GRC) designs and are defined as designs with v treatments in p rows and q columns such that the intersection of each row and column (cell) consists of k experimental units. For example (Bailey & Monod (2001)<doi:10.1111/1467-9469.00235>), to conduct an experiment for comparing 4 treatments using 4 plants with leaves at 2 different heights row-column design with two units per cell can be used. A GRC design is said to be structurally complete if corresponding to the intersection of each row and column, there appears at least two treatments. A GRC design is said to be structurally incomplete if corresponding to the intersection of any row and column, there is at least one cell which does not contain any treatment.
This package provides a collection of reweighted marginal hypothesis tests for clustered data, based on reweighting methods of Williamson, J., Datta, S., and Satten, G. (2003) <doi:10.1111/1541-0420.00005>. The tests in this collection are clustered analogs to well-known hypothesis tests in the classical setting, and are appropriate for data with cluster- and/or group-size informativeness. The syntax and output of functions are modeled after common, recognizable functions native to R. Methods used in the package refer to Gregg, M., Datta, S., and Lorenz, D. (2020) <doi:10.1177/0962280220928572>, Nevalainen, J., Oja, H., and Datta, S. (2017) <doi:10.1002/sim.7288> Dutta, S. and Datta, S. (2015) <doi:10.1111/biom.12447>, Lorenz, D., Datta, S., and Harkema, S. (2011) <doi:10.1002/sim.4368>, Datta, S. and Satten, G. (2008) <doi:10.1111/j.1541-0420.2007.00923.x>, Datta, S. and Satten, G. (2005) <doi:10.1198/016214504000001583>.
Objective: Implement new methods for detecting change points in high-dimensional time series data. These new methods can be applied to non-Gaussian data, account for spatial and temporal dependence, and detect a wide variety of change-point configurations, including changes near the boundary and changes in close proximity. Additionally, this package helps address the â small n, large pâ problem, which occurs in many research contexts. This problem arises when a dataset contains changes that are visually evident but do not rise to the level of statistical significance due to the small number of observations and large number of parameters. The problem is overcome by treating the dimensions as a whole and scaling the test statistics only by its standard deviation, rather than scaling each dimension individually. Due to the computational complexity of the functions, the package runs best on datasets with a relatively large number of attributes but no more than a few hundred observations.
The goal of this package is to cover the most common steps in Loss Given Default (LGD) rating model development. The main procedures available are those that refer to bivariate and multivariate analysis. In particular two statistical methods for multivariate analysis are currently implemented â OLS regression and fractional logistic regression. Both methods are also available within different blockwise model designs and both have customized stepwise algorithms. Descriptions of these customized designs are available in Siddiqi (2016) <doi:10.1002/9781119282396.ch10> and Anderson, R.A. (2021) <doi:10.1093/oso/9780192844194.001.0001>. Although they are explained for PD model, the same designs are applicable for LGD model with different underlying regression methods (OLS and fractional logistic regression). To cover other important steps for LGD model development, it is recommended to use LGDtoolkit package along with PDtoolkit', and monobin (or monobinShiny
') packages. Additionally, LGDtoolkit provides set of procedures handy for initial and periodical model validation.
There are some experimental scenarios where each experimental unit receives a sequence of treatments across multiple periods, and treatment effects persist beyond the period of application. It focuses on the construction and calculation of the parametric value of the residual effect designs balanced for carryover effects, also referred to as crossover designs, change-over designs, or repeated measurements designs (Aggarwal and Jha, 2010<doi:10.1080/15598608.2010.10412013>). The primary objective of the package is to generate a new class of Balanced Ternary Residual Effect Designs (BTREDs), balanced for carryover effects tailored explicitly for situations where the number of periods is less than or equal to the number of treatments. In addition, the package provides four new classes of Partially Balanced Ternary Residual Effect Designs (PBTREDs), constructed using incomplete block designs, initial sequences, and rectangular association scheme. In addition, one extra function is included to help study the parametric properties of a given residual effect design.
This package provides functions to work with directed (asymmetric) and undirected (symmetric) spatial networks. It makes the creation of connectivity matrices easier, i.e. a binary matrix of dimension n x n, where n is the number of nodes (sampling units) indicating the presence (1) or the absence (0) of an edge (link) between pairs of nodes. Different network objects can be produced by chessboard': node list, neighbor list, edge list, connectivity matrix. It can also produce objects that will be used later in Moran's Eigenvector Maps (Dray et al. (2006) <doi:10.1016/j.ecolmodel.2006.02.015>) and Asymetric Eigenvector Maps (Blanchet et al. (2008) <doi:10.1016/j.ecolmodel.2008.04.001>), methods available in the package adespatial (Dray et al. (2023) <https://CRAN.R-project.org/package=adespatial>). This work is part of the FRB-CESAB working group Bridge <https://www.fondationbiodiversite.fr/en/the-frb-in-action/programs-and-projects/le-cesab/bridge/>.
HiCcompare
provides functions for joint normalization and difference detection in multiple Hi-C datasets. HiCcompare
operates on processed Hi-C data in the form of chromosome-specific chromatin interaction matrices. It accepts three-column tab-separated text files storing chromatin interaction matrices in a sparse matrix format which are available from several sources. HiCcompare
is designed to give the user the ability to perform a comparative analysis on the 3-Dimensional structure of the genomes of cells in different biological states.`HiCcompare`
differs from other packages that attempt to compare Hi-C data in that it works on processed data in chromatin interaction matrix format instead of pre-processed sequencing data. In addition, `HiCcompare`
provides a non-parametric method for the joint normalization and removal of biases between two Hi-C datasets for the purpose of comparative analysis. `HiCcompare`
also provides a simple yet robust method for detecting differences between Hi-C datasets.
Clustering is carried out to identify patterns in transcriptomics profiles to determine clinically relevant subgroups of patients. Feature (gene) selection is a critical and an integral part of the process. Currently, there are many feature selection and clustering methods to identify the relevant genes and perform clustering of samples. However, choosing an appropriate methodology is difficult. In addition, extensive feature selection methods have not been supported by the available packages. Hence, we developed an integrative R-package called multiClust
that allows researchers to experiment with the choice of combination of methods for gene selection and clustering with ease. Using multiClust
, we identified the best performing clustering methodology in the context of clinical outcome. Our observations demonstrate that simple methods such as variance-based ranking perform well on the majority of data sets, provided that the appropriate number of genes is selected. However, different gene ranking and selection methods remain relevant as no methodology works for all studies.
Addressing crucial research questions often necessitates a small sample size due to factors such as distinctive target populations, rarity of the event under study, time and cost constraints, ethical concerns, or group-level unit of analysis. Many readily available analytic methods, however, do not accommodate small sample sizes, and the choice of the best method can be unclear. The npboottprm package enables the execution of nonparametric bootstrap tests with pooled resampling to help fill this gap. Grounded in the statistical methods for small sample size studies detailed in Dwivedi, Mallawaarachchi, and Alvarado (2017) <doi:10.1002/sim.7263>, the package facilitates a range of statistical tests, encompassing independent t-tests, paired t-tests, and one-way Analysis of Variance (ANOVA) F-tests. The nonparboot()
function undertakes essential computations, yielding detailed outputs which include test statistics, effect sizes, confidence intervals, and bootstrap distributions. Further, npboottprm incorporates an interactive shiny web application, nonparboot_app()
, offering intuitive, user-friendly data exploration.
Under- and over-dispersed binary data are modeled using an extended Poisson process model (EPPM) appropriate for binary data. A feature of the model is that the under-dispersion relative to the binomial distribution only needs to be greater than zero, but the over-dispersion is restricted compared to other distributional models such as the beta and correlated binomials. Because of this, the examples focus on under-dispersed data and how, in combination with the beta or correlated distributions, flexible models can be fitted to data displaying both under- and over-dispersion. Using Generalized Linear Model (GLM) terminology, the functions utilize linear predictors for the probability of success and scale-factor with various link functions for p, and log link for scale-factor, to fit a variety of models relevant to areas such as bioassay. Details of the EPPM are in Faddy and Smith (2012) <doi:10.1002/bimj.201100214> and Smith and Faddy (2019) <doi:10.18637/jss.v090.i08>.
Compare functional enrichment between two experimentally-derived groups of genes or proteins (Peterson, DR., et al.(2018)) <doi: 10.1371/journal.pone.0198139>. Given a list of gene symbols, diffEnrich
will perform differential enrichment analysis using the Kyoto Encyclopedia of Genes and Genomes (KEGG) REST API. This package provides a number of functions that are intended to be used in a pipeline. Briefly, the user provides a KEGG formatted species id for either human, mouse or rat, and the package will download and clean species specific ENTREZ gene IDs and map them to their respective KEGG pathways by accessing KEGG's REST API. KEGG's API is used to guarantee the most up-to-date pathway data from KEGG. Next, the user will identify significantly enriched pathways from two gene sets, and finally, the user will identify pathways that are differentially enriched between the two gene sets. In addition to the analysis pipeline, this package also provides a plotting function.
This package contains all the formulae of the growth and trace element uptake model described in the equally-named Geoscientific Model Development paper (de Winter, 2017, <doi:10.5194/gmd-2017-137>). The model takes as input a file with X- and Y-coordinates of digitized growth increments recognized on a longitudinal cross section through the bivalve shell, as well as a BMP file of an elemental map of the cross section surface with chemically distinct phases separated by phase analysis. It proceeds by a step-by-step process described in the paper, by which digitized growth increments are used to calculate changes in shell height, shell thickness, shell volume, shell mass and shell growth rate through the bivalve's life time. Then, results of this growth modelling are combined with the trace element mapping results to trace the incorporation of trace elements into the bivalve shell. Results of various modelling parameters can be exported in the form of XLSX files.
Power of non-parametric Mann-Kendall test and Spearmanâ s Rho test is highly influenced by serially correlated data. To address this issue, trend tests may be applied on the modified versions of the time series data by Block Bootstrapping (BBS), Prewhitening (PW) , Trend Free Prewhitening (TFPW), Bias Corrected Prewhitening and Variance Correction Approach by calculating effective sample size. Mann, H. B. (1945).<doi:10.1017/CBO9781107415324.004>. Kendall, M. (1975). Multivariate analysis. Charles Griffin&Company Ltd,. sen, P. K. (1968).<doi:10.2307/2285891>. à nöz, B., & Bayazit, M. (2012) <doi:10.1002/hyp.8438>. Hamed, K. H. (2009).<doi:10.1016/j.jhydrol.2009.01.040>. Yue, S., & Wang, C. Y. (2002) <doi:10.1029/2001WR000861>. Yue, S., Pilon, P., Phinney, B., & Cavadias, G. (2002) <doi:10.1002/hyp.1095>. Hamed, K. H., & Ramachandra Rao, A. (1998) <doi:10.1016/S0022-1694(97)00125-X>. Yue, S., & Wang, C. Y. (2004) <doi:10.1023/B:WARM.0000043140.61082.60>.
preciseTAD
provides functions to predict the location of boundaries of topologically associated domains (TADs) and chromatin loops at base-level resolution. As an input, it takes BED-formatted genomic coordinates of domain boundaries detected from low-resolution Hi-C data, and coordinates of high-resolution genomic annotations from ENCODE or other consortia. preciseTAD
employs several feature engineering strategies and resampling techniques to address class imbalance, and trains an optimized random forest model for predicting low-resolution domain boundaries. Translated on a base-level, preciseTAD
predicts the probability for each base to be a boundary. Density-based clustering and scalable partitioning techniques are used to detect precise boundary regions and summit points. Compared with low-resolution boundaries, preciseTAD
boundaries are highly enriched for CTCF, RAD21, SMC3, and ZNF143 signal and more conserved across cell lines. The pre-trained model can accurately predict boundaries in another cell line using CTCF, RAD21, SMC3, and ZNF143 annotation data for this cell line.
Implementations of the k-means, hierarchical agglomerative and DBSCAN clustering methods for functional data which allows for jointly aligning and clustering curves. It supports functional data defined on one-dimensional domains but possibly evaluating in multivariate codomains. It supports functional data defined in arrays but also via the fd and funData
classes for functional data defined in the fda and funData
packages respectively. It currently supports shift, dilation and affine warping functions for functional data defined on the real line and uses the SRVF framework to handle boundary-preserving warping for functional data defined on a specific interval. Main reference for the k-means algorithm: Sangalli L.M., Secchi P., Vantini S., Vitelli V. (2010) "k-mean alignment for curve clustering" <doi:10.1016/j.csda.2009.12.008>. Main reference for the SRVF framework: Tucker, J. D., Wu, W., & Srivastava, A. (2013) "Generative models for functional data using phase and amplitude separation" <doi:10.1016/j.csda.2012.12.001>.
This package implements several methods to meta-analyze studies that report the sample median of the outcome. The methods described by McGrath
et al. (2019) <doi:10.1002/sim.8013>, Ozturk and Balakrishnan (2020) <doi:10.1002/sim.8738>, and McGrath
et al. (2020a) <doi:10.1002/bimj.201900036> can be applied to directly meta-analyze the median or difference of medians between groups. Additionally, a number of methods (e.g., McGrath
et al. (2020b) <doi:10.1177/0962280219889080>, Cai et al. (2021) <doi:10.1177/09622802211047348>, and McGrath
et al. (2023) <doi:10.1177/09622802221139233>) are implemented to estimate study-specific (difference of) means and their standard errors in order to estimate the pooled (difference of) means. Methods for meta-analyzing median survival times (McGrath
et al. (2025) <doi:10.48550/arXiv.2503.03065>
) are also implemented. See McGrath
et al. (2024) <doi:10.1002/jrsm.1686> for a detailed guide on using the package.
The goal of checkpoint is to solve the problem of package reproducibility in R. Specifically, checkpoint allows you to install packages as they existed on CRAN on a specific snapshot date as if you had a CRAN time machine. To achieve reproducibility, the checkpoint()
function installs the packages required or called by your project and scripts to a local library exactly as they existed at the specified point in time. Only those packages are available to your project, thereby avoiding any package updates that came later and may have altered your results. In this way, anyone using checkpoint's checkpoint()
can ensure the reproducibility of your scripts or projects at any time. To create the snapshot archives, once a day (at midnight UTC) Microsoft refreshes the Austria CRAN mirror on the "Microsoft R Archived Network" server (<https://mran.microsoft.com/>). Immediately after completion of the rsync mirror process, the process takes a snapshot, thus creating the archive. Snapshot archives exist starting from 2014-09-17.
Circular genomic permutation approach uses genome wide association studies (GWAS) results to establish the significance of pathway/gene-set associations whilst accounting for genomic structure(Cabrera et al (2012) <doi:10.1534/g3.112.002618>). All single nucleotide polymorphisms (SNPs) in the GWAS are placed in a circular genome according to their location. Then the complete set of SNP association p-values are permuted by rotation with respect to the SNPs genomic locations. Two testing frameworks are available: permutations at the gene level, and permutations at the SNP level. The permutation at the gene level uses Fisher's combination test to calculate a single gene p-value, followed by the hypergeometric test. The SNP count methodology maps each SNP to pathways/gene-sets and calculates the proportion of SNPs for the real and the permutated datasets above a pre-defined threshold. Genomicper requires a matrix of GWAS association p-values and SNPs annotation to genes. Pathways can be obtained from within the package or can be provided by the user.
Mindat ('mindat.org') is one of the world's most widely used databases of mineral species and their distribution. Many scientists in mineralogy, geochemistry, petrology, and other Earth and planetary disciplines have been using the Mindat data. Still, an open data service and the machine interface have never been fully established. To meet the overwhelming data needs, the Mindat team has built an API (<https://api.mindat.org/schema/redoc/>) for data access.'OpenMindat
R package provides valuable functions to bridge the data highway, connecting users data requirements to the Mindat API server and assist with retrieval and initial processing to improve efficiency further and lower the barrier of data query and access to scientists. OpenMindat
provides friendly and extensible data retrieval functions, including the subjects of geomaterials (e.g., rocks, minerals, synonyms, variety, mixture, and commodity), localities, and the IMA (International Mineralogical Association)-approved mineral list. OpenMindat
R package will accelerate the process of data-intensive studies in mineral informatics and lead to more scientific discoveries.
On-target gene knockdown using siRNA
ideally results from binding fully complementary regions in mRNA
transcripts to induce cleavage. Off-target siRNA
gene knockdown can occur through several modes, one being a seed-mediated mechanism mimicking miRNA
gene regulation. Seed-mediated off-target effects occur when the ~8 nucleotides at the 5â end of the guide strand, called a seed region, bind the 3â untranslated regions of mRNA
, causing reduced translation. Experiments using siRNA
knockdown paired with RNA-seq can be used to detect siRNA
sequences with potential off-target effects driven by the seed region. SeedMatchR
provides tools for exploring and detecting potential seed-mediated off-target effects of siRNA
in RNA-seq experiments. SeedMatchR
is designed to extend current differential expression analysis tools, such as DESeq2', by annotating results with predicted seed matches. Using publicly available data, we demonstrate the ability of SeedMatchR
to detect cumulative changes in differential gene expression attributed to siRNA
seed regions.
This package provides tools for phase-type distributions including the following variants: continuous, discrete, multivariate, in-homogeneous, right-censored, and regression. Methods for functional evaluation, simulation and estimation using the expectation-maximization (EM) algorithm are provided for all models. The methods of this package are based on the following references. Asmussen, S., Nerman, O., & Olsson, M. (1996). Fitting phase-type distributions via the EM algorithm, Olsson, M. (1996). Estimation of phase-type distributions from censored data, Albrecher, H., & Bladt, M. (2019) <doi:10.1017/jpr.2019.60>, Albrecher, H., Bladt, M., & Yslas, J. (2022) <doi:10.1111/sjos.12505>, Albrecher, H., Bladt, M., Bladt, M., & Yslas, J. (2022) <doi:10.1016/j.insmatheco.2022.08.001>, Bladt, M., & Yslas, J. (2022) <doi:10.1080/03461238.2022.2097019>, Bladt, M. (2022) <doi:10.1017/asb.2021.40>, Bladt, M. (2023) <doi:10.1080/10920277.2023.2167833>, Albrecher, H., Bladt, M., & Mueller, A. (2023) <doi:10.1515/demo-2022-0153>, Bladt, M. & Yslas, J. (2023) <doi:10.1016/j.insmatheco.2023.02.008>.
Ranked Set Sampling (RSS) is a stratified sampling method known for its efficiency compared to Simple Random Sampling (SRS). When sample allocation is equal across strata, it is referred to as balanced RSS (BRSS) whereas unequal allocation is called unbalanced RSS (URSS), which is particularly effective for asymmetric or skewed distributions. This package offers practical statistical tools and sampling methods for both BRSS and URSS, emphasizing flexible sampling designs and inference for population means, medians, proportions, and Area Under the Curve (AUC). It incorporates parametric and nonparametric tests, including empirical likelihood ratio (LR) methods. The package provides ranked set sampling methods from a given population, including sampling with imperfect ranking using auxiliary variables. Furthermore, it provides tools for efficient sample allocation in URSS, ensuring greater efficiency than SRS and BRSS. For more details, refer e.g. to Chen et al. (2003) <doi:10.1007/978-0-387-21664-5>, Ahn et al. (2022) <doi:10.1007/978-3-031-14525-4_3>, and Ahn et al. (2024) <doi:10.1111/insr.12589>.