This package provides functions in this package fit a stratified Cox proportional hazards and a proportional subdistribution hazards model by extending Zhang et al., (2007) <doi: 10.1016/j.cmpb.2007.07.010> and Zhang et al., (2011) <doi: 10.1016/j.cmpb.2010.07.005> respectively to clustered right-censored data. The functions also provide the estimates of the cumulative baseline hazard along with their standard errors. Furthermore, the adjusted survival and cumulative incidence probabilities are also provided along with their standard errors. Finally, the estimate of cumulative incidence and survival probabilities given a vector of covariates along with their standard errors are also provided.
Reaction rate dynamics can be retrieved from metabolite concentration time courses. User has to provide corresponding stoichiometric matrix but not a regulation model (Michaelis-Menten or similar). Instead of solving an ordinary differential equation (ODE) system describing the evolution of concentrations, we use B-splines to catch the concentration and rate dynamics then solve a least square problem on their coefficients with non-negativity (and optionally monotonicity) constraints. Constraints can be also set on initial values of concentration. The package dynafluxr can be used as a library but also as an application with command line interface dynafluxr::cli("-h") or graphical user interface dynafluxr::gui()
.
Interactive labelling of scatter plots, volcano plots and Manhattan plots using a shiny and plotly interface. Users can hover over points to see where specific points are located and click points on/off to easily label them. Labels can be dragged around the plot to place them optimally. Plots can be exported directly to PDF for publication. For plots with large numbers of points, points can optionally be rasterized as a bitmap, while all other elements (axes, text, labels & lines) are preserved as vector objects. This can dramatically reduce file size for plots with millions of points such as Manhattan plots, and is ideal for publication.
DNA methylation (6mA
) is a major epigenetic process by which alteration in gene expression took place without changing the DNA sequence. Predicting these sites in-vitro is laborious, time consuming as well as costly. This EpiSemble
package is an in-silico pipeline for predicting DNA sequences containing the 6mA
sites. It uses an ensemble-based machine learning approach by combining Support Vector Machine (SVM), Random Forest (RF) and Gradient Boosting approach to predict the sequences with 6mA
sites in it. This package has been developed by using the concept of Chen et al. (2019) <doi:10.1093/bioinformatics/btz015>.
Bayesian model averaging (BMA) algorithms for univariate link latent Gaussian models (ULLGMs). For detailed information, refer to Steel M.F.J. & Zens G. (2024) "Model Uncertainty in Latent Gaussian Models with Univariate Link Function" <doi:10.48550/arXiv.2406.17318>
. The package supports various g-priors and a beta-binomial prior on the model space. It also includes auxiliary functions for visualizing and tabulating BMA results. Currently, it offers an out-of-the-box solution for model averaging of Poisson log-normal (PLN) and binomial logistic-normal (BiL
) models. The codebase is designed to be easily extendable to other likelihoods, priors, and link functions.
Quantitative RT-PCR data are analyzed using generalized linear mixed models based on lognormal-Poisson error distribution, fitted using MCMC. Control genes are not required but can be incorporated as Bayesian priors or, when template abundances correlate with conditions, as trackers of global effects (common to all genes). The package also implements a lognormal model for higher-abundance data and a "classic" model involving multi-gene normalization on a by-sample basis. Several plotting functions are included to extract and visualize results. The detailed tutorial is available here: <https://matzlab.weebly.com/uploads/7/6/2/2/76229469/mcmc.qpcr.tutorial.v1.2.4.pdf>.
Cooperative learning combines the usual squared error loss of predictions with an agreement penalty to encourage the predictions from different data views to agree. By varying the weight of the agreement penalty, we get a continuum of solutions that include the well-known early and late fusion approaches. Cooperative learning chooses the degree of agreement (or fusion) in an adaptive manner, using a validation set or cross-validation to estimate test set prediction error. In the setting of cooperative regularized linear regression, the method combines the lasso penalty with the agreement penalty (Ding, D., Li, S., Narasimhan, B., Tibshirani, R. (2021) <doi:10.1073/pnas.2202113119>).
This package provides gradient-based MCMC sampling algorithms for use with the MCMC engine provided by the nimble package. This includes two versions of Hamiltonian Monte Carlo (HMC) No-U-Turn (NUTS) sampling, and (under development) Langevin samplers. The `NUTS_classic` sampler implements the original HMC-NUTS algorithm as described in Hoffman and Gelman (2014) <doi:10.48550/arXiv.1111.4246>
. The `NUTS` sampler is a modern version of HMC-NUTS sampling matching the HMC sampler available in version 2.32.2 of Stan (Stan Development Team, 2023). In addition, convenience functions are provided for generating and modifying MCMC configuration objects which employ HMC sampling.
Hail is an open-source, general-purpose, python based data analysis tool with additional data types and methods for working with genomic data, see <https://hail.is/>. Hail is built to scale and has first-class support for multi-dimensional structured data, like the genomic data in a genome-wide association study (GWAS). Hail is exposed as a python library, using primitives for distributed queries and linear algebra implemented in scala', spark', and increasingly C++'. The sparkhail is an R extension using sparklyr package. The idea is to help R users to use hail functionalities with the well-know tidyverse syntax, see <https://www.tidyverse.org/>.
We develop a new class of distribution free multiple testing rules for false discovery rate (FDR) control under general dependence. A key element in our proposal is a symmetrized data aggregation (SDA) approach to incorporating the dependence structure via sample splitting, data screening and information pooling. The proposed SDA filter first constructs a sequence of ranking statistics that fulfill global symmetry properties, and then chooses a data driven threshold along the ranking to control the FDR. For more information, see the website below and the accompanying paper: Du et al. (2020), "False Discovery Rate Control Under General Dependence By Symmetrized Data Aggregation", <arXiv:2002.11992>
.
This package provides a set of functions devoted to multivariate exploratory statistics on textual data. Classical methods such as correspondence analysis and agglomerative hierarchical clustering are available. Chronologically constrained agglomerative hierarchical clustering enriched with labelled-by-words trees is offered. Given a division of the corpus into parts, their characteristic words and documents are identified. Further, accessing to FactoMineR
functions is very easy. Two of them are relevant in textual domain. MFA()
addresses multiple lexical table allowing applications such as dealing with multilingual corpora as well as simultaneously analyzing both open-ended and closed questions in surveys. See <http://xplortext.unileon.es> for examples.
DEPRECATED. Do not start building new projects based on this package. (The (in-house) APD file format was initially developed to store Affymetrix probe-level data, e.g. normalized CEL intensities. Chip types can be added to APD file and similar to methods in the affxparser package, this package provides methods to read APDs organized by units (probesets). In addition, the probe elements can be arranged optimally such that the elements are guaranteed to be read in order when, for instance, data is read unit by unit. This speeds up the read substantially. This package is supporting the Aroma framework and should not be used elsewhere.).
This package performs adjustments of a user-supplied independence loglikelihood function using a robust sandwich estimator of the parameter covariance matrix, based on the methodology in Chandler and Bate (2007) <doi:10.1093/biomet/asm015>. This can be used for cluster correlated data when interest lies in the parameters of the marginal distributions or for performing inferences that are robust to certain types of model misspecification. Functions for profiling the adjusted loglikelihoods are also provided, as are functions for calculating and plotting confidence intervals, for single model parameters, and confidence regions, for pairs of model parameters. Nested models can be compared using an adjusted likelihood ratio test.
This package provides spatially survey balanced designs using the quasi-random number method described Robinson et al. (2013) <doi:10.1111/biom.12059> and adjusted in Robinson et al. (2017) <doi:10.1016/j.spl.2017.05.004>. Designs using MBHdesign can: 1) accommodate, without substantial detrimental effects on spatial balance, legacy sites (Foster et al., 2017 <doi:10.1111/2041-210X.12782>); 2) be based on points or transects (foster et al. 2020 <doi:10.1111/2041-210X.13321> and produce clustered samples (Foster et al. (in press). Additional information about the package use itself is given in Foster (2021) <doi:10.1111/2041-210X.13535>.
Website generator with HTML summaries for predictive models. This package uses DALEX explainers to describe global model behavior. We can see how well models behave (tabs: Model Performance, Auditor), how much each variable contributes to predictions (tabs: Variable Response) and which variables are the most important for a given model (tabs: Variable Importance). We can also compare Concept Drift for pairs of models (tabs: Drifter). Additionally, data available on the website can be easily recreated in current R session. Work on this package was financially supported by the NCN Opus grant 2017/27/B/ST6/01307 at Warsaw University of Technology, Faculty of Mathematics and Information Science.
The Markowitz criterion is a multicriteria decision-making method that stands out in risk and uncertainty analysis in contexts where probabilities are known. This approach represents an evolution of Pascal's criterion by incorporating the dimension of variability. In this framework, the expected value reflects the anticipated return, while the standard deviation serves as a measure of risk. The markowitz package provides a practical and accessible tool for implementing this method, enabling researchers and professionals to perform analyses without complex calculations. Thus, the package facilitates the application of the Markowitz criterion. More details on the method can be found in Octave Jokung-Nguéna (2001, ISBN 2100055372).
This package provides a versatile R visualization package that empowers researchers with comprehensive visualization tools for seamlessly mapping peptides to protein sequences, identifying distinct domains and regions of interest, accentuating mutations, and highlighting post-translational modifications, all while enabling comparisons across diverse experimental conditions. Potential applications of PepMapViz
include the visualization of cross-software mass spectrometry results at the peptide level for specific protein and domain details in a linearized format and post-translational modification coverage across different experimental conditions; unraveling insights into disease mechanisms. It also enables visualization of Major histocompatibility complex-presented peptide clusters in different antibody regions predicting immunogenicity in antibody drug development.
Procedures for testing for group-wide signal in clusters of variables. Tests can be performed for single groups in isolation (univariate) or multiple groups together (multivariate). Specific tests include the exact and approximate (un)selective likelihood ratio tests described in Reid et al (2015), the selective F test and marginal screening prototype test of Reid and Tibshirani (2015). User may pre-specify columns to be included in prototype formation, or allow the function to select them itself. A mixture of these two is also possible. Any variable selection is accounted for using the selective inference framework. Options for non-sampling and hit-and-run null reference distributions.
Identify and understand clusters of points (typically representing the locations of places or events) stored in simple-features (SF) objects. This is useful for analysing, for example, hot-spots of crime events. The package emphasises producing results from point SF data in a single step using reasonable default values for all other arguments, to aid rapid data analysis by users who are starting out. Functions available include kernel density estimation (for details, see Yip (2020) <doi:10.22224/gistbok/2020.1.12>), analysis of spatial association (Getis and Ord (1992) <doi:10.1111/j.1538-4632.1992.tb00261.x>) and hot-spot classification (Chainey (2020) ISBN:158948584X).
Selection index is one of the efficient and acurrate method for selection of animals. This package is useful for construction of selection indices. It uses mixed and random model least squares analysis to estimate the heritability of traits and genetic correlation between traits. The package uses the sire model as it is considered as random effect. The genetic and phenotypic (co)variances along with the relative economic values are used to construct the selection index for any number of traits. It also estimates the accuracy of the index and the genetic gain expected for different traits. Fisher (1936) <doi:10.1111/j.1469-1809.1936.tb02137.x>.
The synchrosqueezed wavelet transform is implemented. The package is a translation of MATLAB Synchrosqueezing Toolbox, version 1.1 originally developed by Eugene Brevdo (2012). The C code for curve_ext was authored by Jianfeng Lu, and translated to Fortran by Dongik Jang. Synchrosqueezing is based on the papers: [1] Daubechies, I., Lu, J. and Wu, H. T. (2011) Synchrosqueezed wavelet transforms: An empirical mode decomposition-like tool. Applied and Computational Harmonic Analysis, 30. 243-261. [2] Thakur, G., Brevdo, E., Fukar, N. S. and Wu, H-T. (2013) The Synchrosqueezing algorithm for time-varying spectral analysis: Robustness properties and new paleoclimate applications. Signal Processing, 93, 1079-1094.
The Wavelet Decomposition followed by Random Forest Regression (RF) models have been applied for time series forecasting. The maximum overlap discrete wavelet transform (MODWT) algorithm was chosen as it works for any length of the series. The series is first divided into training and testing sets. In each of the wavelet decomposed series, the supervised machine learning approach namely random forest was employed to train the model. This package also provides accuracy metrics in the form of Root Mean Square Error (RMSE) and Mean Absolute Prediction Error (MAPE). This package is based on the algorithm of Ding et al. (2021) <DOI: 10.1007/s11356-020-12298-3>.
Some response-adaptive randomization methods commonly found in literature are included in this package. These methods include the randomized play-the-winner rule for binary endpoint (Wei and Durham (1978) <doi:10.2307/2286290>), the doubly adaptive biased coin design with minimal variance strategy for binary endpoint (Atkinson and Biswas (2013) <doi:10.1201/b16101>, Rosenberger and Lachin (2015) <doi:10.1002/9781118742112>) and maximal power strategy targeting Neyman allocation for binary endpoint (Tymofyeyev, Rosenberger, and Hu (2007) <doi:10.1198/016214506000000906>) and RSIHR allocation with each letter representing the first character of the names of the individuals who first proposed this rule (Youngsook and Hu (2010) <doi:10.1198/sbr.2009.0056>, Bello and Sabo (2016) <doi:10.1080/00949655.2015.1114116>), A-optimal Allocation for continuous endpoint (Sverdlov and Rosenberger (2013) <doi:10.1080/15598608.2013.783726>), Aa-optimal Allocation for continuous endpoint (Sverdlov and Rosenberger (2013) <doi:10.1080/15598608.2013.783726>), generalized RSIHR allocation for continuous endpoint (Atkinson and Biswas (2013) <doi:10.1201/b16101>), Bayesian response-adaptive randomization with a control group using the Thall \& Wathen method for binary and continuous endpoints (Thall and Wathen (2007) <doi:10.1016/j.ejca.2007.01.006>) and the forward-looking Gittins index rule for binary and continuous endpoints (Villar, Wason, and Bowden (2015) <doi:10.1111/biom.12337>, Williamson and Villar (2019) <doi:10.1111/biom.13119>).
Addressing measurement error in covariates and misclassification in binary outcome variables within causal inference, the ATE.ERROR package implements inverse probability weighted estimation methods proposed by Shu and Yi (2017, <doi:10.1177/0962280217743777>; 2019, <doi:10.1002/sim.8073>). These methods correct errors to accurately estimate average treatment effects (ATE). The package includes two main functions: ATE.ERROR.Y()
for handling misclassification in the outcome variable and ATE.ERROR.XY()
for correcting both outcome misclassification and covariate measurement error. It employs logistic regression for treatment assignment and uses bootstrap sampling to calculate standard errors and confidence intervals, with simulated datasets provided for practical demonstration.