This package provides comprehensive cytokine profiling analysis through quality control using biologically meaningful cutoffs on raw cytokine measurements and by testing for distributional symmetry to recommend appropriate transformations. Offers exploratory data analysis with summary statistics, enhanced boxplots, and barplots, along with univariate and multivariate analytical capabilities for in-depth cytokine profiling such as Principal Component Analysis based on Andrzej MaÄ kiewicz and Waldemar Ratajczak (1993) <doi:10.1016/0098-3004(93)90090-R>, Sparse Partial Least Squares Discriminant Analysis based on Lê Cao K-A, Boitard S, and Besse P (2011) <doi:10.1186/1471-2105-12-253>, Random Forest based on Breiman, L. (2001) <doi:10.1023/A:1010933404324>, and Extreme Gradient Boosting based on Tianqi Chen and Carlos Guestrin (2016) <doi:10.1145/2939672.2939785>.
Also abbreviates to "CCSeq". Finds clusters of colocalized sequences in .bed annotation files up to a specified cut-off distance. Two sequences are colocalized if they are within the cut-off distance of each other, and clusters are sets of sequences where each sequence is colocalized to at least one other sequence in the cluster. For a set of .bed annotation tables provided in a list along with a cut-off distance, the program will output a file containing the locations of each cluster. Annotated .bed files are from the pwmscan application at <https://ccg.epfl.ch/pwmtools/pwmscan.php>. Personal machines might crash or take excessively long depending on the number of annotated sequences in each file and whether chromsearch()
or gensearch()
is used.
This package provides a regression framework for response variables which are continuous self-rating scales such as the Visual Analog Scale (VAS) used in pain assessment, or the Linear Analog Self-Assessment (LASA) scales in quality of life studies. These scales measure subjects perception of an intangible quantity, and cannot be handled as ratio variables because of their inherent non-linearity. We treat them as ordinal variables, measured on a continuous scale. A function (the g function) connects the scale with an underlying continuous latent variable. The link function is the inverse of the CDF of the assumed underlying distribution of the latent variable. A variety of link functions are currently implemented. Such models are described in Manuguerra et al (2020) <doi:10.18637/jss.v096.i08>.
Includes functions for the construction of matched samples that are balanced and representative by design. Among others, these functions can be used for matching in observational studies with treated and control units, with cases and controls, in related settings with instrumental variables, and in discontinuity designs. Also, they can be used for the design of randomized experiments, for example, for matching before randomization. By default, designmatch uses the highs optimization solver, but its performance is greatly enhanced by the Gurobi optimization solver and its associated R interface. For their installation, please follow the instructions at <https://www.gurobi.com/documentation/quickstart.html> and <https://www.gurobi.com/documentation/7.0/refman/r_api_overview.html>. We have also included directions in the gurobi_installation file in the inst folder.
High performance trainers for parameterizing and clustering weighted data. The Gaussian mixture (GM) module includes the conventional EM (expectation maximization) trainer, the component-wise EM trainer, the minimum-message-length EM trainer by Figueiredo and Jain (2002) <doi:10.1109/34.990138>. These trainers accept additional constraints on mixture weights, covariance eigen ratios and on which mixture components are subject to update. The K-means (KM) module offers clustering with the options of (i) deterministic and stochastic K-means++ initializations, (ii) upper bounds on cluster weights (sizes), (iii) Minkowski distances, (iv) cosine dissimilarity, (v) dense and sparse representation of data input. The package improved the typical implementations of GM and KM algorithms in various aspects. It is carefully crafted in multithreaded C++ for modeling large data for industry use.
Interlinearized glossed texts (IGT) are used in descriptive linguistics for representing a morphological analysis of a text through a morpheme-by-morpheme gloss. InterlineaR
provide a set of functions that targets several popular formats of IGT ('SIL Toolbox', EMELD XML') and that turns an IGT into a set of data frames following a relational model (the tables represent the different linguistic units: texts, sentences, word, morphems). The same pieces of software ('SIL FLEX', SIL Toolbox') typically produce dictionaries of the morphemes used in the glosses. InterlineaR
provide a function for turning the LIFT XML dictionary format into a set of data frames following a relational model in order to represent the dictionary entries, the sense(s) attached to the entries, the example(s) attached to senses, etc.
This package provides an efficient method to recover the missing block of an approximately low-rank matrix. Current literature on matrix completion focuses primarily on independent sampling models under which the individual observed entries are sampled independently. Motivated by applications in genomic data integration, we propose a new framework of structured matrix completion (SMC) to treat structured missingness by design [Cai T, Cai TT, Zhang A (2016) <doi:10.1080/01621459.2015.1021005>]. Specifically, our proposed method aims at efficient matrix recovery when a subset of the rows and columns of an approximately low-rank matrix are observed. The main function in our package, smc.FUN()
, is for recovery of the missing block A22 of an approximately low-rank matrix A given the other blocks A11, A12, A21.
This package provides a framework to infer causality on binary data using techniques in frequent pattern mining and estimation statistics. Given a set of individual vectors S=x where x(i) is a realization value of binary variable i, the framework infers empirical causal relations of binary variables i,j from S in a form of causal graph G=(V,E) where V is a set of nodes representing binary variables and there is an edge from i to j in E if the variable i causes j. The framework determines dependency among variables as well as analyzing confounding factors before deciding whether i causes j. The publication of this package is at Chainarong Amornbunchornvej, Navaporn Surasvadi, Anon Plangprasopchok, and Suttipong Thajchayapong (2023) <doi:10.1016/j.heliyon.2023.e15947>.
Fits single-species (univariate) and multi-species (multivariate) non-spatial and spatial abundance models in a Bayesian framework using Markov Chain Monte Carlo (MCMC). Spatial models are fit using Nearest Neighbor Gaussian Processes (NNGPs). Details on NNGP models are given in Datta, Banerjee, Finley, and Gelfand (2016) <doi:10.1080/01621459.2015.1044091> and Finley, Datta, and Banerjee (2022) <doi:10.18637/jss.v103.i05>. Fits single-species and multi-species spatial and non-spatial versions of generalized linear mixed models (Gaussian, Poisson, Negative Binomial), N-mixture models (Royle 2004 <doi:10.1111/j.0006-341X.2004.00142.x>) and hierarchical distance sampling models (Royle, Dawson, Bates (2004) <doi:10.1890/03-3127>). Multi-species spatial models are fit using a spatial factor modeling approach with NNGPs for computational efficiency.
Implementation of the bootstrapping approach for the estimation of clustering stability and its application in estimating the number of clusters, as introduced by Yu et al (2016)<doi:10.1142/9789814749411_0007>. Implementation of the non-parametric bootstrap approach to assessing the stability of module detection in a graph, the extension for the selection of a parameter set that defines a graph from data in a way that optimizes stability and the corresponding visualization functions, as introduced by Tian et al (2021) <doi:10.1002/sam.11495>. Implemented out-of-bag stability estimation function and k-select Smin-based k-selection function as introduced by Liu et al (2022) <doi:10.1002/sam.11593>. Implemented ensemble clustering method based-on k-means clustering method, spectral clustering method and hierarchical clustering method.
Pharmacokinetics is the study of drug absorption, distribution, metabolism, and excretion. The pharmacokinetics model explains that how the drug concentration change as the drug moves through the different compartments of the body. For pharmacokinetic modeling and analysis, it is essential to understand the basic pharmacokinetic parameters. All parameters are considered, but only some of parameters are used in the model. Therefore, we need to convert the estimated parameters to the other parameters after fitting the specific pharmacokinetic model. This package is developed to help this converting work. For more detailed explanation of pharmacokinetic parameters, see "Gabrielsson and Weiner" (2007), "ISBN-10: 9197651001"; "Benet and Zia-Amirhosseini" (1995) <DOI: 10.1177/019262339502300203>; "Mould and Upton" (2012) <DOI: 10.1038/psp.2012.4>; "Mould and Upton" (2013) <DOI: 10.1038/psp.2013.14>.
This package performs repeated nested cross-validation for Cox Proportionate Hazards, Cox Lasso, Survival Random Forest, and their ensemble. Returns internally validated concordance index, time-dependent area under the curve, Brier score, calibration slope, and statistical testing of non-linear ensemble outperforming the baseline Cox model. In this, it helps researchers to quantify the gain of using a more complex survival model, or justify its redundancy. Equally, it shows the performance value of the non-linear and interaction terms, and may highlight the need of further feature transformation. Further details can be found in Shamsutdinova, Stamate, Roberts, & Stahl (2022) "Combining Cox Model and Tree-Based Algorithms to Boost Performance and Preserve Interpretability for Health Outcomes" <doi:10.1007/978-3-031-08337-2_15>, where the method is described as Ensemble 1.
This package provides robust and efficient methods for estimating causal effects in a target population using a multi-source dataset, including those of Dahabreh et al. (2019) <doi:10.1111/biom.13716>, Robertson et al. (2021) <doi:10.48550/arXiv.2104.05905>
, and Wang et al. (2024) <doi:10.48550/arXiv.2402.02684>
. The multi-source data can be a collection of trials, observational studies, or a combination of both, which have the same data structure (outcome, treatment, and covariates). The target population can be based on an internal dataset or an external dataset where only covariate information is available. The causal estimands available are average treatment effects and subgroup treatment effects. See Wang et al. (2025) <doi:10.1017/rsm.2025.5> for a detailed guide on using the package.
Computation of dendrometric and structural parameters from forest inventory data. The objective is to provide an user-friendly R package for researchers, ecologists, foresters, statisticians, loggers and others persons who deal with forest inventory data. Useful conversion of angle value from degree to radian, conversion from angle to slope (in percentage) and their reciprocals as well as principal angle determination are also included. Position and dispersion parameters usually found in forest studies are implemented. The package contains Fibonacci series, its extensions and the Golden Number computation. Useful references are Arcadius Y. J. Akossou, Soufianou Arzouma, Eloi Y. Attakpa, Noël H. Fonton and Kouami Kokou (2013) <doi:10.3390/d5010099> and W. Bonou, R. Glele Kakaï, A.E. Assogbadjo, H.N. Fonton, B. Sinsin (2009) <doi:10.1016/j.foreco.2009.05.032> .
This package provides functions for evaluating the stability of low-dimensional embeddings and cluster assignments in singleâ cell RNA sequencing (scRNAâ seq
) datasets. Starting from a principal component analysis (PCA) object, users can generate multiple replicates of tâ Distributed Stochastic Neighbor Embedding (tâ SNE) or Uniform Manifold Approximation and Projection (UMAP) embeddings. Embedding stability is quantified by computing pairwise Kendallâ s Tau correlations across replicates and summarizing the distribution of correlation coefficients. In addition to dimensionality reduction, scStability
assesses clustering consistency using either Louvain or Leiden algorithms and calculating the Normalized Mutual Information (NMI) between all pairs of cluster assignments. For background on UMAP and t-SNE algorithms, see McInnes
et al. (2020, <doi:10.21105/joss.00861>) and van der Maaten & Hinton (2008, <https://github.com/lvdmaaten/bhtsne>), respectively.
This package provides a framework for automated machine learning. Concretely, the focus is on the optimisation of bagging workflows. A bagging workflows is composed by three phases: (i) generation: which and how many predictive models to learn; (ii) pruning: after learning a set of models, the worst ones are cut off from the ensemble; and (iii) integration: how the models are combined for predicting a new observation. autoBagging
optimises these processes by combining metalearning and a learning to rank approach to learn from metadata. It automatically ranks 63 bagging workflows by exploiting past performance and dataset characterization. A complete description of the method can be found in: Pinto, F., Cerqueira, V., Soares, C., Mendes-Moreira, J. (2017): "autoBagging
: Learning to Rank Bagging Workflows with Metalearning" arXiv
preprint arXiv:1706.09367
.
This package provides a (mildly) opinionated set of functions to help assess medication adherence for researchers working with medication claims data. Medication adherence analyses have several complex steps that are often convoluted and can be time-intensive. The focus is to create a set of functions using "tidy principles" geared towards transparency, speed, and flexibility while working with adherence metrics. All functions perform exactly one task with an intuitive name so that a researcher can handle details (often achieved with vectorized solutions) while we handle non-vectorized tasks common to most adherence calculations such as adjusting fill dates and determining episodes of care. The methodologies in referenced in this package come from Canfield SL, et al (2019) "Navigating the Wild West of Medication Adherence Reporting in Specialty Pharmacy" <doi:10.18553/jmcp.2019.25.10.1073>.
This package provides a collection of functions for exploratory chemometrics of 2D spectroscopic data sets such as COSY (correlated spectroscopy) and HSQC (heteronuclear single quantum coherence) 2D NMR (nuclear magnetic resonance) spectra. ChemoSpec2D
deploys methods aimed primarily at classification of samples and the identification of spectral features which are important in distinguishing samples from each other. Each 2D spectrum (a matrix) is treated as the unit of observation, and thus the physical sample in the spectrometer corresponds to the sample from a statistical perspective. In addition to chemometric tools, a few tools are provided for plotting 2D spectra, but these are not intended to replace the functionality typically available on the spectrometer. ChemoSpec2D
takes many of its cues from ChemoSpec
and tries to create consistent graphical output and to be very user friendly.
This package creates and fits staged event tree probability models, which are probabilistic graphical models capable of representing asymmetric conditional independence statements for categorical variables. Includes functions to create, plot and fit staged event trees from data, as well as many efficient structure learning algorithms. References: Carli F, Leonelli M, Riccomagno E, Varando G (2022). <doi: 10.18637/jss.v102.i06>. Collazo R. A., Görgen C. and Smith J. Q. (2018, ISBN:9781498729604). Görgen C., Bigatti A., Riccomagno E. and Smith J. Q. (2018) <arXiv:1705.09457>
. Thwaites P. A., Smith, J. Q. (2017) <arXiv:1510.00186>
. Barclay L. M., Hutton J. L. and Smith J. Q. (2013) <doi:10.1016/j.ijar.2013.05.006>. Smith J. Q. and Anderson P. E. (2008) <doi:10.1016/j.artint.2007.05.004>.
This package contains tools for survey statistics (especially in educational assessment) for datasets with replication designs (jackknife, bootstrap, replicate weights; see Kolenikov, 2010; Pfefferman & Rao, 2009a, 2009b, <doi:10.1016/S0169-7161(09)70003-3>, <doi:10.1016/S0169-7161(09)70037-9>); Shao, 1996, <doi:10.1080/02331889708802523>). Descriptive statistics, linear and logistic regression, path models for manifest variables with measurement error correction and two-level hierarchical regressions for weighted samples are included. Statistical inference can be conducted for multiply imputed datasets and nested multiply imputed datasets and is in particularly suited for the analysis of plausible values (for details see George, Oberwimmer & Itzlinger-Bruneforth, 2016; Bruneforth, Oberwimmer & Robitzsch, 2016; Robitzsch, Pham & Yanagida, 2016). The package development was supported by BIFIE (Federal Institute for Educational Research, Innovation and Development of the Austrian School System; Salzburg, Austria).
This package provides a set of procedures for parametric and non-parametric modelling of the dependence structure of multivariate extreme-values is provided. The statistical inference is performed with non-parametric estimators, likelihood-based estimators and Bayesian techniques. It adapts the methodologies of Beranger and Padoan (2015) <doi:10.48550/arXiv.1508.05561>
, Marcon et al. (2016) <doi:10.1214/16-EJS1162>, Marcon et al. (2017) <doi:10.1002/sta4.145>, Marcon et al. (2017) <doi:10.1016/j.jspi.2016.10.004> and Beranger et al. (2021) <doi:10.1007/s10687-019-00364-0>. This package also allows for the modelling of spatial extremes using flexible max-stable processes. It provides simulation algorithms and fitting procedures relying on the Stephenson-Tawn likelihood as per Beranger at al. (2021) <doi:10.1007/s10687-020-00376-1>.
Finite element modeling (FEM) uses meshes of triangles to define surfaces. A surface within a triangle may be either linear or quadratic. In the order one case each node in the mesh is associated with a basis function and the basis is called the order one finite element basis. In the order two case each edge mid-point is also associated with a basis function. Functions are provided for smoothing, density function estimation point evaluation and plotting results. Two papers illustrating the finite element data analysis are Sangalli, L.M., Ramsay, J.O., Ramsay, T.O. (2013)<http://www.mox.polimi.it/~sangalli> and Bernardi, M.S, Carey, M., Ramsay, J. O., Sangalli, L. (2018)<http://www.mox.polimi.it/~sangalli>. Modelling spatial anisotropy via regression with partial differential regularization Journal of Multivariate Analysis, 167, 15-30.
This package provides a comprehensive framework for building, evaluating, and visualizing regression models for analyzing viral load and CD4 (Cluster of Differentiation 4) lymphocytes data. It leverages the principles of the tidymodels ecosystem of Max Kuhn and Hadley Wickham (2020) <https://www.tidymodels.org> to offer a user-friendly experience in model development. This package includes functions for data preprocessing, feature engineering, model training, tuning, and evaluation, along with visualization tools to enhance the interpretation of model results. It is specifically designed for researchers in biostatistics, computational biology, and HIV research who aim to perform reproducible and rigorous analyses to gain insights into disease dynamics. The main focus is on improving the understanding of the relationships between viral load, CD4 lymphocytes, and other relevant covariates to contribute to HIV research and the visibility of vulnerable seropositive populations.
This package implements an approach aimed at assessing the accuracy and effectiveness of raw scores obtained in scales that contain locally dependent items. The program uses as input the calibration (structural) item estimates obtained from fitting extended unidimensional factor-analytic solutions in which the existing local dependencies are included. Measures of reliability (Omega) and information are proposed at three levels: (a) total score, (b) bivariate-doublet, and (c) item-by-item deletion, and are compared to those that would be obtained if all the items had been locally independent. All the implemented procedures can be obtained from: (a) linear factor-analytic solutions in which the item scores are treated as approximately continuous, and (b) non-linear solutions in which the item scores are treated as ordered-categorical. A detailed guide can be obtained at the following url.