CLUster Evaluation (CLUE) is a computational method for identifying optimal number of clusters in a given time-course dataset clustered by cmeans or kmeans algorithms and subsequently identify key kinases or pathways from each cluster. Its implementation in R is called ClueR. See README on <https://github.com/PYangLab/ClueR> for more details. P Yang et al. (2015) <doi:10.1371/journal.pcbi.1004403>.
Compute distributional quantities for an Integrated Gamma (IG) or Integrated Gamma Limit (IGL) copula, such as a cdf and density. Compute corresponding conditional quantities such as the cdf and quantiles. Generate data from an IG or IGL copula. See the vignette for formulas, or for a derivation, see Coia, V (2017) "Forecasting of Nonlinear Extreme Quantiles Using Copula Models." PhD Dissertation, The University of British Columbia.
Gaussian process regression with an emphasis on kernels. Quantitative and qualitative inputs are accepted. Some pre-defined kernels are available, such as radial or tensor-sum for quantitative inputs, and compound symmetry, low rank, group kernel for qualitative inputs. The user can define new kernels and composite kernels through a formula mechanism. Useful methods include parameter estimation by maximum likelihood, simulation, prediction and leave-one-out validation.
Fit and analysis of finite Mixtures of Mallows models with Spearman Distance for full and partial rankings with arbitrary missing positions. Inference is conducted within the maximum likelihood framework via Expectation-Maximization algorithms. Estimation uncertainty is tackled via diverse versions of bootstrapped and asymptotic confidence intervals. The most relevant reference of the methods is Crispino, Mollica, Astuti and Tardella (2023) <doi:10.1007/s11222-023-10266-8>.
This package provides a new method to implement clustering from multiple modality data of certain samples, the function M2SMF() jointly factorizes multiple similarity matrices into a shared sub-matrix and several modality private sub-matrices, which is further used for clustering. Along with this method, we also provide function to calculate the similarity matrix and function to evaluate the best cluster number from the original data.
Model fitting, sampling and visualization for the (Hidden) Markov Random Field model with pairwise interactions and general interaction structure from Freguglia, Garcia & Bicas (2020) <doi:10.1002/env.2613>, which has many popular models used in 2-dimensional lattices as particular cases, like the Ising Model and Potts Model. A complete manuscript describing the package is available in Freguglia & Garcia (2022) <doi:10.18637/jss.v101.i08>.
This package provides some easy-to-use functions for time series analyses of (plant-) phenological data sets. These functions mainly deal with the estimation of combined phenological time series and are usually wrappers for functions that are already implemented in other R packages adapted to the special structure of phenological data and the needs of phenologists. Some date conversion functions to handle Julian dates are also provided.
Various quantile-based clustering algorithms: algorithm CU (Common theta and Unscaled variables), algorithm CS (Common theta and Scaled variables through lambda_j), algorithm VU (Variable-wise theta_j and Unscaled variables) and algorithm VW (Variable-wise theta_j and Scaled variables through lambda_j). Hennig, C., Viroli, C., Anderlucci, L. (2019) "Quantile-based clustering." Electronic Journal of Statistics. 13 (2) 4849 - 4883 <doi:10.1214/19-EJS1640>.
An implementation of the stratification index proposed by Zhou (2012) <DOI:10.1177/0081175012452207>. The package provides two functions, srank, which returns stratum-specific information, including population share and average percentile rank; and strat, which returns the stratification index and its approximate standard error. When a grouping factor is specified, strat also provides a detailed decomposition of the overall stratification into between-group and within-group components.
Inspired by the art and color research of Sanzo Wada (1883-1967), his "Dictionary Of Color Combinations" (2011, ISBN:978-4861522475), and the interactive site by Dain M. Blodorn Kim <https://github.com/dblodorn/sanzo-wada>, this package brings Wada's color combinations to R for easy use in data visualizations. This package honors 60 of Wada's color combinations: 20 duos, 20 trios, and 20 quads.
This package contains methods for the simulation of positive tempered stable distributions and related subordinators. Including classical tempered stable, rapidly deceasing tempered stable, truncated stable, truncated tempered stable, generalized Dickman, truncated gamma, generalized gamma, and p-gamma. For details, see Dassios et al (2019) <doi:10.1017/jpr.2019.6>, Dassios et al (2020) <doi:10.1145/3368088>, Grabchak (2021) <doi:10.1016/j.spl.2020.109015>.
This package provides tools to simulate and analyze survival data with interval-, left-, right-, and uncensored observations under common parametric distributions, including "Weibull", "Exponential", "Log-Normal", "Log-Logistic", "Gamma", "Gompertz", "Normal", "Logistic", and "EMV". The package supports both direct maximum likelihood estimation and imputation-based methods, making it suitable for methodological research, simulation benchmarking, and teaching. A web-based companion app is also available for demonstration purposes.
This package provides a pipeline for estimating the average treatment effect via semi-supervised learning. Outcome regression is fit with cross-fitting using various machine learning method or user customized function. Doubly robust ATE estimation leverages both labeled and unlabeled data under a semi-supervised missing-data framework. For more details see Hou et al. (2021) <doi:10.48550/arxiv.2110.12336>. A detailed vignette is included.
This package contains R functions for simulating and estimating integer-valued trawl processes as described in the article Veraart (2019),"Modeling, simulation and inference for multivariate time series of counts using trawl processes", Journal of Multivariate Analysis, 169, pages 110-129, <doi:10.1016/j.jmva.2018.08.012> and for simulating random vectors from the bivariate negative binomial and the bi- and trivariate logarithmic series distributions.
HERON is a software package for analyzing peptide binding array data. In addition to identifying significant binding probes, HERON also provides functions for finding epitopes (string of consecutive peptides within a protein). HERON also calculates significance on the probe, epitope, and protein level by employing meta p-value methods. HERON is designed for obtaining calls on the sample level and calculates fractions of hits for different conditions.
This package performs outlier detection of sequences in a multiple sequence alignment using bootstrap of predefined distance metrics. Outlier sequences can make downstream analyses unreliable or make the alignments less accurate while they are being constructed. This package implements the OD-seq algorithm proposed by Jehl et al (doi 10.1186/s12859-015-0702-1) for aligned sequences and a variant using string kernels for unaligned sequences.
Command line tool to extract the main content from a webpage, as done by the "Reader View" feature of most modern browsers. It's intended to be used with terminal RSS readers, to make the articles more readable on web browsers such as lynx. The code is closely adapted from the Firefox version and the output is expected to be mostly equivalent.
We implemented a Bayesian-statistics approach for subtraction of incoherent scattering from neutron total-scattering data. In this approach, the estimated background signal associated with incoherent scattering maximizes the posterior probability, which combines the likelihood of this signal in reciprocal and real spaces with the prior that favors smooth lines. The description of the corresponding approach could be found at Gagin and Levin (2014) <DOI:10.1107/S1600576714023796>.
This package performs cluster analysis using an ensemble clustering framework, Chiu & Talhouk (2018) <doi:10.1186/s12859-017-1996-y>. Results from a diverse set of algorithms are pooled together using methods such as majority voting, K-Modes, LinkCluE, and CSPA. There are options to compare cluster assignments across algorithms using internal and external indices, visualizations such as heatmaps, and significance testing for the existence of clusters.
Maximum likelihood estimation of an extended class of row-column (RC) association models for two-dimensional contingency tables, which are formulated by a condition of reduced rank on a matrix of extended association parameters; see Forcina (2019) <arXiv:1910.13848>. These parameters are defined by choosing the logit type for the row and column variables among four different options and a transformation derived from suitable divergence measures.
Functionalities for modelling functional data with multidimensional inputs, multivariate functional data, and non-separable and/or non-stationary covariance structure of function-valued processes. In addition, there are functionalities for functional regression models where the mean function depends on scalar and/or functional covariates and the covariance structure depends on functional covariates. The development version of the package can be found on <https://github.com/gpfda/GPFDA-dev>.
Designed for analyzing the Medical Information Mart for Intensive Care(MIMIC) dataset, a repository of freely accessible electronic health records. MIMER(MIMIC-enabled Research) package, offers a suite of data wrangling functions tailored specifically for preparing the dataset for research purposes, particularly in antimicrobial resistance(AMR) studies. It simplifies complex data manipulation tasks, allowing researchers to focus on their primary inquiries without being bogged down by wrangling complexities.
Extract, transform and load MITRE standards. This package gives you an approach to cybersecurity data sets. All data sets are build on runtime downloading raw data from MITRE public services. MITRE <https://www.mitre.org/> is a government-funded research organization based in Bedford and McLean. Current version includes most used standards as data frames. It also provide a list of nodes and edges with all relationships.
This package contains functions for data analysis of Repeated measurement using GEE. Data may contain missing value in response and covariates. For parameter estimation through Fisher Scoring algorithm, Mean Score and Inverse Probability Weighted method combining with Multiple Imputation are used when there is missing value in covariates/response. Reference for mean score method, inverse probability weighted method is Wang et al(2007)<doi:10.1093/biostatistics/kxl024>.