Fit finite mixtures of Plackett-Luce models for partial top rankings/orderings within the Bayesian framework. It provides MAP point estimates via EM algorithm and posterior MCMC simulations via Gibbs Sampling. It also fits MLE as a special case of the noninformative Bayesian analysis with vague priors. In addition to inferential techniques, the package assists other fundamental phases of a model-based analysis for partial rankings/orderings, by including functions for data manipulation, simulation, descriptive summary, model selection and goodness-of-fit evaluation. Main references on the methods are Mollica and Tardella (2017) <doi.org/10.1007/s11336-016-9530-0> and Mollica and Tardella (2014) <doi/10.1002/sim.6224>.
Biclustering by "Factor Analysis for Bicluster Acquisition" (FABIA). FABIA is a model-based technique for biclustering, that is clustering rows and columns simultaneously. Biclusters are found by factor analysis where both the factors and the loading matrix are sparse. FABIA is a multiplicative model that extracts linear dependencies between samples and feature patterns. It captures realistic non-Gaussian data distributions with heavy tails as observed in gene expression measurements. FABIA utilizes well understood model selection techniques like the EM algorithm and variational approaches and is embedded into a Bayesian framework. FABIA ranks biclusters according to their information content and separates spurious biclusters from true biclusters. The code is written in C.
Fit generalized linear models with binomial responses using either an adjusted-score approach to bias reduction or maximum penalized likelihood where penalization is by Jeffreys invariant prior. These procedures return estimates with improved frequentist properties (bias, mean squared error) that are always finite even in cases where the maximum likelihood estimates are infinite (data separation). Fitting takes place by fitting generalized linear models on iteratively updated pseudo-data. The interface is essentially the same as glm
. More flexibility is provided by the fact that custom pseudo-data representations can be specified and used for model fitting. Functions are provided for the construction of confidence intervals for the reduced-bias estimates.
This package provides a general purpose toolbox for personality, psychometric theory and experimental psychology. Functions are primarily for multivariate analysis and scale construction using factor analysis, principal component analysis, cluster analysis and reliability analysis, although others provide basic descriptive statistics. Item Response Theory is done using factor analysis of tetrachoric and polychoric correlations. Functions for analyzing data at multiple levels include within and between group statistics, including correlations and factor analysis. Functions for simulating and testing particular item and test structures are included. Several functions serve as a useful front end for structural equation modeling. Graphical displays of path diagrams, factor analysis and structural equation models are created using basic graphics.
Quantitative methods for benefit-risk analysis help to condense complex decisions into a univariate metric describing the overall benefit relative to risk. One approach is to use the multi-criteria decision analysis framework (MCDA), as in Mussen, Salek, and Walker (2007) <doi:10.1002/pds.1435>. Bayesian benefit-risk analysis incorporates uncertainty through posterior distributions which are inputs to the benefit-risk framework. The brisk package provides functions to assist with Bayesian benefit-risk analyses, such as MCDA. Users input posterior samples, utility functions, weights, and the package outputs quantitative benefit-risk scores. The posterior of the benefit-risk scores for each group can be compared. Some plotting capabilities are also included.
This package provides a suite of functions for rapid and flexible analysis of codon usage bias. It provides in-depth analysis at the codon level, including relative synonymous codon usage (RSCU), tRNA
weight calculations, machine learning predictions for optimal or preferred codons, and visualization of codon-anticodon pairing. Additionally, it can calculate various gene- specific codon indices such as codon adaptation index (CAI), effective number of codons (ENC), fraction of optimal codons (Fop), tRNA
adaptation index (tAI
), mean codon stabilization coefficients (CSCg), and GC contents (GC/GC3s/GC4d). It also supports both standard and non-standard genetic code tables found in NCBI, as well as custom genetic code tables.
Coalescent simulators can rapidly simulate biological sequences evolving according to a given model of evolution. You can use this package to specify such models, to conduct the simulations and to calculate additional statistics from the results (Staab, Metzler, 2016 <doi:10.1093/bioinformatics/btw098>). It relies on existing simulators for doing the simulation, and currently supports the programs ms', msms and scrm'. It also supports finite-sites mutation models by combining the simulators with the program seq-gen'. Coala provides functions for calculating certain summary statistics, which can also be applied to actual biological data. One possibility to import data is through the PopGenome
package (<https://github.com/pievos101/PopGenome>
).
It is a novel tool used to identify the candidate drugs against a particular disease based on the drug target set enrichment analysis. It assumes the most effective drugs are those with a closer affinity in the protein-protein interaction network to the specified disease. (See Gómez-Carballa et al. (2022) <doi: 10.1016/j.envres.2022.112890> and Feng et al. (2022) <doi: 10.7150/ijms.67815> for disease expression profiles; see Wishart et al. (2018) <doi: 10.1093/nar/gkx1037> and Gaulton et al. (2017) <doi: 10.1093/nar/gkw1074> for drug target information; see Kanehisa et al. (2021) <doi: 10.1093/nar/gkaa970> for the details of KEGG database.).
An implementation of multiple-locus association mapping on a genome-wide scale. Eagle can handle inbred and outbred study populations, populations of arbitrary unknown complexity, and data larger than the memory capacity of the computer. Since Eagle is based on linear mixed models, it is best suited to the analysis of data on continuous traits. However, it can tolerate non-normal data. Eagle reports, as its findings, the best set of snp in strongest association with a trait. For users unfamiliar with R, to perform an analysis, run OpenGUI()
'. This opens a web browser to the menu-driven user interface for the input of data, and for performing genome-wide analysis.
Run grass growth simulations using a grass growth model based on ModVege
(Jouven, M., P. Carrère, and R. Baumont "Model Predicting Dynamics of Biomass, Structure and Digestibility of Herbage in Managed Permanent Pastures. 1. Model Description." (2006) <doi:10.1111/j.1365-2494.2006.00515.x>). The implementation in this package contains a few additions to the above cited version of ModVege
, such as simulations of management decisions, and influences of snow cover. As such, the model is fit to simulate grass growth in mountainous regions, such as the Swiss Alps. The package also contains routines for calibrating the model and helpful tools for analysing model outputs and performance.
Interface to the HERE REST APIs <https://developer.here.com/develop/rest-apis>: (1) geocode and autosuggest addresses or reverse geocode POIs using the Geocoder API; (2) route directions, travel distance or time matrices and isolines using the Routing', Matrix Routing and Isoline Routing APIs; (3) request real-time traffic flow and incident information from the Traffic API; (4) find request public transport connections and nearby stations from the Public Transit API; (5) request intermodal routes using the Intermodal Routing API; (6) get weather forecasts, reports on current weather conditions, astronomical information and alerts at a specific location from the Destination Weather API. Locations, routes and isolines are returned as sf objects.
This package provides two record linkage data sets on the Italian Survey on Household and Wealth, 2008 and 2010, a sample survey conducted by the Bank of Italy every two years. The 2010 survey covered 13,702 individuals, while the 2008 survey covered 13,734 individuals. The following categorical variables are included in this data set: year of birth, working status, employment status, branch of activity, town size, geographical area of birth, sex, whether or not Italian national, and highest educational level obtained. Unique identifiers are available to assess the accuracy of oneâ s method. Please see Steorts (2015) <DOI:10.1214/15-BA965SI> to find more details about the data set.
This package provides tools for high-dimensional peaks-over-threshold inference and simulation of Brown-Resnick and extremal Student spatial extremal processes. These include optimization routines based on censored likelihood and gradient scoring, and exact simulation algorithms for max-stable and multivariate Pareto distributions based on rejection sampling. Fast multivariate Gaussian and Student distribution functions using separation-of-variable algorithm with quasi Monte Carlo integration are also provided. Key references include de Fondeville and Davison (2018) <doi:10.1093/biomet/asy026>, Thibaud and Opitz (2015) <doi:10.1093/biomet/asv045>, Wadsworth and Tawn (2014) <doi:10.1093/biomet/ast042> and Genz and Bretz (2009) <doi:10.1007/978-3-642-01689-9>.
Machine learning is widely used in information-systems design. Yet, training algorithms on imbalanced datasets may severely affect performance on unseen data. For example, in some cases in healthcare, financial, or internet-security contexts, certain sub-classes are difficult to learn because they are underrepresented in training data. This R package offers a flexible and efficient solution based on a new synthetic average neighborhood sampling algorithm ('SANSA'), which, in contrast to other solutions, introduces a novel â placementâ parameter that can be tuned to adapt to each datasets unique manifestation of the imbalance. More information about the algorithm's parameters can be found at Nasir et al. (2022) <https://murtaza.cc/SANSA/>.
This package provides a computing tool is developed to automated identify somatic mutation-driven immune cells. The operation modes including: i) inferring the relative abundance matrix of tumor-infiltrating immune cells and integrating it with a particular gene mutation status, ii) detecting differential immune cells with respect to the gene mutation status and converting the abundance matrix of significant differential immune cell into two binary matrices (one for up-regulated and one for down-regulated), iii) identifying somatic mutation-driven immune cells by comparing the gene mutation status with each immune cell in the binary matrices across all samples, and iv) visualization of immune cell abundance of samples in different mutation status..
Three steps variable selection procedure based on random forests. Initially developed to handle high dimensional data (for which number of variables largely exceeds number of observations), the package is very versatile and can treat most dimensions of data, for regression and supervised classification problems. First step is dedicated to eliminate irrelevant variables from the dataset. Second step aims to select all variables related to the response for interpretation purpose. Third step refines the selection by eliminating redundancy in the set of variables selected by the second step, for prediction purpose. Genuer, R. Poggi, J.-M. and Tuleau-Malot, C. (2015) <https://journal.r-project.org/archive/2015-2/genuer-poggi-tuleaumalot.pdf>.
It is vital to assess the heterogeneity of treatment effects (HTE) when making health care decisions for an individual patient or a group of patients. Nevertheless, it remains challenging to evaluate HTE based on information collected from clinical studies that are often designed and conducted to evaluate the efficacy of a treatment for the overall population. The Bayesian framework offers a principled and flexible approach to estimate and compare treatment effects across subgroups of patients defined by their characteristics. This package allows users to explore a wide range of Bayesian HTE analysis models, and produce posterior inferences about HTE. See Wang et al. (2018) <DOI:10.18637/jss.v085.i07> for further details.
This package implements convex regression with interpretable sharp partitions (CRISP), which considers the problem of predicting an outcome variable on the basis of two covariates, using an interpretable yet non-additive model. CRISP partitions the covariate space into blocks in a data-adaptive way, and fits a mean model within each block. Unlike other partitioning methods, CRISP is fit using a non-greedy approach by solving a convex optimization problem, resulting in low-variance fits. More details are provided in Petersen, A., Simon, N., and Witten, D. (2016). Convex Regression with Interpretable Sharp Partitions. Journal of Machine Learning Research, 17(94): 1-31 <http://jmlr.org/papers/volume17/15-344/15-344.pdf>.
Computes the test statistic and p-value of the Cramer-von Mises and Anderson-Darling test for some continuous distribution functions proposed by Chen and Balakrishnan (1995) <http://asq.org/qic/display-item/index.html?item=11407>. In addition to our classic distribution functions here, we calculate the Goodness of Fit (GoF
) test to dataset which follows the extreme value distribution function, without remembering the formula of distribution/density functions. Calculates the Value at Risk (VaR
) and Average VaR
are another important risk factors which are estimated by using well-known distribution functions. Pflug and Romisch (2007, ISBN: 9812707409) is a good reference to study the properties of risk measures.
Replacement for nls()
tools for working with nonlinear least squares problems. The calling structure is similar to, but much simpler than, that of the nls()
function. Moreover, where nls()
specifically does NOT deal with small or zero residual problems, nlmrt is quite happy to solve them. It also attempts to be more robust in finding solutions, thereby avoiding singular gradient messages that arise in the Gauss-Newton method within nls()
. The Marquardt-Nash approach in nlmrt generally works more reliably to get a solution, though this may be one of a set of possibilities, and may also be statistically unsatisfactory. Added print and summary as of August 28, 2012.
Simulation methods for the Fisher Bingham distribution on the unit sphere, the matrix Bingham distribution on a Grassmann manifold, the matrix Fisher distribution on SO(3), and the bivariate von Mises sine model on the torus. The methods use an acceptance/rejection simulation algorithm for the Bingham distribution and are described fully by Kent, Ganeiber and Mardia (2018) <doi:10.1080/10618600.2017.1390468>. These methods supersede earlier MCMC simulation methods and are more general than earlier simulation methods. The methods can be slower in specific situations where there are existing non-MCMC simulation methods (see Section 8 of Kent, Ganeiber and Mardia (2018) <doi:10.1080/10618600.2017.1390468> for further details).
This package provides tools to compute and analyze the set of statistically-equivalent (Gaussian, linear) path models which generate the input precision or (partial) correlation matrix. This procedure is useful for understanding how statistical network models such as the Gaussian Graphical Model (GGM) perform as causal discovery tools. The statistical-equivalence set of a given GGM expresses the uncertainty we have about the sign, size and direction of directed relationships based on the weights matrix of the GGM alone. The derivation of the equivalence set and its use for understanding GGMs as causal discovery tools is described by Ryan, O., Bringmann, L.F., & Schuurman, N.K. (2022) <doi: 10.31234/osf.io/ryg69>.
The soGGi
package provides a toolset to create genomic interval aggregate/summary plots of signal or motif occurence from BAM and bigWig
files as well as PWM, rlelist, GRanges and GAlignments Bioconductor objects. soGGi
allows for normalisation, transformation and arithmetic operation on and between summary plot objects as well as grouping and subsetting of plots by GRanges objects and user supplied metadata. Plots are created using the GGplot2 libary to allow user defined manipulation of the returned plot object. Coupled together, soGGi
features a broad set of methods to visualise genomics data in the context of groups of genomic intervals such as genes, superenhancers and transcription factor binding events.
This package implements methods that are useful in designing research studies and analyzing data, with particular emphasis on methods that are developed for or used within the behavioral, educational, and social sciences (broadly defined). That being said, many of the methods implemented within MBESS are applicable to a wide variety of disciplines. MBESS has a suite of functions for a variety of related topics, such as effect sizes, confidence intervals for effect sizes (including standardized effect sizes and noncentral effect sizes), sample size planning (from the accuracy in parameter estimation (AIPE), power analytic, equivalence, and minimum-risk point estimation perspectives), mediation analysis, various properties of distributions, and a variety of utility functions.