Fit finite mixtures of Plackett-Luce models for partial top rankings/orderings within the Bayesian framework. It provides MAP point estimates via EM algorithm and posterior MCMC simulations via Gibbs Sampling. It also fits MLE as a special case of the noninformative Bayesian analysis with vague priors. In addition to inferential techniques, the package assists other fundamental phases of a model-based analysis for partial rankings/orderings, by including functions for data manipulation, simulation, descriptive summary, model selection and goodness-of-fit evaluation. Main references on the methods are Mollica and Tardella (2017) <doi:10.1007/s11336-016-9530-0> and Mollica and Tardella (2014) <doi:10.1002/sim.6224>.
This package implements Time-Weighted Dynamic Time Warping (TWDTW), a measure for quantifying time series similarity. The TWDTW algorithm, described in Maus et al. (2016) <doi:10.1109/JSTARS.2016.2517118> and Maus et al. (2019) <doi:10.18637/jss.v088.i05>, is applicable to multi-dimensional time series of various resolutions. It is particularly suitable for comparing time series with seasonality for environmental and ecological data analysis, covering domains such as remote sensing imagery, climate data, hydrology, and animal movement. The twdtw package offers a user-friendly R interface, efficient Fortran routines for TWDTW calculations, flexible time weighting definitions, as well as utilities for time series preprocessing and visualization.
In the course of a genome-wide association study, the situation often arises that some phenotypes are known with greater precision than others. It could be that some individuals are known to harbor more micro-environmental variance than others. In the case of inbred strains of model organisms, it could be the case that more organisms were observed from some strains than others, so the strains with more organisms have better-estimated means. Package wISAM handles this situation by allowing for weighting of each observation according to residual variance. Specifically, the weight parameter to the function conduct_scan() takes the precision of each observation (one over the variance).
Famat is made to collect data about lists of genes and metabolites provided by user, and to visualize it through a Shiny app. Information collected is: - Pathways containing some of the user's genes and metabolites (obtained using a pathway enrichment analysis). - Direct interactions between user's elements inside pathways. - Information about elements (their identifiers and descriptions). - Go terms enrichment analysis performed on user's genes. The Shiny app is composed of: - information about genes, metabolites, and direct interactions between them inside pathways. - an heatmap showing which elements from the list are in pathways (pathways are structured in hierarchies). - hierarchies of enriched go terms using Molecular Function and Biological Process.
Biological molecules in a living organism seldom work individually. They usually interact each other in a cooperative way. Biological process is too complicated to understand without considering such interactions. Thus, network-based procedures can be seen as powerful methods for studying complex process. However, many methods are devised for analyzing individual genes. It is said that techniques based on biological networks such as gene co-expression are more precise ways to represent information than those using lists of genes only. This package is aimed to integrate the gene expression and biological network. A biological network is constructed from gene expression data and it is used for Gene Set Enrichment Analysis.
Fit generalized linear models with binomial responses using either an adjusted-score approach to bias reduction or maximum penalized likelihood where penalization is by Jeffreys invariant prior. These procedures return estimates with improved frequentist properties (bias, mean squared error) that are always finite even in cases where the maximum likelihood estimates are infinite (data separation). Fitting takes place by fitting generalized linear models on iteratively updated pseudo-data. The interface is essentially the same as glm. More flexibility is provided by the fact that custom pseudo-data representations can be specified and used for model fitting. Functions are provided for the construction of confidence intervals for the reduced-bias estimates.
This package provides a general purpose toolbox for personality, psychometric theory and experimental psychology. Functions are primarily for multivariate analysis and scale construction using factor analysis, principal component analysis, cluster analysis and reliability analysis, although others provide basic descriptive statistics. Item Response Theory is done using factor analysis of tetrachoric and polychoric correlations. Functions for analyzing data at multiple levels include within and between group statistics, including correlations and factor analysis. Functions for simulating and testing particular item and test structures are included. Several functions serve as a useful front end for structural equation modeling. Graphical displays of path diagrams, factor analysis and structural equation models are created using basic graphics.
Circadian rhythms are rhythms that oscillate about every 24 h, which has been observed in multiple physiological processes including core body temperature, hormone secretion, heart rate, blood pressure, and many others. Measuring circadian rhythm with wearables is based on a principle that there is increased movement during wake periods and reduced movement during sleep periods, and has been shown to be reliable and valid. This package can be used to extract nonparametric circadian metrics like intradaily variability (IV), interdaily stability (IS), and relative amplitude (RA); and parametric cosinor model and extended cosinor model coefficient. Details can be found in Junrui Di et al (2019) <doi:10.1007/s12561-019-09236-4>.
This package provides functions to calculate the relative crystallinity of starch by X-ray Diffraction (XRD) and Infrared Spectroscopy (FTIR). Starch is biosynthesized by plants in the form of granules semicrystalline. For XRD, the relative crystallinity is obtained by separating the crystalline peaks from the amorphous scattering region. For FTIR, the relative crystallinity is achieved by setting of a Gaussian holocrystalline-peak in the 800-1300 cm-1 region of FTIR spectrum of starch which is divided into amorphous region and crystalline region. The relative crystallinity of native starch granules varies from 14 of 45 percent. This package was supported by FONDECYT 3150630 and CIPA Conicyt-Regional R08C1002 is gratefully acknowledged.
To integrate multiple GSEA studies, we propose a hybrid strategy, iGSEA-AT, for choosing random effects (RE) versus fixed effect (FE) models, with an attempt to achieve the potential maximum statistical efficiency as well as stability in performance in various practical situations. In addition to iGSEA-AT, this package also provides options to perform integrative GSEA with testing based on a FE model (iGSEA-FE) and testing based on a RE model (iGSEA-RE). The approaches account for different set sizes when testing a database of gene sets. The function is easy to use, and the three approaches can be applied to both binary and continuous phenotypes.
This package provides a collection of functions for processing and analyzing metabolite data. The namesake function mrbin() converts 1D or 2D Nuclear Magnetic Resonance data into a matrix of values suitable for further data analysis and performs basic processing steps in a reproducible way. Negative values, a common issue in such data, can be replaced by positive values (<doi:10.1021/acs.jproteome.0c00684>). All used parameters are stored in a readable text file and can be restored from that file to enable exact reproduction of the data at a later time. The function fia() ranks features according to their impact on classifier models, especially artificial neural network models.
Biclustering by "Factor Analysis for Bicluster Acquisition" (FABIA). FABIA is a model-based technique for biclustering, that is clustering rows and columns simultaneously. Biclusters are found by factor analysis where both the factors and the loading matrix are sparse. FABIA is a multiplicative model that extracts linear dependencies between samples and feature patterns. It captures realistic non-Gaussian data distributions with heavy tails as observed in gene expression measurements. FABIA utilizes well understood model selection techniques like the EM algorithm and variational approaches and is embedded into a Bayesian framework. FABIA ranks biclusters according to their information content and separates spurious biclusters from true biclusters. The code is written in C.
An elaborate molecular evolutionary framework that facilitates straightforward simulation of codon genetic sequences subjected to different degrees and/or patterns of Darwinian selection. The model is built upon the fitness landscape paradigm of Sewall Wright, as popularised by the mutation-selection model of Halpern and Bruno. This enables realistic evolutionary process of living organisms to be reproducible seamlessly. For example, an Ornstein-Uhlenbeck fitness update algorithm is incorporated herein. Consequently, otherwise complex biological processes, such as the effect of the interplay between genetic drift and fitness landscape fluctuations on the inference of diversifying selection, may now be investigated with minimal effort. Frequency-dependent and stochastic fitness landscape update techniques are available.
Quantitative methods for benefit-risk analysis help to condense complex decisions into a univariate metric describing the overall benefit relative to risk. One approach is to use the multi-criteria decision analysis framework (MCDA), as in Mussen, Salek, and Walker (2007) <doi:10.1002/pds.1435>. Bayesian benefit-risk analysis incorporates uncertainty through posterior distributions which are inputs to the benefit-risk framework. The brisk package provides functions to assist with Bayesian benefit-risk analyses, such as MCDA. Users input posterior samples, utility functions, weights, and the package outputs quantitative benefit-risk scores. The posterior of the benefit-risk scores for each group can be compared. Some plotting capabilities are also included.
Coalescent simulators can rapidly simulate biological sequences evolving according to a given model of evolution. You can use this package to specify such models, to conduct the simulations and to calculate additional statistics from the results (Staab, Metzler, 2016 <doi:10.1093/bioinformatics/btw098>). It relies on existing simulators for doing the simulation, and currently supports the programs ms', msms and scrm'. It also supports finite-sites mutation models by combining the simulators with the program seq-gen'. Coala provides functions for calculating certain summary statistics, which can also be applied to actual biological data. One possibility to import data is through the PopGenome package (<https://github.com/pievos101/PopGenome>).
This package provides a suite of functions for rapid and flexible analysis of codon usage bias. It provides in-depth analysis at the codon level, including relative synonymous codon usage (RSCU), tRNA weight calculations, machine learning predictions for optimal or preferred codons, and visualization of codon-anticodon pairing. Additionally, it can calculate various gene- specific codon indices such as codon adaptation index (CAI), effective number of codons (ENC), fraction of optimal codons (Fop), tRNA adaptation index (tAI), mean codon stabilization coefficients (CSCg), and GC contents (GC/GC3s/GC4d). It also supports both standard and non-standard genetic code tables found in NCBI, as well as custom genetic code tables.
It is a novel tool used to identify the candidate drugs against a particular disease based on the drug target set enrichment analysis. It assumes the most effective drugs are those with a closer affinity in the protein-protein interaction network to the specified disease. (See Gómez-Carballa et al. (2022) <doi: 10.1016/j.envres.2022.112890> and Feng et al. (2022) <doi: 10.7150/ijms.67815> for disease expression profiles; see Wishart et al. (2018) <doi: 10.1093/nar/gkx1037> and Gaulton et al. (2017) <doi: 10.1093/nar/gkw1074> for drug target information; see Kanehisa et al. (2021) <doi: 10.1093/nar/gkaa970> for the details of KEGG database.).
An implementation of multiple-locus association mapping on a genome-wide scale. Eagle can handle inbred and outbred study populations, populations of arbitrary unknown complexity, and data larger than the memory capacity of the computer. Since Eagle is based on linear mixed models, it is best suited to the analysis of data on continuous traits. However, it can tolerate non-normal data. Eagle reports, as its findings, the best set of snp in strongest association with a trait. For users unfamiliar with R, to perform an analysis, run OpenGUI()'. This opens a web browser to the menu-driven user interface for the input of data, and for performing genome-wide analysis.
Run grass growth simulations using a grass growth model based on ModVege (Jouven, M., P. Carrère, and R. Baumont "Model Predicting Dynamics of Biomass, Structure and Digestibility of Herbage in Managed Permanent Pastures. 1. Model Description." (2006) <doi:10.1111/j.1365-2494.2006.00515.x>). The implementation in this package contains a few additions to the above cited version of ModVege, such as simulations of management decisions, and influences of snow cover. As such, the model is fit to simulate grass growth in mountainous regions, such as the Swiss Alps. The package also contains routines for calibrating the model and helpful tools for analysing model outputs and performance.
Interface to the HERE REST APIs <https://developer.here.com/develop/rest-apis>: (1) geocode and autosuggest addresses or reverse geocode POIs using the Geocoder API; (2) route directions, travel distance or time matrices and isolines using the Routing', Matrix Routing and Isoline Routing APIs; (3) request real-time traffic flow and incident information from the Traffic API; (4) find request public transport connections and nearby stations from the Public Transit API; (5) request intermodal routes using the Intermodal Routing API; (6) get weather forecasts, reports on current weather conditions, astronomical information and alerts at a specific location from the Destination Weather API. Locations, routes and isolines are returned as sf objects.
This package provides two record linkage data sets on the Italian Survey on Household and Wealth, 2008 and 2010, a sample survey conducted by the Bank of Italy every two years. The 2010 survey covered 13,702 individuals, while the 2008 survey covered 13,734 individuals. The following categorical variables are included in this data set: year of birth, working status, employment status, branch of activity, town size, geographical area of birth, sex, whether or not Italian national, and highest educational level obtained. Unique identifiers are available to assess the accuracy of oneâ s method. Please see Steorts (2015) <DOI:10.1214/15-BA965SI> to find more details about the data set.
This package provides tools for high-dimensional peaks-over-threshold inference and simulation of Brown-Resnick and extremal Student spatial extremal processes. These include optimization routines based on censored likelihood and gradient scoring, and exact simulation algorithms for max-stable and multivariate Pareto distributions based on rejection sampling. Fast multivariate Gaussian and Student distribution functions using separation-of-variable algorithm with quasi Monte Carlo integration are also provided. Key references include de Fondeville and Davison (2018) <doi:10.1093/biomet/asy026>, Thibaud and Opitz (2015) <doi:10.1093/biomet/asv045>, Wadsworth and Tawn (2014) <doi:10.1093/biomet/ast042> and Genz and Bretz (2009) <doi:10.1007/978-3-642-01689-9>.
Machine learning is widely used in information-systems design. Yet, training algorithms on imbalanced datasets may severely affect performance on unseen data. For example, in some cases in healthcare, financial, or internet-security contexts, certain sub-classes are difficult to learn because they are underrepresented in training data. This R package offers a flexible and efficient solution based on a new synthetic average neighborhood sampling algorithm ('SANSA'), which, in contrast to other solutions, introduces a novel â placementâ parameter that can be tuned to adapt to each datasets unique manifestation of the imbalance. More information about the algorithm's parameters can be found at Nasir et al. (2022) <https://murtaza.cc/SANSA/>.
This package provides a computing tool is developed to automated identify somatic mutation-driven immune cells. The operation modes including: i) inferring the relative abundance matrix of tumor-infiltrating immune cells and integrating it with a particular gene mutation status, ii) detecting differential immune cells with respect to the gene mutation status and converting the abundance matrix of significant differential immune cell into two binary matrices (one for up-regulated and one for down-regulated), iii) identifying somatic mutation-driven immune cells by comparing the gene mutation status with each immune cell in the binary matrices across all samples, and iv) visualization of immune cell abundance of samples in different mutation status..