This package provides a collection of tools that support data diagnosis, exploration, and transformation. Data diagnostics provides information and visualization of missing values, outliers, and unique and negative values to help you understand the distribution and quality of your data. Data exploration provides information and visualization of the descriptive statistics of univariate variables, normality tests and outliers, correlation of two variables, and the relationship between the target variable and predictor. Data transformation supports binning for categorizing continuous variables, imputes missing values and outliers, and resolves skewness. And it creates automated reports that support these three tasks.
Forensic applications of pedigree analysis, including likelihood ratios for relationship testing, general relatedness inference, marker simulation, and power analysis. forrel is part of the pedsuite', a collection of packages for pedigree analysis, further described in the book Pedigree Analysis in R (Vigeland, 2021, ISBN:9780128244302). Several functions deal specifically with power analysis in missing person cases, implementing methods described in Vigeland et al. (2020) <doi:10.1016/j.fsigen.2020.102376>. Data import from the Familias software (Egeland et al. (2000) <doi:10.1016/S0379-0738(00)00147-X>) is supported through the pedFamilias
package.
An implementation of regression models with partial differential regularizations, making use of the Finite Element Method. The models efficiently handle data distributed over irregularly shaped domains and can comply with various conditions at the boundaries of the domain. A priori information about the spatial structure of the phenomenon under study can be incorporated in the model via the differential regularization. See Sangalli, L. M. (2021) <doi:10.1111/insr.12444> "Spatial Regression With Partial Differential Equation Regularisation" for an overview. The release 1.1-9 requires R (>= 4.2.0) to be installed on windows machines.
Given a group of genomes and their relationship with each other, the package clusters the genomes and selects the most representative members of each cluster. Additional data can be provided to the prioritize certain genomes. The results can be printed out as a list or a new phylogeny with graphs of the trees and distance distributions also available. For detailed introduction see: Thomas H Clarke, Lauren M Brinkac, Granger Sutton, and Derrick E Fouts (2018), GGRaSP
: a R-package for selecting representative genomes using Gaussian mixture models, Bioinformatics, bty300, <doi:10.1093/bioinformatics/bty300>.
The risk plot may be one of the most commonly used figures in tumor genetic data analysis. We can conclude the following two points: Comparing the prediction results of the model with the real survival situation to see whether the survival rate of the high-risk group is lower than that of the low-level group, and whether the survival time of the high-risk group is shorter than that of the low-risk group. The other is to compare the heat map and scatter plot to see the correlation between the predictors and the outcome.
The number of clusters (k) is needed to start all the partitioning clustering algorithms. An optimal value of this input argument is widely determined by using some internal validity indices. Since most of the existing internal indices suggest a k value which is computed from the clustering results after several runs of a clustering algorithm they are computationally expensive. On the contrary, the package kpeaks enables to estimate k before running any clustering algorithm. It is based on a simple novel technique using the descriptive statistics of peak counts of the features in a data set.
Sharing statistical methods or simulation frameworks through shiny applications often requires workflows for handling data. To help save and display simulation results, the postgresUI()
and postgresServer()
functions in mmints help with persistent data storage using a PostgreSQL
database. The mmints package also offers data upload functionality through the csvUploadUI()
and csvUploadServer()
functions which allow users to upload data, view variables and their types, and edit variable types before fitting statistical models within the shiny application. These tools aim to enhance efficiency and user interaction in shiny based statistical and simulation applications.
Estimates exponential-family random graph models for multilevel network data, assuming the multilevel structure is observed. The scope, at present, covers multilevel models where the set of nodes is nested within known blocks. The estimation method uses Monte-Carlo maximum likelihood estimation (MCMLE) methods to estimate a variety of canonical or curved exponential family models for binary random graphs. MCMLE methods for curved exponential-family random graph models can be found in Hunter and Handcock (JCGS, 2006). The package supports parallel computing, and provides methods for assessing goodness-of-fit of models and visualization of networks.
This package provides a collection of tools to fit and work with trophic Species Distribution Models. Trophic Species Distribution Models combine knowledge of trophic interactions with Bayesian structural equation models that model each species as a function of its prey (or predators) and environmental conditions. It exploits the topological ordering of the known trophic interaction network to predict species distribution in space and/or time, where the prey (or predator) distribution is unavailable. The method implemented by the package is described in Poggiato, Andréoletti, Pollock and Thuiller (2022) <doi:10.22541/au.166853394.45823739/v1>.
Many tools for data analysis are not available in R, but are present in public repositories like conda. The Herper package provides a comprehensive set of functions to interact with the conda package managament system. With Herper users can install, manage and run conda packages from the comfort of their R session. Herper also provides an ad-hoc approach to handling external system requirements for R packages. For people developing packages with python conda dependencies we recommend using basilisk (https://bioconductor.org/packages/release/bioc/html/basilisk.html) to internally support these system requirments pre-hoc.
Here we present Link-HD, an approach to integrate heterogeneous datasets, as a generalization of STATIS-ACT (“Structuration des Tableaux A Trois Indices de la Statistique–Analyse Conjointe de Tableaux”), a family of methods to join and compare information from multiple subspaces. However, STATIS-ACT has some drawbacks since it only allows continuous data and it is unable to establish relationships between samples and features. In order to tackle these constraints, we incorporate multiple distance options and a linear regression based Biplot model in order to stablish relationships between observations and variable and perform variable selection.
This package provides a collection of functions and classes which serve as the foundation for packages such as PharmacoGx and RadioGx. It was created to abstract shared functionality to increase ease of maintainability and reduce code repetition in current and future Gx suite programs. Major features include a CoreSet
class, from which RadioSet and PharmacoSet are derived, along with get and set methods for each respective slot. Additional functions related to fitting and plotting dose response curves, quantifying statistical correlation and calculating AUC or SF are included.
The DHARMa package uses a simulation-based approach to create readily interpretable scaled (quantile) residuals for fitted (generalized) linear mixed models. Moreover, externally created simulations, e.g. posterior predictive simulations from Bayesian software such as JAGS, STAN, or BUGS can be processed as well. The resulting residuals are standardized to values between 0 and 1 and can be interpreted as intuitively as residuals from a linear regression. The package also provides a number of plot and test functions for typical model misspecification problems, such as over/underdispersion, zero-inflation, and residual spatial, phylogenetic and temporal autocorrelation.
It covers various approaches to analysis of variance, provides an assumption testing section in order to provide a decision diagram that allows selecting the most appropriate technique. It provides the classical analysis of variance, the nonparametric equivalent of Kruskal Wallis, and the Bayesian approach. These results are shown in an interactive shiny panel, which allows modifying the arguments of the tests, contains interactive graphics and presents automatic conclusions depending on the tests in order to contribute to the interpretation of these analyzes. AovBay
uses Stan and FactorBayes
for Bayesian analysis and Highcharts for interactive charts.
This package provides a beginners toolbox to help those in ecology who want to deepen their understanding or utilize Bioacoustics in their work. The package has a number of utilizations from calculating frequency from waveform, performing operations in dB
, and determining acoustic range of recorders. The majority of this package is based on key concepts learned from the K. Lisa Yang Center for Conservation Bioacoustics at Cornell University and their associated course: Introduction to Bioacoustics course. More information can be found within the walk through vignettes at <https://github.com/MattyD797/bioSNR/tree/main/vignettes>
.
The Bayesian MCMC estimation of parameters for Thomas-type cluster point process with various inhomogeneities. It allows for inhomogeneity in (i) distribution of parent points, (ii) mean number of points in a cluster, (iii) cluster spread. The package also allows for the Bayesian MCMC algorithm for the homogeneous generalized Thomas process. The cluster size is allowed to have a variance that is greater or less than the expected value (cluster sizes are over or under dispersed). Details are described in DvoŠák, RemeÅ¡, Beránek & MrkviÄ ka (2022) <arXiv
: 10.48550/arXiv.2205.07946>
.
This package provides functions and data files to help CE Public-Use Microdata (PUMD) users calculate annual estimated expenditure means, standard errors, and quantiles according to the methods used by the CE with PUMD. For more information on the CE please visit <https://www.bls.gov/cex>. For further reading on CE estimate calculations please see the CE Calculation section of the U.S. Bureau of Labor Statistics (BLS) Handbook of Methods at <https://www.bls.gov/opub/hom/cex/calculation.htm>. For further information about CE PUMD please visit <https://www.bls.gov/cex/pumd.htm>.
It provides the ability to generate images from documents of different types. Three main features are provided: functions for generating document thumbnails, functions for performing visual tests of documents and a function for updating fields and table of contents of a Microsoft Word or RTF document. In order to work, LibreOffice
must be installed on the machine and or Microsoft Word'. If the latter is available, it can be used to produce PDF documents or images identical to the originals; otherwise, LibreOffice
is used and the rendering can be sometimes different from the original documents.
Automated time series forecasting developed by Microsoft Finance. The Microsoft Finance Time Series Forecasting Framework, aka Finn, can be used to forecast any component of the income statement, balance sheet, or any other area of interest by finance. Any numerical quantity over time, Finn can be used to forecast it. While it can be applied outside of the finance domain, Finn was built to meet the needs of financial analysts to better forecast their businesses within a company, and has a lot of built in features that are specific to the needs of financial forecasters. Happy forecasting!
This package provides a set of wrappers around rjags functions to run Bayesian analyses in JAGS (specifically, via libjags'). A single function call can control adaptive, burn-in, and sampling MCMC phases, with MCMC chains run in sequence or in parallel. Posterior distributions are automatically summarized (with the ability to exclude some monitored nodes if desired) and functions are available to generate figures based on the posteriors (e.g., predictive check plots, traceplots). Function inputs, argument syntax, and output format are nearly identical to the R2WinBUGS'/'R2OpenBUGS
packages to allow easy switching between MCMC samplers.
This package provides general linear model facilities (single y-variable, multiple x-variables with arbitrary mixture of continuous and categorical and arbitrary interactions) for cross-species data. The method is, however, based on the nowadays rather uncommon situation in which uncertainty about a phylogeny is well represented by adopting a single polytomous tree. The theory is in A. Grafen (1989, Proc. R. Soc. B 326, 119-157) and aims to cope with both recognised phylogeny (closely related species tend to be similar) and unrecognised phylogeny (a polytomy usually indicates ignorance about the true sequence of binary splits).
This package provides functions for the integrated analysis of protein-protein interaction networks and the detection of functional modules. Different datasets can be integrated into the network by assigning p-values of statistical tests to the nodes of the network. E.g. p-values obtained from the differential expression of the genes from an Affymetrix array are assigned to the nodes of the network. By fitting a beta-uniform mixture model and calculating scores from the p-values, overall scores of network regions can be calculated and an integer linear programming algorithm identifies the maximum scoring subnetwork.
This package provides a fast reimplementation of several density-based algorithms of the DBSCAN family. It includes the clustering algorithms DBSCAN (density-based spatial clustering of applications with noise) and hierarchical DBSCAN (HDBSCAN), the ordering algorithm ordering points to identify the clustering structure (OPTICS), shared nearest neighbor clustering, and the outlier detection algorithms local outlier factor (LOF) and global-local outlier score from hierarchies (GLOSH). The implementations use the kd-tree data structure for faster k-nearest neighbor search. An R interface to fast kNN and fixed-radius NN search is also provided.
Functionalities to simulate space-time data and to estimate dynamic-spatial panel data models. Estimators implemented are the BCML (Elhorst (2010), <doi:10.1016/j.regsciurbeco.2010.03.003>), the MML (Elhorst (2010) <doi:10.1016/j.regsciurbeco.2010.03.003>) and the INLA Bayesian estimator (Lindgren and Rue, (2015) <doi:10.18637/jss.v063.i19>; Bivand, Gomez-Rubio and Rue, (2015) <doi:10.18637/jss.v063.i20>) adapted to panel data. The package contains functions to replicate the analyses of the scientific article entitled "Agricultural Productivity in Space" (Baldoni and Esposti (2021), <doi:10.1111/ajae.12155>)).