An interface to build machine learning models for classification and regression problems. mikropml implements the ML pipeline described by TopçuoÄ lu et al. (2020) <doi:10.1128/mBio.00434-20>
with reasonable default options for data preprocessing, hyperparameter tuning, cross-validation, testing, model evaluation, and interpretation steps. See the website <https://www.schlosslab.org/mikropml/> for more information, documentation, and examples.
Semi-parametric approach for sparse canonical correlation analysis which can handle mixed data types: continuous, binary and truncated continuous. Bridge functions are provided to connect Kendall's tau to latent correlation under the Gaussian copula model. The methods are described in Yoon, Carroll and Gaynanova (2020) <doi:10.1093/biomet/asaa007> and Yoon, Mueller and Gaynanova (2021) <doi:10.1080/10618600.2021.1882468>.
This package provides a comprehensive implementation of Petersen-type estimators and its many variants for two-sample capture-recapture studies. A conditional likelihood approach is used that allows for tag loss; non reporting of tags; reward tags; categorical, geographical and temporal stratification; partial stratification; reverse capture-recapture; and continuous variables in modeling the probability of capture. Many examples from fisheries management are presented.
Makes it easy to push data to Power BI using R and the Power BI REST APIs (see <https://docs.microsoft.com/en-us/rest/api/power-bi/>). A set of functions for turning data frames into Power BI datasets and refreshing these datasets are provided. Administrative tasks such as monitoring refresh statuses and pulling metadata about workspaces and users are also supported.
Fast computation of multivariate analyses of small (10s to 100s markers) to big (1000s to 100000s) genotype data. Runs Principal Component Analysis allowing for centering, z-score standardization and scaling for genetic drift, projection of ancient samples to modern genetic space and multivariate tests for differences in group location (Permutation-Based Multivariate Analysis of Variance) and dispersion (Permutation-Based Multivariate Analysis of Dispersion).
This package implements the Seinhorst model to analyze the relationship between initial nematode densities and plant growth response using nonlinear least squares estimation. The package provides tools for model fitting, prediction, and visualization, facilitating the study of plant-nematode interactions. Model parameters can be estimated or set to predefined values based on Seinhorst (1986) <doi:10.1007/978-1-4613-2251-1_11>.
Bayesian Tensor Factorization for decomposition of tensor data sets using the trilinear CANDECOMP/PARAFAC (CP) factorization, with automatic component selection. The complete data analysis pipeline is provided, including functions and recommendations for data normalization and model definition, as well as missing value prediction and model visualization. The method performs factorization for three-way tensor datasets and the inference is implemented with Gibbs sampling.
This package implements an algorithm for generating maps, known as tile maps, in which each region is represented by a single tile of the same shape and size. The algorithm was first proposed in "Generating Tile Maps" by Graham McNeill
and Scott Hale (2017) <doi:10.1111/cgf.13200>. Functions allow users to generate, plot, and compare square or hexagon tile maps.
FHIR R4 bundles in JSON format are derived from https://synthea.mitre.org/downloads. Transformation inspired by a kaggle notebook published by Dr Alexander Scarlat, https://www.kaggle.com/code/drscarlat/fhir-starter-parse-healthcare-bundles-into-tables. This is a very limited illustration of some basic parsing and reorganization processes. Additional tooling will be required to move beyond the Synthea data illustrations.
ClustIRR
analyzes repertoires of B- and T-cell receptors. It starts by identifying communities of immune receptors with similar specificities, based on the sequences of their complementarity-determining regions (CDRs). Next, it employs a Bayesian probabilistic models to quantify differential community occupancy (DCO) between repertoires, allowing the identification of expanding or contracting communities in response to e.g. infection or cancer treatment.
The funOmics
package ggregates or summarizes omics data into higher level functional representations such as GO terms gene sets or KEGG metabolic pathways. The aggregated data matrix represents functional activity scores that facilitate the analysis of functional molecular sets while allowing to reduce dimensionality and provide easier and faster biological interpretations. Coordinated functional activity scores can be as informative as single molecules!
This package provides a collection of microRNAs/targets
from external resources, including validated microRNA-target
databases (miRecords
, miRTarBase
and TarBase
), predicted microRNA-target
databases (DIANA-microT
, ElMMo
, MicroCosm
, miRanda
, miRDB
, PicTar
, PITA and TargetScan
) and microRNA-disease/drug
databases (miR2Disease
, Pharmaco-miR
VerSe
and PhenomiR
).
Helps enable adaptive management by codifying knowledge in the form of models generated from numerous analyses and data sets. Facilitates this process by storing all models and data sets in a single object that can be updated and saved, thus tracking changes in knowledge through time. A shiny application called AM Model Manager (modelMgr()
) enables the use of these functions via a GUI.
Calculate ActiGraph
counts from the X, Y, and Z axes of a triaxial accelerometer. This work was inspired by Neishabouri et al. who published the article "Quantification of Acceleration as Activity Counts in ActiGraph
Wearables" on February 24, 2022. The link to the article (<https://pubmed.ncbi.nlm.nih.gov/35831446>) and python implementation of this code (<https://github.com/actigraph/agcounts>).
This package provides a number of functions to access the National Energy Research Laboratory Alternate Fuel Locator API <https://developer.nrel.gov/docs/transportation/alt-fuel-stations-v1/>. The Alternate Fuel Locator shows the location of alternate fuel stations in the United States and Canada. This package also includes the data from the US Department of Energy Alternate Fuel database as a data set.
This package provides functionality to automatically detect groove locations via a Bayesian changepoint detection method to be used in the data preprocessing step of forensic bullet matching algorithms. The methods in this package are based on those in Stephens (1994) <doi:10.2307/2986119>. Bayesian changepoint detection will simply be an option in the function from the package bulletxtrctr which identifies the groove locations.
Utilities for Bratteli graphs. A tree is an example of a Bratteli graph. The package provides a function which generates a LaTeX
file that renders the given Bratteli graph. It also provides functions to compute the dimensions of the vertices, the intrinsic kernels and the intrinsic distances. Intrinsic kernels and distances were introduced by Vershik (2014) <doi:10.1007/s10958-014-1958-0>.
Responsive and modern HTML card essentials for shiny applications and dashboards. This novel card component in Bootstrap provides a flexible and extensible content container with multiple variants and options for building robust R based apps e.g for graph build or machine learning projects. The features rely on a combination of JQuery <https://jquery.com> and CSS styles to improve the card functionality.
Management of and data extraction from camera trap data in wildlife studies. The package provides a workflow for storing and sorting camera trap photos (and videos), tabulates records of species and individuals, and creates detection/non-detection matrices for occupancy and spatial capture-recapture analyses with great flexibility. In addition, it can visualise species activity data and provides simple mapping functions with GIS export.
Allows to generate colors from palettes defined in the colormap module of Node.js'. (see <https://github.com/bpostlethwaite/colormap> for more information). In total it provides 44 distinct palettes made from sequential and/or diverging colors. In addition to the pre defined palettes you can also specify your own set of colors. There are also scale functions that can be used with ggplot2'.
This package creates project specific directory and file templates that are written to a .Rprofile file. Upon starting a new R session, these templates can be used to streamline the creation of new directories that are standardized to the user's preferences and can include the initiation of a git repository, an RStudio R project, and project-local dependency management with the renv package.
Testing functions for Covariance Matrices. These tests include high-dimension homogeneity of covariance matrix testing described by Schott (2007) <doi:10.1016/j.csda.2007.03.004> and high-dimensional one-sample tests of covariance matrix structure described by Fisher, et al. (2010) <doi:10.1016/j.jmva.2010.07.004>. Covariance matrix tests use C++ to speed performance and allow larger data sets.
This package provides a local haplotyping visualization toolbox to capture major patterns of co-inheritance between clusters of linked variants, whilst connecting findings to phenotypic and demographic traits across individuals. crosshap enables users to explore and understand genomic variation across a trait-associated region. For an example of successful local haplotype analysis, see Marsh et al. (2022) <doi:10.1007/s00122-022-04045-8>.
This package contains a function called dmur()
which accepts four parameters like possible values, probabilities of the values, selling cost and preparation cost. The dmur()
function generates various numeric decision parameters like MEMV (Maximum (optimum) expected monitory value), best choice, EPPI (Expected profit with perfect information), EVPI (Expected value of the perfect information), EOL (Expected opportunity loss), which facilitate effective decision-making.