The package conducts pathway testing from untargetted metabolomics data. It requires the user to supply feature-level test results, from case-control testing, regression, or other suitable feature-level tests for the study design. Weights are given to metabolic features based on how many metabolites they could potentially match to. The package can combine positive and negative mode results in pathway tests.
Researchers commonly need to summarize scientific information, a process known as evidence synthesis'. The first stage of a synthesis process (such as a systematic review or meta-analysis) is to download a list of references from academic search engines such as Web of Knowledge or Scopus'. The traditional approach to systematic review is then to sort these data manually, first by locating and removing duplicated entries, and then screening to remove irrelevant content by viewing titles and abstracts (in that order). revtools provides interfaces for each of these tasks. An alternative approach, however, is to draw on tools from machine learning to visualise patterns in the corpus. In this case, you can use revtools to render ordinations of text drawn from article titles, keywords and abstracts, and interactively select or exclude individual references, words or topics.
Flexible framework for ecological restoration planning. It aims to identify priority areas for restoration efforts using optimization algorithms (based on Justeau-Allaire et al. 2021 <doi:10.1111/1365-2664.13803>). Priority areas can be identified by maximizing landscape indices, such as the effective mesh size (Jaeger 2000 <doi:10.1023/A:1008129329289>), or the integral index of connectivity (Pascual-Hortal & Saura 2006 <doi:10.1007/s10980-006-0013-z>). Additionally, constraints can be used to ensure that priority areas exhibit particular characteristics (e.g., ensure that particular places are not selected for restoration, ensure that priority areas form a single contiguous network). Furthermore, multiple near-optimal solutions can be generated to explore multiple options in restoration planning. The package leverages the Choco-solver software to perform optimization using constraint programming (CP) techniques (<https://choco-solver.org/>).
The implemented R6 class SCM aims to simplify working with structural causal models. The missing data mechanism can be defined as a part of the structural model. The class contains methods for 1) defining a structural causal model via functions, text or conditional probability tables, 2) printing basic information on the model, 3) plotting the graph for the model using packages igraph or qgraph', 4) simulating data from the model, 5) applying an intervention, 6) checking the identifiability of a query using the R packages causaleffect and dosearch', 7) defining the missing data mechanism, 8) simulating incomplete data from the model according to the specified missing data mechanism and 9) checking the identifiability in a missing data problem using the R package dosearch'. In addition, there are functions for running experiments and doing counterfactual inference using simulation.
This package performs statistical testing to compare predictive models based on multiple observations of the A statistic (also known as Area Under the Receiver Operating Characteristic Curve, or AUC). Specifically, it implements a testing method based on the equivalence between the A statistic and the Wilcoxon statistic. For more information, see Hanley and McNeil
(1982) <doi:10.1148/radiology.143.1.7063747>.
Allows access to the data found in the species list featured in the renowned List of the Birds of Peru Plenge, M. A. (2023) <https://sites.google.com/site/boletinunop/checklist>. This publication stands as one of Peru's most comprehensive reviews of bird diversity. The dataset incorporates detailed species accounts and has been meticulously structured for effortless utilization within the R environment.
Bayesian seemingly unrelated regression with general variable selection and dense/sparse covariance matrix. The sparse seemingly unrelated regression is described in Bottolo et al. (2021) <doi:10.1111/rssc.12490>, the software paper is in Zhao et al. (2021) <doi:10.18637/jss.v100.i11>, and the model with random effects is described in Zhao et al. (2024) <doi:10.1093/jrsssc/qlad102>.
Use BirdNET
', a state-of-the-art deep learning classifier, to automatically identify (bird) sounds. Analyze bioacoustic datasets without any computer science background using a pre-trained model or a custom trained classifier. Predict bird species occurrence based on location and week of the year. Kahl, S., Wood, C. M., Eibl, M., & Klinck, H. (2021) <doi:10.1016/j.ecoinf.2021.101236>.
Covariance is of universal prevalence across various disciplines within statistics. We provide a rich collection of geometric and inferential tools for convenient analysis of covariance structures, topics including distance measures, mean covariance estimator, covariance hypothesis test for one-sample and two-sample cases, and covariance estimation. For an introduction to covariance in multivariate statistical analysis, see Schervish (1987) <doi:10.1214/ss/1177013111>.
Generate multivariate color palettes to represent two-dimensional or three-dimensional data in graphics (in contrast to standard color palettes that represent just one variable). You tell colors3d how to map color space onto your data, and it gives you a color for each data point. You can then use these colors to make plots in base R', ggplot2', or other graphics frameworks.
This package provides API access to the Government of Canada Vehicle Recalls Database <https://tc.api.canada.ca/en/detail?api=VRDB> used by the Defect Investigations and Recalls Division for vehicles, tires, and child car seats. The API wrapper provides access to recall summary information searched using make, model, and year range, as well as detailed recall information searched using recall number.
Estimation of incidence and case fatality for a chronic disease, given partial information, using a multi-state model. Given data on age-specific mortality and either incidence or prevalence, Bayesian inference is used to estimate the posterior distributions of incidence, case fatality, and functions of these such as prevalence. The methods are described in Jackson et al. (2023) <doi:10.1093/jrsssa/qnac015>.
This package performs calculations with tree taper (or stem profile) equations, including model fitting. The package implements the methods from GarcĂ a, O. (2015) "Dynamic modelling of tree form" <http://mcfns.net/index.php/Journal/article/view/MCFNS7.1_2>. The models are parsimonious, describe well the tree bole shape over its full length, and are consistent with wood formation mechanisms through time.
The FisherEM
algorithm, proposed by Bouveyron & Brunet (2012) <doi:10.1007/s11222-011-9249-9>, is an efficient method for the clustering of high-dimensional data. FisherEM
models and clusters the data in a discriminative and low-dimensional latent subspace. It also provides a low-dimensional representation of the clustered data. A sparse version of Fisher-EM algorithm is also provided.
The ability to tune models is important. finetune enhances the tune package by providing more specialized methods for finding reasonable values of model tuning parameters. Two racing methods described by Kuhn (2014) <doi:10.48550/arXiv.1405.6974>
are included. An iterative search method using generalized simulated annealing (Bohachevsky, Johnson and Stein, 1986) <doi:10.1080/00401706.1986.10488128> is also included.
Anonymized data from surveys conducted by Forwards <https://forwards.github.io/>, the R Foundation task force on women and other under-represented groups. Currently, a single data set of responses to a survey of attendees at useR
! 2016 <https://www.r-project.org/useR-2016/>
, the R user conference held at Stanford University, Stanford, California, USA, June 27 - June 30 2016.
Ease the transition between R vectors and markdown text. With gluedown and rmarkdown', users can create traditional vectors in R, glue those strings together with the markdown syntax, and print those formatted vectors directly to the document. This package primarily uses GitHub
Flavored Markdown (GFM), an offshoot of the unambiguous CommonMark
specification by John MacFarlane
(2019) <https://spec.commonmark.org/>.
Utilizes methods of the PyMongo
Python library to initialize, insert and query GeoJson
data (see <https://github.com/mongodb/mongo-python-driver> for more information on PyMongo
'). Furthermore, it allows the user to validate GeoJson
objects and to use the console for MongoDB
(bulk) commands. The reticulate package provides the R interface to Python modules, classes and functions.
This package provides functions and data are provided that support a course that emphasizes statistical issues of inference and generalizability. The functions are designed to make it straightforward to illustrate the use of cross-validation, the training/test approach, simulation, and model-based estimates of accuracy. Methods considered are Generalized Additive Modeling, Linear and Quadratic Discriminant Analysis, Tree-based methods, and Random Forests.
This package creates styled tables for data presentation. Export to HTML, LaTeX
, RTF, Word', Excel', and PowerPoint
'. Simple, modern interface to manipulate borders, size, position, captions, colours, text styles and number formatting. Table cells can span multiple rows and/or columns. Includes a huxreg function for creation of regression tables, and quick_* one-liners to print data to a new document.
Data from the United States Center for Medicare and Medicaid Services (CMS) is included in this package. There are ICD-9 and ICD-10 diagnostic and procedure codes, and lists of the chapter and sub-chapter headings and the ranges of ICD codes they encompass. There are also two sample datasets. These data are used by the icd package for finding comorbidities.
This package provides a pipeline to annotate chromatography peaks from the IDSL.IPA workflow <doi:10.1021/acs.jproteome.2c00120> with molecular formulas of a prioritized chemical space using an isotopic profile matching approach. The IDSL.UFA workflow only requires mass spectrometry level 1 (MS1) data for formula annotation. The IDSL.UFA methods was described in <doi:10.1021/acs.analchem.2c00563> .
This package provides a set of tools designed to enhance transparency and understanding of date-time manipulation functions from the lubridate package. It provides detailed feedback about the operations performed by lubridate functions, allowing users to better comprehend and debug their code. These insights serve as both a learning tool for newcomers and a debugging aid for programmers working with date-time data.
This is for code management functions, NLP tools, a Monty Hall simulator, and for implementing my own variable reduction technique called Feed Reduction. The Feed Reduction technique is not yet published, but is merely a tool for implementing a series of binary neural networks meant for reducing data into N dimensions, where N is the number of possible values of the response variable.