Supports the analysis of oceanographic data recorded by Argo autonomous drifting profiling floats. Functions are provided to (a) download and cache data files, (b) subset data in various ways, (c) handle quality-control flags and (d) plot the results according to oceanographic conventions. A shiny app is provided for easy exploration of datasets. The package is designed to work well with the oce package, providing a wide range of processing capabilities that are particular to oceanographic analysis. See Kelley, Harbin, and Richards (2021) <doi:10.3389/fmars.2021.635922> for more on the scientific context and applications.
This package provides a set of functions to allow analysis of count data (such as faecal egg count data) using Bayesian MCMC methods. Returns information on the possible values for mean count, coefficient of variation and zero inflation (true prevalence) present in the data. A complete faecal egg count reduction test (FECRT) model is implemented, which returns inference on the true efficacy of the drug from the pre- and post-treatment data provided, using non-parametric bootstrapping as well as using Bayesian MCMC. Functions to perform power analyses for faecal egg counts (including FECRT) are also provided.
Fit growth models to otoliths and/or tagging data, using the RTMB package and maximum likelihood. The otoliths (or similar measurements of age) provide direct observed coordinates of age and length. The tagging data provide information about the observed length at release and length at recapture at a later time, where the age at release is unknown and estimated as a vector of parameters. The growth models provided by this package can be fitted to otoliths only, tagging data only, or a combination of the two. Growth variability can be modelled as constant or increasing with length.
It provides classifiers which can be used for discrete variables and for continuous variables based on the Naive Bayes and Fuzzy Naive Bayes hypothesis. Those methods were developed by researchers belong to the Laboratory of Technologies for Virtual Teaching and Statistics (LabTEVE
) and Laboratory of Applied Statistics to Image Processing and Geoprocessing (LEAPIG) at Federal University of Paraiba, Brazil'. They considered some statistical distributions and their papers were published in the scientific literature, as for instance, the Gaussian classifier using fuzzy parameters, proposed by Moraes, Ferreira and Machado (2021) <doi:10.1007/s40815-020-00936-4>.
Write beautiful yet customizable letters in R Markdown and directly obtain the finished PDF. Smooth generation of PDFs is realized by rmarkdown', the pandoc-letter template and the KOMA-Script letter class. KOMA-Script provides enhanced replacements for the standard LaTeX
classes with emphasis on typography and versatility. KOMA-Script is particularly useful for international writers as it handles various paper formats well, provides layouts for many common window envelope types (e.g. German, US, French, Japanese) and lets you define your own layouts. The package comes with a default letter layout based on DIN 5008B'.
This package contains the development of a tool that provides a web-based graphical user interface (GUI) to perform Biplots representations from a scraping of news from digital newspapers under the Bayesian approach of Latent Dirichlet Assignment (LDA) and machine learning algorithms. Contains LDA methods described by Blei , David M., Andrew Y. Ng and Michael I. Jordan (2003) <https://jmlr.org/papers/volume3/blei03a/blei03a.pdf>, and Biplot methods described by Gabriel K.R(1971) <doi:10.1093/biomet/58.3.453> and Galindo-Villardon P(1986) <https://diarium.usal.es/pgalindo/files/2012/07/Questiio.pdf>.
This package provides a tidy workflow for landscape-scale analysis. multilandr offers tools to generate landscapes at multiple spatial scales and compute landscape metrics, primarily using the landscapemetrics package. It also features utility functions for plotting and analyzing multi-scale landscapes, exploring correlations between metrics, filtering landscapes based on specific conditions, generating landscape gradients for a given metric, and preparing datasets for further statistical analysis. Documentation about multilandr is provided in an introductory vignette included in this package and in the paper by Huais (2024) <doi:10.1007/s10980-024-01930-z>; see citation("multilandr") for details.
An exploratory and heuristic approach for specification search in Structural Equation Modeling. The basic idea is to subsample the original data and then search for optimal models on each subset. Optimality is defined through two objectives: model fit and parsimony. As these objectives are conflicting, we apply a multi-objective optimization methods, specifically NSGA-II, to obtain optimal models for the whole range of model complexities. From these optimal models, we consider only the relevant model specifications (structures), i.e., those that are both stable (occur frequently) and parsimonious and use those to infer a causal model.
Interactive R package with an intuitive Shiny-based graphical interface for alternative splicing quantification and integrative analyses of alternative splicing and gene expression based on The Cancer Genome Atlas (TCGA), the Genotype-Tissue Expression project (GTEx), Sequence Read Archive (SRA) and user-provided data. The tool interactively performs survival, dimensionality reduction and median- and variance-based differential splicing and gene expression analyses that benefit from the incorporation of clinical and molecular sample-associated features (such as tumour stage or survival). Interactive visual access to genomic mapping and functional annotation of selected alternative splicing events is also included.
Helps visualizing what is summarized in Pearson's correlation coefficient. That is, it visualizes its main constituent, namely the distances of the single values to their respective mean. The visualization thereby shows what the etymology of the word correlation contains: In pairwise combination, bringing back (see package Vignette for more details). I hope that the correlatio package may benefit some people in understanding and critically evaluating what Pearson's correlation coefficient summarizes in a single number, i.e., to what degree and why Pearson's correlation coefficient may (or may not) be warranted as a measure of association.
It can be used to create/encode molecular "license-plates" from sequences and to also decode the "license-plates" back to sequences. While initially created for transfer RNA-derived small fragments (tRFs
), this tool can be used for any genomic sequences including but not limited to: tRFs
, microRNAs
, etc. The detailed information can reference to Pliatsika V, Loher P, Telonis AG, Rigoutsos I (2016) <doi:10.1093/bioinformatics/btw194>. It can also be used to annotate tRFs
. The detailed information can reference to Loher P, Telonis AG, Rigoutsos I (2017) <doi:10.1038/srep41184>.
An open source software package written in R statistical language. It consist in a set of decision making tools to conduct missing person searches. Particularly, it allows computing optimal LR threshold for declaring potential matches in DNA-based database search. More recently mispitools incorporates preliminary investigation data based LRs. Statistical weight of different traces of evidence such as biological sex, age and hair color are presented. For citing mispitools please use the following references: Marsico and Caridi, 2023 <doi:10.1016/j.fsigen.2023.102891> and Marsico, Vigeland et al. 2021 <doi:10.1016/j.fsigen.2021.102519>.
An implementation of the cross-validated difference in means (CVDM) test by Desmarais and Harden (2014) <doi:10.1007/s11135-013-9884-7> (see also Harden and Desmarais, 2011 <doi:10.1177/1532440011408929>) and the cross-validated median fit (CVMF) test by Desmarais and Harden (2012) <doi:10.1093/pan/mpr042>. These tests use leave-one-out cross-validated log-likelihoods to assist in selecting among model estimations. You can also utilize data from Golder (2010) <doi:10.1177/0010414009341714> and Joshi & Mason (2008) <doi:10.1177/0022343308096155> that are included to facilitate examples from real-world analysis.
Implement and enhance the performance of spatial fuzzy clustering using Fuzzy Geographically Weighted Clustering with various optimization algorithms, mainly from Xin She Yang (2014) <ISBN:9780124167438> with book entitled Nature-Inspired Optimization Algorithms. The optimization algorithm is useful to tackle the disadvantages of clustering inconsistency when using the traditional approach. The distance measurements option is also provided in order to increase the quality of clustering results. The Fuzzy Geographically Weighted Clustering with nature inspired optimisation algorithm was firstly developed by Arie Wahyu Wijayanto and Ayu Purwarianti (2014) <doi:10.1109/CITSM.2014.7042178> using Artificial Bee Colony algorithm.
This package implements multi-study learning algorithms such as merging, the study-specific ensemble (trained-on-observed-studies ensemble) the study strap, the covariate-matched study strap, covariate-profile similarity weighting, and stacking weights. Embedded within the caret framework, this package allows for a wide range of single-study learners (e.g., neural networks, lasso, random forests). The package offers over 20 default similarity measures and allows for specification of custom similarity measures for covariate-profile similarity weighting and an accept/reject step. This implements methods described in Loewinger, Kishida, Patil, and Parmigiani. (2019) <doi:10.1101/856385>.
Reconstruct phylogenetic trees from discrete data. Inapplicable character states are handled using the algorithm of Brazeau, Guillerme and Smith (2019) <doi:10.1093/sysbio/syy083> with the "Morphy" library, under equal or implied step weights. Contains a "shiny" user interface for interactive tree search and exploration of results, including character visualization, rogue taxon detection, tree space mapping, and cluster consensus trees (Smith 2022a, b) <doi:10.1093/sysbio/syab099>, <doi:10.1093/sysbio/syab100>. Profile Parsimony (Faith and Trueman, 2001) <doi:10.1080/10635150118627>, Successive Approximations (Farris, 1969) <doi:10.2307/2412182> and custom optimality criteria are implemented.
This package implements functions to retrieve the nearest genes around the peak, annotate genomic region of the peak, statstical methods for estimate the significance of overlap among ChIP peak data sets, and incorporate GEO database for user to compare the own dataset with those deposited in database. The comparison can be used to infer cooperative regulation and thus can be used to generate hypotheses. Several visualization functions are implemented to summarize the coverage of the peak experiment, average profile and heatmap of peaks binding to TSS regions, genomic annotation, distance to TSS, and overlap of peaks or genes.
Utilities for working with hourly air quality monitoring data with a focus on small particulates (PM2.5). A compact data model is structured as a list with two dataframes. A meta dataframe contains spatial and measuring device metadata associated with deployments at known locations. A data dataframe contains a datetime column followed by columns of measurements associated with each "device-deployment". Algorithms to calculate NowCast
and the associated Air Quality Index (AQI) are defined at the US Environmental Projection Agency AirNow
program: <https://document.airnow.gov/technical-assistance-document-for-the-reporting-of-daily-air-quailty.pdf>.
The kernel of this Rcpp based package is an efficient implementation of the generalized gradient projection method for spline function based constrained maximum likelihood estimator for interval censored survival data (Wu, Yuan; Zhang, Ying. Partially monotone tensor spline estimation of the joint distribution function with bivariate current status data. Ann. Statist. 40, 2012, 1609-1636 <doi:10.1214/12-AOS1016>). The key function computes the density function of the joint distribution of event time and the marker and returns the receiver operating characteristic (ROC) curve for the interval censored survival data as well as area under the curve (AUC).
The developed function is designed to facilitate the seamless conversion of KML (Keyhole Markup Language) files to Shapefiles while preserving attribute values. It provides a straightforward interface for users to effortlessly import KML data, extract relevant attributes, and export them into the widely compatible Shapefile format. The package ensures accurate representation of spatial data while maintaining the integrity of associated attribute information. For details see, Flores, G. (2021). <DOI:10.1007/978-3-030-63665-4_15>. Whether for spatial analysis, visualization, or data interoperability, it simplifies the conversion process and empowers users to seamlessly work with geospatial datasets.
Convenient wrapper functions for the analysis of matrix-assisted laser desorption/ionization-time-of-flight (MALDI-TOF) spectra data in order to select only representative spectra (also called cherry-pick). The package covers the preprocessing and dereplication steps (based on Strejcek, Smrhova, Junkova and Uhlik (2018) <doi:10.3389/fmicb.2018.01294>) needed to cluster MALDI-TOF spectra before the final cherry-picking step. It enables the easy exclusion of spectra and/or clusters to accommodate complex cherry-picking strategies. Alternatively, cherry-picking using taxonomic identification MALDI-TOF data is made easy with functions to import inconsistently formatted reports.
The goal of snpsettest is to provide simple tools that perform set-based association tests (e.g., gene-based association tests) using GWAS (genome-wide association study) summary statistics. A set-based association test in this package is based on the statistical model described in VEGAS (versatile gene-based association study), which combines the effects of a set of SNPs accounting for linkage disequilibrium between markers. This package uses a different approach from the original VEGAS implementation to compute set-level p values more efficiently, as described in <https://github.com/HimesGroup/snpsettest/wiki/Statistical-test-in-snpsettest>
.
CellTrails
is an unsupervised algorithm for the de novo chronological ordering, visualization and analysis of single-cell expression data. CellTrails
makes use of a geometrically motivated concept of lower-dimensional manifold learning, which exhibits a multitude of virtues that counteract intrinsic noise of single cell data caused by drop-outs, technical variance, and redundancy of predictive variables. CellTrails
enables the reconstruction of branching trajectories and provides an intuitive graphical representation of expression patterns along all branches simultaneously. It allows the user to define and infer the expression dynamics of individual and multiple pathways towards distinct phenotypes.
This package provides tools for the calculation of common biodiversity indices from count data. Additionally, it incorporates bootstrapping techniques to generate multiple samples, facilitating the estimation of confidence intervals around these indices. Furthermore, the package allows for the exploration of how variation in these indices changes with differing numbers of sites, making it a useful tool with which to begin an ecological analysis. Methods are based on the following references: Chao et al. (2014) <doi:10.1890/13-0133.1>, Chao and Colwell (2022) <doi:10.1002/9781119902911.ch2>, Hsieh, Ma,` and Chao (2016) <doi:10.1111/2041-210X.12613>.