This package provides a comprehensive suite of functions for processing and visualizing taxonomic data. It includes functionality to clean and transform taxonomic data, categorize it into hierarchical ranks (such as Phylum, Class, Order, Family, and Genus), and calculate the relative abundance of each category. The package also generates a color palette for visual representation of the taxonomic data, allowing users to easily identify and differentiate between various taxonomic groups. Additionally, it features a river plot visualization to effectively display the distribution of individuals across different taxonomic ranks, facilitating insights into taxonomic visualization.
This package contains miscellaneous functions useful for managing NetCDF
files (see <https://en.wikipedia.org/wiki/NetCDF>
), get moon phase and time for sun rise and fall, tide level, analyse and reconstruct periodic time series of temperature with irregular sinusoidal pattern, show scales and wind rose in plot with change of color of text, Metropolis-Hastings algorithm for Bayesian MCMC analysis, plot graphs or boxplot with error bars, search files in disk by there names or their content, read the contents of all files from a folder at one time.
Automate the detection of gaps and elevations in mapped sequencing read coverage using a 2D pattern-matching algorithm. ProActive
detects, characterizes and visualizes read coverage patterns in both genomes and metagenomes. Optionally, users may provide gene annotations associated with their genome or metagenome in the form of a .gff file. In this case, ProActive
will generate an additional output table containing the gene annotations found within the detected regions of gapped and elevated read coverage. Additionally, users can search for gene annotations of interest in the output read coverage plots.
Machine learning provides algorithms that can learn from data and make inferences or predictions. Stochastic automata is a class of input/output devices which can model components. This work provides implementation an inference algorithm for stochastic automata which is similar to the Viterbi algorithm. Moreover, we specify a learning algorithm using the expectation-maximization technique and provide a more efficient implementation of the Baum-Welch algorithm for stochastic automata. This work is based on Inference and learning in stochastic automata was by Karl-Heinz Zimmermann(2017) <doi:10.12732/ijpam.v115i3.15>.
Efficient Markov chain Monte Carlo (MCMC) algorithms for fully Bayesian estimation of time-varying parameter models with shrinkage priors, both dynamic and static. Details on the algorithms used are provided in Bitto and Frühwirth-Schnatter (2019) <doi:10.1016/j.jeconom.2018.11.006> and Cadonna et al. (2020) <doi:10.3390/econometrics8020020> and Knaus and Frühwirth-Schnatter (2023) <doi:10.48550/arXiv.2312.10487>
. For details on the package, please see Knaus et al. (2021) <doi:10.18637/jss.v100.i13>. For the multivariate extension, see the shrinkTVPVAR
package.
This package provides functions for importing external vector images and drawing them as part of R plots. This package is different from the grImport
package because, where that package imports PostScript format images, this package imports SVG format images. Furthermore, this package imports a specific subset of SVG, so external images must be preprocessed using a package like rsvg
to produce SVG that this package can import. SVG features that are not supported by R graphics, such as gradient fills, can be imported and then exported via the gridSVG
package.
This package provides a simple tool to quantify the amount of transmission of an infectious disease of interest occurring within and between population groups. bumblebee uses counts of observed directed transmission pairs, identified phylogenetically from deep-sequence data or from epidemiological contacts, to quantify transmission flows within and between population groups accounting for sampling heterogeneity. Population groups might include: geographical areas (e.g. communities, regions), demographic groups (e.g. age, gender) or arms of a randomized clinical trial. See the bumblebee website for statistical theory, documentation and examples <https://magosil86.github.io/bumblebee/>.
Color values in R are often represented as strings of hexadecimal colors or named colors. This package offers fast conversion of these color representations to either an array of red/green/blue/alpha values or to the packed integer format used in native raster objects. Functions for conversion are also exported at the C level for use in other packages. This fast conversion of colors is implemented using an order-preserving minimal perfect hash derived from Majewski et al (1996) "A Family of Perfect Hashing Methods" <doi:10.1093/comjnl/39.6.547>.
The purpose is to account for the random displacements (jittering) of true survey household cluster center coordinates in geostatistical analyses of Demographic and Health Surveys program (DHS) data. Adjustment for jittering can be implemented either in the spatial random effect, or in the raster/distance based covariates, or in both. Detailed information about the methods behind the package functionality can be found in two preprints. Umut Altay, John Paige, Andrea Riebler, Geir-Arne Fuglstad (2022) <arXiv:2202.11035v2>
. Umut Altay, John Paige, Andrea Riebler, Geir-Arne Fuglstad (2022) <arXiv:2211.07442v1>
.
Grey zones locally occur in an agreement table due to the subjective evaluation of raters based on various factors such as not having uniform guidelines, the differences between the raters level of expertise or low variability among the level of the categorical variable. It is important to detect grey zones since they cause a negative bias in the estimate of the agreement level. This package provides a function for detecting the existence of grey zones in two-way inter-rater agreement tables (Demirhan and Yilmaz (2023) <doi:10.1186/s12874-022-01759-7>).
Class imbalance usually damages the performance of classifiers. Thus, it is important to treat data before applying a classifier algorithm. This package includes recent resampling algorithms in the literature: (Barua et al. 2014) <doi:10.1109/tkde.2012.232>; (Das et al. 2015) <doi:10.1109/tkde.2014.2324567>, (Zhang et al. 2014) <doi:10.1016/j.inffus.2013.12.003>; (Gao et al. 2014) <doi:10.1016/j.neucom.2014.02.006>; (Almogahed et al. 2014) <doi:10.1007/s00500-014-1484-5>. It also includes an useful interface to perform oversampling.
Analysis of dichotomous and continuous response data using latent factor by both 1PL LSIRM and 2PL LSIRM as described in Jeon et al. (2021) <doi:10.1007/s11336-021-09762-5>. It includes original 1PL LSIRM and 2PL LSIRM provided for binary response data and its extension for continuous response data. Bayesian model selection with spike-and-slab prior and method for dealing data with missing value under missing at random, missing completely at random are also supported. Various diagnostic plots are available to inspect the latent space and summary of estimated parameters.
Vitamin and mineral deficiencies continue to be a significant public health problem. This is particularly critical in developing countries where deficiencies to vitamin A, iron, iodine, and other micronutrients lead to adverse health consequences. Cross-sectional surveys are helpful in answering questions related to the magnitude and distribution of deficiencies of selected vitamins and minerals. This package provides tools for calculating and determining select vitamin and mineral deficiencies based on World Health Organization (WHO) guidelines found at <https://www.who.int/teams/nutrition-and-food-safety/databases/vitamin-and-mineral-nutrition-information-system>.
The SoundexBR
package provides an algorithm for decoding names into phonetic codes, as pronounced in Portuguese. The goal is for homophones to be encoded to the same representation so that they can be matched despite minor differences in spelling. The algorithm mainly encodes consonants; a vowel will not be encoded unless it is the first letter. The soundex code resultant consists of a four digits long string composed by one letter followed by three numerical digits: the letter is the first letter of the name, and the digits encode the remaining consonants.
Using the adjustment method from Benjamini & Hochberg (1995) <doi:10.1111/j.2517-6161.1995.tb02031.x>, this package determines which variables are significant under repeated testing with a given dataframe of p values and an user defined "q" threshold. It then returns the original dataframe along with a significance column where an asterisk denotes a significant p value after FDR calculation, and NA denotes all other p values. This package uses the Benjamini & Hochberg method specifically as described in Lee, S., & Lee, D. K. (2018) <doi:10.4097/kja.d.18.00242>.
Data exploration and modelling is a process in which a lot of data artifacts are produced. Artifacts like: subsets, data aggregates, plots, statistical models, different versions of data sets and different versions of results. Archivist helps to store and manage artifacts created in R. It allows you to store selected artifacts as binary files together with their metadata and relations. Archivist allows sharing artifacts with others. It can look for already created artifacts by using its class, name, date of the creation or other properties. It also makes it easy to restore such artifacts.
Finite mixture models are a popular technique for modelling unobserved heterogeneity or to approximate general distribution functions in a semi-parametric way. They are used in a lot of different areas such as astronomy, biology, economics, marketing or medicine. This package is the implementation of popular robust mixture regression methods based on different algorithms including: fleximix, finite mixture models and latent class regression; CTLERob, component-wise adaptive trimming likelihood estimation; mixbi, bi-square estimation; mixL
, Laplacian distribution; mixt, t-distribution; TLE, trimmed likelihood estimation. The implemented algorithms includes: CTLERob stands for Component-wise adaptive Trimming Likelihood Estimation based mixture regression; mixbi stands for mixture regression based on bi-square estimation; mixLstands
for mixture regression based on Laplacian distribution; TLE stands for Trimmed Likelihood Estimation based mixture regression. For more detail of the algorithms, please refer to below references. Reference: Chun Yu, Weixin Yao, Kun Chen (2017) <doi:10.1002/cjs.11310>. NeyKov
N, Filzmoser P, Dimova R et al. (2007) <doi:10.1016/j.csda.2006.12.024>. Bai X, Yao W. Boyer JE (2012) <doi:10.1016/j.csda.2012.01.016>. Wennan Chang, Xinyu Zhou, Yong Zang, Chi Zhang, Sha Cao (2020) <arXiv:2005.11599>
.
Reads, writes, and edits EXIF and other file metadata using ExifTool
<https://exiftool.org/>, returning read results as a data frame. ExifTool
supports many different metadata formats including EXIF, GPS, IPTC, XMP, JFIF, GeoTIFF
, ICC Profile, Photoshop IRB, FlashPix
, AFCP and ID3, Lyrics3, as well as the maker notes of many digital cameras by Canon, Casio, DJI, FLIR, FujiFilm
, GE, GoPro
, HP, JVC/Victor, Kodak, Leaf, Minolta/Konica-Minolta, Motorola, Nikon, Nintendo, Olympus/Epson, Panasonic/Leica, Pentax/Asahi, Phase One, Reconyx, Ricoh, Samsung, Sanyo, Sigma/Foveon and Sony.
This package provides unsupervised selection and clustering of microarray data using mixture models. Following the methods described in McLachlan
, Bean and Peel (2002) <doi:10.1093/bioinformatics/18.3.413> a subset of genes are selected based one the likelihood ratio statistic for the test of one versus two components when fitting mixtures of t-distributions to the expression data for each gene. The dimensionality of this gene subset is further reduced through the use of mixtures of factor analyzers, allowing the tissue samples to be clustered by fitting mixtures of normal distributions.
DNA methylation of 5-methylcytosine (5mC
) is the result of a multi-step, enzyme-dependent process. Predicting these sites in-vitro is laborious, time consuming as well as costly. This Gb5mC-Pred
package is an in-silico pipeline for predicting DNA sequences containing the 5mC
sites. It uses a machine learning approach which uses Stochastic Gradient Boosting approach for prediction of the sequences with 5mC
sites. This package has been developed by using the concept of Navarez and Roxas (2022) <doi:10.1109/TCBB.2021.3082184>.
By analyzing time series, it is possible to observe significant changes in the behavior of observations that frequently characterize events. Events present themselves as anomalies, change points, or motifs. In the literature, there are several methods for detecting events. However, searching for a suitable time series method is a complex task, especially considering that the nature of events is often unknown. This work presents Harbinger, a framework for integrating and analyzing event detection methods. Harbinger contains several state-of-the-art methods described in Salles et al. (2020) <doi:10.5753/sbbd.2020.13626>.
This package contains functions for applying the horseshoe prior to high- dimensional linear regression, yielding the posterior mean and credible intervals, amongst other things. The key parameter tau can be equipped with a prior or estimated via maximum marginal likelihood estimation (MMLE). The main function, horseshoe, is for linear regression. In addition, there are functions specifically for the sparse normal means problem, allowing for faster computation of for example the posterior mean and posterior variance. Finally, there is a function available to perform variable selection, using either a form of thresholding, or credible intervals.
This is an open-source software designed specifically for text mining in the Persian language. It allows users to examine word frequencies, download data for analysis, and generate word clouds. This tool is particularly useful for researchers and analysts working with Persian language data. This package mainly makes use of the PersianStemmer
(Safshekan, R., et al. (2019). <https://CRAN.R-project.org/package=PersianStemmer>
), udpipe (Wijffels, J., et al. (2023). <https://CRAN.R-project.org/package=udpipe>), and shiny (Chang, W., et al. (2023). <https://CRAN.R-project.org/package=shiny>) packages.
Plot both fixed and random effects of linear mixed models, multilevel models in a single spaghetti plot. The package allows to visualize the effect of a predictor on a criterion between different levels of a grouping variable. Additionally, confidence intervals can be displayed for fixed effects. Calculation of predicted values of random effects allows only models with one random intercept and/or one random slope to be plotted. Confidence intervals and predicted values of fixed effects are computed using the ggpredict function from the ggeffects package. Lüdecke, D. (2018) <doi:10.21105/joss.00638>.