Analyze data from next-generation sequencing experiments on genomic samples. CLONETv2 offers a set of functions to compute allele specific copy number and clonality from segmented data and SNPs position pileup. The package has also calculated the clonality of single nucleotide variants given read counts at mutated positions. The package has been developed at the laboratory of Computational and Functional Oncology, Department of CIBIO, University of Trento (Italy), under the supervision of prof Francesca Demichelis. References: Prandi et al. (2014) <doi:10.1186/s13059-014-0439-6>; Carreira et al. (2014) <doi:10.1126/scitranslmed.3009448>; Romanel et al. (2015) <doi:10.1126/scitranslmed.aac9511>.
This package provides a flexible container to transport and manipulate complex sets of data. These data may consist of multiple data files and associated meta data and ancillary files. Individual data objects have associated system level meta data, and data files are linked together using the OAI-ORE standard resource map which describes the relationships between the files. The OAI- ORE standard is described at <https://www.openarchives.org/ore/>. Data packages can be serialized and transported as structured files that have been created following the BagIt
specification. The BagIt
specification is described at <https://tools.ietf.org/html/draft-kunze-bagit-08>.
This package implements choice models based on economic theory, including estimation using Markov chain Monte Carlo (MCMC), prediction, and more. Its usability is inspired by ideas from tidyverse'. Models include versions of the Hierarchical Multinomial Logit and Multiple Discrete-Continous (Volumetric) models with and without screening. The foundations of these models are described in Allenby, Hardt and Rossi (2019) <doi:10.1016/bs.hem.2019.04.002>. Models with conjunctive screening are described in Kim, Hardt, Kim and Allenby (2022) <doi:10.1016/j.ijresmar.2022.04.001>. Models with set-size variation are described in Hardt and Kurz (2020) <doi:10.2139/ssrn.3418383>.
This package provides a toolkit for the analysis and management of data for genes in the so-called "Human Leukocyte Antigen" (HLA) region. Functions extract reference data from the Anthony Nolan HLA Informatics Group/ImmunoGeneTics
HLA GitHub
repository (ANHIG/IMGTHLA) <https://github.com/ANHIG/IMGTHLA>, validate Genotype List (GL) Strings, convert between UNIFORMAT and GL String Code (GLSC) formats, translate HLA alleles and GLSCs across ImmunoPolymorphism
Database (IPD) IMGT/HLA Database release versions, identify differences between pairs of alleles at a locus, generate customized, multi-position sequence alignments, trim and convert allele-names across nomenclature epochs, and extend existing data-analysis methods.
This package provides functionality for working with raster-like quadtrees (also called â region quadtreesâ ), which allow for variable-sized cells. The package allows for flexibility in the quadtree creation process. Several functions defining how to split and aggregate cells are provided, and custom functions can be written for both of these processes. In addition, quadtrees can be created using other quadtrees as â templatesâ , so that the new quadtree's structure is identical to the template quadtree. The package also includes functionality for modifying quadtrees, querying values, saving quadtrees to a file, and calculating least-cost paths using the quadtree as a resistance surface.
RcppArmadillo
implementation for the Matlab code of the Variational Mode Decomposition and Two-Dimensional Variational Mode Decomposition'. For more information, see (i) Variational Mode Decomposition by K. Dragomiretskiy and D. Zosso in IEEE Transactions on Signal Processing, vol. 62, no. 3, pp. 531-544, Feb.1, 2014, <doi:10.1109/TSP.2013.2288675>; (ii) Two-Dimensional Variational Mode Decomposition by Dragomiretskiy, K., Zosso, D. (2015), In: Tai, XC., Bae, E., Chan, T.F., Lysaker, M. (eds) Energy Minimization Methods in Computer Vision and Pattern Recognition. EMMCVPR 2015. Lecture Notes in Computer Science, vol 8932. Springer, <doi:10.1007/978-3-319-14612-6_15>.
This package is Cytometry dATa anALYSis Tools (CATALYST). Mass cytometry like Cytometry by time of flight (CyTOF) uses heavy metal isotopes rather than fluorescent tags as reporters to label antibodies, thereby substantially decreasing spectral overlap and allowing for examination of over 50 parameters at the single cell level. While spectral overlap is significantly less pronounced in CyTOF than flow cytometry, spillover due to detection sensitivity, isotopic impurities, and oxide formation can impede data interpretability. CATALYST
was designed to provide a pipeline for preprocessing of cytometry data, including:
normalization using bead standards;
single-cell deconvolution;
bead-based compensation.
Allows access to selected services that are part of the Google Adwords API <https://developers.google.com/adwords/api/docs/guides/start>. Google Adwords is an online advertising service by Google', that delivers Ads to users. This package offers a authentication process using OAUTH2'. Currently, there are two methods of data of accessing the API, depending on the type of request. One method uses SOAP requests which require building an XML structure and then sent to the API. These are used for the ManagedCustomerService
and the TargetingIdeaService
'. The second method is by building AWQL queries for the reporting side of the Google Adwords API.
Simplify bivariate and regression analyses by automating result generation, including summary tables, statistical tests, and customizable graphs. It supports tests for continuous and dichotomous data, as well as stepwise regression for linear, logistic, and Firth penalized logistic models. While not a substitute for tailored analysis, BiVariAn
accelerates workflows and is expanding features like multilingual interpretations of results.The methods for selecting significant statistical tests, as well as the predictor selection in prediction functions, can be referenced in the works of Marc Kery (2003) <doi:10.1890/0012-9623(2003)84[92:NORDIG]2.0.CO;2> and Rainer Puhr (2017) <doi:10.1002/sim.7273>.
There are many different formats dates are commonly represented with: the order of day, month, or year can differ, different separators ("-", "/", or whitespace) can be used, months can be numerical, names, or abbreviations and year given as two digits or four. datefixR
takes dates in all these different formats and converts them to R's built-in date class. If datefixR
cannot standardize a date, such as because it is too malformed, then the user is told which date cannot be standardized and the corresponding ID for the row. datefixR
also allows the imputation of missing days and months with user-controlled behavior.
Implementation of the double/debiased machine learning framework of Chernozhukov et al. (2018) <doi:10.1111/ectj.12097> for partially linear regression models, partially linear instrumental variable regression models, interactive regression models and interactive instrumental variable regression models. DoubleML
allows estimation of the nuisance parts in these models by machine learning methods and computation of the Neyman orthogonal score functions. DoubleML
is built on top of mlr3 and the mlr3 ecosystem. The object-oriented implementation of DoubleML
based on the R6 package is very flexible. More information available in the publication in the Journal of Statistical Software: <doi:10.18637/jss.v108.i03>.
Lactation curve modeling plays a central role in dairy production, supporting management decisions and the selection of animals with superior productivity and resilience. The package EMOTIONS fits 47 models for lactation curves and creates ensemble models using model averaging based on Akaike information criterion, Bayesian information criterion, root mean square percentage error, and mean squared error, variance of the predictions, cosine similarity for each model's predictions, and Bayesian Model Average. The daily production values predicted through the ensemble models can be used to estimate resilience indicators in the package. Additionally, the package allows the graphical visualization of the model ranks and the predicted lactation curves.
We implement various classical tests for the composite hypothesis of testing the fit to the family of gamma distributions as the Kolmogorov-Smirnov test, the Cramer-von Mises test, the Anderson Darling test and the Watson test. For each test a parametric bootstrap procedure is implemented, as considered in Henze, Meintanis & Ebner (2012) <doi:10.1080/03610926.2010.542851>. The recent procedures presented in Henze, Meintanis & Ebner (2012) <doi:10.1080/03610926.2010.542851> and Betsch & Ebner (2019) <doi:10.1007/s00184-019-00708-7> are implemented. Estimation of parameters of the gamma law are implemented using the method of Bhattacharya (2001) <doi:10.1080/00949650108812100>.
Converts table-like objects to stand-alone PDF or PNG. Can be used to embed tables and arbitrary content in PDF or Word documents. Provides a low-level R interface for creating LaTeX
code, e.g. command()
and a high-level interface for creating PDF documents, e.g. as.pdf.data.frame()
. Extensive customization is available via mid-level functions, e.g. as.tabular()
. See also package?latexpdf'. Support for PNG is experimental; see as.png.data.frame'. Adapted from metrumrg <https://r-forge.r-project.org/R/?group_id=1215>. Requires a compatible installation of pdflatex', e.g. <https://miktex.org/>.
This function obtains a Random Number Generator (RNG) or collection of RNGs that replicate the required parameter(s) of a distribution for a time series of data. Consider the case of reproducing a time series data set of size 20 that uses an autoregressive (AR) model with phi = 0.8 and standard deviation equal to 1. When one checks the arima.sin()
function's estimated parameters, it's possible that after a single trial or a few more, one won't find the precise parameters. This enables one to look for the ideal RNG setting for a simulation that will accurately duplicate the desired parameters.
This package provides tools for building reinforcement learning (RL) models specifically tailored for Two-Alternative Forced Choice (TAFC) tasks, commonly employed in psychological research. These models build upon the foundational principles of model-free reinforcement learning detailed in Sutton and Barto (2018) <ISBN:9780262039246>. The package allows for the intuitive definition of RL models using simple if-else statements. Our approach to constructing and evaluating these computational models is informed by the guidelines proposed in Wilson & Collins (2019) <doi:10.7554/eLife.49547>
. Example datasets included with the package are sourced from the work of Mason et al. (2024) <doi:10.3758/s13423-023-02415-x>.
Intended to analyse recordings from multiple microphones (e.g., backpack microphones in captive setting). It allows users to align recordings even if there is non-linear drift of several minutes between them. A call detection and assignment pipeline can be used to find vocalisations and assign them to the vocalising individuals (even if the vocalisation is picked up on multiple microphones). The tracing and measurement functions allow for detailed analysis of the vocalisations and filtering of noise. Finally, the package includes a function to run spectrographic cross correlation, which can be used to compare vocalisations. It also includes multiple other functions related to analysis of vocal behaviour.
This package provides functions and a workflow to easily and powerfully calculating specificity, sensitivity and ROC curves of biomarkers combinations. Allows to rank and select multi-markers signatures as well as to find the best performing sub-signatures, now also from single-cell RNA-seq datasets. The method used was first published as a Shiny app and described in Mazzara et al. (2017) <doi:10.1038/srep45477> and further described in Bombaci & Rossi (2019) <doi:10.1007/978-1-4939-9164-8_16>, and widely expanded as a package as presented in the bioRxiv
pre print Ferrari et al. <doi:10.1101/2022.01.17.476603>.
Easily automate the following tasks to describe data frames: Summarise the distributions, and labelled missings of variables graphically and using descriptive statistics. For surveys, compute and summarise reliabilities (internal consistencies, retest, multilevel) for psychological scales. Combine this information with metadata (such as item labels and labelled values) that is derived from R attributes. To do so, the package relies on rmarkdown partials, so you can generate HTML, PDF, and Word documents. Codebooks are also available as tables (CSV, Excel, etc.) and in JSON-LD, so that search engines can find your data and index the metadata. The metadata are also available at your fingertips via RStudio Addins.
This package provides a comprehensive visualization toolkit built with coders of all skill levels and color-vision impaired audiences in mind. It allows creation of finely-tuned, publication-quality figures from single function calls. Visualizations include scatter plots, compositional bar plots, violin, box, and ridge plots, and more. Customization ranges from size and title adjustments to discrete-group circling and labeling, hidden data overlay upon cursor hovering via ggplotly()
conversion, and many more, all with simple, discrete inputs. Color blindness friendliness is powered by legend adjustments (enlarged keys), and by allowing the use of shapes or letter-overlay in addition to the carefully selected dittoColors()
.
This package provides wrapper of various machine learning models. In applied machine learning, there is a strong belief that we need to strike a balance between interpretability and accuracy. However, in field of the interpretable machine learning, there are more and more new ideas for explaining black-box models, that are implemented in R'. DALEXtra creates DALEX Biecek (2018) <arXiv:1806.08915>
explainer for many type of models including those created using python scikit-learn and keras libraries, and java h2o library. Important part of the package is Champion-Challenger analysis and innovative approach to model performance across subsets of test data presented in Funnel Plot.
This package provides functions to compute coefficients measuring the dependence of two or more than two variables. The functions can be deployed to gain information about functional dependencies of the variables with emphasis on monotone functions. The statistics describe how well one response variable can be approximated by a monotone function of other variables. In regression analysis the variable selection is an important issue. In this framework the functions could be useful tools in modeling the regression function. Detailed explanations on the subject can be found in papers Liebscher (2014) <doi:10.2478/demo-2014-0004>; Liebscher (2017) <doi:10.1515/demo-2017-0012>; Liebscher (2019, submitted).
Implementation of Das Gupta's standardisation and decomposition of population rates, as set out "Standardization and decomposition of rates: A userâ s manual", Das Gupta (1993) <https://www2.census.gov/library/publications/1993/demographics/p23-186.pdf>. The goal of these methods is to calculate adjusted rates based on compositional factors and quantify the contribution of each factor to the difference in crude rates between populations. The package offers functionality to handle various scenarios for any number of factors and populations, where said factors can be comprised of vectors across sub-populations (including cross-classified population breakdowns), and with the option to specify user-defined rate functions.
This package provides a toolbox for estimating vector fields from intensive longitudinal data, and construct potential landscapes thereafter. The vector fields can be estimated with two nonparametric methods: the Multivariate Vector Field Kernel Estimator (MVKE) by Bandi & Moloche (2018) <doi:10.1017/S0266466617000305> and the Sparse Vector Field Consensus (SparseVFC
) algorithm by Ma et al. (2013) <doi:10.1016/j.patcog.2013.05.017>. The potential landscapes can be constructed with a simulation-based approach with the simlandr package (Cui et al., 2021) <doi:10.31234/osf.io/pzva3>, or the Bhattacharya et al. (2011) method for path integration <doi:10.1186/1752-0509-5-85>.