The jsonlite package provides a fast JSON parser and generator optimized for statistical data and the web. It offers flexible, robust, high performance tools for working with JSON in R and is particularly powerful for building pipelines and interacting with a web API. In addition to converting JSON data from/to R objects, jsonlite contains functions to stream, validate, and prettify JSON data. The unit tests included with the package verify that all edge cases are encoded and decoded consistently for use with dynamic data in systems and applications.
This is a package to provide infrastructure for managing package parameters. Parameters are easy to get in relevant functions within a package, and rrror is thrown if a parameter is missing. Developers are able to register parameters and set their default value in a config file that is part of the package in YAML format, and users are able to override parameters using their own YAML. Users get an exception when trying to override a parameter that was not registered, and can load multiple parameters to the current environment.
Retro is a modern, pragmatic set of Forths drawing influence from many sources. It clean, elegant, tiny, easy to grasp, and adaptable to many tasks.
It's not a traditional Forth. Drawing influence from colorForth, it uses prefixes to guide the compiler. From Joy and Factor, it uses quotations (anonymous, nestable functions) and combinators (functions that operate on functions) for much of the stack and flow control. It also adds vocabularies for working with strings, arrays, and other data types. Source files are written in Unu, allowing for simple, literate sources.
Automated assessment of accuracy and geographical status of georeferenced biological data. The methods rely on reference regions, namely checklists and range maps. Includes functions to obtain data from the Global Biodiversity Information Facility <https://www.gbif.org/> and from the Global Inventory of Floras and Traits <https://gift.uni-goettingen.de/home>. Alternatively, the user can input their own data. Furthermore, provides easy visualisation of the data and the results through the plotting functions. Especially suited for large datasets. The reference for the methodology is: Arlé et al. (under review).
In practical applications, the assumptions underlying generalized linear models frequently face violations, including incorrect specifications of the outcome variable's distribution or omitted predictors. These deviations can render the results of standard generalized linear models unreliable. As the sample size increases, what might initially appear as minor issues can escalate to critical concerns. To address these challenges, we adopt a permutation-based inference method tailored for generalized linear models. This approach offers robust estimations that effectively counteract the mentioned problems, and its effectiveness remains consistent regardless of the sample size.
This package provides functions to calculate the out-of-bag learning curve for random forests for any measure that is available in the mlr package. Supported random forest packages are randomForest
and ranger and trained models of these packages with the train function of mlr'. The main function is OOBCurve()
that calculates the out-of-bag curve depending on the number of trees. With the OOBCurvePars()
function out-of-bag curves can also be calculated for mtry', sample.fraction and min.node.size for the ranger package.
This package implements a sequential imputation framework using Bayesian Mixed-Effects Trees ('SBMTrees') for handling missing data in longitudinal studies. The package supports a variety of models, including non-linear relationships and non-normal random effects and residuals, leveraging Dirichlet Process priors for increased flexibility. Key features include handling Missing at Random (MAR) longitudinal data, imputation of both covariates and outcomes, and generating posterior predictive samples for further analysis. The methodology is designed for applications in epidemiology, biostatistics, and other fields requiring robust handling of missing data in longitudinal settings.
In the past decade, genome-scale metabolic reconstructions have widely been used to comprehend the systems biology of metabolic pathways within an organism. Different GSMs are constructed using various techniques that require distinct steps, but the input data, information conversion and software tools are neither concisely defined nor mathematically or programmatically formulated in a context-specific manner.The tool that quantitatively and qualitatively specifies each reconstruction steps and can generate a template list of reconstruction steps dynamically selected from a reconstruction step reservoir, constructed based on all available published papers.
TensorFlow
SIG Addons <https://www.tensorflow.org/addons> is a repository of community contributions that conform to well-established API patterns, but implement new functionality not available in core TensorFlow
'. TensorFlow
natively supports a large number of operators, layers, metrics, losses, optimizers, and more. However, in a fast moving field like Machine Learning, there are many interesting new developments that cannot be integrated into core TensorFlow
(because their broad applicability is not yet clear, or it is mostly used by a smaller subset of the community).
The easylift package provides a convenient tool for genomic liftover operations between different genome assemblies. It seamlessly works with Bioconductor's GRanges objects and chain files from the UCSC Genome Browser, allowing for straightforward handling of genomic ranges across various genome versions. One noteworthy feature of easylift is its integration with the BiocFileCache
package. This integration automates the management and caching of chain files necessary for liftover operations. Users no longer need to manually specify chain file paths in their function calls, reducing the complexity of the liftover process.
This package provides a simple framework to facilitate the comparison of pipelines involving various steps and parameters. The `pipelineDefinition`
class represents pipelines as, minimally, a set of functions consecutively executed on the output of the previous one, and optionally accompanied by step-wise evaluation and aggregation functions. Given such an object, a set of alternative parameters/methods, and benchmark datasets, the `runPipeline`
function then proceeds through all combinations arguments, avoiding recomputing the same step twice and compiling evaluations on the fly to avoid storing potentially large intermediate data.
r-pathview
is a tool set for pathway based data integration and visualization. It maps and renders a wide variety of biological data on relevant pathway graphs. All users need is to supply their data and specify the target pathway. This package automatically downloads the pathway graph data, parses the data file, maps user data to the pathway, and render pathway graph with the mapped data. In addition, r-pathview
also seamlessly integrates with pathway and gene set (enrichment) analysis tools for large-scale and fully automated analysis.
Estimation and inference methods for bounding average treatment effects (on the treated) that are valid under an unconfoundedness assumption. The bounds are designed to be robust in challenging situations, for example, when the conditioning variables take on a large number of different values in the observed sample, or when the overlap condition is violated. This robustness is achieved by only using limited "pooling" of information across observations. For more details, see the paper by Lee and Weidner (2021), "Bounding Treatment Effects by Pooling Limited Information across Observations," <arXiv:2111.05243>
.
Wraps the CRFsuite library <https://github.com/chokkan/crfsuite> allowing users to fit a Conditional Random Field model and to apply it on existing data. The focus of the implementation is in the area of Natural Language Processing where this R package allows you to easily build and apply models for named entity recognition, text chunking, part of speech tagging, intent recognition or classification of any category you have in mind. Next to training, a small web application is included in the package to allow you to easily construct training data.
Analysis of agreement for nominal data between two raters using the Delta model. This model is proposed as an alternative to the widespread measure Cohen kappa coefficient, which performs poorly when the marginal distributions are very asymmetric (Martin-Andres and Femia-Marzo (2004), <doi:10.1348/000711004849268>; Martin-Andres and Femia-Marzo (2008) <doi:10.1080/03610920701669884>). The package also contains a function to perform a massive analysis of multiple raters against a gold standard. A shiny app is also provided to obtain the measures of nominal agreement between two raters.
The American Association Research (AACR) Project Genomics Evidence Neoplasia Information Exchange (GENIE) BioPharma
Collaborative represents a multi-year, multi-institution effort to build a pan-cancer repository of linked clinico-genomic data. The genomic and clinical data are provided in multiple releases (separate releases for each cancer cohort with updates following data corrections), which are stored on the data sharing platform Synapse <https://www.synapse.org/>. The genieBPC
package provides a seamless way to obtain the data corresponding to each release from Synapse and to prepare datasets for analysis.
We provide an efficient implementation for two-step multi-source transfer learning algorithms in high-dimensional generalized linear models (GLMs). The elastic-net penalized GLM with three popular families, including linear, logistic and Poisson regression models, can be fitted. To avoid negative transfer, a transferable source detection algorithm is proposed. We also provides visualization for the transferable source detection results. The details of methods can be found in "Tian, Y., & Feng, Y. (2023). Transfer learning under high-dimensional generalized linear models. Journal of the American Statistical Association, 118(544), 2684-2697.".
Data sets related to the Islas Malvinas /// Sets de datos relacionados a las Islas Malvinas - La Nación Argentina ratifica su legà tima e imprescriptible soberanà a sobre las islas Malvinas, Georgias del Sur y Sándwich del Sur y los espacios marà timos e insulares correspondientes, por ser parte integrante del territorio nacional. La recuperación de dichos territorios y el ejercicio pleno de la soberanà a, respetando el modo de vida de sus habitantes y conforme a los principios del Derecho Internacional, constituyen un objetivo permanente e irrenunciable del pueblo argentino.
Modelling Multivariate Binary Data with Blocks of Specific One-Factor Distribution. Variables are grouped into independent blocks. Each variable is described by two continuous parameters (its marginal probability and its dependency strength with the other block variables), and one binary parameter (positive or negative dependency). Model selection consists in the estimation of the repartition of the variables into blocks. It is carried out by the maximization of the BIC criterion by a deterministic (faster) algorithm or by a stochastic (more time consuming but optimal) algorithm. Tool functions facilitate the model interpretation.
This package provides a collection of data structures and methods for handling volumetric brain imaging data, with a focus on functional magnetic resonance imaging (fMRI
). Provides efficient representations for three-dimensional and four-dimensional neuroimaging data through sparse and dense array implementations, memory-mapped file access for large datasets, and spatial transformation capabilities. Implements methods for image resampling, spatial filtering, region of interest analysis, and connected component labeling. General introduction to fMRI
analysis can be found in Poldrack et al. (2024, "Handbook of functional MRI data analysis", <ISBN:9781108795760>).
We present Platypus', an open-source software platform providing a user-friendly interface to investigate B-cell receptor and T-cell receptor repertoires from scSeq
experiments. Platypus provides a framework to automate and ease the analysis of single-cell immune repertoires while also incorporating transcriptional information involving unsupervised clustering, gene expression and gene ontology. This R version of Platypus is part of the ePlatypus
ecosystem for computational analysis of immunogenomics data: Yermanos et al. (2021) <doi:10.1093/nargab/lqab023>, Cotet et al. (2023) <doi:10.1093/bioinformatics/btad553>.
The use of overparameterization is proposed with combinatorial analysis to test a broader spectrum of possible ARIMA models. In the selection of ARIMA models, the most traditional methods such as correlograms or others, do not usually cover many alternatives to define the number of coefficients to be estimated in the model, which represents an estimation method that is not the best. The popstudy package contains several tools for statistical analysis in demography and time series based in Shryock research (Shryock et. al. (1980) <https://books.google.co.cr/books?id=8Oo6AQAAMAAJ>).
Processor for selected ion flow tube mass spectrometer (SIFT-MS) output file from breath analysis. It allows the filtering of the SIFT output file (i.e., variation over time of the target analyte concentration) and the following analysis for the determination of: maximum, average, and standard deviation value of target concentration measured at each exhalation, and the respiratory rate over the measurement. Additionally, it is possible to align the SIFT-MS data with other on-line techniques such as cardio pulmonary exercise test (CPET) for a comprehensive characterization of breath samples.
Assessment of the distributions of baseline continuous and categorical variables in randomised trials. This method is based on the Carlisle-Stouffer method with Monte Carlo simulations. It calculates p-values for each trial baseline variable, as well as combined p-values for each trial - these p-values measure how compatible are distributions of trials baseline variables with random sampling. This package also allows for graphically plotting the cumulative frequencies of computed p-values. Please note that code was partly adapted from Carlisle JB, Loadsman JA. (2017) <doi:10.1111/anae.13650>.