The easylift package provides a convenient tool for genomic liftover operations between different genome assemblies. It seamlessly works with Bioconductor's GRanges objects and chain files from the UCSC Genome Browser, allowing for straightforward handling of genomic ranges across various genome versions. One noteworthy feature of easylift is its integration with the BiocFileCache package. This integration automates the management and caching of chain files necessary for liftover operations. Users no longer need to manually specify chain file paths in their function calls, reducing the complexity of the liftover process.
This package provides a simple framework to facilitate the comparison of pipelines involving various steps and parameters. The `pipelineDefinition` class represents pipelines as, minimally, a set of functions consecutively executed on the output of the previous one, and optionally accompanied by step-wise evaluation and aggregation functions. Given such an object, a set of alternative parameters/methods, and benchmark datasets, the `runPipeline` function then proceeds through all combinations arguments, avoiding recomputing the same step twice and compiling evaluations on the fly to avoid storing potentially large intermediate data.
The use of structured elicitation to inform decision making has grown dramatically in recent decades, however, judgements from multiple experts must be aggregated into a single estimate. Empirical evidence suggests that mathematical aggregation provides more reliable estimates than enforcing behavioural consensus on group estimates. aggreCAT provides state-of-the-art mathematical aggregation methods for elicitation data including those defined in Hanea, A. et al. (2021) <doi:10.1371/journal.pone.0256919>. The package also provides functions to visualise and evaluate the performance of your aggregated estimates on validation data.
In practical applications, the assumptions underlying generalized linear models frequently face violations, including incorrect specifications of the outcome variable's distribution or omitted predictors. These deviations can render the results of standard generalized linear models unreliable. As the sample size increases, what might initially appear as minor issues can escalate to critical concerns. To address these challenges, we adopt a permutation-based inference method tailored for generalized linear models. This approach offers robust estimations that effectively counteract the mentioned problems, and its effectiveness remains consistent regardless of the sample size.
This package provides functions to calculate the out-of-bag learning curve for random forests for any measure that is available in the mlr package. Supported random forest packages are randomForest and ranger and trained models of these packages with the train function of mlr'. The main function is OOBCurve() that calculates the out-of-bag curve depending on the number of trees. With the OOBCurvePars() function out-of-bag curves can also be calculated for mtry', sample.fraction and min.node.size for the ranger package.
In the past decade, genome-scale metabolic reconstructions have widely been used to comprehend the systems biology of metabolic pathways within an organism. Different GSMs are constructed using various techniques that require distinct steps, but the input data, information conversion and software tools are neither concisely defined nor mathematically or programmatically formulated in a context-specific manner.The tool that quantitatively and qualitatively specifies each reconstruction steps and can generate a template list of reconstruction steps dynamically selected from a reconstruction step reservoir, constructed based on all available published papers.
Stock-and-flow models are a computational method from the field of system dynamics. They represent how systems change over time and are mathematically equivalent to ordinary differential equations. sdbuildR (system dynamics builder) provides an intuitive interface for constructing stock-and-flow models without requiring extensive domain knowledge. Models can quickly be simulated and revised, supporting iterative development. sdbuildR simulates models in R and Julia', where Julia offers unit support and large-scale ensemble simulations. Additionally, sdbuildR can import models created in Insight Maker (<https://insightmaker.com/>).
TensorFlow SIG Addons <https://www.tensorflow.org/addons> is a repository of community contributions that conform to well-established API patterns, but implement new functionality not available in core TensorFlow'. TensorFlow natively supports a large number of operators, layers, metrics, losses, optimizers, and more. However, in a fast moving field like Machine Learning, there are many interesting new developments that cannot be integrated into core TensorFlow (because their broad applicability is not yet clear, or it is mostly used by a smaller subset of the community).
The United Nations Sustainable Development Goals (SDGs) have become an important guideline for organisations to monitor and plan their contributions to social, economic, and environmental transformations. The text2sdg package is an open-source analysis package that identifies SDGs in text using scientifically developed query systems, opening up the opportunity to monitor any type of text-based data, such as scientific output or corporate publications. For more information see Meier, Mata & Wulff (2025) <doi:10.32614/RJ-2024-005> and Wulff, Meier & Mata (2024) <doi:10.1007/s11625-024-01516-3>.
The purpose of this package is to identify traits in a dataset that can separate groups. This is done on two levels. First, clustering is performed, using an implementation of sparse K-means. Secondly, the generated clusters are used to predict outcomes of groups of individuals based on their distribution of observations in the different clusters. As certain clusters with separating information will be identified, and these clusters are defined by a sparse number of variables, this method can reduce the complexity of data, to only emphasize the data that actually matters.
Circular layout is an efficient way to visualise huge amounts of information. This package provides an implementation of circular layout generation in R as well as an enhancement of available software. Its flexibility is based on the usage of low-level graphics functions such that self-defined high-level graphics can be easily implemented by users for specific purposes. Together with the seamless connection between the powerful computational and visual environment in R, it gives users more convenience and freedom to design figures for better understanding complex patterns behind multi-dimensional data.
Estimation and inference methods for bounding average treatment effects (on the treated) that are valid under an unconfoundedness assumption. The bounds are designed to be robust in challenging situations, for example, when the conditioning variables take on a large number of different values in the observed sample, or when the overlap condition is violated. This robustness is achieved by only using limited "pooling" of information across observations. For more details, see the paper by Lee and Weidner (2021), "Bounding Treatment Effects by Pooling Limited Information across Observations," <arXiv:2111.05243>.
Analysis of agreement for nominal data between two raters using the Delta model. This model is proposed as an alternative to the widespread measure Cohen kappa coefficient, which performs poorly when the marginal distributions are very asymmetric (Martin-Andres and Femia-Marzo (2004), <doi:10.1348/000711004849268>; Martin-Andres and Femia-Marzo (2008) <doi:10.1080/03610920701669884>). The package also contains a function to perform a massive analysis of multiple raters against a gold standard. A shiny app is also provided to obtain the measures of nominal agreement between two raters.
The American Association Research (AACR) Project Genomics Evidence Neoplasia Information Exchange (GENIE) BioPharma Collaborative represents a multi-year, multi-institution effort to build a pan-cancer repository of linked clinico-genomic data. The genomic and clinical data are provided in multiple releases (separate releases for each cancer cohort with updates following data corrections), which are stored on the data sharing platform Synapse <https://www.synapse.org/>. The genieBPC package provides a seamless way to obtain the data corresponding to each release from Synapse and to prepare datasets for analysis.
We provide an efficient implementation for two-step multi-source transfer learning algorithms in high-dimensional generalized linear models (GLMs). The elastic-net penalized GLM with three popular families, including linear, logistic and Poisson regression models, can be fitted. To avoid negative transfer, a transferable source detection algorithm is proposed. We also provides visualization for the transferable source detection results. The details of methods can be found in "Tian, Y., & Feng, Y. (2023). Transfer learning under high-dimensional generalized linear models. Journal of the American Statistical Association, 118(544), 2684-2697.".
Data sets related to the Islas Malvinas /// Sets de datos relacionados a las Islas Malvinas - La Nación Argentina ratifica su legà tima e imprescriptible soberanà a sobre las islas Malvinas, Georgias del Sur y Sándwich del Sur y los espacios marà timos e insulares correspondientes, por ser parte integrante del territorio nacional. La recuperación de dichos territorios y el ejercicio pleno de la soberanà a, respetando el modo de vida de sus habitantes y conforme a los principios del Derecho Internacional, constituyen un objetivo permanente e irrenunciable del pueblo argentino.
Modelling Multivariate Binary Data with Blocks of Specific One-Factor Distribution. Variables are grouped into independent blocks. Each variable is described by two continuous parameters (its marginal probability and its dependency strength with the other block variables), and one binary parameter (positive or negative dependency). Model selection consists in the estimation of the repartition of the variables into blocks. It is carried out by the maximization of the BIC criterion by a deterministic (faster) algorithm or by a stochastic (more time consuming but optimal) algorithm. Tool functions facilitate the model interpretation.
This package provides a collection of data structures and methods for handling volumetric brain imaging data, with a focus on functional magnetic resonance imaging (fMRI). Provides efficient representations for three-dimensional and four-dimensional neuroimaging data through sparse and dense array implementations, memory-mapped file access for large datasets, and spatial transformation capabilities. Implements methods for image resampling, spatial filtering, region of interest analysis, and connected component labeling. General introduction to fMRI analysis can be found in Poldrack et al. (2024, "Handbook of functional MRI data analysis", <ISBN:9781108795760>).
This package provides a suite of tools for the comprehensive visualization of multi-omics data, including genomics, transcriptomics, and proteomics. Offers user-friendly functions to generate publication-quality plots, thereby facilitating the exploration and interpretation of complex biological datasets. Supports seamless integration with popular R visualization frameworks and is well-suited for both exploratory data analysis and the presentation of final results. Key formats and methods are presented in Huang, S., et al. (2024) "The Born in Guangzhou Cohort Study enables generational genetic discoveries" <doi:10.1038/s41586-023-06988-4>.
We present Platypus', an open-source software platform providing a user-friendly interface to investigate B-cell receptor and T-cell receptor repertoires from scSeq experiments. Platypus provides a framework to automate and ease the analysis of single-cell immune repertoires while also incorporating transcriptional information involving unsupervised clustering, gene expression and gene ontology. This R version of Platypus is part of the ePlatypus ecosystem for computational analysis of immunogenomics data: Yermanos et al. (2021) <doi:10.1093/nargab/lqab023>, Cotet et al. (2023) <doi:10.1093/bioinformatics/btad553>.
Pharmacometric tools for common data analytical tasks; closed-form solutions for calculating concentrations at given times after dosing based on compartmental PK models (1-compartment, 2-compartment and 3-compartment, covering infusions, zero- and first-order absorption, and lag times, after single doses and at steady state, per Bertrand & Mentre (2008) <https://www.facm.ucl.ac.be/cooperation/Vietnam/WBI-Vietnam-October-2011/Modelling/Monolix32_PKPD_library.pdf>); parametric simulation from NONMEM-generated parameter estimates and other output; and parsing, tabulating and plotting results generated by Perl-speaks-NONMEM (PsN).
The use of overparameterization is proposed with combinatorial analysis to test a broader spectrum of possible ARIMA models. In the selection of ARIMA models, the most traditional methods such as correlograms or others, do not usually cover many alternatives to define the number of coefficients to be estimated in the model, which represents an estimation method that is not the best. The popstudy package contains several tools for statistical analysis in demography and time series based in Shryock research (Shryock et. al. (1980) <https://books.google.co.cr/books?id=8Oo6AQAAMAAJ>).
Processor for selected ion flow tube mass spectrometer (SIFT-MS) output file from breath analysis. It allows the filtering of the SIFT output file (i.e., variation over time of the target analyte concentration) and the following analysis for the determination of: maximum, average, and standard deviation value of target concentration measured at each exhalation, and the respiratory rate over the measurement. Additionally, it is possible to align the SIFT-MS data with other on-line techniques such as cardio pulmonary exercise test (CPET) for a comprehensive characterization of breath samples.
Assessment of the distributions of baseline continuous and categorical variables in randomised trials. This method is based on the Carlisle-Stouffer method with Monte Carlo simulations. It calculates p-values for each trial baseline variable, as well as combined p-values for each trial - these p-values measure how compatible are distributions of trials baseline variables with random sampling. This package also allows for graphically plotting the cumulative frequencies of computed p-values. Please note that code was partly adapted from Carlisle JB, Loadsman JA. (2017) <doi:10.1111/anae.13650>.