This package provides methods for spatial risk calculations, focusing on efficient determination of the sum of observations within a circle of a given radius. These methods are particularly relevant for applications such as insurance, where recent European Commission regulations require the calculation of the maximum insured value of fire risk policies for all buildings that are partly or fully located within a 200 m radius. The underlying problem is described by Church (1974) <doi:10.1007/BF01942293>.
(guix-science-nonfree packages bioconductor)This package is used for the detection of differentially expressed genes (DEGs) from the comparison of two biological conditions (treated vs. untreated, diseased vs. normal, mutant vs. wild-type) among different levels of gene expression (transcriptome ,translatome, proteome), using several statistical methods: Rank Product, Translational Efficiency, t-test, Limma, ANOTA, DESeq, edgeR. It also provides the possibility to plot the results with scatterplots, histograms, MA plots, standard deviation (SD) plots, coefficient of variation (CV) plots.
This package provides a class and subclasses for storing non-scalar objects in matrix entries. This is akin to a ragged array but the raggedness is in the third dimension, much like a bumpy surface--hence the name. Of particular interest is the BumpyDataFrameMatrix, where each entry is a Bioconductor data frame. This allows us to naturally represent multivariate data in a format that is compatible with two-dimensional containers like the SummarizedExperiment and MultiAssayExperiment objects.
Function-oriented Make-like declarative pipelines for statistics and data science are supported in the targets R package. As an extension to targets, the tarchetypes package provides convenient user-side functions to make targets easier to use. By establishing reusable archetypes for common kinds of targets and pipelines, these functions help express complicated reproducible pipelines concisely and compactly. The methods in this package were influenced by the drake R package by Will Landau (2018) <doi:10.21105/joss.00550>.
This package provides a pipeline to discern RNA structure at and proximal to the site of protein binding within regions of the transcriptome defined by the user. CLIP protein-binding data can be input as either aligned BAM or peak-called bedGraph files. RNA structure can either be predicted internally from sequence or users have the option to input their own RNA structure data. RNA structure binding profiles can be visually and quantitatively compared across multiple formats.
Estimate fish length-at-age models using MCMC analysis with rstan models. This package allows a multimodel approach to growth fitting to be applied to length-at-age data and is supported by further analyses to determine model selection and result presentation. The core methods of this package are presented in Smart and Grammer (2021) "Modernising fish and shark growth curves with Bayesian length-at-age models". PLOS ONE 16(2): e0246734 <doi:10.1371/journal.pone.0246734>.
Evolutionary black box optimization algorithms building on the bbotk package. miesmuschel offers both ready-to-use optimization algorithms, as well as their fundamental building blocks that can be used to manually construct specialized optimization loops. The Mixed Integer Evolution Strategies as described by Li et al. (2013) <doi:10.1162/EVCO_a_00059> can be implemented, as well as the multi-objective optimization algorithms NSGA-II by Deb, Pratap, Agarwal, and Meyarivan (2002) <doi:10.1109/4235.996017>.
Useful functions for one-sample (individual level data) Mendelian randomization and instrumental variable analyses. The package includes implementations of; the Sanderson and Windmeijer (2016) <doi:10.1016/j.jeconom.2015.06.004> conditional F-statistic, the multiplicative structural mean model Hernán and Robins (2006) <doi:10.1097/01.ede.0000222409.00878.37>, and two-stage predictor substitution and two-stage residual inclusion estimators explained by Terza et al. (2008) <doi:10.1016/j.jhealeco.2007.09.009>.
This package provides a collection of functions that primarily produce graphics to aid in a Propensity Score Analysis (PSA). Functions include: cat.psa and box.psa to test balance within strata of categorical and quantitative covariates, circ.psa for a representation of the estimated effect size by stratum, loess.psa that provides a graphic and loess based effect size estimate, and various balance functions that provide measures of the balance achieved via a PSA in a categorical covariate.
This package provides a wrapped LASSO approach by integrating an ensemble learning strategy to help select efficient, stable, and high confidential variables from omics-based data. Using a bagging strategy in combination of a parametric method or inflection point search method for cut-off threshold determination. This package can integrate and vote variables generated from multiple LASSO models to determine the optimal candidates. Luo H, Zhao Q, et al (2020) <doi:10.1126/scitranslmed.aax7533> for more details.
Finds the most likely originating tissue(s) and developmental stage(s) of tissue-specific RNA sequencing data. The package identifies both pure transcriptomes and mixtures of transcriptomes. The most likely identity is found through comparisons of the sequencing data with high-throughput in situ hybridisation patterns. Typical uses are the identification of cancer cell origins, validation of cell culture strain identities, validation of single-cell transcriptomes, and validation of identity and purity of flow-sorting and dissection sequencing products.
This package implements fast Monte Carlo simulations for goodness-of-fit (GOF) tests for discrete distributions. This includes tests based on the Chi-squared statistic, the log-likelihood-ratio (G^2) statistic, the Freeman-Tukey (Hellinger-distance) statistic, the Kolmogorov-Smirnov statistic, the Cramer-von Mises statistic as described in Choulakian, Lockhart and Stephens (1994) <doi:10.2307/3315828>, and the root-mean-square statistic, see Perkins, Tygert, and Ward (2011) <doi:10.1016/j.amc.2011.03.124>.
An implementation of several functions for feature extraction in ordinal time series datasets. Specifically, some of the features proposed by Weiss (2019) <doi:10.1080/01621459.2019.1604370> can be computed. These features can be used to perform inferential tasks or to feed machine learning algorithms for ordinal time series, among others. The package also includes some interesting datasets containing financial time series. Practitioners from a broad variety of fields could benefit from the general framework provided by otsfeatures'.
This package provides tools to conduct interpretable sensitivity analyses for weighted estimators, introduced in Huang (2024) <doi:10.1093/jrsssa/qnae012> and Hartman and Huang (2024) <doi:10.1017/pan.2023.12>. The package allows researchers to generate the set of recommended sensitivity summaries to evaluate the sensitivity in their underlying weighting estimators to omitted moderators or confounders. The tools can be flexibly applied in causal inference settings (i.e., in external and internal validity contexts) or survey contexts.
Using Gaussian graphical models we propose a novel approach to perform pathway analysis using gene expression. Given the structure of a graph (a pathway) we introduce two statistical tests to compare the mean and the concentration matrices between two groups. Specifically, these tests can be performed on the graph and on its connected components (cliques). The package is based on the method described in Massa M.S., Chiogna M., Romualdi C. (2010) <doi:10.1186/1752-0509-4-121>.
This package contains functions for a variational Bayesian method for sparse PCA proposed by Ning (2020) <arXiv:2102.00305>. There are two algorithms: the PX-CAVI algorithm (if assuming the loadings matrix is jointly row-sparse) and the batch PX-CAVI algorithm (if without this assumption). The outputs of the main function, VBsparsePCA(), include the mean and covariance of the loadings matrix, the score functions, the variable selection results, and the estimated variance of the random noise.
A tandem repeat in DNA is two or more adjacent, approximate copies of a pattern of nucleotides. Tandem Repeats Finder is a program to locate and display tandem repeats in DNA sequences. In order to use the program, the user submits a sequence in FASTA format. The output consists of two files: a repeat table file and an alignment file. Submitted sequences may be of arbitrary length. Repeats with pattern size in the range from 1 to 2000 bases are detected.
The package CellBarcode performs Cellular DNA Barcode analysis. It can handle all kinds of DNA barcodes, as long as the barcode is within a single sequencing read and has a pattern that can be matched by a regular expression. \codeCellBarcode can handle barcodes with flexible lengths, with or without UMI (unique molecular identifier). This tool also can be used for pre-processing some amplicon data such as CRISPR gRNA screening, immune repertoire sequencing, and metagenome data.
Routines to handle family data with a Pedigree object. The initial purpose was to create correlation structures that describe family relationships such as kinship and identity-by-descent, which can be used to model family data in mixed effects models, such as in the coxme function. Also includes a tool for Pedigree drawing which is focused on producing compact layouts without intervention. Recent additions include utilities to trim the Pedigree object with various criteria, and kinship for the X chromosome.
SpatialCPie is an R package designed to facilitate cluster evaluation for spatial transcriptomics data by providing intuitive visualizations that display the relationships between clusters in order to guide the user during cluster identification and other downstream applications. The package is built around a shiny "gadget" to allow the exploration of the data with multiple plots in parallel and an interactive UI. The user can easily toggle between different cluster resolutions in order to choose the most appropriate visual cues.
SpectralTAD is an R package designed to identify Topologically Associated Domains (TADs) from Hi-C contact matrices. It uses a modified version of spectral clustering that uses a sliding window to quickly detect TADs. The function works on a range of different formats of contact matrices and returns a bed file of TAD coordinates. The method does not require users to adjust any parameters to work and gives them control over the number of hierarchical levels to be returned.
This package provides functions to estimate a factor model using discrete and continuous proxy variables. The function dproxyme estimates a factor model of discrete proxy variables using an EM algorithm (Dempster, Laird, Rubin (1977) <doi:10.1111/j.2517-6161.1977.tb01600.x>; Hu (2008) <doi:10.1016/j.jeconom.2007.12.001>; Hu(2017) <doi:10.1016/j.jeconom.2017.06.002> ). The function cproxyme estimates a linear factor model (Cunha, Heckman, and Schennach (2010) <doi:10.3982/ECTA6551>).
The fusion learning method uses a model selection algorithm to learn from multiple data sets across different experimental platforms through group penalization. The responses of interest may include a mix of discrete and continuous variables. The responses may share the same set of predictors, however, the models and parameters differ across different platforms. Integrating information from different data sets can enhance the power of model selection. Package is based on Xin Gao, Raymond J. Carroll (2017) <arXiv:1610.00667v1>.
This package provides a curated taxonomic backbone of mammal species recorded in Peru, based on the checklist published by Pacheco and collaborators (2021) <doi:10.15381/rpb.v28i4.21019>. The package includes standardized species data, occurrence records by ecological regions, endemic status, and tools for validating and matching scientific names through exact and approximate string procedures. It is designed as a lightweight and reliable reference for ecological, environmental, biogeographical, and conservation workflows that require verified species information for Peruvian mammals.