It provides functions to generate a correlation matrix from a genetic dataset and to use this matrix to predict the phenotype of an individual by using the phenotypes of the remaining individuals through kriging. Kriging is a geostatistical method for optimal prediction or best unbiased linear prediction. It consists of predicting the value of a variable at an unobserved location as a weighted sum of the variable at observed locations. Intuitively, it works as a reverse linear regression: instead of computing correlation (univariate regression coefficients are simply scaled correlation) between a dependent variable Y and independent variables X, it uses known correlation between X and Y to predict Y.
It estimates the parameters of a partially linear regression censored model via maximum penalized likelihood through of ECME algorithm. The model belong to the semiparametric class, that including a parametric and nonparametric component. The error term considered belongs to the scale-mixture of normal (SMN) distribution, that includes well-known heavy tails distributions as the Student-t distribution, among others. To examine the performance of the fitted model, case-deletion and local influence techniques are provided to show its robust aspect against outlying and influential observations. This work is based in Ferreira, C. S., & Paula, G. A. (2017) <doi:10.1080/02664763.2016.1267124> but considering the SMN family.
Database search is the most widely used approach for peptide and protein identification in mass spectrometry-based proteomics studies. Our previous study showed that sample-specific protein databases derived from RNA-Seq data can better approximate the real protein pools in the samples and thus improve protein identification. More importantly, single nucleotide variations, short insertion and deletions and novel junctions identified from RNA-Seq data make protein database more complete and sample-specific. Here, we report an R package customProDB
that enables the easy generation of customized databases from RNA-Seq data for proteomics search. This work bridges genomics and proteomics studies and facilitates cross-omics data integration.
This package simulates regulations of ceRNA
(Competing Endogenous) expression levels after a expression level change in one or more miRNA/mRNAs
. The methodolgy adopted by the package has potential to incorparate any ceRNA
(circRNA
, lincRNA
, etc.) into miRNA:target
interaction network. The package basically distributes miRNA
expression over available ceRNAs
where each ceRNA
attracks miRNAs
proportional to its amount. But, the package can utilize multiple parameters that modify miRNA
effect on its target (seed type, binding energy, binding location, etc.). The functions handle the given dataset as graph object and the processes progress via edge and node variables.
Allows plotting data on bathymetric maps using ggplot2'. Plotting oceanographic spatial data is made as simple as feasible, but also flexible for custom modifications. Data that contain geographic information from anywhere around the globe can be plotted on maps generated by the basemap()
or qmap()
functions using ggplot2 layers separated by the + operator. The package uses spatial shape- ('sf') and raster ('stars') files, geospatial packages for R to manipulate, and the ggplot2 package to plot these files. The package ships with low-resolution spatial data files and higher resolution files for detailed maps are stored in the ggOceanMapsLargeData
repository on GitHub
and downloaded automatically when needed.
Most of the time floating point arithmetic does approximately the right thing. When adding sums or having products of numbers that greatly differ in magnitude, the floating point arithmetic may be incorrect. This package implements the Kahan (1965) sum <doi:10.1145/363707.363723>, Neumaier (1974) sum <doi:10.1002/zamm.19740540106>, pairwise-sum (adapted from NumPy
', See Castaldo (2008) <doi:10.1137/070679946> for a discussion of accuracy), and arbitrary precision sum (adapted from the fsum in Python ; Shewchuk (1997) <https://people.eecs.berkeley.edu/~jrs/papers/robustr.pdf>). In addition, products are changed to long double precision for accuracy, or changed into a log-sum for accuracy.
Data in multidimensional systems is obtained from operational systems and is transformed to adapt it to the new structure. Frequently, the operations to be performed aim to transform a flat table into a star schema. Transformations can be carried out using professional extract, transform and load tools or tools intended for data transformation for end users. With the tools mentioned, this transformation can be carried out, but it requires a lot of work. The main objective of this package is to define transformations that allow obtaining stars from flat tables easily. In addition, it includes basic data cleaning, dimension enrichment, incremental data refresh and query operations, adapted to this context.
Generates interactive Jellyfish plots to visualize spatiotemporal tumor evolution by integrating sample and phylogenetic trees into a unified plot. This approach provides an intuitive way to analyze tumor heterogeneity and evolution over time and across anatomical locations. The Jellyfish plot visualization design was first introduced by Lahtinen, Lavikka, et al. (2023, <doi:10.1016/j.ccell.2023.04.017>). This package also supports visualizing ClonEvol
results, a tool developed by Dang, et al. (2017, <doi:10.1093/annonc/mdx517>), for analyzing clonal evolution from multi-sample sequencing data. The clonevol package is not available on CRAN but can be installed from its GitHub
repository (<https://github.com/hdng/clonevol>).
This package provides a set of vectorised functions to calculate medical equations used in transplantation, focused mainly on transplantation of abdominal organs. These functions include donor and recipient risk indices as used by NHS Blood & Transplant, OPTN/UNOS and Eurotransplant, tools for quantifying HLA mismatches, functions for calculating estimated glomerular filtration rate (eGFR
), a function to calculate the APRI (AST to platelet ratio) score used in initial screening of suitability to receive a transplant from a hepatitis C seropositive donor and some biochemical unit converter functions. All functions are designed to work with either US or international units. References for the equations are provided in the vignettes and function documentation.
Computes the ATM (Attractor Transition Matrix) structure and the tree-like structure describing the cell differentiation process (based on the Threshold Ergodic Set concept introduced by Serra and Villani), starting from the Boolean networks with synchronous updating scheme of the BoolNet
R package. TESs (Threshold Ergodic Sets) are the mathematical abstractions that represent the different cell types arising during ontogenesis. TESs and the powerful model of biological differentiation based on Boolean networks to which it belongs have been firstly described in "A Dynamical Model of Genetic Networks for Cell Differentiation" Villani M, Barbieri A, Serra R (2011) A Dynamical Model of Genetic Networks for Cell Differentiation. PLOS ONE 6(3): e17703.
Bayesian data analysis usually incurs long runtimes and cumbersome custom code. A pipeline toolkit tailored to Bayesian statisticians, the jagstargets R package is leverages targets and R2jags to ease this burden. jagstargets makes it super easy to set up scalable JAGS pipelines that automatically parallelize the computation and skip expensive steps when the results are already up to date. Minimal custom code is required, and there is no need to manually configure branching, so usage is much easier than targets alone. For the underlying methodology, please refer to the documentation of targets <doi:10.21105/joss.02959> and JAGS (Plummer 2003) <https://www.r-project.org/conferences/DSC-2003/Proceedings/Plummer.pdf>.
This package provides a framework for deconvolution, alignment and postprocessing of 1-dimensional (1d) nuclear magnetic resonance (NMR) spectra, resulting in a data matrix of aligned signal integrals. The deconvolution part uses the algorithm described in Koh et al. (2009) <doi:10.1016/j.jmr.2009.09.003>. The alignment part is based on functions from the speaq package, described in Beirnaert et al. (2018) <doi:10.1371/journal.pcbi.1006018> and Vu et al. (2011) <doi:10.1186/1471-2105-12-405>. A detailed description and evaluation of an early version of the package, MetaboDecon1D
v0.2.2', can be found in Haeckl et al. (2021) <doi:10.3390/metabo11070452>.
This package allows biologists to judge in the first place whether the sequence surrounding the polymorphism is a good match, and in the second place how much information is gained or lost in one allele of the polymorphism relative to another. This package gives a choice of algorithms for interrogation of genomes with motifs from public sources:
a weighted-sum probability matrix;
log-probabilities;
weighted by relative entropy.
This package can predict effects for novel or previously described variants in public databases, making it suitable for tasks beyond the scope of its original design. Lastly, it can be used to interrogate any genome curated within Bioconductor.
This package provides a highly efficient R tool suite for Credit Modeling, Analysis and Visualization.Contains infrastructure functionalities such as data exploration and preparation, missing values treatment, outliers treatment, variable derivation, variable selection, dimensionality reduction, grid search for hyper parameters, data mining and visualization, model evaluation, strategy analysis etc. This package is designed to make the development of binary classification models (machine learning based models as well as credit scorecard) simpler and faster. The references including: 1 Refaat, M. (2011, ISBN: 9781447511199). Credit Risk Scorecard: Development and Implementation Using SAS; 2 Bezdek, James C.FCM: The fuzzy c-means clustering algorithm. Computers & Geosciences (0098-3004),<DOI:10.1016/0098-3004(84)90020-7>.
Este paquete pretende apoyar el proceso enseñanza-aprendizaje de estadà stica descriptiva e inferencial. Las funciones contenidas en el paquete estadistica cubren los conceptos básicos estudiados en un curso introductorio. Muchos conceptos son ilustrados con gráficos dinámicos o web apps para facilitar su comprensión. This package aims to help the teaching-learning process of descriptive and inferential statistics. The functions contained in the package estadistica cover the basic concepts studied in a statistics introductory course. Many concepts are illustrated with dynamic graphs or web apps to make the understanding easier. See: Esteban et al. (2005, ISBN: 9788497323741), Newbold et al.(2019, ISBN:9781292315034 ), Murgui et al. (2002, ISBN:9788484424673) .
This package provides R wrappers of several on-target and off-target scoring methods for CRISPR guide RNAs (gRNAs
). The following nucleases are supported: SpCas9
, AsCas12a
, enAsCas12a
, and RfxCas13d
(CasRx
). The available on-target cutting efficiency scoring methods are RuleSet1
, Azimuth, DeepHF
, DeepCpf1
, enPAM+GB
, and CRISPRscan. Both the CFD and MIT scoring methods are available for off-target specificity prediction. The package also provides a Lindel-derived score to predict the probability of a gRNA
to produce indels inducing a frameshift for the Cas9 nuclease. Note that DeepHF
, DeepCpf1
and enPAM+GB
are not available on Windows machines.
This is a collection of tools for assessment of feature importance and feature effects. Key functions are:
feature_importance()
for assessment of global level feature importance,ceteris_paribus()
for calculation of the what-if plots,partial_dependence()
for partial dependence plots,conditional_dependence()
for conditional dependence plots,accumulated_dependence()
for accumulated local effects plots,aggregate_profiles()
andcluster_profiles()
for aggregation of ceteris paribus profiles,generic
print()
andplot()
for better usability of selected explainers,generic
plotD3()
for interactive, D3 based explanations, andgeneric
describe()
for explanations in natural language.
This code provides several different functions for cleaning and analyzing continuous glucose monitor data. Currently it works with Dexcom', iPro
2', Diasend', Libre', or Carelink data. The cleandata()
function takes a directory of CGM data files and prepares them for analysis. cgmvariables()
iterates through a directory of cleaned CGM data files and produces a single spreadsheet with data for each file in either rows or columns. The column format of this spreadsheet is compatible with REDCap data upload. cgmreport()
also iterates through a directory of cleaned data, and produces PDFs of individual and aggregate AGP plots. Please visit <https://github.com/childhealthbiostatscore/R-Packages/> to download the new-user guide.
This package provides methods to perform Joint graph Regularized Single-Cell Kullback-Leibler Sparse Non-negative Matrix Factorization ('jrSiCKLSNMF
', pronounced "junior sickles NMF") on quality controlled single-cell multimodal omics count data. jrSiCKLSNMF
specifically deals with dual-assay scRNA-seq
and scATAC-seq
data. This package contains functions to extract meaningful latent factors that are shared across omics modalities. These factors enable accurate cell-type clustering and facilitate visualizations. Methods for pre-processing, clustering, and mini-batch updates and other adaptations for larger datasets are also included. For further details on the methods used in this package please see Ellis, Roy, and Datta (2023) <doi:10.3389/fgene.2023.1179439>.
This package provides functions for simulating and estimating kinship-related dispersal. Based on the methods described in M. Jasper, T.L. Schmidt., N.W. Ahmad, S.P. Sinkins & A.A. Hoffmann (2019) <doi:10.1111/1755-0998.13043> "A genomic approach to inferring kinship reveals limited intergenerational dispersal in the yellow fever mosquito". Assumes an additive variance model of dispersal in two dimensions, compatible with Wright's neighbourhood area. Simple and composite dispersal simulations are supplied, as well as the functions needed to estimate parent-offspring dispersal for simulated or empirical data, and to undertake sampling design for future field studies of dispersal. For ease of use an integrated Shiny app is also included.
An HTML widget that randomly tours 2D projections of numerical data. A random walk through projections of the data is shown. The user can manipulate the plot to use specified axes, or turn on Guided Tour mode to find an informative projection of the data. Groups within the data can be hidden or shown, as can particular axes. Points can be brushed, and the selection can be linked to other widgets using crosstalk. The underlying method to produce the random walk and projection pursuit uses Langevin dynamics. The widget can be used from within R, or included in a self-contained R Markdown or Quarto document or presentation, or used in a Shiny app.
R6 classes to model traditional life insurance contracts like annuities, whole life insurances or endowments. Such life insurance contracts provide a guaranteed interest and are not directly linked to the performance of a particular investment vehicle, but they typically provide (discretionary) profit participation. This package provides a framework to model such contracts in a very generic (cash-flow-based) way and includes modelling profit participation schemes, dynamic increases or more general contract layers, as well as contract changes (like sum increases or premium waivers). All relevant quantities like premium decomposition, reserves and benefits over the whole contract period are calculated and potentially exported to Excel'. Mortality rates are given using the MortalityTables
package.
Labels are a common construct in statistical software providing a human readable description of a variable. While variable names are succinct, quick to type, and follow a language's naming conventions, labels may be more illustrative and may use plain text and spaces. R does not provide native support for labels. Some packages, however, have made this feature available. Most notably, the Hmisc package provides labelling methods for a number of different object. Due to design decisions, these methods are not all exported, and so are unavailable for use in package development. The labelVector
package supports labels for atomic vectors in a light-weight design that is suitable for use in other packages.
This package provides a test of multivariate normality of an unknown sample that does not require estimation of the nuisance parameters, the mean and covariance matrix. Rather, a sequence of transformations removes these nuisance parameters and results in a set of sample matrices that are positive definite. These matrices are uniformly distributed on the space of positive definite matrices in the unit hyper-rectangle if and only if the original data is multivariate normal (Fairweather, 1973, Doctoral dissertation, University of Washington). The package performs a goodness of fit test of this hypothesis. In addition to the test, functions in the package give visualizations of the support region of positive definite matrices for bivariate samples.