This package provides functions to support compatibility between Maelstrom R packages and Opal environment. Opal is the OBiBa
core database application for biobanks. It is used to build data repositories that integrates data collected from multiple sources. Opal Maelstrom is a specific implementation of this software. This Opal client is specifically designed to interact with Opal Maelstrom distributions to perform operations on the R server side. The user must have adequate credentials. Please see <https://opaldoc.obiba.org/> for complete documentation.
Allows the user to perform ANOVA tests (in a strict sense: continuous and normally-distributed Y variable and 1 or more factorial/categorical X variable(s)), with the possibility to specify the type of sum of squares (1, 2 or 3), the types of variables (Fixed or Random) and their relationships (crossed or nested) with the sole function of the package (FullyParamANOVA()
). The resulting outputs are the same as in SAS software. A dataset (Butterfly) to test the function is also joined.
An implementation of split-population duration regression models. Unlike regular duration models, split-population duration models are mixture models that accommodate the presence of a sub-population that is not at risk for failure, e.g. cancer patients who have been cured by treatment. This package implements Weibull and Loglogistic forms for the duration component, and focuses on data with time-varying covariates. These models were originally formulated in Boag (1949) and Berkson and Gage (1952), and extended in Schmidt and Witte (1989).
Acquire hourly meteorological data from stations located all over the world. There is a wealth of data available, with historic weather data accessible from nearly 30,000 stations. The available data is automatically downloaded from a data repository and processed into a tibble for the exact range of years requested. A relative humidity approximation is provided using the August-Roche-Magnus formula, which was adapted from Alduchov and Eskridge (1996) <doi:10.1175%2F1520-0450%281996%29035%3C0601%3AIMFAOS%3E2.0.CO%3B2>.
Provide a range of functions with multiple criteria for cutting phylogenetic trees at any evolutionary depth. It enables users to cut trees in any orientation, such as rootwardly (from root to tips) and tipwardly (from tips to its root), or allows users to define a specific time interval of interest. It can also be used to create multiple tree pieces of equal temporal width. Moreover, it allows the assessment of novel temporal rates for various phylogenetic indexes, which can be quickly displayed graphically.
This package provides a new batch effect correction method based on Projection to Latent Structures Discriminant Analysis named “PLSDA-batch” to correct data prior to any downstream analysis. PLSDA-batch estimates latent components related to treatment and batch effects to remove batch variation. The method is multivariate, non-parametric and performs dimension reduction. Combined with centered log ratio transformation for addressing uneven library sizes and compositional structure, PLSDA-batch addresses all characteristics of microbiome data that existing correction methods have ignored so far.
The Racket CS implementation, which uses ``Chez Scheme'' as its core compiler and runtime system, has been the default Racket VM implementation since Racket 8.0. It performs better than the Racket BC implementation for most programs. On systems for which Racket CS cannot generate machine code, this package uses a variant of its ``portable bytecode'' backend specialized for word size and endianness.
Using the Racket VM packages directly is not recommended: instead, install the racket-minimal
or racket
packages.
Genomic selection is a specialized form of marker assisted selection. The package contains functions to select important genetic markers and predict phenotype on the basis of fitted training data using integrated model framework (Guha Majumdar et. al. (2019) <doi:10.1089/cmb.2019.0223>) developed by combining one additive (sparse additive models by Ravikumar et. al. (2009) <doi:10.1111/j.1467-9868.2009.00718.x>) and one non-additive (hsic lasso by Yamada et. al. (2014) <doi:10.1162/NECO_a_00537>) model.
Function to identify haplotypes within QTL (Quantitative Trait Loci). One haplotype is a combination of SNP (Single Nucleotide Polymorphisms) within the QTL. This function groups together all individuals of a population with the same haplotype. Each group contains individual with the same allele in each SNP, whether or not missing data. Thus, haplotyper groups individuals, that to be imputed, have a non-zero probability of having the same alleles in the entire sequence of SNP's. Moreover, haplotyper calculates such probability from relative frequencies.
This package provides functions for Bayesian analysis of data from randomized experiments with non-compliance. The functions are based on the models described in Imbens and Rubin (1997) <doi:10.1214/aos/1034276631>. Currently only two types of outcome models are supported: binary outcomes and normally distributed outcomes. Models can be fit with and without the exclusion restriction and/or the strong access monotonicity assumption. Models are fit using the data augmentation algorithm as described in Tanner and Wong (1987) <doi:10.2307/2289457>.
This package provides functions complementary to packages nicheROVER
and SIBER allowing the user to extract Bayesian estimates from data objects created by the packages nicheROVER
and SIBER'. Please see the following publications for detailed methods on nicheROVER
and SIBER Hansen et al. (2015) <doi:10.1890/14-0235.1>, Jackson et al. (2011) <doi:10.1111/j.1365-2656.2011.01806.x>, and Layman et al. (2007) <doi:10.1890/0012-9658(2007)88[42:CSIRPF]2.0.CO;2>, respectfully.
Proteins reside in either the cell plasma or in the cell membrane. A membrane protein goes through the membrane at least once. Given the amino acid sequence of a membrane protein, the tool PureseqTM
(<https://github.com/PureseqTM/pureseqTM_package>
, as described in "Efficient And Accurate Prediction Of Transmembrane Topology From Amino acid sequence only.", Wang, Qing, et al (2019), <doi:10.1101/627307>), can predict the topology of a membrane protein. This package allows one to use PureseqTM
from R.
Dynamic interaction refers to spatial-temporal associations in the movements of two (or more) animals. This package provides tools for calculating a suite of indices used for quantifying dynamic interaction with wildlife telemetry data. For more information on each of the methods employed see the references within. The package (as of version >= 0.3) also has new tools for automating contact analysis in large tracking datasets. The package (as of version 1.0) uses the move2 class of objects for working with tracking dataset.
This package provides Bayesian PCA, Probabilistic PCA, Nipals PCA, Inverse Non-Linear PCA and the conventional SVD PCA. A cluster based method for missing value estimation is included for comparison. BPCA, PPCA and NipalsPCA may be used to perform PCA on incomplete data as well as for accurate missing value estimation. A set of methods for printing and plotting the results is also provided. All PCA methods make use of the same data structure (pcaRes) to provide a common interface to the PCA results.
GNU Emacs is an extensible and highly customizable text editor. It is based on an Emacs Lisp interpreter with extensions for text editing. Emacs has been extended in essentially all areas of computing, giving rise to a vast array of packages supporting, e.g., email, IRC and XMPP messaging, spreadsheets, remote server editing, and much more. Emacs includes extensive documentation on all aspects of the system, from basic editing to writing large Lisp programs. It has full Unicode support for nearly all human languages.
Has two functions to help with calculating feature selection stability. Lump is a function that groups subset vectors into a dataframe, and adds NA to shorter vectors so they all have the same length. ASM is a function that takes a dataframe of subset vectors and the original vector of features as inputs, and calculates the Stability of the feature selection. The calculation for asm uses the Adjusted Stability Measure proposed in: Lustgarten', Gopalakrishnan', & Visweswaran (2009)<https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2815476/>.
Algorithms to build set partitions and commutator matrices and their use in the construction of multivariate d-Hermite polynomials; estimation and derivation of theoretical vector moments and vector cumulants of multivariate distributions; conversion formulae for multivariate moments and cumulants. Applications to estimation and derivation of multivariate measures of skewness and kurtosis; estimation and derivation of asymptotic covariances for d-variate Hermite polynomials, multivariate moments and cumulants and measures of skewness and kurtosis. The formulae implemented are discussed in Terdik (2021, ISBN:9783030813925), "Multivariate Statistical Methods".
Spatial versions of Regression Discontinuity Designs (RDDs) are becoming increasingly popular as tools for causal inference. However, conducting state-of-the-art analyses often involves tedious and time-consuming steps. This package offers comprehensive functionalities for executing all required spatial and econometric tasks in a streamlined manner. Moreover, it equips researchers with tools for performing essential placebo and balancing checks comprehensively. The fact that researchers do not have to rely on APIs of external GIS software ensures replicability and raises the standard for spatial RDDs.
World Flora Online is an online flora of all known plants, available from <https://www.worldfloraonline.org/>. Methods are provided of matching a list of plant names (scientific names, taxonomic names, botanical names) against a static copy of the World Flora Online Taxonomic Backbone data that can be downloaded from the World Flora Online website. The World Flora Online Taxonomic Backbone is an updated version of The Plant List (<http://www.theplantlist.org/>), a working list of plant names that has become static since 2013.
scMultiSim
simulates paired single cell RNA-seq, single cell ATAC-seq and RNA velocity data, while incorporating mechanisms of gene regulatory networks, chromatin accessibility and cell-cell interactions. It allows users to tune various parameters controlling the amount of each biological factor, variation of gene-expression levels, the influence of chromatin accessibility on RNA sequence data, and so on. It can be used to benchmark various computational methods for single cell multi-omics data, and to assist in experimental design of wet-lab experiments.
The arrangement of hypotheses in a hierarchical structure appears in many research fields and often indicates different resolutions at which data can be viewed. This raises the question of which resolution level the signal should best be interpreted on. treeclimbR
provides a flexible method to select optimal resolution levels (potentially different levels in different parts of the tree), rather than cutting the tree at an arbitrary level. treeclimbR
uses a tuning parameter to generate candidate resolutions and from these selects the optimal one.
BayesPrism includes deconvolution and embedding learning modules. The deconvolution module models a prior from cell type-specific expression profiles from scRNA-seq to jointly estimate the posterior distribution of cell type composition and cell type-specific gene expression from bulk RNA-seq expression of tumor samples. The embedding learning module uses Expectation-maximization (EM) to approximate the tumor expression using a linear combination of malignant gene programs while conditional on the inferred expression and fraction of non-malignant cells estimated by the deconvolution module.
This package provides a pipeline with high specificity and sensitivity in extracting proteins from the RefSeq
database (National Center for Biotechnology Information). Manual identification of gene families is highly time-consuming and laborious, requiring an iterative process of manual and computational analysis to identify members of a given family. The pipelines implements an automatic approach for the identification of gene families based on the conserved domains that specifically define that family. See Die et al. (2018) <doi:10.1101/436659> for more information and examples.
This package performs Modal Clustering (MAC) including Hierarchical Modal Clustering (HMAC) along with their parallel implementation (PHMAC) over several processors. These model-based non-parametric clustering techniques can extract clusters in very high dimensions with arbitrary density shapes. By default clustering is performed over several resolutions and the results are summarised as a hierarchical tree. Associated plot functions are also provided. There is a package vignette that provides many examples. This version adheres to CRAN policy of not spanning more than two child processes by default.