This package provides methods to calculate diagnostics for multicollinearity among predictors in a linear or generalized linear model. It also provides methods to visualize those diagnostics following Friendly & Kwan (2009), "Whereâ s Waldo: Visualizing Collinearity Diagnostics", <doi:10.1198/tast.2009.0012>. These include better tabular presentation of collinearity diagnostics that highlight the important numbers, a semi-graphic tableplot of the diagnostics to make warning and danger levels more salient, and a "collinearity biplot" of the smallest dimensions of predictor space, where collinearity is most apparent.
The package provides functions to create and use transcript-centric annotation databases/packages. The annotation for the databases are directly fetched from Ensembl using their Perl API. The functionality and data is similar to that of the TxDb packages from the GenomicFeatures
package, but, in addition to retrieve all gene/transcript models and annotations from the database, the ensembldb
package also provides a filter framework allowing to retrieve annotations for specific entries like genes encoded on a chromosome region or transcript models of lincRNA genes.
This package provides statistical tests for label-free LC-MS/MS data by spectral counts, to discover differentially expressed proteins between two biological conditions. Three tests are available: Poisson GLM regression, quasi-likelihood GLM regression, and the negative binomial of the edgeR package. The three models admit blocking factors to control for nuisance variables. To assure a good level of reproducibility a post-test filter is available, where we may set the minimum effect size considered biologicaly relevant, and the minimum expression of the most abundant condition.
Implementation of the BRIk, FABRIk and FDEBRIk algorithms to initialise k-means. These methods are intended for the clustering of multivariate and functional data, respectively. They make use of the Modified Band Depth and bootstrap to identify appropriate initial seeds for k-means, which are proven to be better options than many techniques in the literature. Torrente and Romo (2021) <doi:10.1007/s00357-020-09372-3> It makes use of the functions kma and kma.similarity, from the archived package fdakma, by Alice Parodi et al.
This package provides various tools of for clustering multivariate angular data on the torus. The package provides angular adaptations of usual clustering methods such as the k-means clustering, pairwise angular distances, which can be used as an input for distance-based clustering algorithms, and implements clustering based on the conformal prediction framework. Options for the conformal scores include scores based on a kernel density estimate, multivariate von Mises mixtures, and naive k-means clusters. Moreover, the package provides some basic data handling tools for angular data.
This package provides functions to help with analysis of longitudinal data featuring irregular observation times, where the observation times may be associated with the outcome process. There are functions to quantify the degree of irregularity, fit inverse-intensity weighted Generalized Estimating Equations (Lin H, Scharfstein DO, Rosenheck RA (2004) <doi:10.1111/j.1467-9868.2004.b5543.x>), perform multiple outputation (Pullenayegum EM (2016) <doi:10.1002/sim.6829>) and fit semi-parametric joint models (Liang Y (2009) <doi: 10.1111/j.1541-0420.2008.01104.x>).
This package provides an R interface to Julia', which is a high-level, high-performance dynamic programming language for numerical computing, see <https://julialang.org/> for more information. It provides a high-level interface as well as a low-level interface. Using the high level interface, you could call any Julia function just like any R function with automatic type conversion. Using the low level interface, you could deal with C-level SEXP directly while enjoying the convenience of using a high-level programming language like Julia'.
An API client for NASA POWER global meteorology, surface solar energy and climatology data API. POWER (Prediction Of Worldwide Energy Resources) data are freely available for download with varying spatial resolutions dependent on the original data and with several temporal resolutions depending on the POWER parameter and community. This work is funded through the NASA Earth Science Directorate Applied Science Program. For more on the data themselves, the methodologies used in creating, a web-based data viewer and web access, please see <https://power.larc.nasa.gov/>.
Classification based analysis of DNA sequences to taxonomic groupings. This package primarily implements Naive Bayesian Classifier from the Ribosomal Database Project. This approach has traditionally been used to classify 16S rRNA
gene sequences to bacterial taxonomic outlines; however, it can be used for any type of gene sequence. The method was originally described by Wang, Garrity, Tiedje, and Cole in Applied and Environmental Microbiology 73(16):5261-7 <doi:10.1128/AEM.00062-07>. The package also provides functions to read in FASTA'-formatted sequence data.
This package implements the routines and algorithms developed and analysed in "Multiple Systems Estimation for Sparse Capture Data: Inferential Challenges when there are Non-Overlapping Lists" Chan, L, Silverman, B. W., Vincent, K (2019) <arXiv:1902.05156>
. This package explicitly handles situations where there are pairs of lists which have no observed individuals in common. It deals correctly with parameters whose estimated values can be considered as being negative infinity. It also addresses other possible issues of non-existence and non-identifiability of maximum likelihood estimates.
This package provides useful tools for both users and developers of packages for fitting Bayesian models or working with output from Bayesian models. The primary goals of the package are to:
Efficiently convert between many different useful formats of draws (samples) from posterior or prior distributions.
Provide consistent methods for operations commonly performed on draws, for example, subsetting, binding, or mutating draws.
Provide various summaries of draws in convenient formats.
Provide lightweight implementations of state of the art posterior inference diagnostics.
This package provides probability mass, distribution, quantile, random-variate generation, and method-of-moments parameter-estimation functions for the Delaporte distribution with parameterization based on Vose (2008). The Delaporte is a discrete probability distribution which can be considered the convolution of a negative binomial distribution with a Poisson distribution. Alternatively, it can be considered a counting distribution with both Poisson and negative binomial components. It has been studied in actuarial science as a frequency distribution which has more variability than the Poisson, but less than the negative binomial.
This package provides tools for the analysis of reverse-phase protein arrays (RPPAs), which are also known as tissue lysate arrays or simply lysate arrays'. The package's primary purpose is to input a set of quantification files representing dilution series of samples and control points taken from scanned RPPA slides and determine a relative log concentration value for each valid dilution series present in each slide and provide graphical visualization of the input and output data and their relationships. Other optional features include generation of quality control scores for judging the quality of the input data, spatial adjustment of sample points based on controls added to the slides, and various types of normalization of calculated values across a set of slides. The package was derived from a previous package named SuperCurve
. For a detailed description of data inputs and outputs, usage information, and a list of related papers describing methods used in the package please review the vignette Guide_to_RPPASPACE'. RPPA SPACE: an R package for normalization and quantitation of Reverse-Phase Protein Array data'. Bioinformatics Nov 15;38(22):5131-5133. <doi: 10.1093/bioinformatics/btac665>.
Utility functions to download and process data produced by the ALARM Project, including 2020 redistricting files Kenny and McCartan
(2021) <https://alarm-redist.org/posts/2021-08-10-census-2020/> and the 50-State Redistricting Simulations of McCartan
, Kenny, Simko, Garcia, Wang, Wu, Kuriwaki, and Imai (2022) <doi:10.7910/DVN/SLCD3E>. The package extends the data introduced in McCartan
, Kenny, Simko, Garcia, Wang, Wu, Kuriwaki, and Imai (2022) <doi:10.1038/s41597-022-01808-2> to also include states with only a single district.
Paquete creado con el fin de facilitar el cálculo y distribución del à ndice Socio Material Territorial (ISMT), elaborado por el Observatorio de Ciudades UC. La metodologà a completa está disponible en "ISMT" (<https://ideocuc-ocuc.hub.arcgis.com/datasets/6ed956450cfc4293b7d90df3ce3474e4/about>) [Observatorio de Ciudades UC (2019)]. || Package created to facilitate the calculation and distribution of the Socio-Material Territorial Index by Observatorio de Ciudades UC. The full methodology is available at "ISMT" (<https://ideocuc-ocuc.hub.arcgis.com/datasets/6ed956450cfc4293b7d90df3ce3474e4/about>) [Observatorio de Ciudades UC (2019)].
The function get_parameters()
is intended to be used within a docker container to read keyword arguments from a .json file automagically. A tool.yaml file contains specifications on these keyword arguments, which are then passed as input to containerized R tools in the [tool-runner framework](<https://github.com/hydrocode-de/tool-runner>). A template for a containerized R tool, which can be used as a basis for developing new tools, is available at the following URL: <https://github.com/VForWaTer/tool_template_r>
.
Conducts a cointegration test for high-dimensional vector autoregressions (VARs) of order k based on the large N,T asymptotics of Bykhovskaya and Gorin, 2022 (<doi:10.48550/arXiv.2202.07150>
). The implemented test is a modification of the Johansen likelihood ratio test. In the absence of cointegration the test converges to the partial sum of the Airy-1 point process. This package contains simulated quantiles of the first ten partial sums of the Airy-1 point process that are precise up to the first three digits.
Neural network framework based on Generalized Additive Models from Hastie & Tibshirani (1990, ISBN:9780412343902), which trains a different neural network to estimate the contribution of each feature to the response variable. The networks are trained independently leveraging the local scoring and backfitting algorithms to ensure that the Generalized Additive Model converges and it is additive. The resultant Neural Network is a highly accurate and interpretable deep learning model, which can be used for high-risk AI practices where decision-making should be based on accountable and interpretable algorithms.
The past decade has demonstrated an increased need to better understand risks leading to systemic crises. This framework offers scholars, practitioners and policymakers a useful toolbox to explore such risks in financial systems. Specifically, this framework provides popular econometric and network measures to monitor systemic risk and to measure the consequences of regulatory decisions. These systemic risk measures are based on the frameworks of Adrian and Brunnermeier (2016) <doi:10.1257/aer.20120555> and Billio, Getmansky, Lo and Pelizzon (2012) <doi:10.1016/j.jfineco.2011.12.010>.
Sparse modeling provides a mean selecting a small number of non-zero effects from a large possible number of candidate effects. This package includes a suite of methods for sparse modeling: estimation via EM or MCMC, approximate confidence intervals with nominal coverage, and diagnostic and summary plots. The method can implement sparse linear regression and sparse probit regression. Beyond regression analyses, applications include subgroup analysis, particularly for conjoint experiments, and panel data. Future versions will include extensions to models with truncated outcomes, propensity score, and instrumental variable analysis.
This package creates a local Lightning Memory-Mapped Database ('LMDB') of many commonly used taxonomic authorities and provides functions that can quickly query this data. Supported taxonomic authorities include the Integrated Taxonomic Information System ('ITIS'), National Center for Biotechnology Information ('NCBI'), Global Biodiversity Information Facility ('GBIF'), Catalogue of Life ('COL'), and Open Tree Taxonomy ('OTT'). Name and identifier resolution using LMDB can be hundreds of times faster than either relational databases or internet-based queries. Precise data provenance information for data derived from naming providers is also included.
Dunn's test computes stochastic dominance & reports pairwise comparisons. This is done following a Kruskal-Wallis test (Kruskal and Wallis, 1952). It employs Dunn's z-test-statistic approximations for rank statistics, conducting k(k-1)/2 comparisons. The null hypothesis assumes that the probability of a randomly selected value from the first group being larger than one from the second group is one half, similar to the Wilcoxon-Mann-Whitney test. Dunn's test serves as a test for median difference and takes into account tied ranks.
Calculates several entropy metrics for spatial data inspired by Boltzmann's entropy formula. It includes metrics introduced by Cushman for landscape mosaics (Cushman (2015) <doi:10.1007/s10980-015-0305-2>), and landscape gradients and point patterns (Cushman (2021) <doi:10.3390/e23121616>); by Zhao and Zhang for landscape mosaics (Zhao and Zhang (2019) <doi:10.1007/s10980-019-00876-x>); and by Gao et al. for landscape gradients (Gao et al. (2018) <doi:10.1111/tgis.12315>; Gao and Li (2019) <doi:10.1007/s10980-019-00854-3>).
Noise in the time-series data significantly affects the accuracy of the Machine Learning (ML) models (Artificial Neural Network and Support Vector Regression are considered here). Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) decomposes the time series data into sub-series and help to improve the model performance. The models can achieve higher prediction accuracy than the traditional ML models. Two models have been provided here for time series forecasting. More information may be obtained from Garai and Paul (2023) <doi:10.1016/j.iswa.2023.200202>.