Implementation of the BRIk, FABRIk and FDEBRIk algorithms to initialise k-means. These methods are intended for the clustering of multivariate and functional data, respectively. They make use of the Modified Band Depth and bootstrap to identify appropriate initial seeds for k-means, which are proven to be better options than many techniques in the literature. Torrente and Romo (2021) <doi:10.1007/s00357-020-09372-3> It makes use of the functions kma and kma.similarity, from the archived package fdakma, by Alice Parodi et al.
This package provides various tools of for clustering multivariate angular data on the torus. The package provides angular adaptations of usual clustering methods such as the k-means clustering, pairwise angular distances, which can be used as an input for distance-based clustering algorithms, and implements clustering based on the conformal prediction framework. Options for the conformal scores include scores based on a kernel density estimate, multivariate von Mises mixtures, and naive k-means clusters. Moreover, the package provides some basic data handling tools for angular data.
Extract toxicological and chemical information from databases maintained by scientific agencies and resources, including the Comparative Toxicogenomics Database <https://ctdbase.org/>, the Integrated Chemical Environment <https://ice.ntp.niehs.nih.gov/>, the Integrated Risk Information System <https://cfpub.epa.gov/ncea/iris/>, Provisional Peer-Reviewed Toxicity Values <https://www.epa.gov/pprtv/provisional-peer-reviewed-toxicity-values-pprtvs-assessments>, the CompTox
Chemicals Dashboard Resource Hub <https://www.epa.gov/comptox-tools/comptox-chemicals-dashboard-resource-hub>, PubChem
<https://pubchem.ncbi.nlm.nih.gov/>, and others.
This package provides functions to help with analysis of longitudinal data featuring irregular observation times, where the observation times may be associated with the outcome process. There are functions to quantify the degree of irregularity, fit inverse-intensity weighted Generalized Estimating Equations (Lin H, Scharfstein DO, Rosenheck RA (2004) <doi:10.1111/j.1467-9868.2004.b5543.x>), perform multiple outputation (Pullenayegum EM (2016) <doi:10.1002/sim.6829>) and fit semi-parametric joint models (Liang Y (2009) <doi: 10.1111/j.1541-0420.2008.01104.x>).
This package provides an R interface to Julia', which is a high-level, high-performance dynamic programming language for numerical computing, see <https://julialang.org/> for more information. It provides a high-level interface as well as a low-level interface. Using the high level interface, you could call any Julia function just like any R function with automatic type conversion. Using the low level interface, you could deal with C-level SEXP directly while enjoying the convenience of using a high-level programming language like Julia'.
An API client for NASA POWER global meteorology, surface solar energy and climatology data API. POWER (Prediction Of Worldwide Energy Resources) data are freely available for download with varying spatial resolutions dependent on the original data and with several temporal resolutions depending on the POWER parameter and community. This work is funded through the NASA Earth Science Directorate Applied Science Program. For more on the data themselves, the methodologies used in creating, a web- based data viewer and web access, please see <https://power.larc.nasa.gov/>.
Classification based analysis of DNA sequences to taxonomic groupings. This package primarily implements Naive Bayesian Classifier from the Ribosomal Database Project. This approach has traditionally been used to classify 16S rRNA
gene sequences to bacterial taxonomic outlines; however, it can be used for any type of gene sequence. The method was originally described by Wang, Garrity, Tiedje, and Cole in Applied and Environmental Microbiology 73(16):5261-7 <doi:10.1128/AEM.00062-07>. The package also provides functions to read in FASTA'-formatted sequence data.
This package implements the routines and algorithms developed and analysed in "Multiple Systems Estimation for Sparse Capture Data: Inferential Challenges when there are Non-Overlapping Lists" Chan, L, Silverman, B. W., Vincent, K (2019) <arXiv:1902.05156>
. This package explicitly handles situations where there are pairs of lists which have no observed individuals in common. It deals correctly with parameters whose estimated values can be considered as being negative infinity. It also addresses other possible issues of non-existence and non-identifiability of maximum likelihood estimates.
This package provides multiple water chemistry-based models and published empirical models in one standard format. Functions can be chained together to model a complete treatment process and are designed to work in a tidyverse workflow. Models are primarily based on these sources: Benjamin, M. M. (2002, ISBN:147862308X), Crittenden, J. C., Trussell, R., Hand, D., Howe, J. K., & Tchobanoglous, G., Borchardt, J. H. (2012, ISBN:9781118131473), USEPA. (2001) <https://www.epa.gov/sites/default/files/2017-03/documents/wtp_model_v._2.0_manual_508.pdf>.
This package provides statistical tests for label-free LC-MS/MS data by spectral counts, to discover differentially expressed proteins between two biological conditions. Three tests are available: Poisson GLM regression, quasi-likelihood GLM regression, and the negative binomial of the edgeR package. The three models admit blocking factors to control for nuisance variables. To assure a good level of reproducibility a post-test filter is available, where we may set the minimum effect size considered biologicaly relevant, and the minimum expression of the most abundant condition.
The package provides functions to create and use transcript-centric annotation databases/packages. The annotation for the databases are directly fetched from Ensembl using their Perl API. The functionality and data is similar to that of the TxDb packages from the GenomicFeatures
package, but, in addition to retrieve all gene/transcript models and annotations from the database, the ensembldb
package also provides a filter framework allowing to retrieve annotations for specific entries like genes encoded on a chromosome region or transcript models of lincRNA genes.
Utility functions to download and process data produced by the ALARM Project, including 2020 redistricting files Kenny and McCartan
(2021) <https://alarm-redist.org/posts/2021-08-10-census-2020/> and the 50-State Redistricting Simulations of McCartan
, Kenny, Simko, Garcia, Wang, Wu, Kuriwaki, and Imai (2022) <doi:10.7910/DVN/SLCD3E>. The package extends the data introduced in McCartan
, Kenny, Simko, Garcia, Wang, Wu, Kuriwaki, and Imai (2022) <doi:10.1038/s41597-022-01808-2> to also include states with only a single district.
Paquete creado con el fin de facilitar el cálculo y distribución del à ndice Socio Material Territorial (ISMT), elaborado por el Observatorio de Ciudades UC. La metodologà a completa está disponible en "ISMT" (<https://ideocuc-ocuc.hub.arcgis.com/datasets/6ed956450cfc4293b7d90df3ce3474e4/about>) [Observatorio de Ciudades UC (2019)]. || Package created to facilitate the calculation and distribution of the Socio-Material Territorial Index by Observatorio de Ciudades UC. The full methodology is available at "ISMT" (<https://ideocuc-ocuc.hub.arcgis.com/datasets/6ed956450cfc4293b7d90df3ce3474e4/about>) [Observatorio de Ciudades UC (2019)].
The function get_parameters()
is intended to be used within a docker container to read keyword arguments from a .json file automagically. A tool.yaml file contains specifications on these keyword arguments, which are then passed as input to containerized R tools in the [tool-runner framework](<https://github.com/hydrocode-de/tool-runner>). A template for a containerized R tool, which can be used as a basis for developing new tools, is available at the following URL: <https://github.com/VForWaTer/tool_template_r>
.
Conducts a cointegration test for high-dimensional vector autoregressions (VARs) of order k based on the large N,T asymptotics of Bykhovskaya and Gorin, 2022 (<doi:10.48550/arXiv.2202.07150>
). The implemented test is a modification of the Johansen likelihood ratio test. In the absence of cointegration the test converges to the partial sum of the Airy-1 point process. This package contains simulated quantiles of the first ten partial sums of the Airy-1 point process that are precise up to the first three digits.
Neural network framework based on Generalized Additive Models from Hastie & Tibshirani (1990, ISBN:9780412343902), which trains a different neural network to estimate the contribution of each feature to the response variable. The networks are trained independently leveraging the local scoring and backfitting algorithms to ensure that the Generalized Additive Model converges and it is additive. The resultant Neural Network is a highly accurate and interpretable deep learning model, which can be used for high-risk AI practices where decision-making should be based on accountable and interpretable algorithms.
The past decade has demonstrated an increased need to better understand risks leading to systemic crises. This framework offers scholars, practitioners and policymakers a useful toolbox to explore such risks in financial systems. Specifically, this framework provides popular econometric and network measures to monitor systemic risk and to measure the consequences of regulatory decisions. These systemic risk measures are based on the frameworks of Adrian and Brunnermeier (2016) <doi:10.1257/aer.20120555> and Billio, Getmansky, Lo and Pelizzon (2012) <doi:10.1016/j.jfineco.2011.12.010>.
Sparse modeling provides a mean selecting a small number of non-zero effects from a large possible number of candidate effects. This package includes a suite of methods for sparse modeling: estimation via EM or MCMC, approximate confidence intervals with nominal coverage, and diagnostic and summary plots. The method can implement sparse linear regression and sparse probit regression. Beyond regression analyses, applications include subgroup analysis, particularly for conjoint experiments, and panel data. Future versions will include extensions to models with truncated outcomes, propensity score, and instrumental variable analysis.
This package creates a local Lightning Memory-Mapped Database ('LMDB') of many commonly used taxonomic authorities and provides functions that can quickly query this data. Supported taxonomic authorities include the Integrated Taxonomic Information System ('ITIS'), National Center for Biotechnology Information ('NCBI'), Global Biodiversity Information Facility ('GBIF'), Catalogue of Life ('COL'), and Open Tree Taxonomy ('OTT'). Name and identifier resolution using LMDB can be hundreds of times faster than either relational databases or internet-based queries. Precise data provenance information for data derived from naming providers is also included.
The software formalises a framework for classification and survival model evaluation in R. There are four stages; Data transformation, feature selection, model training, and prediction. The requirements of variable types and variable order are fixed, but specialised variables for functions can also be provided. The framework is wrapped in a driver loop that reproducibly carries out a number of cross-validation schemes. Functions for differential mean, differential variability, and differential distribution are included. Additional functions may be developed by the user, by creating an interface to the framework.
DriverNet
is a package to predict functional important driver genes in cancer by integrating genome data (mutation and copy number variation data) and transcriptome data (gene expression data). The different kinds of data are combined by an influence graph, which is a gene-gene interaction network deduced from pathway data. A greedy algorithm is used to find the possible driver genes, which may mutated in a larger number of patients and these mutations will push the gene expression values of the connected genes to some extreme values.
The iNETgrate
package provides functions to build a correlation network in which nodes are genes. DNA methylation and gene expression data are integrated to define the connections between genes. This network is used to identify modules (clusters) of genes. The biological information in each of the resulting modules is represented by an eigengene. These biological signatures can be used as features e.g., for classification of patients into risk categories. The resulting biological signatures are very robust and give a holistic view of the underlying molecular changes.
This package provides probability mass, distribution, quantile, random-variate generation, and method-of-moments parameter-estimation functions for the Delaporte distribution with parameterization based on Vose (2008). The Delaporte is a discrete probability distribution which can be considered the convolution of a negative binomial distribution with a Poisson distribution. Alternatively, it can be considered a counting distribution with both Poisson and negative binomial components. It has been studied in actuarial science as a frequency distribution which has more variability than the Poisson, but less than the negative binomial.
This package provides useful tools for both users and developers of packages for fitting Bayesian models or working with output from Bayesian models. The primary goals of the package are to:
Efficiently convert between many different useful formats of draws (samples) from posterior or prior distributions.
Provide consistent methods for operations commonly performed on draws, for example, subsetting, binding, or mutating draws.
Provide various summaries of draws in convenient formats.
Provide lightweight implementations of state of the art posterior inference diagnostics.