The ProteinGymR
package provides analysis-ready data resources from ProteinGym
, generated by Notin et al., 2023. ProteinGym
comprises a collection of benchmarks for evaluating the performance of models predicting the effect of point mutations. This package provides access to 1. Deep mutational scanning (DMS) scores from 217 assays measuring the impact of all possible amino acid substitutions across 186 proteins, 2. AlphaMissense
pathogenicity scores for ~1.6 M substitutions in the ProteinGym
DMS data, and 3. five performance metrics for 62 variant prediction models in a zero-shot setting.
Animalcules is an R package for utilizing up-to-date data analytics, visualization methods, and machine learning models to provide users an easy-to-use interactive microbiome analysis framework. It can be used as a standalone software package or users can explore their data with the accompanying interactive R Shiny application. Traditional microbiome analysis such as alpha/beta diversity and differential abundance analysis are enhanced, while new methods like biomarker identification are introduced by animalcules. Powerful interactive and dynamic figures generated by animalcules enable users to understand their data better and discover new insights.
This variant of the Racket BC (``before Chez'' or ``bytecode'') implementation is not recommended for general use. It uses CGC (a ``Conservative Garbage Collector''), which was succeeded as default in PLT Scheme version 370 (which translates to 3.7 in the current versioning scheme) by the 3M variant, which in turn was succeeded in version 8.0 by the Racket CS implementation.
Racket CGC is primarily used for bootstrapping Racket BC [3M]. It may also be used for embedding applications without the annotations needed in C code to use the 3M garbage collector.
The primary function makeCPMSampler()
generates a sampler function which performs the correlated pseudo-marginal method of Deligiannidis, Doucet and Pitt (2017) <arXiv:1511.04992>
. If the rho= argument of makeCPMSampler()
is set to 0, then the generated sampler function performs the original pseudo-marginal method of Andrieu and Roberts (2009) <DOI:10.1214/07-AOS574>. The sampler function is constructed with the user's choice of prior, parameter proposal distribution, and the likelihood approximation scheme. Note that this algorithm is not automatically tuned--each one of these arguments must be carefully chosen.
This package provides a clustering algorithm similar to K-Means is implemented, it has two main advantages, namely (a) The estimator is resistant to outliers, that means that results of estimator are still correct when there are atypical values in the sample and (b) The estimator is efficient, roughly speaking, if there are no outliers in the sample, results will be similar to those obtained by a classic algorithm (K-Means). Clustering procedure is carried out by minimizing the overall robust scale so-called tau scale. (see Gonzalez, Yohai and Zamar (2019) <arxiv:1906.08198>).
This package implements quantile smoothing. It contains a dataset used to produce human chromosomal ideograms for plotting purposes and a collection of arrays that contains data of chromosome 14 of 3 colorectal tumors. The package provides functions for painting chromosomal icons, chromosome or chromosomal idiogram and other types of plots. Quantsmooth offers options like converting chromosomal ids to their numeric form, retrieving the human chromosomal length from NCBI data, retrieving regions of interest in a vector of intensities using quantile smoothing, determining cytoband position based on the location of the probe, and other useful tools.
Computing comorbidity indices and scores such as the weighted Charlson score (Charlson, 1987 <doi:10.1016/0021-9681(87)90171-8>) and the Elixhauser comorbidity score (Elixhauser, 1998 <doi:10.1097/00005650-199801000-00004>) using ICD-9-CM or ICD-10 codes (Quan, 2005 <doi:10.1097/01.mlr.0000182534.19832.83>). Australian and Swedish modifications of the Charlson Comorbidity Index are available as well (Sundararajan, 2004 <doi:10.1016/j.jclinepi.2004.03.012> and Ludvigsson, 2021 <doi:10.2147/CLEP.S282475>), together with different weighting algorithms for both the Charlson and Elixhauser comorbidity scores.
In personalized medicine, one wants to know, for a given patient and his or her outcome for a predictor (pre-treatment variable), how likely it is that a treatment will be more beneficial than an alternative treatment. This package allows for the quantification of the predictive causal association (i.e., the association between the predictor variable and the individual causal effect of the treatment) and related metrics. Part of this software has been developed using funding provided from the European Union's 7th Framework Programme for research, technological development and demonstration under Grant Agreement no 602552.
Create interactive flow maps using FlowmapBlue
TypeScript
library <https://github.com/FlowmapBlue/FlowmapBlue>
, which is a free tool for representing aggregated numbers of movements between geographic locations as flow maps. It is used to visualize urban mobility, commuting behavior, bus, subway and air travels, bicycle sharing, human and bird migration, refugee flows, freight transportation, trade, supply chains, scientific collaboration, epidemiological and historical data and many other topics. The package allows to either create standalone flow maps in form of htmlwidgets and save them in HTML files, or integrate flow maps into Shiny applications.
After testing for biased treatment assignment in an observational study using an unaffected outcome, the sensitivity analysis is constrained to be compatible with that test. The package uses the optimization software gurobi obtainable from <https://www.gurobi.com/>, together with its associated R package, also called gurobi; see: <https://www.gurobi.com/documentation/7.0/refman/installing_the_r_package.html>. The method is a substantial computational and practical enhancement of a concept introduced in Rosenbaum (1992) Detecting bias with confidence in observational studies Biometrika, 79(2), 367-374 <doi:10.1093/biomet/79.2.367>.
Enables the user to find the country, region, district, city, coordinates, zip code, time zone, ISP, domain name, connection type, area code, weather, Mobile Country Code, Mobile Network Code, mobile brand name, elevation, usage type, address type, IAB category and Autonomous system information that any IP address or hostname originates from. Supported IPv4 and IPv6. Please visit <https://www.ip2location.com> to learn more. You may also want to visit <https://lite.ip2location.com> for free database download. This package requires IP2Location Python module. At the terminal, please run pip install IP2Location to install the module.
It provides miscellaneous sequence analysis functions for describing episodes in individual sequences, measuring association between domains in multidimensional sequence analysis (see Piccarreta (2017) <doi:10.1177/0049124115591013>), heat maps of sequence data, Globally Interdependent Multidimensional Sequence Analysis (see Robette et al (2015) <doi:10.1177/0081175015570976>), smoothing sequences for index plots (see Piccarreta (2012) <doi:10.1177/0049124112452394>), coding sequences for Qualitative Harmonic Analysis (see Deville (1982)), measuring stress from multidimensional scaling factors (see Piccarreta and Lior (2010) <doi:10.1111/j.1467-985X.2009.00606.x>), symmetrical (or canonical) Partial Least Squares (see Bry (1996)).
This package provides a wavelet-based LSTM model is a type of neural network architecture that uses wavelet technique to pre-process the input data before passing it through a Long Short-Term Memory (LSTM) network. The wavelet-based LSTM model is a powerful approach that combines the benefits of wavelet analysis and LSTM networks to improve the accuracy of predictions in various applications. This package has been developed using the algorithm of Anjoy and Paul (2017) and Paul and Garai (2021) <DOI:10.1007/s00521-017-3289-9> <doi:10.1007/s00500-021-06087-4>.
seq.hotSPOT
provides a resource for designing effective sequencing panels to help improve mutation capture efficacy for ultradeep sequencing projects. Using SNV datasets, this package designs custom panels for any tissue of interest and identify the genomic regions likely to contain the most mutations. Establishing efficient targeted sequencing panels can allow researchers to study mutation burden in tissues at high depth without the economic burden of whole-exome or whole-genome sequencing. This tool was developed to make high-depth sequencing panels to study low-frequency clonal mutations in clinically normal and cancerous tissues.
Shows statistics about bytes contained in a file as a circle graph of deviations from mean in sigma increments. The function can be useful for statistically analyze the content of files in a glimpse: text files are shown as a green centered crown, compressed and encrypted files should be shown as equally distributed variations with a very low CV (sigma/mean), and other types of files can be classified between these two categories depending on their text vs binary content, which can be useful to quickly determine how information is stored inside them (databases, multimedia files, etc).
This package implements two algorithms of detecting Bull and Bear markets in stock prices: the algorithm of Pagan and Sossounov (2002, <doi:10.1002/jae.664>) and the algorithm of Lunde and Timmermann (2004, <doi:10.1198/073500104000000136>). The package also contains functions for printing out the dating of the Bull and Bear states of the market, the descriptive statistics of the states, and functions for plotting the results. For the sake of convenience, the package includes the monthly and daily data on the prices (not adjusted for dividends) of the S&P 500 stock market index.
This package provides novel dendroclimatological methods, primarily used by the Tree-ring research community. There are four core functions. The first one is daily_response()
, which finds the optimal sequence of days that are related to one or more tree-ring proxy records. Similar function is daily_response_seascorr()
, which implements partial correlations in the analysis of daily response functions. For the enthusiast of monthly data, there is monthly_response()
function. The last core function is compare_methods()
, which effectively compares several linear and nonlinear regression algorithms on the task of climate reconstruction.
An interface for training Fuzzy DBScan with both Fuzzy Core and Fuzzy Border. Therefore, the package provides a method to initialize and run the algorithm and a function to predict new data w.t.h. of R6'. The package is build upon the paper "Fuzzy Extensions of the DBScan algorithm" from Ienco and Bordogna (2018) <doi:10.1007/s00500-016-2435-0>. A predict function assigns new data according to the same criteria as the algorithm itself. However, the prediction function freezes the algorithm to preserve the trained cluster structure and treats each new prediction object individually.
This package provides tools for statistical analysis using partitioning-based least squares regression as described in Cattaneo, Farrell and Feng (2019a, <arXiv:1804.04916>
) and Cattaneo, Farrell and Feng (2019b, <arXiv:1906.00202>
): lsprobust()
for nonparametric point estimation of regression functions and their derivatives and for robust bias-corrected (pointwise and uniform) inference; lspkselect()
for data-driven selection of the IMSE-optimal number of knots; lsprobust.plot()
for regression plots with robust confidence intervals and confidence bands; lsplincom()
for estimation and inference for linear combinations of regression functions from different groups.
This package performs hybrid multiple testing that incorporates method selection and assumption evaluations into the analysis using EBP estimates obtained by Grenander density estimation. For instance, for 3-group comparison analysis, Hybrid Multiple testing considers EBPs as weighted EBPs between F-test and H-test with EBPs from Shapiro Wilk test of normality as weight. Instead of just using EBPs from F-test only or using H-test only, this methodology combines both types of EBPs through EBPs from Shapiro Wilk test of normality. This methodology uses then the law of total EBPs.
Allows the construction selection indices based on estimated breeding values in animal and plant breeding and to calculate several analytic measures around to assess its impact on genetic and phenotypic progress. The methodology thereby allows to analyze genetic gain of traits in the breeding goal which are not part of the actual index and automatically computes several analytic measures. It further allows to retrospectively derive realized economic weights from observed genetic trends. The framework is described in Simianer, H., Heise, J., Rensing, S., Pook, T. Geibel, J. and Reimer, C. (2023) <doi:10.1186/s12711-023-00807-0>.
This package provides functions for cost-optimal control charts with a focus on health care applications. Compared to assumptions in traditional control chart theory, here, we allow random shift sizes, random repair and random sampling times. The package focuses on X-bar charts with a sample size of 1 (representing the monitoring of a single patient at a time). The methods are described in Zempleni et al. (2004) <doi:10.1002/asmb.521>, Dobi and Zempleni (2019) <doi:10.1002/qre.2518> and Dobi and Zempleni (2019) <http://ac.inf.elte.hu/Vol_049_2019/129_49.pdf>.
Computes A-, MV-, D- and E-optimal or near-optimal row-column designs for two-colour cDNA
microarray experiments using the linear fixed effects and mixed effects models where the interest is in a comparison of all pairwise treatment contrasts. The algorithms used in this package are based on the array exchange and treatment exchange algorithms adopted from Debusho, Gemechu and Haines (2016, unpublished) algorithms after adjusting for the row-column designs setup. The package also provides an optional method of using the graphical user interface (GUI) R package tcltk to ensure that it is user friendly.
Computational infrastructure for biogeography, community ecology, and biodiversity conservation (Daru et al. 2020) <doi:10.1111/2041-210X.13478>. It is based on the methods described in Daru et al. (2020) <doi:10.1038/s41467-020-15921-6>. The original conceptual work is described in Daru et al. (2017) <doi:10.1016/j.tree.2017.08.013> on patterns and processes of biogeographical regionalization. Additionally, the package contains fast and efficient functions to compute more standard conservation measures such as phylogenetic diversity, phylogenetic endemism, evolutionary distinctiveness and global endangerment, as well as compositional turnover (e.g., beta diversity).