This package provides gradient-based MCMC sampling algorithms for use with the MCMC engine provided by the nimble package. This includes two versions of Hamiltonian Monte Carlo (HMC) No-U-Turn (NUTS) sampling, and (under development) Langevin samplers. The `NUTS_classic` sampler implements the original HMC-NUTS algorithm as described in Hoffman and Gelman (2014) <doi:10.48550/arXiv.1111.4246>
. The `NUTS` sampler is a modern version of HMC-NUTS sampling matching the HMC sampler available in version 2.32.2 of Stan (Stan Development Team, 2023). In addition, convenience functions are provided for generating and modifying MCMC configuration objects which employ HMC sampling.
This package provides a versatile R visualization package that empowers researchers with comprehensive visualization tools for seamlessly mapping peptides to protein sequences, identifying distinct domains and regions of interest, accentuating mutations, and highlighting post-translational modifications, all while enabling comparisons across diverse experimental conditions. Potential applications of PepMapViz
include the visualization of cross-software mass spectrometry results at the peptide level for specific protein and domain details in a linearized format and post-translational modification coverage across different experimental conditions; unraveling insights into disease mechanisms. It also enables visualization of major histocompatibility complex-presented peptides in different antibody regions predicting immunogenicity in antibody drug development.
Hail is an open-source, general-purpose, python based data analysis tool with additional data types and methods for working with genomic data, see <https://hail.is/>. Hail is built to scale and has first-class support for multi-dimensional structured data, like the genomic data in a genome-wide association study (GWAS). Hail is exposed as a python library, using primitives for distributed queries and linear algebra implemented in scala', spark', and increasingly C++'. The sparkhail is an R extension using sparklyr package. The idea is to help R users to use hail functionalities with the well-know tidyverse syntax, see <https://www.tidyverse.org/>.
We develop a new class of distribution free multiple testing rules for false discovery rate (FDR) control under general dependence. A key element in our proposal is a symmetrized data aggregation (SDA) approach to incorporating the dependence structure via sample splitting, data screening and information pooling. The proposed SDA filter first constructs a sequence of ranking statistics that fulfill global symmetry properties, and then chooses a data driven threshold along the ranking to control the FDR. For more information, see the website below and the accompanying paper: Du et al. (2020), "False Discovery Rate Control Under General Dependence By Symmetrized Data Aggregation", <arXiv:2002.11992>
.
This package provides a set of functions devoted to multivariate exploratory statistics on textual data. Classical methods such as correspondence analysis and agglomerative hierarchical clustering are available. Chronologically constrained agglomerative hierarchical clustering enriched with labelled-by-words trees is offered. Given a division of the corpus into parts, their characteristic words and documents are identified. Further, accessing to FactoMineR
functions is very easy. Two of them are relevant in textual domain. MFA()
addresses multiple lexical table allowing applications such as dealing with multilingual corpora as well as simultaneously analyzing both open-ended and closed questions in surveys. See <http://xplortext.unileon.es> for examples.
DEPRECATED. Do not start building new projects based on this package. (The (in-house) APD file format was initially developed to store Affymetrix probe-level data, e.g. normalized CEL intensities. Chip types can be added to APD file and similar to methods in the affxparser package, this package provides methods to read APDs organized by units (probesets). In addition, the probe elements can be arranged optimally such that the elements are guaranteed to be read in order when, for instance, data is read unit by unit. This speeds up the read substantially. This package is supporting the Aroma framework and should not be used elsewhere.).
This package performs adjustments of a user-supplied independence loglikelihood function using a robust sandwich estimator of the parameter covariance matrix, based on the methodology in Chandler and Bate (2007) <doi:10.1093/biomet/asm015>. This can be used for cluster correlated data when interest lies in the parameters of the marginal distributions or for performing inferences that are robust to certain types of model misspecification. Functions for profiling the adjusted loglikelihoods are also provided, as are functions for calculating and plotting confidence intervals, for single model parameters, and confidence regions, for pairs of model parameters. Nested models can be compared using an adjusted likelihood ratio test.
Website generator with HTML summaries for predictive models. This package uses DALEX explainers to describe global model behavior. We can see how well models behave (tabs: Model Performance, Auditor), how much each variable contributes to predictions (tabs: Variable Response) and which variables are the most important for a given model (tabs: Variable Importance). We can also compare Concept Drift for pairs of models (tabs: Drifter). Additionally, data available on the website can be easily recreated in current R session. Work on this package was financially supported by the NCN Opus grant 2017/27/B/ST6/01307 at Warsaw University of Technology, Faculty of Mathematics and Information Science.
The Markowitz criterion is a multicriteria decision-making method that stands out in risk and uncertainty analysis in contexts where probabilities are known. This approach represents an evolution of Pascal's criterion by incorporating the dimension of variability. In this framework, the expected value reflects the anticipated return, while the standard deviation serves as a measure of risk. The markowitz package provides a practical and accessible tool for implementing this method, enabling researchers and professionals to perform analyses without complex calculations. Thus, the package facilitates the application of the Markowitz criterion. More details on the method can be found in Octave Jokung-Nguéna (2001, ISBN 2100055372).
This package provides spatially survey balanced designs using the quasi-random number method described Robinson et al. (2013) <doi:10.1111/biom.12059> and adjusted in Robinson et al. (2017) <doi:10.1016/j.spl.2017.05.004>. Designs using MBHdesign can: 1) accommodate, without substantial detrimental effects on spatial balance, legacy sites (Foster et al., 2017 <doi:10.1111/2041-210X.12782>); 2) be based on points or transects (foster et al. 2020 <doi:10.1111/2041-210X.13321> and produce clustered samples (Foster et al. (in press). Additional information about the package use itself is given in Foster (2021) <doi:10.1111/2041-210X.13535>.
Procedures for testing for group-wide signal in clusters of variables. Tests can be performed for single groups in isolation (univariate) or multiple groups together (multivariate). Specific tests include the exact and approximate (un)selective likelihood ratio tests described in Reid et al (2015), the selective F test and marginal screening prototype test of Reid and Tibshirani (2015). User may pre-specify columns to be included in prototype formation, or allow the function to select them itself. A mixture of these two is also possible. Any variable selection is accounted for using the selective inference framework. Options for non-sampling and hit-and-run null reference distributions.
The synchrosqueezed wavelet transform is implemented. The package is a translation of MATLAB Synchrosqueezing Toolbox, version 1.1 originally developed by Eugene Brevdo (2012). The C code for curve_ext was authored by Jianfeng Lu, and translated to Fortran by Dongik Jang. Synchrosqueezing is based on the papers: [1] Daubechies, I., Lu, J. and Wu, H. T. (2011) Synchrosqueezed wavelet transforms: An empirical mode decomposition-like tool. Applied and Computational Harmonic Analysis, 30. 243-261. [2] Thakur, G., Brevdo, E., Fukar, N. S. and Wu, H-T. (2013) The Synchrosqueezing algorithm for time-varying spectral analysis: Robustness properties and new paleoclimate applications. Signal Processing, 93, 1079-1094.
Identify and understand clusters of points (typically representing the locations of places or events) stored in simple-features (SF) objects. This is useful for analysing, for example, hot-spots of crime events. The package emphasises producing results from point SF data in a single step using reasonable default values for all other arguments, to aid rapid data analysis by users who are starting out. Functions available include kernel density estimation (for details, see Yip (2020) <doi:10.22224/gistbok/2020.1.12>), analysis of spatial association (Getis and Ord (1992) <doi:10.1111/j.1538-4632.1992.tb00261.x>) and hot-spot classification (Chainey (2020) ISBN:158948584X).
Selection index is one of the efficient and acurrate method for selection of animals. This package is useful for construction of selection indices. It uses mixed and random model least squares analysis to estimate the heritability of traits and genetic correlation between traits. The package uses the sire model as it is considered as random effect. The genetic and phenotypic (co)variances along with the relative economic values are used to construct the selection index for any number of traits. It also estimates the accuracy of the index and the genetic gain expected for different traits. Fisher (1936) <doi:10.1111/j.1469-1809.1936.tb02137.x>.
The Wavelet Decomposition followed by Random Forest Regression (RF) models have been applied for time series forecasting. The maximum overlap discrete wavelet transform (MODWT) algorithm was chosen as it works for any length of the series. The series is first divided into training and testing sets. In each of the wavelet decomposed series, the supervised machine learning approach namely random forest was employed to train the model. This package also provides accuracy metrics in the form of Root Mean Square Error (RMSE) and Mean Absolute Prediction Error (MAPE). This package is based on the algorithm of Ding et al. (2021) <DOI: 10.1007/s11356-020-12298-3>.
Mass-spectrometry based UPS proteomics data sets from Ramus C, Hovasse A, Marcellin M, Hesse AM, Mouton-Barbosa E, Bouyssie D, Vaca S, Carapito C, Chaoui K, Bruley C, Garin J, Cianferani S, Ferro M, Dorssaeler AV, Burlet-Schiltz O, Schaeffer C, Coute Y, Gonzalez de Peredo A. Spiked proteomic standard dataset for testing label-free quantitative software and statistical methods. Data Brief. 2015 Dec 17;6:286-94 and Giai Gianetto, Q., Combes, F., Ramus, C., Bruley, C., Coute, Y., Burger, T. (2016). Calibration plot for proteomics: A graphical tool to visually check the assumptions underlying FDR control in quantitative experiments. Proteomics, 16(1), 29-32.
This package is designed for the import, quality control, analysis, and visualization of methylation data generated using Sequenom's MassArray
platform. The tools herein contain a highly detailed amplicon prediction for optimal assay design. Also included are quality control measures of data, such as primer dimer and bisulfite conversion efficiency estimation. Methylation data are calculated using the same algorithms contained in the EpiTyper
software package. Additionally, automatic SNP-detection can be used to flag potentially confounded data from specific CG sites. Visualization includes barplots of methylation data as well as UCSC Genome Browser-compatible BED tracks. Multiple assays can be positionally combined for integrated analysis.
Calculates key indicators such as fertility rates (Total Fertility Rate (TFR), General Fertility Rate (GFR), and Age Specific Fertility Rate (ASFR)) using Demographic and Health Survey (DHS) women/individual data, childhood mortality probabilities and rates such as Neonatal Mortality Rate (NNMR), Post-neonatal Mortality Rate (PNNMR), Infant Mortality Rate (IMR), Child Mortality Rate (CMR), and Under-five Mortality Rate (U5MR), and adult mortality indicators such as the Age Specific Mortality Rate (ASMR), Age Adjusted Mortality Rate (AAMR), Age Specific Maternal Mortality Rate (ASMMR), Age Adjusted Maternal Mortality Rate (AAMMR), Age Specific Pregnancy Related Mortality Rate (ASPRMR), Age Adjusted Pregnancy Related Mortality Rate (AAPRMR), Maternal Mortality Ratio (MMR) and Pregnancy Related Mortality Ratio (PRMR). In addition to the indicators, the DHS.rates package estimates sampling errors indicators such as Standard Error (SE), Design Effect (DEFT), Relative Standard Error (RSE) and Confidence Interval (CI). The package is developed according to the DHS methodology of calculating the fertility indicators and the childhood mortality rates outlined in the "Guide to DHS Statistics" (Croft, Trevor N., Aileen M. J. Marshall, Courtney K. Allen, et al. 2018, <https://dhsprogram.com/Data/Guide-to-DHS-Statistics/index.cfm>) and the DHS methodology of estimating the sampling errors indicators outlined in the "DHS Sampling and Household Listing Manual" (ICF International 2012, <https://dhsprogram.com/pubs/pdf/DHSM4/DHS6_Sampling_Manual_Sept2012_DHSM4.pdf>).
Addressing measurement error in covariates and misclassification in binary outcome variables within causal inference, the ATE.ERROR package implements inverse probability weighted estimation methods proposed by Shu and Yi (2017, <doi:10.1177/0962280217743777>; 2019, <doi:10.1002/sim.8073>). These methods correct errors to accurately estimate average treatment effects (ATE). The package includes two main functions: ATE.ERROR.Y()
for handling misclassification in the outcome variable and ATE.ERROR.XY()
for correcting both outcome misclassification and covariate measurement error. It employs logistic regression for treatment assignment and uses bootstrap sampling to calculate standard errors and confidence intervals, with simulated datasets provided for practical demonstration.
Gene Symbols or Ensembl Gene IDs are converted using the Bimap interface in AnnotationDbi
in convertId2()
but that function is only provided as fallback mechanism for the most common use cases in data analysis. The main function in the package is convert.bm()
which queries BioMart
using the full capacity of the API provided through the biomaRt
package. Presets and defaults are provided for convenience but all "marts", "filters" and "attributes" can be set by the user. Function convert.alias()
converts Gene Symbols to Aliases and vice versa and function likely_symbol()
attempts to determine the most likely current Gene Symbol.
This package performs biomedical named entity recognition, Unified Medical Language System (UMLS) concept mapping, and negation detection using the Python spaCy
', scispaCy
', and medspaCy
packages, and transforms extracted data into a wide format for inclusion in machine learning models. The development of the scispaCy
package is described by Neumann (2019) <doi:10.18653/v1/W19-5034>. The medspacy package uses ConText
', an algorithm for determining the context of clinical statements described by Harkema (2009) <doi:10.1016/j.jbi.2009.05.002>. Clinspacy also supports entity embeddings from scispaCy
and UMLS cui2vec concept embeddings developed by Beam (2018) <arXiv:1804.01486>
.
This package provides utility functions, distributions, and fitting methods for Bayesian Spatial Capture-Recapture (SCR) and Open Population Spatial Capture-Recapture (OPSCR) modelling using the nimble package (de Valpine et al. 2017 <doi:10.1080/10618600.2016.1172487 >). Development of the package was motivated primarily by the need for flexible and efficient analysis of large-scale SCR data (Bischof et al. 2020 <doi:10.1073/pnas.2011383117 >). Computational methods and techniques implemented in nimbleSCR
include those discussed in Turek et al. 2021 <doi:10.1002/ecs2.3385>; among others. For a recent application of nimbleSCR
, see Milleret et al. (2021) <doi:10.1098/rsbl.2021.0128>.
Supports risk assessors in performing the entry step of the quantitative Pest Risk Assessment. It allows the estimation of the amount of a plant pest entering a risk assessment area (in terms of founder populations) through the calculation of the imported commodities that could be potential pathways of pest entry, and the development of a pathway model. Two Shiny apps based on the functionalities of the package are included, that simplify the process of assessing the risk of entry of plant pests. The approach is based on the work of the European Food Safety Authority (EFSA PLH Panel et al., 2018) <doi:10.2903/j.efsa.2018.5350>.
Sequence detector in this package contains a specific automaton model that can be used to learn and detect data and process sequences. Automaton model in this package is capable of learning and tracing sequences. Automaton model can be found in Krleža, Vrdoljak, BrÄ iÄ (2019) <doi:10.1109/ACCESS.2019.2955245>. This research has been partly supported under Competitiveness and Cohesion Operational Programme from the European Regional and Development Fund, as part of the Integrated Anti-Fraud System project no. KK.01.2.1.01.0041. This research has also been partly supported by the European Regional Development Fund under the grant KK.01.1.1.01.0009.