This package provides a higher-level interface to the torch package for defining, training, and fine-tuning neural networks, including its depth, powered by code generation. This package currently supports few to several architectures, namely feedforward (multi-layer perceptron) and recurrent neural networks (Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU)), while also reduces boilerplate torch code while enabling seamless integration with torch'. The model methods to train neural networks from this package also bridges to titanic ML frameworks in R, namely tidymodels ecosystem, which enables the parsnip model specifications, workflows, recipes, and tuning tools.
This package provides a collection of miscellaneous helper function for running multilevel/mixed models in lme4'. This package aims to provide functions to compute common tasks when estimating multilevel models such as computing the intraclass correlation and design effect, centering variables, estimating the proportion of variance explained at each level, pseudo-R squared, random intercept and slope reliabilities, tests for homogeneity of variance at level-1, and cluster robust and bootstrap standard errors. The tests and statistics reported in the package are from Raudenbush & Bryk (2002, ISBN:9780761919049), Hox et al. (2018, ISBN:9781138121362), and Snijders & Bosker (2012, ISBN:9781849202015).
This package provides tools for data-driven statistical analysis using local polynomial regression and kernel density estimation methods as described in Calonico, Cattaneo and Farrell (2018, <doi:10.1080/01621459.2017.1285776>): lprobust() for local polynomial point estimation and robust bias-corrected inference, lpbwselect() for local polynomial bandwidth selection, kdrobust() for kernel density point estimation and robust bias-corrected inference, kdbwselect() for kernel density bandwidth selection, and nprobust.plot() for plotting results. The main methodological and numerical features of this package are described in Calonico, Cattaneo and Farrell (2019, <doi:10.18637/jss.v091.i08>).
Calculate common types of tables for weighted survey data. Options include topline and (2-way and 3-way) crosstab tables of categorical or ordinal data as well as summary tables of weighted numeric variables. Optionally, include the margin of error at selected confidence intervals including the design effect. The design effect is calculated as described by Kish (1965) <doi:10.1002/bimj.19680100122> beginning on page 257. Output takes the form of tibbles (simple data frames). This package conveniently handles labelled data, such as that commonly used by Stata and SPSS. Complex survey design is not supported at this time.
This package implements two tests for same-source of toolmarks. The chumbley_non_random() test follows the paper "An Improved Version of a Tool Mark Comparison Algorithm" by Hadler and Morris (2017) <doi:10.1111/1556-4029.13640>. This is an extension of the Chumbley score as previously described in "Validation of Tool Mark Comparisons Obtained Using a Quantitative, Comparative, Statistical Algorithm" by Chumbley et al (2010) <doi:10.1111/j.1556-4029.2010.01424.x>. fixed_width_no_modeling() is based on correlation measures in a diamond shaped area of the toolmark as described in Hadler (2017).
The model, developed at the Vienna University of Technology, is a lumped conceptual rainfall-runoff model, following the structure of the HBV model. The model can also be run in a semi-distributed fashion and with dual representation of soil layer. The model runs on a daily or shorter time step and consists of a snow routine, a soil moisture routine and a flow routing routine. See Parajka, J., R. Merz, G. Bloeschl (2007) <DOI:10.1002/hyp.6253> Uncertainty and multiple objective calibration in regional water balance modelling: case study in 320 Austrian catchments, Hydrological Processes, 21, 435-446.
RNA degradation is monitored through measurement of RNA abundance after inhibiting RNA synthesis. This package has functions and example scripts to facilitate (1) data normalization, (2) data modeling using constant decay rate or time-dependent decay rate models, (3) the evaluation of treatment or genotype effects, and (4) plotting of the data and models. Data Normalization: functions and scripts make easy the normalization to the initial (T0) RNA abundance, as well as a method to correct for artificial inflation of Reads per Million (RPM) abundance in global assessments as the total size of the RNA pool decreases. Modeling: Normalized data is then modeled using maximum likelihood to fit parameters. For making treatment or genotype comparisons (up to four), the modeling step models all possible treatment effects on each gene by repeating the modeling with constraints on the model parameters (i.e., the decay rate of treatments A and B are modeled once with them being equal and again allowing them to both vary independently). Model Selection: The AICc value is calculated for each model, and the model with the lowest AICc is chosen. Modeling results of selected models are then compiled into a single data frame. Graphical Plotting: functions are provided to easily visualize decay data model, or half-life distributions using ggplot2 package functions.
This package provides functions for performing quick observations or evaluations of data, including a variety of ways to list objects by size, class, etc. The functions seqle and reverse.seqle mimic the base rle but can search for linear sequences. The function splatnd allows the user to generate zero-argument commands without the need for makeActiveBinding . Functions provided to convert from any base to any other base, and to find the n-th greatest max or n-th least min. In addition, functions which mimic Unix shell commands, including head', tail ,'pushd ,and popd'. Various other goodies included as well.
This package implements the Polynomial Maximization Method ('PMM') for parameter estimation in linear and time series models when error distributions deviate from normality. The PMM2 variant achieves lower variance parameter estimates compared to ordinary least squares ('OLS') when errors exhibit significant skewness. Includes methods for linear regression, AR'/'MA'/'ARMA'/'ARIMA models, and bootstrap inference. Methodology described in Zabolotnii, Warsza, and Tkachenko (2018) <doi:10.1007/978-3-319-77179-3_75>, Zabolotnii, Tkachenko, and Warsza (2022) <doi:10.1007/978-3-031-03502-9_37>, and Zabolotnii, Tkachenko, and Warsza (2023) <doi:10.1007/978-3-031-25844-2_21>.
Single unified interface for end-to-end modelling of regression, categorical and time-to-event (survival) outcomes. Models created using familiar are self-containing, and their use does not require additional information such as baseline survival, feature clustering, or feature transformation and normalisation parameters. Model performance, calibration, risk group stratification, (permutation) variable importance, individual conditional expectation, partial dependence, and more, are assessed automatically as part of the evaluation process and exported in tabular format and plotted, and may also be computed manually using export and plot functions. Where possible, metrics and values obtained during the evaluation process come with confidence intervals.
An optim-style implementation of the Stochastic Quasi-Gradient Differential Evolution (SQG-DE) optimization algorithm first published by Sala, Baldanzini, and Pierini (2018; <doi:10.1007/978-3-319-72926-8_27>). This optimization algorithm fuses the robustness of the population-based global optimization algorithm "Differential Evolution" with the efficiency of gradient-based optimization. The derivative-free algorithm uses population members to build stochastic gradient estimates, without any additional objective function evaluations. Sala, Baldanzini, and Pierini argue this algorithm is useful for difficult optimization problems under a tight function evaluation budget. This package can run SQG-DE in parallel and sequentially.
This package provides a comprehensive collection of tools for creating, manipulating and visualising pedigrees and genetic marker data. Pedigrees can be read from text files or created on the fly with built-in functions. A range of utilities enable modifications like adding or removing individuals, breaking loops, and merging pedigrees. An online tool for creating pedigrees interactively, based on pedtools', is available at <https://magnusdv.shinyapps.io/quickped>. pedtools is the hub of the pedsuite', a collection of packages for pedigree analysis. A detailed presentation of the pedsuite is given in the book Pedigree Analysis in R (Vigeland, 2021, ISBN:9780128244302).
The algorithm implemented in this package was designed to quickly estimates the distribution of the log-rank especially for heavy unbalanced groups. VALORATE estimates the null distribution and the p-value of the log-rank test based on a recent formulation. For a given number of alterations that define the size of survival groups, the estimation involves a weighted sum of distributions that are conditional on a co-occurrence term where mutations and events are both present. The estimation of conditional distributions is quite fast allowing the analysis of large datasets in few minutes <https://bioinformatics.mx/index.php/bioinfo-tools/>.
This is a supportive data package for the software package gage. However, the data supplied here are also useful for gene set or pathway analysis or microarray data analysis in general. In this package, we provide two demo microarray dataset: GSE16873 (a breast cancer dataset from GEO) and BMP6 (originally published as an demo dataset for GAGE, also registered as GSE13604 in GEO). This package also includes commonly used gene set data based on KEGG pathways and GO terms for major research species, including human, mouse, rat and budding yeast. Mapping data between common gene IDs for budding yeast are also included.
While gene signatures are frequently used to predict phenotypes (e.g. predict prognosis of cancer patients), it it not always clear how optimal or meaningful they are (cf David Venet, Jacques E. Dumont, and Vincent Detours paper "Most Random Gene Expression Signatures Are Significantly Associated with Breast Cancer Outcome"). Based on suggestions in that paper, SigCheck accepts a data set (as an ExpressionSet) and a gene signature, and compares its performance on survival and/or classification tasks against a) random gene signatures of the same length; b) known, related and unrelated gene signatures; and c) permuted data and/or metadata.
Scalable implementation of generalized mixed models with highly optimized C++ implementation and integration with Genomic Data Structure (GDS) files. It is designed for single variant tests and set-based aggregate tests in large-scale Phenome-wide Association Studies (PheWAS) with millions of variants and samples, controlling for sample structure and case-control imbalance. The implementation is based on the SAIGE R package (v0.45, Zhou et al. 2018 and Zhou et al. 2020), and it is extended to include the state-of-the-art ACAT-O set-based tests. Benchmarks show that SAIGEgds is significantly faster than the SAIGE R package.
Implementation of the nonparametric bounds for the average causal effect under an instrumental variable model by Balke and Pearl (Bounds on Treatment Effects from Studies with Imperfect Compliance, JASA, 1997, 92, 439, 1171-1176, <doi:10.2307/2965583>). The package can calculate bounds for a binary outcome, a binary treatment/phenotype, and an instrument with either 2 or 3 categories. The package implements bounds for situations where these 3 variables are measured in the same dataset (trivariate data) or where the outcome and instrument are measured in one study and the treatment/phenotype and instrument are measured in another study (bivariate data).
Calculates the carbon footprint of dairy farms based on methodologies of the International Dairy Federation and the Intergovernmental Panel on Climate Change. Includes tools for single-farm and batch analysis, report generation, and visualization. Methods follow International Dairy Federation (2022) "The IDF global Carbon Footprint standard for the dairy sector" (Bulletin of the IDF n° 520/2022) <doi:10.56169/FKRK7166> and IPCC (2019) "2019 Refinement to the 2006 IPCC Guidelines for National Greenhouse Gas Inventories, Chapter 10: Emissions from Livestock and Manure Management" <https://www.ipcc-nggip.iges.or.jp/public/2019rf/pdf/4_Volume4/19R_V4_Ch10_Livestock.pdf> guidelines.
This package provides a Bayesian framework for parameter inference in differential equations. This approach offers a rigorous methodology for parameter inference as well as modeling the link between unobservable model states and parameters, and observable quantities. Provides templates for the DE model, the observation model and data likelihood, and the model parameters and their prior distributions. A Markov chain Monte Carlo (MCMC) procedure processes these inputs to estimate the posterior distributions of the parameters and any derived quantities, including the model trajectories. Further functionality is provided to facilitate MCMC diagnostics and the visualisation of the posterior distributions of model parameters and trajectories.
Easily perform a Monte Carlo simulation to evaluate the cost and carbon, ecological, and water footprints of a set of ideal diets. Pre-processing tools are also available to quickly treat the data, along with basic statistical features to analyze the simulation results â including the ability to establish confidence intervals for selected parameters, such as nutrients and price/emissions. A standard version of the datasets employed is included as well, allowing users easy access to customization. This package brings to R the Python software initially developed by Vandevijvere, Young, Mackay, Swinburn and Gahegan (2018) <doi:10.1186/s12966-018-0648-6>.
Tailored explicitly for Experience Sampling Method (ESM) data, it contains a suite of functions designed to simplify preprocessing steps and create subsequent reporting. It empowers users with capabilities to extract critical insights during preprocessing, conducts thorough data quality assessments (e.g., design and sampling scheme checks, compliance rate, careless responses), and generates visualizations and concise summary tables tailored specifically for ESM data. Additionally, it streamlines the creation of informative and interactive preprocessing reports, enabling researchers to transparently share their dataset preprocessing methodologies. Finally, it is part of a larger ecosystem which includes a framework and a web gallery (<https://preprocess.esmtools.com/>).
Real-time quantitative polymerase chain reaction (qPCR) data by Guescini et al. (2008) <doi:10.1186/1471-2105-9-326> in tidy format. This package provides two data sets where the amplification efficiency has been modulated: either by changing the amplification mix concentration, or by increasing the concentration of IgG, a PCR inhibitor. Original raw data files: <https://static-content.springer.com/esm/art%3A10.1186%2F1471-2105-9-326/MediaObjects/12859_2008_2311_MOESM1_ESM.xls> and <https://static-content.springer.com/esm/art%3A10.1186%2F1471-2105-9-326/MediaObjects/12859_2008_2311_MOESM5_ESM.xls>.
Characterisation and calibration of single or multiple Ion Selective Electrodes (ISEs); activity estimation of experimental samples. Implements methods described in: Dillingham, P.W., Radu, T., Diamond, D., Radu, A. and McGraw, C.M. (2012) <doi:10.1002/elan.201100510>, Dillingham, P.W., Alsaedi, B.S.O. and McGraw, C.M. (2017) <doi:10.1109/ICSENS.2017.8233898>, Dillingham, P.W., Alsaedi, B.S.O., Radu, A., and McGraw, C.M. (2019) <doi:10.3390/s19204544>, and Dillingham, P.W., Alsaedi, B.S.O., Granados-Focil, S., Radu, A., and McGraw, C.M. (2020) <doi:10.1021/acssensors.9b02133>.
The sample mean and standard deviation are two commonly used statistics in meta-analyses, but some trials use other summary statistics such as the median and quartiles to report the results. Therefore, researchers need to transform those information back to the sample mean and standard deviation. This package implemented sample mean estimators by Luo et al. (2016) <arXiv:1505.05687>, sample standard deviation estimators by Wan et al. (2014) <arXiv:1407.8038>, and the best linear unbiased estimators (BLUEs) of location and scale parameters by Yang et al. (2018, submitted) based on sample quantiles derived summaries in a meta-analysis.