Traditional and spatial capture-mark-recapture analysis with multiple non-invasive marks. The models implemented in multimark combine encounter history data arising from two different non-invasive "marks", such as images of left-sided and right-sided pelage patterns of bilaterally asymmetrical species, to estimate abundance and related demographic parameters while accounting for imperfect detection. Bayesian models are specified using simple formulae and fitted using Markov chain Monte Carlo. Addressing deficiencies in currently available software, multimark also provides a user-friendly interface for performing Bayesian multimodel inference using non-spatial or spatial capture-recapture data consisting of a single conventional mark or multiple non-invasive marks. See McClintock (2015) <doi:10.1002/ece3.1676> and Maronde et al. (2020) <doi:10.1002/ece3.6990>.
This package provides gradient-based MCMC sampling algorithms for use with the MCMC engine provided by the nimble package. This includes two versions of Hamiltonian Monte Carlo (HMC) No-U-Turn (NUTS) sampling, and (under development) Langevin samplers. The `NUTS_classic` sampler implements the original HMC-NUTS algorithm as described in Hoffman and Gelman (2014) <doi:10.48550/arXiv.1111.4246>. The `NUTS` sampler is a modern version of HMC-NUTS sampling matching the HMC sampler available in version 2.32.2 of Stan (Stan Development Team, 2023). In addition, convenience functions are provided for generating and modifying MCMC configuration objects which employ HMC sampling. Functionality of the nimbleHMC package is described further in Turek, et al (2024) <doi: 10.21105/joss.06745>.
This is a computational package designed to identify the most sensitive interactions within a network which must be estimated most accurately in order to produce qualitatively robust predictions to a press perturbation. This is accomplished by enumerating the number of sign switches (and their magnitude) in the net effects matrix when an edge experiences uncertainty. The package produces data and visualizations when uncertainty is associated to one or more edges in the network and according to a variety of distributions. The software requires the network to be described by a system of differential equations but only requires as input a numerical Jacobian matrix evaluated at an equilibrium point. This package is based on Koslicki, D., & Novak, M. (2017) <doi:10.1007/s00285-017-1163-0>.
Creation of an individual claims simulator which generates various features of non-life insurance claims. An initial set of test parameters, designed to mirror the experience of an Auto Liability portfolio, were set up and applied by default to generate a realistic test data set of individual claims (see vignette). The simulated data set then allows practitioners to back-test the validity of various reserving models and to prove and/or disprove certain actuarial assumptions made in claims modelling. The distributional assumptions used to generate this data set can be easily modified by users to match their experiences. Reference: Avanzi B, Taylor G, Wang M, Wong B (2020) "SynthETIC: an individual insurance claim simulator with feature control" <doi:10.48550/arXiv.2008.05693>.
This package provides functions for the computation of F-, f- and D-statistics (e.g., Fst, hierarchical F-statistics, Patterson's F2, F3, F3*, F4 and D parameters) in population genomics studies from allele count or Pool-Seq read count data and for the fitting, building and visualization of admixture graphs. The package also includes several utilities to manipulate Pool-Seq data stored in standard format (e.g., such as vcf files or rsync files generated by the the PoPoolation software) and perform conversion to alternative format (as used in the BayPass and SelEstim software). As of version 2.0, the package also includes utilities to manipulate standard allele count data (e.g., stored in TreeMix, BayPass and SelEstim format).
The analysis and visualization of alternative splicing (AS) events from RNA sequencing data remains challenging. SpliceWiz is a user-friendly and performance-optimized R package for AS analysis, by processing alignment BAM files to quantify read counts across splice junctions, IRFinder-based intron retention quantitation, and supports novel splicing event identification. We introduce a novel visualization for AS using normalized coverage, thereby allowing visualization of differential AS across conditions. SpliceWiz features a shiny-based GUI facilitating interactive data exploration of results including gene ontology enrichment. It is performance optimized with multi-threaded processing of BAM files and a new COV file format for fast recall of sequencing coverage. Overall, SpliceWiz streamlines AS analysis, enabling reliable identification of functionally relevant AS events for further characterization.
The biomarker data set by Vermeulen et al. (2009) <doi:10.1016/S1470-2045(09)70154-8> is provided. The data source, however, is by Ruijter et al. (2013) <doi:10.1016/j.ymeth.2012.08.011>. The original data set may be downloaded from <https://medischebiologie.nl/wp-content/uploads/2019/02/qpcrdatamethods.zip>. This data set is for a real-time quantitative polymerase chain reaction (PCR) experiment that comprises the raw fluorescence data of 24,576 amplification curves. This data set comprises 59 genes of interest and 5 reference genes. Each gene was assessed on 366 neuroblastoma complementary DNA (cDNA) samples and on 18 standard dilution series samples (10-fold 5-point dilution series x 3 replicates + no template controls (NTC) x 3 replicates).
Analyze and compare conversations using various similarity measures including topic, lexical, semantic, structural, stylistic, sentiment, participant, and timing similarities. Supports both pairwise conversation comparisons and analysis of multiple dyads. Methods are based on established research: Topic modeling: Blei et al. (2003) <doi:10.1162/jmlr.2003.3.4-5.993>; Landauer et al. (1998) <doi:10.1080/01638539809545028>; Lexical similarity: Jaccard (1912) <doi:10.1111/j.1469-8137.1912.tb05611.x>; Semantic similarity: Salton & Buckley (1988) <doi:10.1016/0306-4573(88)90021-0>; Mikolov et al. (2013) <doi:10.48550/arXiv.1301.3781>; Pennington et al. (2014) <doi:10.3115/v1/D14-1162>; Structural and stylistic analysis: Graesser et al. (2004) <doi:10.1075/target.21131.ryu>; Sentiment analysis: Rinker (2019) <https://github.com/trinker/sentimentr>.
TimeScape is an automated tool for navigating temporal clonal evolution data. The key attributes of this implementation involve the enumeration of clones, their evolutionary relationships and their shifting dynamics over time. TimeScape requires two inputs: (i) the clonal phylogeny and (ii) the clonal prevalences. Optionally, TimeScape accepts a data table of targeted mutations observed in each clone and their allele prevalences over time. The output is the TimeScape plot showing clonal prevalence vertically, time horizontally, and the plot height optionally encoding tumour volume during tumour-shrinking events. At each sampling time point (denoted by a faint white line), the height of each clone accurately reflects its proportionate prevalence. These prevalences form the anchors for bezier curves that visually represent the dynamic transitions between time points.
This package contains functions carrying out adaptive procedures using mixed scaling approach to establish bioequivalence for in-vitro permeation test (IVPT) data. Currently, the package provides procedures based on parallel replicate design and balanced data, according to the U.S. Food and Drug Administration's "Draft Guidance on Acyclovir" <https://www.accessdata.fda.gov/drugsatfda_docs/psg/Acyclovir_topical%20cream_RLD%2021478_RV12-16.pdf>. Potvin et al. (2008) <doi:10.1002/pst.294> provides the basis for our adaptive design (see Method B). For a comprehensive overview of the method, refer to Lim et al. (2023) <doi:10.1002/pst.2333>. This package reflects the views of the authors and should not be construed to represent the views or policies of the U.S. Food and Drug Administration.
Compute a cyclist's Eddington number, including efficiently computing cumulative E over a vector. A cyclist's Eddington number <https://en.wikipedia.org/wiki/Arthur_Eddington#Eddington_number_for_cycling> is the maximum number satisfying the condition such that a cyclist has ridden E miles or greater on E distinct days. The algorithm in this package is an improvement over the conventional approach because both summary statistics and cumulative statistics can be computed in linear time, since it does not require initial sorting of the data. These functions may also be used for computing h-indices for authors, a metric described by Hirsch (2005) <doi:10.1073/pnas.0507655102>. Both are specific applications of computing the side length of a Durfee square <https://en.wikipedia.org/wiki/Durfee_square>.
This package provides a wrapper around the LIBLINEAR C/C++ library for machine learning (available at <https://www.csie.ntu.edu.tw/~cjlin/liblinear/>). LIBLINEAR is a simple library for solving large-scale regularized linear classification and regression. It currently supports L2-regularized classification (such as logistic regression, L2-loss linear SVM and L1-loss linear SVM) as well as L1-regularized classification (such as L2-loss linear SVM and logistic regression) and L2-regularized support vector regression (with L1- or L2-loss). The main features of LiblineaR include multi-class classification (one-vs-the rest, and Crammer & Singer method), cross validation for model selection, probability estimates (logistic regression only) or weights for unbalanced data. The estimation of the models is particularly fast as compared to other libraries.
Statistical Analyses and Pooling after Multiple Imputation. A large variety of repeated statistical analysis can be performed and finally pooled. Statistical analysis that are available are, among others, Levene's test, Odds and Risk Ratios, One sample proportions, difference between proportions and linear and logistic regression models. Functions can also be used in combination with the Pipe operator. More and more statistical analyses and pooling functions will be added over time. Heymans (2007) <doi:10.1186/1471-2288-7-33>. Eekhout (2017) <doi:10.1186/s12874-017-0404-7>. Wiel (2009) <doi:10.1093/biostatistics/kxp011>. Marshall (2009) <doi:10.1186/1471-2288-9-57>. Sidi (2021) <doi:10.1080/00031305.2021.1898468>. Lott (2018) <doi:10.1080/00031305.2018.1473796>. Grund (2021) <doi:10.31234/osf.io/d459g>.
Fits Bayesian time-course models for model-based network meta-analysis (MBNMA) that allows inclusion of multiple time-points from studies. Repeated measures over time are accounted for within studies by applying different time-course functions, following the method of Pedder et al. (2019) <doi:10.1002/jrsm.1351>. The method allows synthesis of studies with multiple follow-up measurements that can account for time-course for a single or multiple treatment comparisons. Several general time-course functions are provided; others may be added by the user. Various characteristics can be flexibly added to the models, such as correlation between time points and shared class effects. The consistency of direct and indirect evidence in the network can be assessed using unrelated mean effects models and/or by node-splitting.
Full dynamic system to describe and forecast the spread and the severity of a developing pandemic, based on available data. These data are number of infections, hospitalizations, deaths and recoveries notified each day. The system consists of three transitions, infection-infection, infection-hospital and hospital-death/recovery. The intensities of these transitions are dynamic and estimated using non-parametric local linear estimators. The package can be used to provide forecasts and survival indicators such as the median time spent in hospital and the probability that a patient who has been in hospital for a number of days can leave it alive. Methods are described in Gámiz, Mammen, Martà nez-Miranda, and Nielsen (2024) <doi:10.48550/arXiv.2308.09918> and <doi:10.48550/arXiv.2308.09919>.
This package implements methods for obtaining kernel density estimates subject to a variety of shape constraints (unimodality, bimodality, symmetry, tail monotonicity, bounds, and constraints on the number of inflection points). Enforcing constraints can eliminate unwanted waves or kinks in the estimate, which improves its subjective appearance and can also improve statistical performance. The main function scdensity() is very similar to the density() function in stats', allowing shape-restricted estimates to be obtained with little effort. The methods implemented in this package are described in Wolters and Braun (2017) <doi:10.1080/03610918.2017.1288247>, Wolters (2012) <doi:10.18637/jss.v047.i06>, and Hall and Huang (2002) <https://www3.stat.sinica.edu.tw/statistica/j12n4/j12n41/j12n41.htm>. See the scdensity() help for for full citations.
Several generalized / directional Fixed Sequence Multiple Testing Procedures (FSMTPs) are developed for testing a sequence of pre-ordered hypotheses while controlling the FWER, FDR and Directional Error (mdFWER). All three FWER controlling generalized FSMTPs are designed under arbitrary dependence, which allow any number of acceptances. Two FDR controlling generalized FSMTPs are respectively designed under arbitrary dependence and independence, which allow more but a given number of acceptances. Two mdFWER controlling directional FSMTPs are respectively designed under arbitrary dependence and independence, which can also make directional decisions based on the signs of the test statistics. The main functions for each proposed generalized / directional FSMTPs are designed to calculate adjusted p-values and critical values, respectively. For users convenience, the functions also provide the output option for printing decision rules.
Several Goodness-of-Fit (GoF) tests for Copulae are provided. A new hybrid test, Zhang et al. (2016) <doi:10.1016/j.jeconom.2016.02.017> is implemented which supports all of the individual tests in the package, e.g. Genest et al. (2009) <doi:10.1016/j.insmatheco.2007.10.005>. Estimation methods for the margins are provided and all the tests support parameter estimation and predefined values. The parameters are estimated by pseudo maximum likelihood but if it fails the estimation switches automatically to inversion of Kendall's tau. For reproducibility of results, the functions support the definition of seeds. Also all the tests support automatized parallelization of the bootstrapping tasks. The package provides an interface to perform new GoF tests by submitting the test statistic.
The MsQuality provides functionality to calculate quality metrics for mass spectrometry-derived, spectral data at the per-sample level. MsQuality relies on the mzQC framework of quality metrics defined by the Human Proteom Organization-Proteomics Standards Initiative (HUPO-PSI). These metrics quantify the quality of spectral raw files using a controlled vocabulary. The package is especially addressed towards users that acquire mass spectrometry data on a large scale (e.g. data sets from clinical settings consisting of several thousands of samples). The MsQuality package allows to calculate low-level quality metrics that require minimum information on mass spectrometry data: retention time, m/z values, and associated intensities. MsQuality relies on the Spectra package, or alternatively the MsExperiment package, and its infrastructure to store spectral data.
This package provides a comprehensive and automated workflow for managing multicollinearity in data frames with numeric and/or categorical variables. The package integrates five robust methods into a single function: (1) target encoding of categorical variables based on response values (Micci-Barreca, 2001 (Micci-Barreca, D. 2001 <doi:10.1145/507533.507538>); (2) automated feature prioritization to preserve key predictors during filtering; (3 and 4) pairwise correlation and VIF filtering across all variable types (numericâ numeric, numericâ categorical, and categoricalâ categorical); (5) adaptive correlation and VIF thresholds. Together, these methods enable a reliable multicollinearity management in most use cases while maintaining model integrity. The package also supports parallel processing and progress tracking via the packages future and progressr', and provides seamless integration with the tidymodels ecosystem through a dedicated recipe step.
This package provides the facility to perform the chi-square and G-square test of independence, calculates the retrospective power of the traditional chi-square test, compute permutation and Monte Carlo p-value, and provides measures of association for tables of any size such as Phi, Phi corrected, odds ratio with 95 percent CI and p-value, Yule Q and Y, adjusted contingency coefficient, Cramer's V, V corrected, V standardised, bias-corrected V, W, Cohen's w, Goodman-Kruskal's lambda, and tau. It also calculates standardised, moment-corrected standardised, and adjusted standardised residuals, and their significance, as well as the Quetelet Index, IJ association factor, and adjusted standardised counts. It also computes the chi-square-maximising version of the input table. Different outputs are returned in nicely formatted tables.
Vector autoregressive (VAR) model is a fundamental and effective approach for multivariate time series analysis. Shrinkage estimation methods can be applied to high-dimensional VAR models with dimensionality greater than the number of observations, contrary to the standard ordinary least squares method. This package is an integrative package delivering nonparametric, parametric, and semiparametric methods in a unified and consistent manner, such as the multivariate ridge regression in Golub, Heath, and Wahba (1979) <doi:10.2307/1268518>, a James-Stein type nonparametric shrinkage method in Opgen-Rhein and Strimmer (2007) <doi:10.1186/1471-2105-8-S2-S3>, and Bayesian estimation methods using noninformative and informative priors in Lee, Choi, and S.-H. Kim (2016) <doi:10.1016/j.csda.2016.03.007> and Ni and Sun (2005) <doi:10.1198/073500104000000622>.
The successor to the AlphaSim software for breeding program simulation [Faux et al. (2016) <doi:10.3835/plantgenome2016.02.0013>]. Used for stochastic simulations of breeding programs to the level of DNA sequence for every individual. Contained is a wide range of functions for modeling common tasks in a breeding program, such as selection and crossing. These functions allow for constructing simulations of highly complex plant and animal breeding programs via scripting in the R software environment. Such simulations can be used to evaluate overall breeding program performance and conduct research into breeding program design, such as implementation of genomic selection. Included is the Markovian Coalescent Simulator ('MaCS') for fast simulation of biallelic sequences according to a population demographic history [Chen et al. (2009) <doi:10.1101/gr.083634.108>].
This package provides access to word predictability estimates using large language models (LLMs) based on transformer architectures via integration with the Hugging Face ecosystem <https://huggingface.co/>. The package interfaces with pre-trained neural networks and supports both causal/auto-regressive LLMs (e.g., GPT-2') and masked/bidirectional LLMs (e.g., BERT') to compute the probability of words, phrases, or tokens given their linguistic context. For details on GPT-2 and causal models, see Radford et al. (2019) <https://storage.prod.researchhub.com/uploads/papers/2020/06/01/language-models.pdf>, for details on BERT and masked models, see Devlin et al. (2019) <doi:10.48550/arXiv.1810.04805>. By enabling a straightforward estimation of word predictability, the package facilitates research in psycholinguistics, computational linguistics, and natural language processing (NLP).