This package implements methods that are useful in designing research studies and analyzing data, with particular emphasis on methods that are developed for or used within the behavioral, educational, and social sciences (broadly defined). That being said, many of the methods implemented within MBESS are applicable to a wide variety of disciplines. MBESS has a suite of functions for a variety of related topics, such as effect sizes, confidence intervals for effect sizes (including standardized effect sizes and noncentral effect sizes), sample size planning (from the accuracy in parameter estimation (AIPE), power analytic, equivalence, and minimum-risk point estimation perspectives), mediation analysis, various properties of distributions, and a variety of utility functions.
It is vital to assess the heterogeneity of treatment effects (HTE) when making health care decisions for an individual patient or a group of patients. Nevertheless, it remains challenging to evaluate HTE based on information collected from clinical studies that are often designed and conducted to evaluate the efficacy of a treatment for the overall population. The Bayesian framework offers a principled and flexible approach to estimate and compare treatment effects across subgroups of patients defined by their characteristics. This package allows users to explore a wide range of Bayesian HTE analysis models, and produce posterior inferences about HTE. See Wang et al. (2018) <DOI:10.18637/jss.v085.i07> for further details.
This package implements convex regression with interpretable sharp partitions (CRISP), which considers the problem of predicting an outcome variable on the basis of two covariates, using an interpretable yet non-additive model. CRISP partitions the covariate space into blocks in a data-adaptive way, and fits a mean model within each block. Unlike other partitioning methods, CRISP is fit using a non-greedy approach by solving a convex optimization problem, resulting in low-variance fits. More details are provided in Petersen, A., Simon, N., and Witten, D. (2016). Convex Regression with Interpretable Sharp Partitions. Journal of Machine Learning Research, 17(94): 1-31 <http://jmlr.org/papers/volume17/15-344/15-344.pdf>.
Computes the test statistic and p-value of the Cramer-von Mises and Anderson-Darling test for some continuous distribution functions proposed by Chen and Balakrishnan (1995) <http://asq.org/qic/display-item/index.html?item=11407>. In addition to our classic distribution functions here, we calculate the Goodness of Fit (GoF) test to dataset which follows the extreme value distribution function, without remembering the formula of distribution/density functions. Calculates the Value at Risk (VaR) and Average VaR are another important risk factors which are estimated by using well-known distribution functions. Pflug and Romisch (2007, ISBN: 9812707409) is a good reference to study the properties of risk measures.
Replacement for nls() tools for working with nonlinear least squares problems. The calling structure is similar to, but much simpler than, that of the nls() function. Moreover, where nls() specifically does NOT deal with small or zero residual problems, nlmrt is quite happy to solve them. It also attempts to be more robust in finding solutions, thereby avoiding singular gradient messages that arise in the Gauss-Newton method within nls(). The Marquardt-Nash approach in nlmrt generally works more reliably to get a solution, though this may be one of a set of possibilities, and may also be statistically unsatisfactory. Added print and summary as of August 28, 2012.
This package provides tools to compute and analyze the set of statistically-equivalent (Gaussian, linear) path models which generate the input precision or (partial) correlation matrix. This procedure is useful for understanding how statistical network models such as the Gaussian Graphical Model (GGM) perform as causal discovery tools. The statistical-equivalence set of a given GGM expresses the uncertainty we have about the sign, size and direction of directed relationships based on the weights matrix of the GGM alone. The derivation of the equivalence set and its use for understanding GGMs as causal discovery tools is described by Ryan, O., Bringmann, L.F., & Schuurman, N.K. (2022) <doi: 10.31234/osf.io/ryg69>.
Simulation methods for the Fisher Bingham distribution on the unit sphere, the matrix Bingham distribution on a Grassmann manifold, the matrix Fisher distribution on SO(3), and the bivariate von Mises sine model on the torus. The methods use an acceptance/rejection simulation algorithm for the Bingham distribution and are described fully by Kent, Ganeiber and Mardia (2018) <doi:10.1080/10618600.2017.1390468>. These methods supersede earlier MCMC simulation methods and are more general than earlier simulation methods. The methods can be slower in specific situations where there are existing non-MCMC simulation methods (see Section 8 of Kent, Ganeiber and Mardia (2018) <doi:10.1080/10618600.2017.1390468> for further details).
The soGGi package provides a toolset to create genomic interval aggregate/summary plots of signal or motif occurence from BAM and bigWig files as well as PWM, rlelist, GRanges and GAlignments Bioconductor objects. soGGi allows for normalisation, transformation and arithmetic operation on and between summary plot objects as well as grouping and subsetting of plots by GRanges objects and user supplied metadata. Plots are created using the GGplot2 libary to allow user defined manipulation of the returned plot object. Coupled together, soGGi features a broad set of methods to visualise genomics data in the context of groups of genomic intervals such as genes, superenhancers and transcription factor binding events.
This package provides functions for implementing the Generalized Bayesian Optimal Phase II (G-BOP2) design using various Particle Swarm Optimization (PSO) algorithms, including: - PSO-Default, based on Kennedy and Eberhart (1995) <doi:10.1109/ICNN.1995.488968>, "Particle Swarm Optimization"; - PSO-Quantum, based on Sun, Xu, and Feng (2004) <doi:10.1109/ICCIS.2004.1460396>, "A Global Search Strategy of Quantum-Behaved Particle Swarm Optimization"; - PSO-Dexp, based on Stehlà k et al. (2024) <doi:10.1016/j.asoc.2024.111913>, "A Double Exponential Particle Swarm Optimization with Non-Uniform Variates as Stochastic Tuning and Guaranteed Convergence to a Global Optimum with Sample Applications to Finding Optimal Exact Designs in Biostatistics"; - and PSO-GO.
Combining genomic prediction with Monte Carlo simulation, three different strategies are implemented to select parental lines for multiple traits in plant breeding. The selection strategies include (i) GEBV-O considers only genomic estimated breeding values (GEBVs) of the candidate individuals; (ii) GD-O considers only genomic diversity (GD) of the candidate individuals; and (iii) GEBV-GD considers both GEBV and GD. The above method can be seen in Chung PY, Liao CT (2020) <doi:10.1371/journal.pone.0243159>. Multi-trait genomic best linear unbiased prediction (MT-GBLUP) model is used to simultaneously estimate GEBVs of the target traits, and then a selection index is adopted to evaluate the composite performance of an individual.
Hidden Markov Model (HMM) based on symmetric lambda distribution framework is implemented for the study of return time-series in the financial market. Major features in the S&P500 index, such as regime identification, volatility clustering, and anti-correlation between return and volatility, can be extracted from HMM cleanly. Univariate symmetric lambda distribution is essentially a location-scale family of exponential power distribution. Such distribution is suitable for describing highly leptokurtic time series obtained from the financial market. It provides a theoretically solid foundation to explore such data where the normal distribution is not adequate. The HMM implementation follows closely the book: "Hidden Markov Models for Time Series", by Zucchini, MacDonald, Langrock (2016).
Microbial growth is often measured by growth curves i.e. a table of population sizes and times of measurements. This package allows to use such growth curve data to determine the duration of "microbial lag phase" i.e. the time needed for microbes to restart divisions. It implements the most commonly used methods to calculate the lag duration, these methods are discussed and described in Opalek et.al. 2022. Citation: Smug, B. J., Opalek, M., Necki, M., & Wloch-Salamon, D. (2024). Microbial lag calculator: A shiny-based application and an R package for calculating the duration of microbial lag phase. Methods in Ecology and Evolution, 15, 301â 307 <doi:10.1111/2041-210X.14269>.
Generates chronological and ordered p-plots for data vectors or vectors of p-values. The p-plot visualizes the evolution of the p-value of a significance test across the sampled data. It allows for assessing the consistency of the observed effects, for detecting the presence of potential moderator variables, and for estimating the influence of outlier values on the observed results. For non-significant findings, it can diagnose patterns indicative of underpowered study designs. The p-plot can thus either back the binary accept-vs-reject decision of common null-hypothesis significance tests, or it can qualify this decision and stimulate additional empirical work to arrive at more robust and replicable statistical inferences.
DVB-T dongles based on the Realtek RTL2832U can be used as a cheap software defined radio, since the chip allows transferring the raw I/Q samples to the host. rtl-sdr provides drivers for this purpose.
The default Linux driver managing DVB-T dongles as TV devices doesn't work for SDR purposes and clashes with this package. Therefore you must prevent the kernel from loading it automatically by adding the following line to your system configuration:
(kernel-arguments '("modprobe.blacklist=dvb_usb_rtl28xxu"))To install the rtl-sdr udev rules, you must extend 'udev-service-type' with this package. E.g.: (udev-rules-service 'rtl-sdr rtl-sdr)
Perform hierarchical Bayesian Aldrich-McKelvey scaling using Hamiltonian Monte Carlo via Stan'. Aldrich-McKelvey ('AM') scaling is a method for estimating the ideological positions of survey respondents and political actors on a common scale using positional survey data. The hierarchical versions of the Bayesian AM model included in this package outperform other versions both in terms of yielding meaningful posterior distributions for respondent positions and in terms of recovering true respondent positions in simulations. The package contains functions for preparing data, fitting models, extracting estimates, plotting key results, and comparing models using cross-validation. The original version of the default model is described in Bølstad (2024) <doi:10.1017/pan.2023.18>.
Optimal k Nearest Neighbours Ensemble is an ensemble of base k nearest neighbour models each constructed on a bootstrap sample with a random subset of features. k closest observations are identified for a test point "x" (say), in each base k nearest neighbour model to fit a stepwise regression to predict the output value of "x". The final predicted value of "x" is the mean of estimates given by all the models. The implemented model takes training and test datasets and trains the model on training data to predict the test data. Ali, A., Hamraz, M., Kumam, P., Khan, D.M., Khalil, U., Sulaiman, M. and Khan, Z. (2020) <DOI:10.1109/ACCESS.2020.3010099>.
This package provides functions for pooling/combining the results (i.e., p-values) from (dependent) hypothesis tests. Included are Fisher's method, Stouffer's method, the inverse chi-square method, the Bonferroni method, Tippett's method, and the binomial test. Each method can be adjusted based on an estimate of the effective number of tests or using empirically derived null distribution using pseudo replicates. For Fisher's, Stouffer's, and the inverse chi-square method, direct generalizations based on multivariate theory are also available (leading to Brown's method, Strube's method, and the generalized inverse chi-square method). An introduction can be found in Cinar and Viechtbauer (2022) <doi:10.18637/jss.v101.i01>.
COCOA is a method for understanding epigenetic variation among samples. COCOA can be used with epigenetic data that includes genomic coordinates and an epigenetic signal, such as DNA methylation and chromatin accessibility data. To describe the method on a high level, COCOA quantifies inter-sample variation with either a supervised or unsupervised technique then uses a database of "region sets" to annotate the variation among samples. A region set is a set of genomic regions that share a biological annotation, for instance transcription factor (TF) binding regions, histone modification regions, or open chromatin regions. COCOA can identify region sets that are associated with epigenetic variation between samples and increase understanding of variation in your data.
Quantification and differential analysis of mass-spectrometry proteomics data, with probabilistic recovery of information from missing values. Estimates the detection probability curve (DPC), which relates the probability of successful detection to the underlying expression level of each peptide, and uses it to incorporate peptide missing values into protein quantification and into subsequent differential expression analyses. The package produces objects suitable for downstream analysis in limma. The package accepts peptide-level data with missing values and produces complete protein quantifications without missing values. The uncertainty introduced by missing value imputation is propagated through to the limma analyses using variance modeling and precision weights. The package name "limpa" is an acronym for "Linear Models for Proteomics Data".
This package provides functions for estimating models using a Hierarchical Bayesian (HB) framework. The flexibility comes in allowing the user to specify the likelihood function directly instead of assuming predetermined model structures. Types of models that can be estimated with this code include the family of discrete choice models (Multinomial Logit, Mixed Logit, Nested Logit, Error Components Logit and Latent Class) as well ordered response models like ordered probit and ordered logit. In addition, the package allows for flexibility in specifying parameters as either fixed (non-varying across individuals) or random with continuous distributions. Parameter distributions supported include normal, positive/negative log-normal, positive/negative censored normal, and the Johnson SB distribution. Kenneth Train's Matlab and Gauss code for doing Hierarchical Bayesian estimation has served as the basis for a few of the functions included in this package. These Matlab/Gauss functions have been rewritten to be optimized within R. Considerable code has been added to increase the flexibility and usability of the code base. Train's original Gauss and Matlab code can be found here: <http://elsa.berkeley.edu/Software/abstracts/train1006mxlhb.html> See Train's chapter on HB in Discrete Choice with Simulation here: <http://elsa.berkeley.edu/books/choice2.html>; and his paper on using HB with non-normal distributions here: <http://eml.berkeley.edu//~train/trainsonnier.pdf>. The authors would also like to thank the invaluable contributions of Stephane Hess and the Choice Modelling Centre: <https://cmc.leeds.ac.uk/>.
This package provides functions for simulating Markov chains using the Barker proposal to compute Markov chain Monte Carlo (MCMC) estimates of expectations with respect to a target distribution on a real-valued vector space. The Barker proposal, described in Livingstone and Zanella (2022) <doi:10.1111/rssb.12482>, is a gradient-based MCMC algorithm inspired by the Barker accept-reject rule. It combines the robustness of simpler MCMC schemes, such as random-walk Metropolis, with the efficiency of gradient-based methods, such as the Metropolis adjusted Langevin algorithm. The key function provided by the package is sample_chain(), which allows sampling a Markov chain with a specified target distribution as its stationary distribution. The chain is sampled by generating proposals and accepting or rejecting them using a Metropolis-Hasting acceptance rule. During an initial warm-up stage, the parameters of the proposal distribution can be adapted, with adapters available to both: tune the scale of the proposals by coercing the average acceptance rate to a target value; tune the shape of the proposals to match covariance estimates under the target distribution. As well as the default Barker proposal, the package also provides implementations of alternative proposal distributions, such as (Gaussian) random walk and Langevin proposals. Optionally, if BridgeStan's R interface <https://roualdes.us/bridgestan/latest/languages/r.html>, available on GitHub <https://github.com/roualdes/bridgestan>, is installed, then BridgeStan can be used to specify the target distribution to sample from.
This package provides methods for (auto)covariance/correlation function estimation in change point regression with stationary errors circumventing the pre-estimation of the underlying signal of the observations. Generic, first-order, (m+1)-gapped, difference-based autocovariance function estimator is based on M. Levine and I. Tecuapetla-Gómez (2023) <doi:10.48550/arXiv.1905.04578>. Bias-reducing, second-order, (m+1)-gapped, difference-based estimator is based on I. Tecuapetla-Gómez and A. Munk (2017) <doi:10.1111/sjos.12256>. Robust autocovariance estimator for change point regression with autoregressive errors is based on S. Chakar et al. (2017) <doi:10.3150/15-BEJ782>. It also includes a general projection-based method for covariance matrix estimation.
In the USA, companies file different forms with the U.S. Securities and Exchange Commission (SEC) through EDGAR (Electronic Data Gathering, Analysis, and Retrieval system). The EDGAR database automated system collects all the different necessary filings and makes it publicly available. This package facilitates retrieving, storing, searching, and parsing of all the available filings on the EDGAR server. It downloads filings from SEC server in bulk with a single query. Additionally, it provides various useful functions: extracts 8-K triggering events, extract "Business (Item 1)" and "Management's Discussion and Analysis(Item 7)" sections of annual statements, searches filings for desired keywords, provides sentiment measures, parses filing header information, and provides HTML view of SEC filings.
This package performs dose assignment and trial simulation for the FBCRM (Fully Bayesian Continual Reassessment Method) and MFBCRM (Mixture Fully Bayesian Continual Reassessment Method) phase I clinical trial designs. These trial designs extend the Continual Reassessment Method (CRM) and Bayesian Model Averaging Continual Reassessment Method (BMA-CRM) by allowing the prior toxicity skeleton itself to be random, with posterior distributions obtained from Markov Chain Monte Carlo. On average, the FBCRM and MFBCRM methods outperformed the CRM and BMA-CRM methods in terms of selecting an optimal dose level across thousands of randomly generated simulation scenarios. Details on the methods and results of this simulation study are available on request, and the manuscript is currently under review.