This package provides tools to compute and analyze the set of statistically-equivalent (Gaussian, linear) path models which generate the input precision or (partial) correlation matrix. This procedure is useful for understanding how statistical network models such as the Gaussian Graphical Model (GGM) perform as causal discovery tools. The statistical-equivalence set of a given GGM expresses the uncertainty we have about the sign, size and direction of directed relationships based on the weights matrix of the GGM alone. The derivation of the equivalence set and its use for understanding GGMs as causal discovery tools is described by Ryan, O., Bringmann, L.F., & Schuurman, N.K. (2022) <doi: 10.31234/osf.io/ryg69>.
Simulation methods for the Fisher Bingham distribution on the unit sphere, the matrix Bingham distribution on a Grassmann manifold, the matrix Fisher distribution on SO(3), and the bivariate von Mises sine model on the torus. The methods use an acceptance/rejection simulation algorithm for the Bingham distribution and are described fully by Kent, Ganeiber and Mardia (2018) <doi:10.1080/10618600.2017.1390468>. These methods supersede earlier MCMC simulation methods and are more general than earlier simulation methods. The methods can be slower in specific situations where there are existing non-MCMC simulation methods (see Section 8 of Kent, Ganeiber and Mardia (2018) <doi:10.1080/10618600.2017.1390468> for further details).
The soGGi
package provides a toolset to create genomic interval aggregate/summary plots of signal or motif occurence from BAM and bigWig
files as well as PWM, rlelist, GRanges and GAlignments Bioconductor objects. soGGi
allows for normalisation, transformation and arithmetic operation on and between summary plot objects as well as grouping and subsetting of plots by GRanges objects and user supplied metadata. Plots are created using the GGplot2 libary to allow user defined manipulation of the returned plot object. Coupled together, soGGi
features a broad set of methods to visualise genomics data in the context of groups of genomic intervals such as genes, superenhancers and transcription factor binding events.
This package implements methods that are useful in designing research studies and analyzing data, with particular emphasis on methods that are developed for or used within the behavioral, educational, and social sciences (broadly defined). That being said, many of the methods implemented within MBESS are applicable to a wide variety of disciplines. MBESS has a suite of functions for a variety of related topics, such as effect sizes, confidence intervals for effect sizes (including standardized effect sizes and noncentral effect sizes), sample size planning (from the accuracy in parameter estimation (AIPE), power analytic, equivalence, and minimum-risk point estimation perspectives), mediation analysis, various properties of distributions, and a variety of utility functions.
This package provides functions for implementing the Generalized Bayesian Optimal Phase II (G-BOP2) design using various Particle Swarm Optimization (PSO) algorithms, including: - PSO-Default, based on Kennedy and Eberhart (1995) <doi:10.1109/ICNN.1995.488968>, "Particle Swarm Optimization"; - PSO-Quantum, based on Sun, Xu, and Feng (2004) <doi:10.1109/ICCIS.2004.1460396>, "A Global Search Strategy of Quantum-Behaved Particle Swarm Optimization"; - PSO-Dexp, based on Stehlà k et al. (2024) <doi:10.1016/j.asoc.2024.111913>, "A Double Exponential Particle Swarm Optimization with Non-Uniform Variates as Stochastic Tuning and Guaranteed Convergence to a Global Optimum with Sample Applications to Finding Optimal Exact Designs in Biostatistics"; - and PSO-GO.
Combining genomic prediction with Monte Carlo simulation, three different strategies are implemented to select parental lines for multiple traits in plant breeding. The selection strategies include (i) GEBV-O considers only genomic estimated breeding values (GEBVs) of the candidate individuals; (ii) GD-O considers only genomic diversity (GD) of the candidate individuals; and (iii) GEBV-GD considers both GEBV and GD. The above method can be seen in Chung PY, Liao CT (2020) <doi:10.1371/journal.pone.0243159>. Multi-trait genomic best linear unbiased prediction (MT-GBLUP) model is used to simultaneously estimate GEBVs of the target traits, and then a selection index is adopted to evaluate the composite performance of an individual.
Hidden Markov Model (HMM) based on symmetric lambda distribution framework is implemented for the study of return time-series in the financial market. Major features in the S&P500 index, such as regime identification, volatility clustering, and anti-correlation between return and volatility, can be extracted from HMM cleanly. Univariate symmetric lambda distribution is essentially a location-scale family of exponential power distribution. Such distribution is suitable for describing highly leptokurtic time series obtained from the financial market. It provides a theoretically solid foundation to explore such data where the normal distribution is not adequate. The HMM implementation follows closely the book: "Hidden Markov Models for Time Series", by Zucchini, MacDonald
, Langrock (2016).
Microbial growth is often measured by growth curves i.e. a table of population sizes and times of measurements. This package allows to use such growth curve data to determine the duration of "microbial lag phase" i.e. the time needed for microbes to restart divisions. It implements the most commonly used methods to calculate the lag duration, these methods are discussed and described in Opalek et.al. 2022. Citation: Smug, B. J., Opalek, M., Necki, M., & Wloch-Salamon, D. (2024). Microbial lag calculator: A shiny-based application and an R package for calculating the duration of microbial lag phase. Methods in Ecology and Evolution, 15, 301â 307 <doi:10.1111/2041-210X.14269>.
Generates chronological and ordered p-plots for data vectors or vectors of p-values. The p-plot visualizes the evolution of the p-value of a significance test across the sampled data. It allows for assessing the consistency of the observed effects, for detecting the presence of potential moderator variables, and for estimating the influence of outlier values on the observed results. For non-significant findings, it can diagnose patterns indicative of underpowered study designs. The p-plot can thus either back the binary accept-vs-reject decision of common null-hypothesis significance tests, or it can qualify this decision and stimulate additional empirical work to arrive at more robust and replicable statistical inferences.
Perform hierarchical Bayesian Aldrich-McKelvey
scaling using Hamiltonian Monte Carlo via Stan'. Aldrich-McKelvey
('AM') scaling is a method for estimating the ideological positions of survey respondents and political actors on a common scale using positional survey data. The hierarchical versions of the Bayesian AM model included in this package outperform other versions both in terms of yielding meaningful posterior distributions for respondent positions and in terms of recovering true respondent positions in simulations. The package contains functions for preparing data, fitting models, extracting estimates, plotting key results, and comparing models using cross-validation. The original version of the default model is described in Bølstad (2024) <doi:10.1017/pan.2023.18>.
Optimal k Nearest Neighbours Ensemble is an ensemble of base k nearest neighbour models each constructed on a bootstrap sample with a random subset of features. k closest observations are identified for a test point "x" (say), in each base k nearest neighbour model to fit a stepwise regression to predict the output value of "x". The final predicted value of "x" is the mean of estimates given by all the models. The implemented model takes training and test datasets and trains the model on training data to predict the test data. Ali, A., Hamraz, M., Kumam, P., Khan, D.M., Khalil, U., Sulaiman, M. and Khan, Z. (2020) <DOI:10.1109/ACCESS.2020.3010099>.
This package provides functions for pooling/combining the results (i.e., p-values) from (dependent) hypothesis tests. Included are Fisher's method, Stouffer's method, the inverse chi-square method, the Bonferroni method, Tippett's method, and the binomial test. Each method can be adjusted based on an estimate of the effective number of tests or using empirically derived null distribution using pseudo replicates. For Fisher's, Stouffer's, and the inverse chi-square method, direct generalizations based on multivariate theory are also available (leading to Brown's method, Strube's method, and the generalized inverse chi-square method). An introduction can be found in Cinar and Viechtbauer (2022) <doi:10.18637/jss.v101.i01>.
COCOA is a method for understanding epigenetic variation among samples. COCOA can be used with epigenetic data that includes genomic coordinates and an epigenetic signal, such as DNA methylation and chromatin accessibility data. To describe the method on a high level, COCOA quantifies inter-sample variation with either a supervised or unsupervised technique then uses a database of "region sets" to annotate the variation among samples. A region set is a set of genomic regions that share a biological annotation, for instance transcription factor (TF) binding regions, histone modification regions, or open chromatin regions. COCOA can identify region sets that are associated with epigenetic variation between samples and increase understanding of variation in your data.
Quantification and differential analysis of mass-spectrometry proteomics data, with probabilistic recovery of information from missing values. Estimates the detection probability curve (DPC), which relates the probability of successful detection to the underlying expression level of each peptide, and uses it to incorporate peptide missing values into protein quantification and into subsequent differential expression analyses. The package produces objects suitable for downstream analysis in limma. The package accepts peptide-level data with missing values and produces complete protein quantifications without missing values. The uncertainty introduced by missing value imputation is propagated through to the limma analyses using variance modeling and precision weights. The package name "limpa" is an acronym for "Linear Models for Proteomics Data".
DVB-T dongles based on the Realtek RTL2832U can be used as a cheap software defined radio, since the chip allows transferring the raw I/Q samples to the host. rtl-sdr
provides drivers for this purpose.
The default Linux driver managing DVB-T dongles as TV devices doesn't work for SDR purposes and clashes with this package. Therefore you must prevent the kernel from loading it automatically by adding the following line to your system configuration:
(kernel-arguments '("modprobe.blacklist=dvb_usb_rtl28xxu"))
To install the rtl-sdr udev rules, you must extend 'udev-service-type' with this package. E.g.: (udev-rules-service 'rtl-sdr rtl-sdr)
This package provides methods for (auto)covariance/correlation function estimation in change point regression with stationary errors circumventing the pre-estimation of the underlying signal of the observations. Generic, first-order, (m+1)-gapped, difference-based autocovariance function estimator is based on M. Levine and I. Tecuapetla-Gómez (2023) <doi:10.48550/arXiv.1905.04578>
. Bias-reducing, second-order, (m+1)-gapped, difference-based estimator is based on I. Tecuapetla-Gómez and A. Munk (2017) <doi:10.1111/sjos.12256>. Robust autocovariance estimator for change point regression with autoregressive errors is based on S. Chakar et al. (2017) <doi:10.3150/15-BEJ782>. It also includes a general projection-based method for covariance matrix estimation.
In the USA, companies file different forms with the U.S. Securities and Exchange Commission (SEC) through EDGAR (Electronic Data Gathering, Analysis, and Retrieval system). The EDGAR database automated system collects all the different necessary filings and makes it publicly available. This package facilitates retrieving, storing, searching, and parsing of all the available filings on the EDGAR server. It downloads filings from SEC server in bulk with a single query. Additionally, it provides various useful functions: extracts 8-K triggering events, extract "Business (Item 1)" and "Management's Discussion and Analysis(Item 7)" sections of annual statements, searches filings for desired keywords, provides sentiment measures, parses filing header information, and provides HTML view of SEC filings.
This package performs dose assignment and trial simulation for the FBCRM (Fully Bayesian Continual Reassessment Method) and MFBCRM (Mixture Fully Bayesian Continual Reassessment Method) phase I clinical trial designs. These trial designs extend the Continual Reassessment Method (CRM) and Bayesian Model Averaging Continual Reassessment Method (BMA-CRM) by allowing the prior toxicity skeleton itself to be random, with posterior distributions obtained from Markov Chain Monte Carlo. On average, the FBCRM and MFBCRM methods outperformed the CRM and BMA-CRM methods in terms of selecting an optimal dose level across thousands of randomly generated simulation scenarios. Details on the methods and results of this simulation study are available on request, and the manuscript is currently under review.
This package provides a tool for computing probabilities and other quantities that are relevant in selecting performance criteria for discrete trial training. The main function, miebl()
, computes Bayesian and frequentist probabilities and bounds for each of n possible performance criterion choices when attempting to determine a student's true mastery level by counting their number of successful attempts at displaying learning among n trials. The reporting function miebl_re()
takes output from miebl()
and prepares it into a brief report for a specific criterion. miebl_cp()
combines 2 to 5 distributions of true mastery level given performance criterion in one plot for comparison. Ramos (2025) <doi:10.1007/s40617-025-01058-9>.
The classical two-sample t-test works well for the normally distributed data or data with large sample size. The tcfu()
and tt()
tests implemented in this package provide better type-I-error control with more accurate power when testing the equality of two-sample means for skewed populations having unequal variances. These tests are especially useful when the sample sizes are moderate. The tcfu()
uses the Cornish-Fisher expansion to achieve a better approximation to the true percentiles. The tt()
provides transformations of the Welch's t-statistic so that the sampling distribution become more symmetric. For more technical details, please refer to Zhang (2019) <http://hdl.handle.net/2097/40235>.
Consolidates and calculates different sets of time-series features from multiple R and Python packages including Rcatch22 Henderson, T. (2021) <doi:10.5281/zenodo.5546815>, feasts O'Hara-Wild, M., Hyndman, R., and Wang, E. (2021) <https://CRAN.R-project.org/package=feasts>, tsfeatures Hyndman, R., Kang, Y., Montero-Manso, P., Talagala, T., Wang, E., Yang, Y., and O'Hara-Wild, M. (2020) <https://CRAN.R-project.org/package=tsfeatures>, tsfresh Christ, M., Braun, N., Neuffer, J., and Kempa-Liehr A.W. (2018) <doi:10.1016/j.neucom.2018.03.067>, TSFEL Barandas, M., et al. (2020) <doi:10.1016/j.softx.2020.100456>, and Kats Facebook Infrastructure Data Science (2021) <https://facebookresearch.github.io/Kats/>.
This package provides functions for estimating models using a Hierarchical Bayesian (HB) framework. The flexibility comes in allowing the user to specify the likelihood function directly instead of assuming predetermined model structures. Types of models that can be estimated with this code include the family of discrete choice models (Multinomial Logit, Mixed Logit, Nested Logit, Error Components Logit and Latent Class) as well ordered response models like ordered probit and ordered logit. In addition, the package allows for flexibility in specifying parameters as either fixed (non-varying across individuals) or random with continuous distributions. Parameter distributions supported include normal, positive/negative log-normal, positive/negative censored normal, and the Johnson SB distribution. Kenneth Train's Matlab and Gauss code for doing Hierarchical Bayesian estimation has served as the basis for a few of the functions included in this package. These Matlab/Gauss functions have been rewritten to be optimized within R. Considerable code has been added to increase the flexibility and usability of the code base. Train's original Gauss and Matlab code can be found here: <http://elsa.berkeley.edu/Software/abstracts/train1006mxlhb.html> See Train's chapter on HB in Discrete Choice with Simulation here: <http://elsa.berkeley.edu/books/choice2.html>; and his paper on using HB with non-normal distributions here: <http://eml.berkeley.edu//~train/trainsonnier.pdf>. The authors would also like to thank the invaluable contributions of Stephane Hess and the Choice Modelling Centre: <https://cmc.leeds.ac.uk/>.
This package provides various basis expansions for flexible regression modeling, including random Fourier features (Rahimi & Recht, 2007) <https://proceedings.neurips.cc/paper_files/paper/2007/file/013a006f03dbc5392effeb8f18fda755-Paper.pdf>, exact kernel / Gaussian process feature maps, Bayesian Additive Regression Trees (BART) (Chipman et al., 2010) <doi:10.1214/09-AOAS285> prior features, and a helpful interface for n-way interactions. The provided functions may be used within any modeling formula, allowing the use of kernel methods and other basis expansions in modeling functions that do not otherwise support them. Along with the basis expansions, a number of kernel functions are also provided, which support kernel arithmetic to form new kernels. Basic ridge regression functionality is included as well.
The developed function is a comprehensive tool for the analysis of India Meteorological Department (IMD) NetCDF
rainfall data. Specifically designed to process high-resolution daily gridded rainfall datasets. It provides four key functions to process IMD NetCDF
rainfall data and create rasters for various temporal scales, including annual, seasonal, monthly, and weekly rainfall. For method details see, Malik, A. (2019).<DOI:10.1007/s12517-019-4454-5>. It supports different aggregation methods, such as sum, min, max, mean, and standard deviation. These functions are designed for spatio-temporal analysis of rainfall patterns, trend analysis,geostatistical modeling of rainfall variability, identifying rainfall anomalies and extreme events and can be an input for hydrological and agricultural models.