Differential exon usage test for RNA-Seq data via an empirical Bayes shrinkage method for the dispersion parameter the utilizes inclusion-exclusion data to analyze the propensity to skip an exon across groups. The input data consists of two matrices where each row represents an exon and the columns represent the biological samples. The first matrix is the count of the number of reads expressing the exon for each sample. The second matrix is the count of the number of reads that either express the exon or explicitly skip the exon across the samples, a.k.a. the total count matrix. Dividing the two matrices yields proportions representing the propensity to express the exon versus skipping the exon for each sample.
Implementations of several multiple testing procedures that control the family-wise error rate (FWER) designed specifically for discrete tests. Included are discrete adaptations of the Bonferroni, Holm, Hochberg and Šidák procedures as described in the papers Döhler (2010) "Validation of credit default probabilities using multiple-testing procedures" <doi:10.21314/JRMV.2010.062> and Zhu & Guo (2019) "Family-Wise Error Rate Controlling Procedures for Discrete Data" <doi:10.1080/19466315.2019.1654912>. The main procedures of this package take as input the results of a test procedure from package DiscreteTests or a set of observed p-values and their discrete support under their nulls. A shortcut function to apply discrete procedures directly to data is also provided.
Presents two methods to estimate the parameters mu', sigma', and tau of an ex-Gaussian distribution. Those methods are Quantile Maximization Likelihood Estimation ('QMLE') and Bayesian. The QMLE method allows a choice between three different estimation algorithms for these parameters : neldermead ('NEMD'), fminsearch ('FMIN'), and nlminb ('NLMI'). For more details about the methods you can refer at the following list: Brown, S., & Heathcote, A. (2003) <doi:10.3758/BF03195527>; McCormack, P. D., & Wright, N. M. (1964) <doi:10.1037/h0083285>; Van Zandt, T. (2000) <doi:10.3758/BF03214357>; El Haj, A., Slaoui, Y., Solier, C., & Perret, C. (2021) <doi:10.19139/soic-2310-5070-1251>; Gilks, W. R., Best, N. G., & Tan, K. K. C. (1995) <doi:10.2307/2986138>.
This package provides a fast C++ implementation of the design-based, Diffusion Decision Model (DDM) and the Linear Ballistic Accumulation (LBA) model. It enables the user to optimise the choice response time model by connecting with the Differential Evolution Markov Chain Monte Carlo (DE-MCMC) sampler implemented in the ggdmc package. The package fuses the hierarchical modelling, Bayesian inference, choice response time models and factorial designs, allowing users to build their own design-based models. For more information on the underlying models, see the works by Voss, Rothermund, and Voss (2004) <doi:10.3758/BF03196893>, Ratcliff and McKoon (2008) <doi:10.1162/neco.2008.12-06-420>, and Brown and Heathcote (2008) <doi:10.1016/j.cogpsych.2007.12.002>.
This package provides a simple to use, intuitive, and extensible interface to several stochastic simulation algorithms for generating simulated trajectories of finite population continuous-time model. Currently it implements Gillespie's exact stochastic simulation algorithm (Direct method) and several approximate methods (Explicit tau-leap, Binomial tau-leap, and Optimized tau-leap). The package also contains a library of template models that can be run as demo models and can easily be customized and extended. Currently the following models are included, Decaying-Dimerization reaction set, linear chain system, logistic growth model, Lotka predator-prey model, Rosenzweig-MacArthur predator-prey model, Kermack-McKendrick SIR model, and a metapopulation SIRS model. Pineda-Krch et al. (2008) <doi:10.18637/jss.v025.i12>.
An integrative toolbox of word embedding research that provides: (1) a collection of pre-trained static word vectors in the .RData compressed format <https://psychbruce.github.io/WordVector_RData.pdf>; (2) a group of functions to process, analyze, and visualize word vectors; (3) a range of tests to examine conceptual associations, including the Word Embedding Association Test <doi:10.1126/science.aal4230> and the Relative Norm Distance <doi:10.1073/pnas.1720347115>, with permutation test of significance; and (4) a set of training methods to locally train (static) word vectors from text corpora, including Word2Vec <doi:10.48550/arXiv.1301.3781>, GloVe <doi:10.3115/v1/D14-1162>, and FastText <doi:10.48550/arXiv.1607.04606>.
This tool fits a non-parametric Bayesian model called a "hierarchically coupled mixture model with local dependence (HCMM-LD)" to the original microdata in order to generate synthetic microdata for privacy protection. The non-parametric feature of the adopted model is useful for capturing the joint distribution of the original input data in a highly flexible manner, leading to the generation of synthetic data whose distributional features are similar to that of the input data. The package allows the original input data to have missing values and impute them with the posterior predictive distribution, so no missing values exist in the synthetic data output. The method builds on the work of Murray and Reiter (2016) <doi:10.1080/01621459.2016.1174132>.
This package provides a web-based shiny interface for the StepReg package enables stepwise regression analysis across linear, generalized linear (including logistic, Poisson, Gamma, and negative binomial), and Cox models. It supports forward, backward, bidirectional, and best-subset selection under a range of criteria. The package also supports stepwise regression to multivariate settings, allowing multiple dependent variables to be modeled simultaneously. Users can explore and combine multiple selection strategies and criteria to optimize model selection. For enhanced robustness, the package offers optional randomized forward selection to reduce overfitting, and a data-splitting workflow for more reliable post-selection inference. Additional features include logging and visualization of the selection process, as well as the ability to export results in common formats.
Facilitates the analysis of SNP (single nucleotide polymorphism) and silicodart (presence/absence) data. dartR.popgen provides a suit of functions to analyse such data in a population genetics context. It provides several functions to calculate population genetic metrics and to study population structure. Quite a few functions need additional software to be able to run (gl.run.structure(), gl.blast(), gl.LDNe()). You find detailed description in the help pages how to download and link the packages so the function can run the software. dartR.popgen is part of the the dartRverse suit of packages. Gruber et al. (2018) <doi:10.1111/1755-0998.12745>. Mijangos et al. (2022) <doi:10.1111/2041-210X.13918>.
This package provides a function that implements the acceptance-rejection method in an optimized manner to generate pseudo-random observations for discrete or continuous random variables. Proposed by von Neumann J. (1951), <https://mcnp.lanl.gov/pdf_files/>, the function is optimized to work in parallel on Unix-based operating systems and performs well on Windows systems. The acceptance-rejection method implemented optimizes the probability of generating observations from the desired random variable, by simply providing the probability function or probability density function, in the discrete and continuous cases, respectively. Implementation is based on references CASELLA, George at al. (2004) <https://www.jstor.org/stable/4356322>, NEAL, Radford M. (2003) <https://www.jstor.org/stable/3448413> and Bishop, Christopher M. (2006, ISBN: 978-0387310732).
This package provides functions to access data from public RESTful APIs including Nager.Date', World Bank API', and REST Countries API', retrieving real-time or historical data related to Indonesia, such as holidays, economic indicators, and international demographic and geopolitical indicators. The package also includes a curated collection of open datasets focused on Indonesia, covering topics such as consumer prices, poverty probability, food prices by region, tourism destinations, and minimum wage statistics. The package supports reproducible research and teaching by integrating reliable international APIs and structured datasets from public, academic, and government sources. For more information on the APIs, see: Nager.Date <https://date.nager.at/Api>, World Bank API <https://datahelpdesk.worldbank.org/knowledgebase/articles/889392>, and REST Countries API <https://restcountries.com/>.
This package provides a framework to help construct R data packages in a reproducible manner. Potentially time consuming processing of raw data sets into analysis ready data sets is done in a reproducible manner and decoupled from the usual R CMD build process so that data sets can be processed into R objects in the data package and the data package can then be shared, built, and installed by others without the need to repeat computationally costly data processing. The package maintains data provenance by turning the data processing scripts into package vignettes, as well as enforcing documentation and version checking of included data objects. Data packages can be version controlled on GitHub', and used to share data for manuscripts, collaboration and reproducible research.
Identification of putative causal variants in genome-wide association studies with trio and duo families. The package calculates the W feature statistics from KnockoffTrio and p-values from the family-based association test (FBAT) using trio and/or duo data. Compared to previous versions, a significant improvement has been made in Version 1.1.0 to allow the package to be applied not only to trio families but also to duo families. The package implements the methods in the paper: "Yang, Y., Wang, C., Liu, L., Buxbaum, J., He, Z., & Ionita-Laza, I. (2022). KnockoffTrio: A knockoff framework for the identification of putative causal variants in genome-wide association studies with trio design. The American Journal of Human Genetics, 109(10), 1761-1776.".
This package provides tools for calculating I-Scores, a simple way to measure how successful minor political parties are at influencing the major parties in their environment. I-Scores are designed to be a more comprehensive measurement of minor party success than vote share and legislative seats won, the current standard measurements, which do not reflect the strategies that most minor parties employ. The procedure leverages the Manifesto Project's NLP model to identify the issue areas that sentences discuss, see Burst et al. (2024) <doi:10.25522/manifesto.manifestoberta.56topics.context.2024.1.1>, and the Wordfish algorithm to estimate the relative positions that platforms take on those issue areas, see Slapin and Proksch (2008) <doi:10.1111/j.1540-5907.2008.00338.x>.
Two-stage design for single-arm phase II trials with time-to-event endpoints (e.g., clinical trials on immunotherapies among cancer patients) can be calculated using this package. Two notable advantages of the package: 1) It provides flexible choices from three design methods (optimal, minmax, and admissible), and 2) the power of the design is more accurately calculated using the exact variance in the one-sample log-rank test. The package can be used for 1) planning the sample sizes and other design parameters, and 2) conducting the interim and final analyses for the Go/No-go decisions. More details about the design method can be found in: Wu, J, Chen L, Wei J, Weiss H, Chauhan A. (2020). <doi:10.1002/pst.1983>.
Develop, evaluate, and score multiple choice examinations, psychological scales, questionnaires, and similar types of data involving sequences of choices among one or more sets of answers. This version of the package should be considered as brand new. Almost all of the functions have been changed, including their argument list. See the file NEWS.Rd in the Inst folder for more information. Using the package does not require any formal statistical knowledge beyond what would be provided by a first course in statistics in a social science department. There the user would encounter the concept of probability and how it is used to model data and make decisions, and would become familiar with basic mathematical and statistical notation. Most of the output is in graphical form.
motifcounter provides motif matching, motif counting and motif enrichment functionality based on position frequency matrices. The main features of the packages include the utilization of higher-order background models and accounting for self-overlapping motif matches when determining motif enrichment. The background model allows to capture dinucleotide (or higher-order nucleotide) composition adequately which may reduced model biases and misleading results compared to using simple GC background models. When conducting a motif enrichment analysis based on the motif match count, the package relies on a compound Poisson distribution or alternatively a combinatorial model. These distribution account for self-overlapping motif structures as exemplified by repeat-like or palindromic motifs, and allow to determine the p-value and fold-enrichment for a set of observed motif matches.
Estimation of bifurcating autoregressive models of any order, p, BAR(p) as well as several types of bias correction for the least squares estimators of the autoregressive parameters as described in Zhou and Basawa (2005) <doi:10.1016/j.spl.2005.04.024> and Elbayoumi and Mostafa (2020) <doi:10.1002/sta4.342>. Currently, the bias correction methods supported include bootstrap (single, double and fast-double) bias correction and linear-bias-function-based bias correction. Functions for generating and plotting bifurcating autoregressive data from any BAR(p) model are also included. This new version includes calculating several type of bias-corrected and -uncorrected confidence intervals for the least squares estimators of the autoregressive parameters as described in Elbayoumi and Mostafa (2023) <doi:10.6339/23-JDS1092>.
This package provides a comprehensive set of wrapper functions for the analysis of multiplex metabarcode data. It includes robust wrappers for Cutadapt and DADA2 to trim primers, filter reads, perform amplicon sequence variant (ASV) inference, and assign taxonomy. The package can handle single metabarcode datasets, datasets with two pooled metabarcodes, or multiple datasets simultaneously. The final output is a matrix per metabarcode, containing both ASV abundance data and associated taxonomic assignments. An optional function converts these matrices into phyloseq and taxmap objects. For more information on DADA2', including information on how DADA2 infers samples sequences, see Callahan et al. (2016) <doi:10.1038/nmeth.3869>. For more details on the demulticoder R package see Sudermann et al. (2025) <doi:10.1094/PHYTO-02-25-0043-FI>.
This package provides a simulator for reticulate evolution under a birth-death-hybridization process. Here the birth-death process is extended to consider reticulate Evolution by allowing hybridization events to occur. The general purpose simulator allows the modeling of three different reticulate patterns: lineage generative hybridization, lineage neutral hybridization, and lineage degenerative hybridization. Users can also specify hybridization events to be dependent on a trait value or genetic distance. We also extend some phylogenetic tree utility and plotting functions for networks. We allow two different stopping conditions: simulated to a fixed time or number of taxa. When simulating to a fixed number of taxa, the user can simulate under the Generalized Sampling Approach that properly simulates phylogenies when assuming a uniform prior on the root age.
Fast, flexible and user-friendly tools for distribution comparison through direct density ratio estimation. The estimated density ratio can be used for covariate shift adjustment, outlier-detection, change-point detection, classification and evaluation of synthetic data quality. The package implements multiple non-parametric estimation techniques (unconstrained least-squares importance fitting, ulsif(), Kullback-Leibler importance estimation procedure, kliep(), spectral density ratio estimation, spectral(), kernel mean matching, kmm(), and least-squares hetero-distributional subspace search, lhss()). with automatic tuning of hyperparameters. Helper functions are available for two-sample testing and visualizing the density ratios. For an overview on density ratio estimation, see Sugiyama et al. (2012) <doi:10.1017/CBO9781139035613> for a general overview, and the help files for references on the specific estimation techniques.
This package provides R with the Glottolog database <https://glottolog.org/> and some more abilities for purposes of linguistic mapping. The Glottolog database contains the catalogue of languages of the world. This package helps researchers to make a linguistic maps, using philosophy of the Cross-Linguistic Linked Data project <https://clld.org/>, which allows for while at the same time facilitating uniform access to the data across publications. A tutorial for this package is available on GitHub pages <https://docs.ropensci.org/lingtypology/> and package vignette. Maps created by this package can be used both for the investigation and linguistic teaching. In addition, package provides an ability to download data from typological databases such as WALS, AUTOTYP and some others and to create your own database website.
We propose a consistent monitoring procedure to detect a structural change from a cointegrating relationship to a spurious relationship. The procedure is based on residuals from modified least squares estimation, using either Fully Modified, Dynamic or Integrated Modified OLS. It is inspired by Chu et al. (1996) <DOI:10.2307/2171955> in that it is based on parameter estimation on a pre-break "calibration" period only, rather than being based on sequential estimation over the full sample. See the discussion paper <DOI:10.2139/ssrn.2624657> for further information. This package provides the monitoring procedures for both the cointegration and the stationarity case (while the latter is just a special case of the former one) as well as printing and plotting methods for a clear presentation of the results.
An implementation of ggplot2'-methods to present the composition of Solvency II Solvency Capital Requirement (SCR) as a series of concentric circle-parts. Solvency II (Solvency 2) is European insurance legislation, coming in force by the delegated acts of October 10, 2014. <https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=OJ%3AL%3A2015%3A012%3ATOC>. Additional files, defining the structure of the Standard Formula (SF) method of the SCR-calculation are provided. The structure files can be adopted for localization or for insurance companies who use Internal Models (IM). Options are available for combining smaller components, horizontal and vertical scaling, rotation, and plotting only some circle-parts. With outlines and connectors several SCR-compositions can be compared, for example in ORSA-scenarios (Own Risk and Solvency Assessment).