Likelihood-based approaches to estimate linear regression parameters and treatment effects in the presence of endogeneity. Specifically, this package includes James Heckman's classical simultaneous equation models-the sample selection model for outcome selection bias and hybrid model with structural shift for endogenous treatment. For more information, see the seminal paper of Heckman (1978) <DOI:10.3386/w0177> in which the details of these models are provided. This package accommodates repeated measures on subjects with a working independence approach. The hybrid model further accommodates treatment effect modification.
This package implements a method of iteratively collapsing the rows of a contingency table, two at a time, by selecting the pair of categories whose combination yields a new table with the smallest loss of chi-squared, as described by Greenacre, M.J. (1988) <doi:10.1007/BF01901670>. The result is compatible with the class of object returned by the stats package's hclust() function and can be used similarly (plotted as a dendrogram, cut, etc.). Additional functions are provided for automatic cutting and diagnostic plotting.
Matrix eQTL is designed for fast eQTL analysis on large datasets. Matrix eQTL can test for association between genotype and gene expression using linear regression with either additive or ANOVA genotype effects. The models can include covariates to account for factors as population stratification, gender, and clinical variables. It also supports models with heteroscedastic and/or correlated errors, false discovery rate estimation and separate treatment of local (cis) and distant (trans) eQTLs. For more details see Shabalin (2012) <doi:10.1093/bioinformatics/bts163>.
Several methods have been developed to integrate structural equation modeling techniques with network data analysis to examine the relationship between network and non-network data. Both node-based and edge-based information can be extracted from the network data to be used as observed variables in structural equation modeling. To facilitate the application of these methods, model specification can be performed in the familiar syntax of the lavaan package, ensuring ease of use for researchers. Technical details and examples can be found at <https://bigsem.psychstat.org>.
When people make decisions, they may do so using a wide variety of decision rules. The package allows users to easily create obfuscation games to test the obfuscation hypothesis. It provides an easy to use interface and multiple options designed to vary the difficulty of the game and tailor it to the user's needs. For more detail: Chorus et al., 2021, Obfuscation maximization-based decision-making: Theory, methodology and first empirical evidence, Mathematical Social Sciences, 109, 28-44, <doi:10.1016/j.mathsocsci.2020.10.002>.
This package implements conjugate power priors for efficient Bayesian analysis of normal data. Power priors allow principled incorporation of historical information while controlling the degree of borrowing through a discounting parameter (Ibrahim and Chen (2000) <doi:10.1214/ss/1009212519>). This package provides closed-form conjugate representations for both univariate and multivariate normal data using Normal-Inverse-Chi-squared and Normal-Inverse-Wishart distributions, eliminating the need for MCMC sampling. The conjugate framework builds upon standard Bayesian methods described in Gelman et al. (2013, ISBN:978-1439840955).
This package provides functions to calculate exact critical values, statistical power, expected time to signal, and required sample sizes for performing exact sequential analysis. All these calculations can be done for either Poisson or binomial data, for continuous or group sequential analyses, and for different types of rejection boundaries. In case of group sequential analyses, the group sizes do not have to be specified in advance and the alpha spending can be arbitrarily settled. For regression versions of the methods, Monte Carlo and asymptotic methods are used.
Representation-dependent gene-level operations for genetic and evolutionary algorithms with real-coded genes are collected in this package. The common feature of the gene operations is that all of them are useful for derivation-free optimization algorithms. At the moment the package implements initialization, mutation, crossover, and replication operations for differential evolution as described in Price, Kenneth V., Storn, Rainer M. and Lampinen, Jouni A. (2005) <doi:10.1007/3-540-31306-0>. In addition, several (more recent) methods for determining the scale factor are provided.
CAGE is a widely used high throughput assay for measuring transcription start site (TSS) activity. CAGEfightR is an R/Bioconductor package for performing a wide range of common data analysis tasks for CAGE and 5'-end data in general. Core functionality includes: import of CAGE TSSs (CTSSs), tag (or unidirectional) clustering for TSS identification, bidirectional clustering for enhancer identification, annotation with transcript and gene models, correlation of TSS and enhancer expression, calculation of TSS shapes, quantification of CAGE expression as expression matrices and genome brower visualization.
Are you spending too much time fetching and managing clinical trial data? Struggling with complex queries and bulk data extraction? What if you could simplify this process with just a few lines of code? Introducing clintrialx - Fetch clinical trial data from sources like ClinicalTrials.gov <https://clinicaltrials.gov/> and the Clinical Trials Transformation Initiative - Access to Aggregate Content of ClinicalTrials.gov database <https://aact.ctti-clinicaltrials.org/>, supporting pagination and bulk downloads. Also, you can generate HTML reports based on the data obtained from the sources!
This package provides functions that compute probabilistic excursion sets, contour credibility regions, contour avoiding regions, and simultaneous confidence bands for latent Gaussian random processes and fields. The package also contains functions that calculate these quantities for models estimated with the INLA package. The main references for excursions are Bolin and Lindgren (2015) <doi:10.1111/rssb.12055>, Bolin and Lindgren (2017) <doi:10.1080/10618600.2016.1228537>, and Bolin and Lindgren (2018) <doi:10.18637/jss.v086.i05>. These can be generated by the citation function in R.
This package provides a color palette generator inspired by Mexican politics, with colors ranging from red on the left to gray in the middle and green on the right. Palette options range from only a few colors to several colors, but with discrete and continuous options to offer greatest flexibility to the user. This package allows for a range of applications, from mapping brief discrete scales (e.g., four colors for Morena, PRI, and PAN) to continuous interpolated arrays including dozens of shades graded from red to green.
The Pearson-ICA algorithm is a mutual information-based method for blind separation of statistically independent source signals. It has been shown that the minimization of mutual information leads to iterative use of score functions, i.e. derivatives of log densities. The Pearson system allows adaptive modeling of score functions. The flexibility of the Pearson system makes it possible to model a wide range of source distributions including asymmetric distributions. The algorithm is designed especially for problems with asymmetric sources but it works for symmetric sources as well.
Universal and robust algorithm for solving the total alkalinity-pH equation presented in G. Munhoven (2013) <doi:10.5194/gmd-6-1367-2013> and G. Munhoven (2021) <doi:10.5194/gmd-2020-447>. The total alkalinity-pH equation relates total alkalinity and pH for a given set of acid-base concentrations in a given water sample, among which carbonic acid. This package is particularly useful in marine chemistry involving dissolved inorganic carbon. Original package in Fortran can be found at <doi:10.5281/zenodo.4328965>.
Visual contour and 2D point and contour plots for binary classification modeling under algorithms such as glm', rf', gbm', nnet and svm', presented over two dimensions generated by famd and mca methods. Package FactoMineR for multivariate reduction functions and package MBA for interpolation functions are used. The package can be used to visualize the discriminant power of input variables and algorithmic modeling, explore outliers, compare algorithm behaviour, etc. It has been created initially for teaching purposes, but it has also many practical uses under the XAI paradigm.
This package provides a collection of functions for left-censored missing data imputation. Left-censoring is a special case of missing not at random (MNAR) mechanism that generates non-responses in proteomics experiments. The package also contains functions to artificially generate peptide/protein expression data (log-transformed) as random draws from a multivariate Gaussian distribution as well as a function to generate missing data (both randomly and non-randomly). For comparison reasons, the package also contains several wrapper functions for the imputation of non-responses that are missing at random.
This is a package for converting natural language text into tokens. It includes tokenizers for shingled n-grams, skip n-grams, words, word stems, sentences, paragraphs, characters, shingled characters, lines, tweets, Penn Treebank, regular expressions, as well as functions for counting characters, words, and sentences, and a function for splitting longer texts into separate documents, each with the same number of words. The tokenizers have a consistent interface, and the package is built on the stringi and Rcpp packages for fast yet correct tokenization in UTF-8 encoding.
This package provides simple, flexible assertions on data.frame or data.table objects with verbose output for vetting. While other assertion packages apply towards more general use-cases, assertable is tailored towards tabular data. It includes functions to check variable names and values, whether the dataset contains all combinations of a given set of unique identifiers, and whether it is a certain length. In addition, assertable includes utility functions to check the existence of target files and to efficiently import multiple tabular data files into one data.table.
The function missForest in this package is used to impute missing values, particularly in the case of mixed-type data. It uses a random forest trained on the observed values of a data matrix to predict the missing values. It can be used to impute continuous and/or categorical data, including complex interactions and non-linear relations. It yields an OOB imputation error estimate without the need of a test set or elaborate cross- validation. It can be run in parallel to save computation time.
Guile-Reader is a simple framework for building readers for GNU Guile.
The idea is to make it easy to build procedures that extend Guile’s read procedure. Readers supporting various syntax variants can easily be written, possibly by re-using existing “token readers” of a standard Scheme readers. For example, it is used to implement Skribilo’s R5RS-derived document syntax.
Guile-Reader’s approach is similar to Common Lisp’s “read table”, but hopefully more powerful and flexible (for instance, one may instantiate as many readers as needed).
An interface to Azure Data Explorer', also known as Kusto', a fast, distributed data exploration service from Microsoft: <https://azure.microsoft.com/en-us/products/data-explorer/>. Includes DBI and dplyr interfaces, with the latter modelled after the dbplyr package, whereby queries are translated from R into the native KQL query language and executed lazily. On the admin side, the package extends the object framework provided by AzureRMR to support creation and deletion of databases, and management of database principals. Part of the AzureR family of packages.
This package provides a simple interface to the Microsoft Graph API <https://learn.microsoft.com/en-us/graph/overview>. Graph is a comprehensive framework for accessing data in various online Microsoft services. This package was originally intended to provide an R interface only to the Azure Active Directory part, with a view to supporting interoperability of R and Azure': users, groups, registered apps and service principals. However it has since been expanded into a more general tool for interacting with Graph. Part of the AzureR family of packages.
Usually, it is difficult to plot choropleth maps for Bangladesh in R'. The bangladesh package provides ready-to-use shapefiles for different administrative regions of Bangladesh (e.g., Division, District, Upazila, and Union). This package helps users to draw thematic maps of administrative regions of Bangladesh easily as it comes with the sf objects for the boundaries. It also provides functions allowing users to efficiently get specific area maps and center coordinates for regions. Users can also search for a specific area and calculate the centroids of those areas.
This package provides methods for probabilistic reconciliation of hierarchical forecasts of time series. The available methods include analytical Gaussian reconciliation (Corani et al., 2021) <doi:10.1007/978-3-030-67664-3_13>, MCMC reconciliation of count time series (Corani et al., 2024) <doi:10.1016/j.ijforecast.2023.04.003>, Bottom-Up Importance Sampling (Zambon et al., 2024) <doi:10.1007/s11222-023-10343-y>, methods for the reconciliation of mixed hierarchies (Mix-Cond and TD-cond) (Zambon et al., 2024) <https://proceedings.mlr.press/v244/zambon24a.html>.