Are you spending too much time fetching and managing clinical trial data? Struggling with complex queries and bulk data extraction? What if you could simplify this process with just a few lines of code? Introducing clintrialx - Fetch clinical trial data from sources like ClinicalTrials.gov <https://clinicaltrials.gov/> and the Clinical Trials Transformation Initiative - Access to Aggregate Content of ClinicalTrials.gov database <https://aact.ctti-clinicaltrials.org/>, supporting pagination and bulk downloads. Also, you can generate HTML reports based on the data obtained from the sources!
This package provides functions that compute probabilistic excursion sets, contour credibility regions, contour avoiding regions, and simultaneous confidence bands for latent Gaussian random processes and fields. The package also contains functions that calculate these quantities for models estimated with the INLA package. The main references for excursions are Bolin and Lindgren (2015) <doi:10.1111/rssb.12055>, Bolin and Lindgren (2017) <doi:10.1080/10618600.2016.1228537>, and Bolin and Lindgren (2018) <doi:10.18637/jss.v086.i05>. These can be generated by the citation function in R.
This package provides a color palette generator inspired by Mexican politics, with colors ranging from red on the left to gray in the middle and green on the right. Palette options range from only a few colors to several colors, but with discrete and continuous options to offer greatest flexibility to the user. This package allows for a range of applications, from mapping brief discrete scales (e.g., four colors for Morena, PRI, and PAN) to continuous interpolated arrays including dozens of shades graded from red to green.
The Pearson-ICA algorithm is a mutual information-based method for blind separation of statistically independent source signals. It has been shown that the minimization of mutual information leads to iterative use of score functions, i.e. derivatives of log densities. The Pearson system allows adaptive modeling of score functions. The flexibility of the Pearson system makes it possible to model a wide range of source distributions including asymmetric distributions. The algorithm is designed especially for problems with asymmetric sources but it works for symmetric sources as well.
Universal and robust algorithm for solving the total alkalinity-pH equation presented in G. Munhoven (2013) <doi:10.5194/gmd-6-1367-2013> and G. Munhoven (2021) <doi:10.5194/gmd-2020-447>. The total alkalinity-pH equation relates total alkalinity and pH for a given set of acid-base concentrations in a given water sample, among which carbonic acid. This package is particularly useful in marine chemistry involving dissolved inorganic carbon. Original package in Fortran can be found at <doi:10.5281/zenodo.4328965>.
Visual contour and 2D point and contour plots for binary classification modeling under algorithms such as glm', rf', gbm', nnet and svm', presented over two dimensions generated by famd and mca methods. Package FactoMineR for multivariate reduction functions and package MBA for interpolation functions are used. The package can be used to visualize the discriminant power of input variables and algorithmic modeling, explore outliers, compare algorithm behaviour, etc. It has been created initially for teaching purposes, but it has also many practical uses under the XAI paradigm.
This package provides a collection of functions for left-censored missing data imputation. Left-censoring is a special case of missing not at random (MNAR) mechanism that generates non-responses in proteomics experiments. The package also contains functions to artificially generate peptide/protein expression data (log-transformed) as random draws from a multivariate Gaussian distribution as well as a function to generate missing data (both randomly and non-randomly). For comparison reasons, the package also contains several wrapper functions for the imputation of non-responses that are missing at random.
This is a package for converting natural language text into tokens. It includes tokenizers for shingled n-grams, skip n-grams, words, word stems, sentences, paragraphs, characters, shingled characters, lines, tweets, Penn Treebank, regular expressions, as well as functions for counting characters, words, and sentences, and a function for splitting longer texts into separate documents, each with the same number of words. The tokenizers have a consistent interface, and the package is built on the stringi and Rcpp packages for fast yet correct tokenization in UTF-8 encoding.
This package provides simple, flexible assertions on data.frame or data.table objects with verbose output for vetting. While other assertion packages apply towards more general use-cases, assertable is tailored towards tabular data. It includes functions to check variable names and values, whether the dataset contains all combinations of a given set of unique identifiers, and whether it is a certain length. In addition, assertable includes utility functions to check the existence of target files and to efficiently import multiple tabular data files into one data.table.
The function missForest in this package is used to impute missing values, particularly in the case of mixed-type data. It uses a random forest trained on the observed values of a data matrix to predict the missing values. It can be used to impute continuous and/or categorical data, including complex interactions and non-linear relations. It yields an OOB imputation error estimate without the need of a test set or elaborate cross- validation. It can be run in parallel to save computation time.
Guile-Reader is a simple framework for building readers for GNU Guile.
The idea is to make it easy to build procedures that extend Guile’s read procedure. Readers supporting various syntax variants can easily be written, possibly by re-using existing “token readers” of a standard Scheme readers. For example, it is used to implement Skribilo’s R5RS-derived document syntax.
Guile-Reader’s approach is similar to Common Lisp’s “read table”, but hopefully more powerful and flexible (for instance, one may instantiate as many readers as needed).
An interface to Azure Data Explorer', also known as Kusto', a fast, distributed data exploration service from Microsoft: <https://azure.microsoft.com/en-us/products/data-explorer/>. Includes DBI and dplyr interfaces, with the latter modelled after the dbplyr package, whereby queries are translated from R into the native KQL query language and executed lazily. On the admin side, the package extends the object framework provided by AzureRMR to support creation and deletion of databases, and management of database principals. Part of the AzureR family of packages.
This package provides a simple interface to the Microsoft Graph API <https://learn.microsoft.com/en-us/graph/overview>. Graph is a comprehensive framework for accessing data in various online Microsoft services. This package was originally intended to provide an R interface only to the Azure Active Directory part, with a view to supporting interoperability of R and Azure': users, groups, registered apps and service principals. However it has since been expanded into a more general tool for interacting with Graph. Part of the AzureR family of packages.
Usually, it is difficult to plot choropleth maps for Bangladesh in R'. The bangladesh package provides ready-to-use shapefiles for different administrative regions of Bangladesh (e.g., Division, District, Upazila, and Union). This package helps users to draw thematic maps of administrative regions of Bangladesh easily as it comes with the sf objects for the boundaries. It also provides functions allowing users to efficiently get specific area maps and center coordinates for regions. Users can also search for a specific area and calculate the centroids of those areas.
Shed light on black box machine learning models by the help of model performance, variable importance, global surrogate models, ICE profiles, partial dependence (Friedman J. H. (2001) <doi:10.1214/aos/1013203451>), accumulated local effects (Apley D. W. (2016) <doi:10.48550/arXiv.1612.08468>), further effects plots, interaction strength, and variable contribution breakdown (Gosiewska and Biecek (2019) <doi:10.48550/arXiv.1903.11420>). All tools are implemented to work with case weights and allow for stratified analysis. Furthermore, multiple flashlights can be combined and analyzed together.
Publication-ready regional gene locus plots similar to those produced by the web interface LocusZoom <https://my.locuszoom.org>, but running locally in R. Genetic or genomic data with gene annotation tracks are plotted via R base graphics, ggplot2 or plotly', allowing flexibility and easy customisation including laying out multiple locus plots on the same page. It uses the LDlink API <https://ldlink.nih.gov/?tab=apiaccess> to query linkage disequilibrium data from the 1000 Genomes Project and can overlay this on plots <doi:10.1093/bioadv/vbaf006>.
This package provides a computationally efficient solution for generating optimal experimental designs in Accelerated Life Testing (ALT). Leveraging a Particle Swarm Optimization (PSO)-based hybrid algorithm, the package identifies optimal test plans that minimize estimation variance under specified failure models and stress profiles. For more detailed, see Lee et al. (2025), Optimal Robust Strategies for Accelerated Life Tests and Fatigue Testing of Polymer Composite Materials <doi:10.1214/25-AOAS2075>, and Hoang (2025), Model-Robust Minimax Design of Accelerated Life Tests via PSO-based Hybrid Algorithm, Master Thesis, Unpublished.
Maximum likelihood estimates are obtained via an EM algorithm with either a first-order or a fully exponential Laplace approximation as documented by Broatch and Karl (2018) <doi:10.48550/arXiv.1710.05284>, Karl, Yang, and Lohr (2014) <doi:10.1016/j.csda.2013.11.019>, and by Karl (2012) <doi:10.1515/1559-0410.1471>. Karl and Zimmerman <doi:10.1016/j.jspi.2020.06.004> use this package to illustrate how the home field effect estimator from a mixed model can be biased under nonrandom scheduling.
This package provides access to coded election programmes from the Manifesto Corpus and to the Manifesto Project's Main Dataset and routines to analyse this data. The Manifesto Project <https://manifesto-project.wzb.eu> collects and analyses election programmes across time and space to measure the political preferences of parties. The Manifesto Corpus contains the collected and annotated election programmes in the Corpus format of the package tm to enable easy use of text processing and text mining functionality. Specific functions for scaling of coded political texts are included.
This package provides a critical first step in systematic literature reviews and mining of academic texts is to identify relevant texts from a range of sources, particularly databases such as Web of Science or Scopus'. These databases often export in different formats or with different metadata tags. synthesisr expands on the tools outlined by Westgate (2019) <doi:10.1002/jrsm.1374> to import bibliographic data from a range of formats (such as bibtex', ris', or ciw') in a standard way, and allows merging and deduplication of the resulting dataset.
Analyzes shooting data with respect to group shape, precision, and accuracy. This includes graphical methods, descriptive statistics, and inference tests using standard, but also non-parametric and robust statistical methods. Implements distributions for radial error in bivariate normal variables. Works with files exported by OnTarget PC/TDS', Silver Mountain e-target, ShotMarker e-target, SIUS e-target, or Taran', as well as with custom data files in text format. Supports inference from range statistics such as extreme spread. Includes a set of web-based graphical user interfaces.
This is a package for parsing Affymetrix files (CDF, CEL, CHP, BPMAP, BAR). It provides methods for fast and memory efficient parsing of Affymetrix files using the Affymetrix' Fusion SDK. Both ASCII- and binary-based files are supported. Currently, there are methods for reading chip definition file (CDF) and a cell intensity file (CEL). These files can be read either in full or in part. For example, probe signals from a few probesets can be extracted very quickly from a set of CEL files into a convenient list structure.
This package provides a set of tools for the statistical analysis of data using:
normal linear models;
generalized linear models;
negative binomial regression models as alternative to the Poisson regression models under the presence of overdispersion;
beta-binomial and random-clumped binomial regression models as alternative to the binomial regression models under the presence of overdispersion;
zero-inflated and zero-altered regression models to deal with zero-excess in count data;
generalized nonlinear models;
generalized estimating equations for cluster correlated data.
This package provides an interface for working with large matrices stored in files, not in computer memory. It supports multiple non-character data types (double, integer, logical and raw) of various sizes (e.g. 8 and 4 byte real values). Access to parts of the matrix is done by indexing, exactly as with usual R matrices. It supports very large matrices; the package has been tested on multi-terabyte matrices. It allows for more than 2^32 rows or columns, ad allows for quick addition of extra columns to a filematrix.