Circus is an R package for annotation, analysis and visualization of circRNA data. Users can annotate their circRNA candidates with host genes, gene features they are spliced from, and discriminate between known and yet unknown splice junctions. Circular-to-linear ratios of circRNAs can be calculated, and a number of descriptive plots easily generated.
This package provides routines for the statistical analysis of landmark shapes, including Procrustes analysis, graphical displays, principal components analysis, permutation and bootstrap tests, thin-plate spline transformation grids and comparing covariance matrices. See Dryden, I.L. and Mardia, K.V. (2016). Statistical shape analysis, with Applications in R (2nd Edition), John Wiley and Sons.
This package contains a collection of various functions to assist in R programming, such as tools to assist in developing, updating, and maintaining R and R packages, calculating the logit and inverse logit transformations, tests for whether a value is missing, empty or contains only NA and NULL values, and many more.
Transform coordinates from a specified source to a specified target map projection. This uses the PROJ library directly, by wrapping the PROJ package which leverages libproj', otherwise the proj4 package. The reproj() function is generic, methods may be added to remove the need for an explicit source definition. If proj4 is in use reproj() handles the requirement for conversion of angular units where necessary. This is for use primarily to transform generic data formats and direct leverage of the underlying PROJ library. (There are transformations that aren't possible with PROJ and that are provided by the GDAL library, a limitation which users of this package should be aware of.) The PROJ library is available at <https://proj.org/>.
This package provides a toolkit for analyzing classifier performance by using receiver operating characteristic (ROC) curves. Performance may be assessed on a single classifier or multiple ones simultaneously, making it suitable for comparisons. In addition, different metrics allow the evaluation of local performance when working within restricted ranges of sensitivity and specificity. For details on the different implementations, see McClish D. K. (1989) <doi:10.1177/0272989X8900900307>, Vivo J.-M., Franco M. and Vicari D. (2018) <doi:10.1007/S11634-017-0295-9>, Jiang Y., et al (1996) <doi:10.1148/radiology.201.3.8939225>, Franco M. and Vivo J.-M. (2021) <doi:10.3390/math9212826> and Carrington, André M., et al (2020) <doi: 10.1186/s12911-019-1014-6>.
Algorithms for the spatial stratification of landscapes, sampling and modeling of spatially-varying phenomena. These algorithms offer a simple framework for the stratification of geographic space based on raster layers representing landscape factors and/or factor scales. The stratification process follows a hierarchical approach, which is based on first level units (i.e., classification units) and second-level units (i.e., stratification units). Nonparametric techniques allow to measure the correspondence between the geographic space and the landscape configuration represented by the units. These correspondence metrics are useful to define sampling schemes and to model the spatial variability of environmental phenomena. The theoretical background of the algorithms and code examples are presented in Fuentes et al. (2022). <doi:10.32614/RJ-2022-036>.
Simulates, fits, and predicts long-memory and anti-persistent time series, possibly mixed with ARMA, regression, transfer-function components. Exact methods (MLE, forecasting, simulation) are used. Bug reports should be done via GitHub (at <https://github.com/JQVeenstra/arfima>), where the development version of this package lives; it can be installed using devtools.
This package provides functions for exploring and visualising estimation results obtained with BayesX, a free software for estimating structured additive regression models (<https://www.uni-goettingen.de/de/bayesx/550513.html>). In addition, functions that allow to read, write and manipulate map objects that are required in spatial analyses performed with BayesX.
Estimation, prediction, and simulation of nonstationary Gaussian process with modular covariate-based covariance functions. Sources of nonstationarity, such as spatial mean, variance, geometric anisotropy, smoothness, and nugget, can be considered based on spatial characteristics. An induced compact-supported nonstationary covariance function is provided, enabling fast and memory-efficient computations when handling densely sampled domains.
Many modern C/C++ development tools in the clang toolchain, such as clang-tidy or clangd', rely on the presence of a compilation database in JSON format <https://clang.llvm.org/docs/JSONCompilationDatabase.html>. This package temporarily injects additional build flags into the R build process to generate such a compilation database.
Draws stylized choropleth maps -- hexagonal maps and triangular multiclass hex maps -- for New Zealand District Health Boards and Regional Council areas. These allow faceted, coloured displays of quantitative information for comparison across District Health Boards or Regional Councils. The preprint Lumley (2019) <arXiv:1912.04435> is based on the methods in this package.
Populate data from an R environment into .doc and .docx templates. Create a template document in a program such as Word', and add strings encased in guillemet characters to create flags («example»). Use getDictionary() to create a dictionary of flags and replacement values, then call docket() to generate a populated document.
This package provides tools for simulating draws from continuous time processes with well-defined exponential family random graph (ERGM) equilibria, i.e. ERGM generating processes (EGPs). A number of EGPs are supported, including the families identified in Butts (2023) <doi:10.1080/0022250X.2023.2180001>, as are functions for hazard calculation and timing calibration.
The Explainable Ensemble Trees e2tree approach has been proposed by Aria et al. (2024) <doi:10.1007/s00180-022-01312-6>. It aims to explain and interpret decision tree ensemble models using a single tree-like structure. e2tree is a new way of explaining an ensemble tree trained through randomForest or xgboost packages.
Estimate gender from names in Spanish and Portuguese. Works with vectors and dataframes. The estimation works not only for first names but also full names. The package relies on a compilation of common names with it's most frequent associated gender in both languages which are used as look up tables for gender inference.
This package provides an interface to HDFql <https://www.hdfql.com/> and helper functions for reading data from and writing data to HDF5 files. HDFql provides a high-level language for managing HDF5 data that is platform independent. For more information, see the reference manual <https://www.hdfql.com/resources/HDFqlReferenceManual.pdf>.
Calculates intraclass correlation coefficient (ICC) for assessing reproducibility of interval-censored data with two repeated measurements (Kovacic and Varnai (2014) <doi:10.1097/EDE.0000000000000139>). ICC is estimated by maximum likelihood from model with one fixed and one random effect (both intercepts). Help in model checking (normality of subjects means and residuals) is provided.
This package contains functions for fitting a joinpoint proportional hazards model to relative survival or cause-specific survival data, including estimates of joinpoint years at which survival trends have changed and trend measures in the hazard and cumulative survival scale. See Yu et al.(2009) <doi:10.1111/j.1467-985X.2009.00580.x>.
This package provides the tables from the Sean Lahman Baseball Database as a set of R data.frames. It uses the data on pitching, hitting and fielding performance and other tables from 1871 through 2024, as recorded in the 2025 version of the database. Documentation examples show how many baseball questions can be investigated.
Simulation and estimation of univariate and multivariate log-GARCH models. The main functions of the package are: lgarchSim(), mlgarchSim(), lgarch() and mlgarch(). The first two functions simulate from a univariate and a multivariate log-GARCH model, respectively, whereas the latter two estimate a univariate and multivariate log-GARCH model, respectively.
Computes efficient data distributions from highly inconsistent datasets with many missing values using multi-set intersections. Based upon hash functions, mulset can quickly identify intersections from very large matrices of input vectors across columns and rows and thus provides scalable solution for dealing with missing values. Tomic et al. (2019) <doi:10.1101/545186>.
Supports visual interpretation of hierarchical composite endpoints (HCEs). HCEs are complex constructs used as primary endpoints in clinical trials, combining outcomes of different types into ordinal endpoints, in which each patient contributes the most clinically important event (one and only one) to the analysis. See Karpefors M et al. (2022) <doi:10.1177/17407745221134949>.
This package provides tools for analyzing metabolic pathway completeness, abundance, and transcripts using KEGG Orthology (KO) data from (meta)genomic and (meta)transcriptomic studies. Supports both completeness (presence/absence) and abundance-weighted analyses. Includes built-in KEGG reference datasets. For more details see Li et al. (2023) <doi:10.1038/s41467-023-42193-7>.
Procedures to fit species distributions models from occurrence records and environmental variables, using glmnet for model fitting. Model structure is the same as for the Maxent Java package, version 3.4.0, with the same feature types and regularization options. See the Maxent website <http://biodiversityinformatics.amnh.org/open_source/maxent> for more details.