When working with big data sets, RAM conservation is critically important. However, it is not always enough to just monitor the size of the objects created. So-called "copy-on-modify" behavior, characteristic of R, means that some expressions or functions may require an unexpectedly large amount of RAM overhead. For example, replacing a single value in a matrix duplicates that matrix in the back-end, making this task require twice as much RAM as that used by the matrix itself. This package makes it easy to monitor the total and peak RAM used so that developers can quickly identify and eliminate RAM hungry code.
This package provides Sensory and Consumer Data mapping and analysis <doi:10.14569/IJACSA.2017.081266>. The mapping visualization is made available from several features : options in dimension reduction methods and prediction models ranging from linear to non linear regressions. A smoothed version of the map performed using locally weighted regression algorithm is available. A selection process of map stability is provided. A shiny application is included. It presents an easy GUI for the implemented functions as well as a comparative tool of fit models using several criteria. Basic analysis such as characterization of products, panelists and sessions likewise consumer segmentation are also made available.
By gaining the property of emergence through self-organization, the enhancement of SOMs(self organizing maps) is called Emergent SOM (ESOM). The result of the projection by ESOM is a grid of neurons which can be visualised as a three dimensional landscape in form of the Umatrix. Further details can be found in the referenced publications (see url). This package offers tools for calculating and visualising the ESOM as well as Umatrix, Pmatrix and UStarMatrix
. All the functionality is also available through graphical user interfaces implemented in shiny'. Based on the recognized data structures, the method can be used to generate new data.
Data type and tools for working with matrices having precision weights and missing data. This package provides a common representation and tools that can be used with many types of high-throughput data. The meaning of the weights is compatible with usage in the base R function "lm" and the package "limma". Calibrate weights to account for known predictors of precision. Find rows with excess variability. Perform differential testing and find rows with the largest confident differences. Find PCA-like components of variation even with many missing values, rotated so that individual components may be meaningfully interpreted. DelayedArray
matrices and BiocParallel
are supported.
Calculations of the most common metrics of automated advertisement and plotting of them with trend and forecast. Calculations and description of metrics is taken from different RTB platforms support documentation. Plotting and forecasting is based on packages forecast', described in Rob J Hyndman and George Athanasopoulos (2021) "Forecasting: Principles and Practice" <https://otexts.com/fpp3/> and Rob J Hyndman et al "Documentation for forecast'" (2003) <https://pkg.robjhyndman.com/forecast/>, and ggplot2', described in Hadley Wickham et al "Documentation for ggplot2'" (2015) <https://ggplot2.tidyverse.org/>, and Hadley Wickham, Danielle Navarro, and Thomas Lin Pedersen (2015) "ggplot2: Elegant Graphics for Data Analysis" <https://ggplot2-book.org/>.
Bootstrap based goodness-of-fit tests. It allows to perform rigorous statistical tests to check if a chosen model family is correct based on the marked empirical process. The implemented algorithms are described in (Dikta and Scheer (2021) <doi:10.1007/978-3-030-73480-0>) and can be applied to generalized linear models without any further implementation effort. As far as certain linearity conditions are fulfilled the resampling scheme are also applicable beyond generalized linear models. This is reflected in the software architecture which allows to reuse the resampling scheme by implementing only certain interfaces for models that are not supported natively by the package.
This package provides several novel exact hypothesis tests with minimal assumptions on the errors. The tests are exact, meaning that their p-values are correct for the given sample sizes (the p-values are not derived from asymptotic analysis). The test for stochastic inequality is for ordinal comparisons based on two independent samples and requires no assumptions on the errors. The other tests include tests for the mean and variance of a single sample and comparing means in independent samples. All these tests only require that the data has known bounds (such as percentages that lie in [0,100]. These bounds are part of the input.
High Dynamic Range (HDR) images support a large range in luminosity between the lightest and darkest regions of an image. To capture this range, data in HDR images is often stored as floating point numbers and in formats that capture more data and channels than standard image types. This package supports reading and writing two types of HDR images; PFM (Portable Float Map) and OpenEXR
images. HDR images can be converted to lower dynamic ranges (for viewing) using tone-mapping. A number of tone-mapping algorithms are included which are based on Reinhard (2002) "Photographic tone reproduction for digital images" <doi:10.1145/566654.566575>.
Penalized and non-penalized maximum likelihood estimation of smooth transition vector autoregressive models with various types of transition weight functions, conditional distributions, and identification methods. Constrained estimation with various types of constraints is available. Residual based model diagnostics, forecasting, simulations, and calculation of impulse response functions, generalized impulse response functions, and generalized forecast error variance decompositions. See Heather Anderson, Farshid Vahid (1998) <doi:10.1016/S0304-4076(97)00076-6>, Helmut Lütkepohl, Aleksei Netšunajev (2017) <doi:10.1016/j.jedc.2017.09.001>, Markku Lanne, Savi Virolainen (2025) <doi:10.48550/arXiv.2403.14216>
, Savi Virolainen (2025) <doi:10.48550/arXiv.2404.19707>
.
The main janitor functions can: perfectly format data.frame column
names; provide quick counts of variable combinations (i.e., frequency tables and crosstabs); and isolate duplicate records. Other janitor functions nicely format the tabulation results. These tabulate-and-report functions approximate popular features of SPSS and Excel. This package follows the principles of the "tidyverse" and works well with the pipe function %>%
. janitor was built with beginning-to-intermediate R users in mind and is optimized for user-friendliness. Advanced R users can already do everything covered here, but with janitor they can do it faster and save their thinking for the fun stuff.
This package implements Collective And Point Anomaly (CAPA) Fisch, Eckley, and Fearnhead (2022) <doi:10.1002/sam.11586>, Multi-Variate Collective And Point Anomaly (MVCAPA) Fisch, Eckley, and Fearnhead (2021) <doi:10.1080/10618600.2021.1987257>, Proportion Adaptive Segment Selection (PASS) Jeng, Cai, and Li (2012) <doi:10.1093/biomet/ass059>, and Bayesian Abnormal Region Detector (BARD) Bardwell and Fearnhead (2015) <doi:10.1214/16-BA998>. These methods are for the detection of anomalies in time series data. Further information regarding the use of this package along with detailed examples can be found in Fisch, Grose, Eckley, Fearnhead, and Bardwell (2024) <doi:10.18637/jss.v110.i01>.
This package provides tools to calibrate, validate, and make predictions with the General Unified Threshold model of Survival adapted for Bee species. The model is presented in the publication from Baas, J., Goussen, B., Miles, M., Preuss, T.G., Roessing, I. (2022) <doi:10.1002/etc.5423> and Baas, J., Goussen, B., Taenzler, V., Roeben, V., Miles, M., Preuss, T.G., van den Berg, S., Roessink, I. (2024) <doi:10.1002/etc.5871>, and is based on the GUTS framework Jager, T., Albert, C., Preuss, T.G. and Ashauer, R. (2011) <doi:10.1021/es103092a>. The authors are grateful to Bayer A.G. for its financial support.
Several implementations of non-parametric stable bootstrap-based techniques to determine the numbers of components for Partial Least Squares linear or generalized linear regression models as well as and sparse Partial Least Squares linear or generalized linear regression models. The package collects techniques that were published in a book chapter (Magnanensi et al. 2016, The Multiple Facets of Partial Least Squares and Related Methods', <doi:10.1007/978-3-319-40643-5_18>) and two articles (Magnanensi et al. 2017, Statistics and Computing', <doi:10.1007/s11222-016-9651-4>) and (Magnanensi et al. 2021, Frontiers in Applied Mathematics and Statistics', <doi:10.3389/fams.2021.693126>).
This package provides functions for loading large (10M+ lines) CSV and other delimited files, similar to read.csv, but typically faster and using less memory than the standard R loader. While not entirely general, it covers many common use cases when the types of columns in the CSV file are known in advance. In addition, the package provides a class int64', which represents 64-bit integers exactly when reading from a file. The latter is useful when working with 64-bit integer identifiers exported from databases. The CSV file loader supports common column types including integer', double', string', and int64', leaving further type transformations to the user.
Copernicus Atmosphere Monitoring Service (CAMS) radiations service provides time series of global, direct, and diffuse irradiations on horizontal surface, and direct irradiation on normal plane for the actual weather conditions as well as for clear-sky conditions. The geographical coverage is the field-of-view of the Meteosat satellite, roughly speaking Europe, Africa, Atlantic Ocean, Middle East. The time coverage of data is from 2004-02-01 up to 2 days ago. Data are available with a time step ranging from 15 min to 1 month. For license terms and to create an account, please see <http://www.soda-pro.com/web-services/radiation/cams-radiation-service>.
Testing and documenting code that communicates with remote databases can be painful. Although the interaction with R is usually relatively simple (e.g. data(frames) passed to and from a database), because they rely on a separate service and the data there, testing them can be difficult to set up, unsustainable in a continuous integration environment, or impossible without replicating an entire production cluster. This package addresses that by allowing you to make recordings from your database interactions and then play them back while testing (or in other contexts) all without needing to spin up or have access to the database your code would typically connect to.
Traditional phasing programs are limited to diploid organisms. Our method modifies Li and Stephens algorithm with Markov chain Monte Carlo (MCMC) approaches, and builds a generic framework that allows haplotype searches in a multiple infection setting. This package is primarily developed as part of the Pf3k project, which is a global collaboration using the latest sequencing technologies to provide a high-resolution view of natural variation in the malaria parasite Plasmodium falciparum. Parasite DNA are extracted from patient blood sample, which often contains more than one parasite strain, with unknown proportions. This package is used for deconvoluting mixed haplotypes, and reporting the mixture proportions from each sample.
Computes a series of indices commonly used in the fields of economic geography, economic complexity, and evolutionary economics to describe the location, distribution, spatial organization, structure, and complexity of economic activities. Functions include basic spatial indicators such as the location quotient, the Krugman specialization index, the Herfindahl or the Shannon entropy indices but also more advanced functions to compute different forms of normalized relatedness between economic activities or network-based measures of economic complexity. Most of the functions use matrix calculus and are based on bipartite (incidence) matrices consisting of region - industry pairs. These are described in Balland (2017) <http://econ.geo.uu.nl/peeg/peeg1709.pdf>.
This package provides functions to prepare and analyse eye tracking data of reading exercises. The functions allow some basic data preparations and code fixations as first and second pass. First passes can be further devided into forward and reading. The package further allows for aggregating fixation times per AOI or per AOI and per type of pass (first forward, first rereading, second). These methods are based on Hyönä, Lorch, and Rinck (2003) <doi:10.1016/B978-044451020-4/50018-9> and Hyönä, and Lorch (2004) <doi:10.1016/j.learninstruc.2004.01.001>. It is also possible to convert between metric length and visual degrees.
This package provides a suite of convenient tools for social network analysis geared toward students, entry-level users, and non-expert practitioners. â ideanetâ features unique functions for the processing and measurement of sociocentric and egocentric network data. These functions automatically generate node- and system-level measures commonly used in the analysis of these types of networks. Outputs from these functions maximize the ability of novice users to employ network measurements in further analyses while making all users less prone to common data analytic errors. Additionally, â ideanetâ features an R Shiny graphic user interface that allows novices to explore network data with minimal need for coding.
Imports indicator data provided by the Ministry of Education (MoE),Spain
. The data is stored at <https://www.educacionyfp.gob.es/servicios-al-ciudadano/estadisticas/no-universitaria.html> Includes functions for reading, downloading, and selecting data for main series. This package is not sponsored or supported by the MoE
Spain. Importa datos con indicadores del Ministerio de Educación y Formación Profesional (MEFD) de Españá. Los datos están en <https://www.educacionyfp.gob.es/servicios-al-ciudadano/estadisticas/no-universitaria.html> Contiene funciones para leer, descargar, y seleccionar bases de datos de series principales. Este paquete no es patrocinado o respaldado por el MEFD.
This package performs predictions of totals and weighted sums, or finite population block kriging, on spatial data using the methods in Ver Hoef (2008) <doi:10.1007/s10651-007-0035-y>. The primary outputs are an estimate of the total, mean, or weighted sum in the region, an estimated prediction variance, and a plot of the predicted and observed values. This is useful primarily to users with ecological data that are counts or densities measured on some sites in a finite area of interest. Spatial prediction for the total count or average density in the entire region can then be done using the functions in this package.
This package performs an analysis of time-to-event clinical trial data using various "win time" methods, including ewt', ewtr', rmt', max', wtr', rwtr', and pwt'. These methods are used to calculate and compare treatment effects on ordered composite endpoints. The package handles event times, event indicators, and treatment arm indicators and supports calculations on observed and resampled data. Detailed explanations of each method and usage examples are provided in "Use of win time for ordered composite endpoints in clinical trials," by Troendle et al. (2024)<https://pubmed.ncbi.nlm.nih.gov/38417455/>. For more information, see the package documentation or the vignette titled "Introduction to wintime.".
Our pipeline, MICSQTL, utilizes scRNA-seq
reference and bulk transcriptomes to estimate cellular composition in the matched bulk proteomes. The expression of genes and proteins at either bulk level or cell type level can be integrated by Angle-based Joint and Individual Variation Explained (AJIVE) framework. Meanwhile, MICSQTL can perform cell-type-specic quantitative trait loci (QTL) mapping to proteins or transcripts based on the input of bulk expression data and the estimated cellular composition per molecule type, without the need for single cell sequencing. We use matched transcriptome-proteome from human brain frontal cortex tissue samples to demonstrate the input and output of our tool.