This package implements fast, scalable optimization algorithms for fitting topic models ("grade of membership" models) and non-negative matrix factorizations to count data. The methods exploit the special relationship between the multinomial topic model (also, "probabilistic latent semantic indexing") and Poisson non-negative matrix factorization. The package provides tools to compare, annotate and visualize model fits, including functions to efficiently create "structure plots" and identify key features in topics. The fastTopics package is a successor to the CountClust package. For more information, see <doi:10.48550/arXiv.2105.13440> and <doi:10.1186/s13059-023-03067-9>. Please also see the GitHub repository for additional vignettes not included in the package on CRAN.
Power and Sample Size for Health Researchers is a Shiny application that brings together a series of functions related to sample size and power calculations for common analysis in the healthcare field. There are functionalities to calculate the power, sample size to estimate or test hypotheses for means and proportions (including test for correlated groups, equivalence, non-inferiority and superiority), association, correlations coefficients, regression coefficients (linear, logistic, gamma, and Cox), linear mixed model, Cronbach's alpha, interobserver agreement, intraclass correlation coefficients, limit of agreement on Bland-Altman plots, area under the curve, sensitivity and specificity incorporating the prevalence of disease. You can also use the online version at <https://hcpa-unidade-bioestatistica.shinyapps.io/PSS_Health/>.
Implement a promising, and yet little explored protocol for bioacoustical analysis, the eigensound method by MacLeod, Krieger and Jones (2013) <doi:10.4404/hystrix-24.1-6299>. Eigensound is a multidisciplinary method focused on the direct comparison between stereotyped sounds from different species. SoundShape', in turn, provide the tools required for anyone to go from sound waves to Principal Components Analysis, using tools extracted from traditional bioacoustics (i.e. tuneR and seewave packages), geometric morphometrics (i.e. geomorph package) and multivariate analysis (e.g. stats package). For more information, please see Rocha and Romano (2021) and check SoundShape repository on GitHub for news and updates <https://github.com/p-rocha/SoundShape>.
Uses the Distorted Wave Born Approximation (DWBA) to compute the acoustic backward scattering, the geometry of the object is formed by a volumetric mesh, composed of tetrahedrons. This computation is done efficiently through an analytical 3D integration that allows for a solution which is expressed in terms of elementary functions for each tetrahedron. It is important to note that this method is only valid for objects whose acoustic properties, such as density and sound speed, do not vary significantly compared to the surrounding medium. (See Lavia, Cascallares and Gonzalez, J. D. (2023). TetraScatt model: Born approximation for the estimation of acoustic dispersion of fluid-like objects of arbitrary geometries. arXiv preprint <arXiv:2312.16721>).
Estimation of crop water demand can be processed via this package. As example, the data from TerraClimate dataset (<https://www.climatologylab.org/terraclimate.html>) calibrated with automatic weather stations of National Meteorological Institute of Brazil is available in a coarse spatial resolution to do the crop water demand. However, the user have also the option to download the variables directly from TerraClimate repository with the download.terraclimate function and access the original TerraClimate products. If the user believes that is necessary calibrate the variables, there is another function to do it. Lastly, the estimation of the crop water demand present in this package can be run for all the Brazilian territory with TerraClimate dataset.
Calculates marginal effects and conducts process analysis in exponential family random graph models (ERGM). Includes functions to conduct mediation and moderation analyses and to diagnose multicollinearity. URL: <https://github.com/sduxbury/ergMargins>. BugReports: <https://github.com/sduxbury/ergMargins/issues>. Duxbury, Scott W (2021) <doi:10.1177/0049124120986178>. Long, J. Scott, and Sarah Mustillo (2018) <doi:10.1177/0049124118799374>. Mize, Trenton D. (2019) <doi:10.15195/v6.a4>. Karlson, Kristian Bernt, Anders Holm, and Richard Breen (2012) <doi:10.1177/0081175012444861>. Duxbury, Scott W (2018) <doi:10.1177/0049124118782543>. Duxbury, Scott W, Jenna Wertsching (2023) <doi:10.1016/j.socnet.2023.02.003>. Huang, Peng, Carter Butts (2023) <doi:10.1016/j.socnet.2023.07.001>.
Some functions of ade4 and stats are combined in order to obtain a partition of the rows of a data table, with columns representing variables of scales: quantitative, qualitative or frequency. First, a principal axes method is performed and then, a combination of Ward agglomerative hierarchical classification and K-means is performed, using some of the first coordinates obtained from the previous principal axes method. In order to permit different weights of the elements to be clustered, the function kmeansW', programmed in C++, is included. It is a modification of kmeans'. Some graphical functions include the option: gg=FALSE'. When gg=TRUE', they use the ggplot2 and ggrepel packages to avoid the super-position of the labels.
This package provides a collection of functions for processing Gen5 2.06 exported data. Gen5 is an essential data analysis software for BioTek plate readers <https://www.biotek.com/products/software-robotics-software/gen5-microplate-reader-and-imager-software/>. This package contains functions for data cleaning, modeling and plotting using exported data from Gen5 version 2.06. It exports technically correct data defined in (Edwin de Jonge and Mark van der Loo (2013) <https://cran.r-project.org/doc/contrib/de_Jonge+van_der_Loo-Introduction_to_data_cleaning_with_R.pdf>) for customized analysis. It contains Boltzmann fitting for general kinetic analysis. See <https://www.github.com/yanxianUCSB/gen5helper> for more information, documentation and examples.
An implementation of the International Bureau of Weights and Measures (BIPM) generalized consensus estimators used to assign the reference value in a key comparison exercise. This can also be applied to any interlaboratory study. Given a set of different sources, primary laboratories or measurement methods this package provides an evaluation of the variance components according to the selected statistical method for consensus building. It also implements the comparison among different consensus builders and evaluates the participating method or sources against the consensus reference value. Based on a diverse set of references, DerSimonian-Laird (1986) <doi:10.1016/0197-2456(86)90046-2>, for a complete list of references look at the reference section in the package documentation.
Interface to TensorFlow <https://www.tensorflow.org/>, an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device with a single API'. TensorFlow was originally developed by researchers and engineers working on the Google Brain Team within Google's Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research, but the system is general enough to be applicable in a wide variety of other domains as well.
Easily construct prompts and associated logic for interacting with large language models (LLMs). tidyprompt introduces the concept of prompt wraps, which are building blocks that you can use to quickly turn a simple prompt into a complex one. Prompt wraps do not just modify the prompt text, but also add extraction and validation functions that will be applied to the response of the LLM. This ensures that the user gets the desired output. tidyprompt can add various features to prompts and their evaluation by LLMs, such as structured output, automatic feedback, retries, reasoning modes, autonomous R function calling, and R code generation and evaluation. It is designed to be compatible with any LLM provider that offers chat completion.
Students learning both econometrics and R may find the introduction to both challenging. The wooldridge data package aims to lighten the task by efficiently loading any data set found in the text with a single command. Data sets have been compressed to a fraction of their original size. Documentation files contain page numbers, the original source, time of publication, and notes from the author suggesting avenues for further analysis and research. If one needs an introduction to R model syntax, a vignette contains solutions to examples from chapters of the text. Data sets are from the 7th edition (Wooldridge 2020, ISBN-13 978-1-337-55886-0), and are backwards compatible with all previous versions of the text.
In computationally demanding data analysis pipelines, the targets R package (2021, <doi:10.21105/joss.02959>) maintains an up-to-date set of results while skipping tasks that do not need to rerun. This process increases speed and increases trust in the final end product. However, it also overwrites old output with new output, and past results disappear by default. To preserve historical output, the gittargets package captures version-controlled snapshots of the data store, and each snapshot links to the underlying commit of the source code. That way, when the user rolls back the code to a previous branch or commit, gittargets can recover the data contemporaneous with that commit so that all targets remain up to date.
This package provides a small package containing functions to perform a joint calibration of totals and quantiles. The calibration for totals is based on Deville and Särndal (1992) <doi:10.1080/01621459.1992.10475217>, the calibration for quantiles is based on Harms and Duchesne (2006) <https://www150.statcan.gc.ca/n1/en/catalogue/12-001-X20060019255>. The package uses standard calibration via the survey', sampling or laeken packages. In addition, entropy balancing via the ebal package and empirical likelihood based on codes from Wu (2005) <https://www150.statcan.gc.ca/n1/pub/12-001-x/2005002/article/9051-eng.pdf> can be used. See the paper by BerÄ sewicz and Szymkowiak (2023) for details <arXiv:2308.13281>.
This package provides functions to classify mass spectra in known categories, and to determine discriminant mass-to-charge values. It includes easy-to-use functions for preprocessing mass spectra, functions to determine discriminant mass-to-charge values (m/z) from a library of mass spectra corresponding to different categories, and functions to predict the category (species, phenotypes, etc.) associated to a mass spectrum from a list of selected mass-to-charge values. If you use this package in your research, please cite the associated publication (<doi:10.1016/j.eswa.2025.128796>). For a comprehensive guide, additional applications, and detailed examples of using this package, please visit our GitHub repository (<https://github.com/agodmer/MSclassifR_examples>).
We propose a pair of summary measures for the predictive power of a prediction function based on a regression model. The regression model can be linear or nonlinear, parametric, semi-parametric, or nonparametric, and correctly specified or mis-specified. The first measure, R-squared, is an extension of the classical R-squared statistic for a linear model, quantifying the prediction function's ability to capture the variability of the response. The second measure, L-squared, quantifies the prediction function's bias for predicting the mean regression function. When used together, they give a complete summary of the predictive power of a prediction function. Please refer to Gang Li and Xiaoyan Wang (2016) <arXiv:1611.03063> for more details.
Highly multiplexed imaging acquires the single-cell expression of selected proteins in a spatially-resolved fashion. These measurements can be visualised across multiple length-scales. First, pixel-level intensities represent the spatial distributions of feature expression with highest resolution. Second, after segmentation, expression values or cell-level metadata (e.g. cell-type information) can be visualised on segmented cell areas. This package contains functions for the visualisation of multiplexed read-outs and cell-level information obtained by multiplexed imaging technologies. The main functions of this package allow 1. the visualisation of pixel-level information across multiple channels, 2. the display of cell-level information (expression and/or metadata) on segmentation masks and 3. gating and visualisation of single cells.
This is a package for normalization, testing for differential variability and differential methylation and gene set testing for data from Illumina's Infinium HumanMethylation arrays. The normalization procedure is subset-quantile within-array normalization (SWAN), which allows Infinium I and II type probes on a single array to be normalized together. The test for differential variability is based on an empirical Bayes version of Levene's test. Differential methylation testing is performed using RUV, which can adjust for systematic errors of unknown origin in high-dimensional data by using negative control probes. Gene ontology analysis is performed by taking into account the number of probes per gene on the array, as well as taking into account multi-gene associated probes.
It is designed to work with text written in Bahasa Malaysia. We provide functions and data sets that will make working with Bahasa Malaysia text much easier. For word stemming in particular, we will look up the Malay words in a dictionary and then proceed to remove "extra suffix" as explained in Khan, Rehman Ullah, Fitri Suraya Mohamad, Muh Inam UlHaq, Shahren Ahmad Zadi Adruce, Philip Nuli Anding, Sajjad Nawaz Khan, and Abdulrazak Yahya Saleh Al-Hababi (2017) <https://ijrest.net/vol-4-issue-12.html> . This package includes a dictionary of Malay words that may be used to perform word stemming, a dataset of Malay stop words, a dataset of sentiment words and a dataset of normalized words.
This package contains efficient implementations of Discrete Optimal Transport algorithms for the computation of Kantorovich-Wasserstein distances between pairs of large spatial maps (Bassetti, Gualandi, Veneroni (2020), <doi:10.1137/19M1261195>). All the algorithms are based on an ad-hoc implementation of the Network Simplex algorithm. The package has four main helper functions: compareOneToOne() (to compare two spatial maps), compareOneToMany() (to compare a reference map with a list of other maps), compareAll() (to compute a matrix of distances between a list of maps), and focusArea() (to compute the KWD distance within a focus area). In non-convex maps, the helper functions first build the convex-hull of the input bins and pad the weights with zeros.
This package implements comprehensive test data engineering methods as described in Shojima (2022, ISBN:978-9811699856). Provides statistical techniques for engineering and processing test data: Classical Test Theory (CTT) with reliability coefficients for continuous ability assessment; Item Response Theory (IRT) including Rasch, 2PL, and 3PL models with item/test information functions; Latent Class Analysis (LCA) for nominal clustering; Latent Rank Analysis (LRA) for ordinal clustering with automatic determination of cluster numbers; Biclustering methods including infinite relational models for simultaneous clustering of examinees and items without predefined cluster numbers; and Bayesian Network Models (BNM) for visualizing inter-item dependencies. Features local dependence analysis through LRA and biclustering, parameter estimation, dimensionality assessment, and network structure visualization for educational, psychological, and social science research.
When added to an existing shiny app, users may subset any developer-chosen R data.frame on the fly. That is, users are empowered to slice & dice data by applying multiple (order specific) filters using the AND (&) operator between each, and getting real-time updates on the number of rows effected/available along the way. Thus, any downstream processes that leverage this data source (like tables, plots, or statistical procedures) will re-render after new filters are applied. The shiny moduleâ s user interface has a minimalist aesthetic so that the focus can be on the data & other visuals. In addition to returning a reactive (filtered) data.frame, IDEAFilter as also returns dplyr filter statements used to actually slice the data.
Manages, builds and computes statistics and datasets for the construction of quarterly (sub-annual) life tables by exploiting micro-data from either a general or an insured population. References: Pavà a and Lledó (2022) <doi:10.1111/rssa.12769>. Pavà a and Lledó (2023) <doi:10.1017/asb.2023.16>. Pavà a and Lledó (2025) <doi:10.1371/journal.pone.0315937>. Acknowledgements: The authors wish to thank Conselleria de Educación, Universidades y Empleo, Generalitat Valenciana (grants AICO/2021/257; CIAICO/2024/031), Ministerio de Ciencia e Innovación (grant PID2021-128228NB-I00) and Fundación Mapfre (grant Modelización espacial e intra-anual de la mortalidad en España. Una herramienta automática para el calculo de productos de vida') for supporting this research.
Sample surveys use scientific methods to draw inferences about population parameters by observing a representative part of the population, called sample. The SRSWOR (Simple Random Sampling Without Replacement) is one of the most widely used probability sampling designs, wherein every unit has an equal chance of being selected and units are not repeated.This function draws multiple SRSWOR samples from a finite population and estimates the population parameter i.e. total of HT, Ratio, and Regression estimators. Repeated simulations (e.g., 500 times) are used to assess and compare estimators using metrics such as percent relative bias (%RB), percent relative root means square error (%RRMSE).For details on sampling methodology, see, Cochran (1977) "Sampling Techniques" <https://archive.org/details/samplingtechniqu0000coch_t4x6>.