Intense parallel workloads can be difficult to monitor. Packages crew.cluster
, clustermq
, and future.batchtools
distribute hundreds of worker processes over multiple computers. If a worker process exhausts its available memory, it may terminate silently, leaving the underlying problem difficult to detect or troubleshoot. Using the autometric package, a worker can proactively monitor itself in a detached background thread. The worker process itself runs normally, and the thread writes to a log every few seconds. If the worker terminates unexpectedly, autometric can read and visualize the log file to reveal potential resource-related reasons for the crash. The autometric package borrows heavily from the methods of packages ps
and psutil
.
Copula-based regression models for multivariate censored data, including bivariate right-censored data, bivariate interval-censored data, and right/interval-censored semi-competing risks data. Currently supports Clayton, Gumbel, Frank, Joe, AMH and Copula2 copula models. For marginal models, it supports parametric (Weibull, Loglogistic, Gompertz) and semiparametric (Cox and transformation) models. Includes methods for convenient prediction and plotting. Also provides a bivariate time-to-event simulation function and an information ratio-based goodness-of-fit test for copula. Method details can be found in Sun et.al (2019) Lifetime Data Analysis, Sun et.al (2021) Biostatistics, Sun et.al (2022) Statistical Methods in Medical Research, Sun et.al (2022) Biometrics, and Sun et al. (2023+) JRSSC.
This package provides a dataset containing several color naming conventions established by multiple sources, along with associated color metadata. The package also provides related helper functions for mapping among the different Lego color naming conventions and between Lego colors, hex colors, and R color names, making it easy to convert any color palette to one based on existing Lego colors while keeping as close to the original color palette as possible. The functions use nearest color matching based on Euclidean distance in RGB space. Naming conventions for color mapping include those from BrickLink
(<https://www.bricklink.com>), The Lego Group (<https://www.lego.com>), LDraw (<https://www.ldraw.org/>), and Peeron (<http://www.peeron.com/>).
This package provides in built datasets and three functions. These functions are mobility_index, nonStanTest
and linkedLives
. The mobility_index function facilitates the calculation of lifecourse fluidity, whilst the nonStanTest
and the linkedLives
functions allow the user to determine the probability that the observed sequence data was due to chance. The linkedLives
function acknowledges the fact that some individuals may have identical sequences. The datasets available provide sequence data on marital status(maritalData
) and mobility (mydata) for a selected group of individuals from the British Household Panel Study (BHPS). In addition, personal and house ID's for 100 individuals are provided in a third dataset (myHouseID
) from the BHPS.
This package implements the framework presented in Cucci, D. A., Voirol, L., Khaghani, M. and Guerrier, S. (2023) <doi:10.1109/TIM.2023.3267360> which allows to analyze the impact of sensor error modeling on the performance of integrated navigation (sensor fusion) based on inertial measurement unit (IMU), Global Positioning System (GPS), and barometer data. The framework relies on Monte Carlo simulations in which a Vanilla Extended Kalman filter is coupled with realistic and user-configurable noise generation mechanisms to recover a reference trajectory from noisy measurements. The evaluation of several statistical metrics of the solution, aggregated over hundreds of simulated realizations, provides reasonable estimates of the expected performances of the system in real-world conditions.
This package provides several direct search optimization algorithms based on the simplex method. The provided algorithms are direct search algorithms, i.e. algorithms which do not use the derivative of the cost function. They are based on the update of a simplex. The following algorithms are available: the fixed shape simplex method of Spendley, Hext and Himsworth (unconstrained optimization with a fixed shape simplex, 1962) <doi:10.1080/00401706.1962.10490033>, the variable shape simplex method of Nelder and Mead (unconstrained optimization with a variable shape simplex made, 1965) <doi:10.1093/comjnl/7.4.308>, and Box's complex method (constrained optimization with a variable shape simplex, 1965) <doi: 10.1093/comjnl/8.1.42>.
An extension to ggplot2 and magick'. It contains three groups of functions: Functions in the first group draw ggplot2 - based plots: geom_shading_bar()
draws barplot with shading colors in each bar. geom_rect_cm()
, geom_circle_cm()
and geom_ellipse_cm()
draw rectangles, circles and ellipses with centimeter as their unit. Thus their sizes do not change when the coordinate system or the aspect ratio changes. annotation_transparent_text()
draws labels with transparent texts. annotation_shading_polygon()
draws irregular polygons with shading colors. Functions in the second group generate coordinates for regular shapes and make linear transformations. Functions in the third group are magick - based functions facilitating image processing.
Analysis of protein expression data can be done through Principal Component Analysis (PCA), and this R package is designed to streamline the analysis. This package enables users to perform PCA and it generates biplot and scree plot for advanced graphical visualization. Optionally, it supports grouping/clustering visualization with PCA loadings and confidence ellipses. With this R package, researchers can quickly explore complex protein datasets, interpret variance contributions, and visualize sample clustering through intuitive biplots. For more details, see Jolliffe (2001) <doi:10.1007/b98835>, Gabriel (1971) <doi:10.1093/biomet/58.3.453>, Zhang et al. (2024) <doi:10.1038/s41467-024-53239-9>, and Anandan et al. (2022) <doi:10.1038/s41598-022-07781-5>.
This package provides tools to apply Ensemble Empirical Mode Decomposition (EEMD) for cyclostratigraphy purposes. Mainly: a new algorithm, extricate, that performs EEMD in seconds, a linear interpolation algorithm using the greatest rational common divisor of depth or time, different algorithms to compute instantaneous amplitude, frequency and ratios of frequencies, and functions to verify and visualise the outputs. The functions were developed during the CRASH project (Checking the Reproducibility of Astrochronology in the Hauterivian). When using for publication please cite Wouters, S., Crucifix, M., Sinnesael, M., Da Silva, A.C., Zeeden, C., Zivanovic, M., Boulvain, F., Devleeschouwer, X., 2022, "A decomposition approach to cyclostratigraphic signal processing". Earth-Science Reviews 225 (103894). <doi:10.1016/j.earscirev.2021.103894>.
This package implements fast, scalable optimization algorithms for fitting topic models ("grade of membership" models) and non-negative matrix factorizations to count data. The methods exploit the special relationship between the multinomial topic model (also, "probabilistic latent semantic indexing") and Poisson non-negative matrix factorization. The package provides tools to compare, annotate and visualize model fits, including functions to efficiently create "structure plots" and identify key features in topics. The fastTopics
package is a successor to the CountClust
package. For more information, see <doi:10.48550/arXiv.2105.13440>
and <doi:10.1186/s13059-023-03067-9>. Please also see the GitHub
repository for additional vignettes not included in the package on CRAN.
Power and Sample Size for Health Researchers is a Shiny application that brings together a series of functions related to sample size and power calculations for common analysis in the healthcare field. There are functionalities to calculate the power, sample size to estimate or test hypotheses for means and proportions (including test for correlated groups, equivalence, non-inferiority and superiority), association, correlations coefficients, regression coefficients (linear, logistic, gamma, and Cox), linear mixed model, Cronbach's alpha, interobserver agreement, intraclass correlation coefficients, limit of agreement on Bland-Altman plots, area under the curve, sensitivity and specificity incorporating the prevalence of disease. You can also use the online version at <https://hcpa-unidade-bioestatistica.shinyapps.io/PSS_Health/>.
Implement a promising, and yet little explored protocol for bioacoustical analysis, the eigensound method by MacLeod
, Krieger and Jones (2013) <doi:10.4404/hystrix-24.1-6299>. Eigensound is a multidisciplinary method focused on the direct comparison between stereotyped sounds from different species. SoundShape
', in turn, provide the tools required for anyone to go from sound waves to Principal Components Analysis, using tools extracted from traditional bioacoustics (i.e. tuneR
and seewave packages), geometric morphometrics (i.e. geomorph package) and multivariate analysis (e.g. stats package). For more information, please see Rocha and Romano (2021) and check SoundShape
repository on GitHub
for news and updates <https://github.com/p-rocha/SoundShape>
.
Uses the Distorted Wave Born Approximation (DWBA) to compute the acoustic backward scattering, the geometry of the object is formed by a volumetric mesh, composed of tetrahedrons. This computation is done efficiently through an analytical 3D integration that allows for a solution which is expressed in terms of elementary functions for each tetrahedron. It is important to note that this method is only valid for objects whose acoustic properties, such as density and sound speed, do not vary significantly compared to the surrounding medium. (See Lavia, Cascallares and Gonzalez, J. D. (2023). TetraScatt
model: Born approximation for the estimation of acoustic dispersion of fluid-like objects of arbitrary geometries. arXiv
preprint <arXiv:2312.16721>
).
Estimation of crop water demand can be processed via this package. As example, the data from TerraClimate
dataset (<https://www.climatologylab.org/terraclimate.html>) calibrated with automatic weather stations of National Meteorological Institute of Brazil is available in a coarse spatial resolution to do the crop water demand. However, the user have also the option to download the variables directly from TerraClimate
repository with the download.terraclimate function and access the original TerraClimate
products. If the user believes that is necessary calibrate the variables, there is another function to do it. Lastly, the estimation of the crop water demand present in this package can be run for all the Brazilian territory with TerraClimate
dataset.
Interact with the Europeana Data Model via a variety of API endpoints that contains digital collections from thousands of institutions around Europe. This translates to millions of Cultural Heritage Objects in the form of image, text, video, sound and 3D, accompanied by rich metadata. The Data Model design principles are based on the core principles and best practices of the Semantic Web and Linked Data efforts to which Europeana contributes (see, e.g., Doerr, Martin, et al. The europeana data model (edm). World Library and Information Congress: 76th IFLA general conference and assembly. Vol. 10. 2010.). The package also provides methods for bulk downloads of specific subsets of items, including both their metadata and their associated media files.
Calculates marginal effects and conducts process analysis in exponential family random graph models (ERGM). Includes functions to conduct mediation and moderation analyses and to diagnose multicollinearity. URL: <https://github.com/sduxbury/ergMargins>
. BugReports
: <https://github.com/sduxbury/ergMargins/issues>
. Duxbury, Scott W (2021) <doi:10.1177/0049124120986178>. Long, J. Scott, and Sarah Mustillo (2018) <doi:10.1177/0049124118799374>. Mize, Trenton D. (2019) <doi:10.15195/v6.a4>. Karlson, Kristian Bernt, Anders Holm, and Richard Breen (2012) <doi:10.1177/0081175012444861>. Duxbury, Scott W (2018) <doi:10.1177/0049124118782543>. Duxbury, Scott W, Jenna Wertsching (2023) <doi:10.1016/j.socnet.2023.02.003>. Huang, Peng, Carter Butts (2023) <doi:10.1016/j.socnet.2023.07.001>.
Some functions of ade4 and stats are combined in order to obtain a partition of the rows of a data table, with columns representing variables of scales: quantitative, qualitative or frequency. First, a principal axes method is performed and then, a combination of Ward agglomerative hierarchical classification and K-means is performed, using some of the first coordinates obtained from the previous principal axes method. In order to permit different weights of the elements to be clustered, the function kmeansW
', programmed in C++, is included. It is a modification of kmeans'. Some graphical functions include the option: gg=FALSE'. When gg=TRUE', they use the ggplot2 and ggrepel packages to avoid the super-position of the labels.
This package provides a collection of functions for processing Gen5 2.06 exported data. Gen5 is an essential data analysis software for BioTek
plate readers <https://www.biotek.com/products/software-robotics-software/gen5-microplate-reader-and-imager-software/>. This package contains functions for data cleaning, modeling and plotting using exported data from Gen5 version 2.06. It exports technically correct data defined in (Edwin de Jonge and Mark van der Loo (2013) <https://cran.r-project.org/doc/contrib/de_Jonge+van_der_Loo-Introduction_to_data_cleaning_with_R.pdf>) for customized analysis. It contains Boltzmann fitting for general kinetic analysis. See <https://www.github.com/yanxianUCSB/gen5helper>
for more information, documentation and examples.
An implementation of the International Bureau of Weights and Measures (BIPM) generalized consensus estimators used to assign the reference value in a key comparison exercise. This can also be applied to any interlaboratory study. Given a set of different sources, primary laboratories or measurement methods this package provides an evaluation of the variance components according to the selected statistical method for consensus building. It also implements the comparison among different consensus builders and evaluates the participating method or sources against the consensus reference value. Based on a diverse set of references, DerSimonian-Laird
(1986) <doi:10.1016/0197-2456(86)90046-2>, for a complete list of references look at the reference section in the package documentation.
Easily construct prompts and associated logic for interacting with large language models (LLMs). tidyprompt introduces the concept of prompt wraps, which are building blocks that you can use to quickly turn a simple prompt into a complex one. Prompt wraps do not just modify the prompt text, but also add extraction and validation functions that will be applied to the response of the LLM. This ensures that the user gets the desired output. tidyprompt can add various features to prompts and their evaluation by LLMs, such as structured output, automatic feedback, retries, reasoning modes, autonomous R function calling, and R code generation and evaluation. It is designed to be compatible with any LLM provider that offers chat completion.
Students learning both econometrics and R may find the introduction to both challenging. The wooldridge data package aims to lighten the task by efficiently loading any data set found in the text with a single command. Data sets have been compressed to a fraction of their original size. Documentation files contain page numbers, the original source, time of publication, and notes from the author suggesting avenues for further analysis and research. If one needs an introduction to R model syntax, a vignette contains solutions to examples from chapters of the text. Data sets are from the 7th edition (Wooldridge 2020, ISBN-13 978-1-337-55886-0), and are backwards compatible with all previous versions of the text.
In computationally demanding data analysis pipelines, the targets R package (2021, <doi:10.21105/joss.02959>) maintains an up-to-date set of results while skipping tasks that do not need to rerun. This process increases speed and increases trust in the final end product. However, it also overwrites old output with new output, and past results disappear by default. To preserve historical output, the gittargets package captures version-controlled snapshots of the data store, and each snapshot links to the underlying commit of the source code. That way, when the user rolls back the code to a previous branch or commit, gittargets can recover the data contemporaneous with that commit so that all targets remain up to date.
This package provides a small package containing functions to perform a joint calibration of totals and quantiles. The calibration for totals is based on Deville and Särndal (1992) <doi:10.1080/01621459.1992.10475217>, the calibration for quantiles is based on Harms and Duchesne (2006) <https://www150.statcan.gc.ca/n1/en/catalogue/12-001-X20060019255>. The package uses standard calibration via the survey', sampling or laeken packages. In addition, entropy balancing via the ebal package and empirical likelihood based on codes from Wu (2005) <https://www150.statcan.gc.ca/n1/pub/12-001-x/2005002/article/9051-eng.pdf> can be used. See the paper by BerÄ sewicz and Szymkowiak (2023) for details <arXiv:2308.13281>
.
We propose a pair of summary measures for the predictive power of a prediction function based on a regression model. The regression model can be linear or nonlinear, parametric, semi-parametric, or nonparametric, and correctly specified or mis-specified. The first measure, R-squared, is an extension of the classical R-squared statistic for a linear model, quantifying the prediction function's ability to capture the variability of the response. The second measure, L-squared, quantifies the prediction function's bias for predicting the mean regression function. When used together, they give a complete summary of the predictive power of a prediction function. Please refer to Gang Li and Xiaoyan Wang (2016) <arXiv:1611.03063>
for more details.