This package provides a collection of recycled and modified R functions to aid in file manipulation, data exploration, wrangling, optimization, and object manipulation. Other functions aid in convenient data visualization, loop progression, software packaging, and installation.
This package provides functions for creating and annotating a composite plot in ggplot2'. Offers background themes and shortcut plotting functions that produce figures that are appropriate for the format of scientific journals. Some methods are described in Min and Zhou (2021) <doi:10.3389/fgene.2021.802894>.
This package provides a flexible moving average algorithm for modeling drug exposure in pharmacoepidemiology studies as presented in the article: Ouchi, D., Giner-Soriano, M., Gómez-Lumbreras, A., Vedia Urgell, C.,Torres, F., & Morros, R. (2022). "Automatic Estimation of the Most Likely Drug Combination in Electronic Health Records Using the Smooth Algorithm : Development and Validation Study." JMIR medical informatics, 10(11), e37976. <doi:10.2196/37976>.
Bayesian analysis of censored linear mixed-effects models that replace Gaussian assumptions with a flexible class of distributions, such as the scale mixture of normal family distributions, considering a damped exponential correlation structure which was employed to account for within-subject autocorrelation among irregularly observed measures. For more details, see Kelin Zhong, Fernanda L. Schumacher, Luis M. Castro, Victor H. Lachos (2025) <doi:10.1002/sim.10295>.
This tiny package contains one function smirnov()
which calculates two scaled taxonomic coefficients, Txy (coefficient of similarity) and Txx (coefficient of originality). These two characteristics may be used for the analysis of similarities between any number of taxonomic groups, and also for assessing uniqueness of giving taxon. It is possible to use smirnov()
output as a distance measure: convert it to distance by "as.dist(1 - smirnov(x))".
This package provides the SMOTE with Boosting (SMOTEWB) algorithm. See F. SaÄ lam, M. A. Cengiz (2022) <doi:10.1016/j.eswa.2022.117023>. It is a SMOTE-based resampling technique which creates synthetic data on the links between nearest neighbors. SMOTEWB uses boosting weights to determine where to generate new samples and automatically decides the number of neighbors for each sample. It is robust to noise and outperforms most of the alternatives according to Matthew Correlation Coefficient metric. Alternative resampling methods are also available in the package.
Flexible multidimensional scaling (MDS) methods and extensions to the package smacof'. This package contains various functions, wrappers, methods and classes for fitting, plotting and displaying a large number of different flexible MDS models. These are: Torgerson scaling (Torgerson, 1958, ISBN:978-0471879459) with powers, Sammon mapping (Sammon, 1969, <doi:10.1109/T-C.1969.222678>) with ratio and interval optimal scaling, Multiscale MDS (Ramsay, 1977, <doi:10.1007/BF02294052>) with ratio and interval optimal scaling, s-stress MDS (ALSCAL; Takane, Young & De Leeuw, 1977, <doi:10.1007/BF02293745>) with ratio and interval optimal scaling, elastic scaling (McGee
, 1966, <doi:10.1111/j.2044-8317.1966.tb00367.x>) with ratio and interval optimal scaling, r-stress MDS (De Leeuw, Groenen & Mair, 2016, <https://rpubs.com/deleeuw/142619>) with ratio, interval, splines and nonmetric optimal scaling, power-stress MDS (POST-MDS; Buja & Swayne, 2002 <doi:10.1007/s00357-001-0031-0>) with ratio and interval optimal scaling, restricted power-stress (Rusch, Mair & Hornik, 2021, <doi:10.1080/10618600.2020.1869027>) with ratio and interval optimal scaling, approximate power-stress with ratio optimal scaling (Rusch, Mair & Hornik, 2021, <doi:10.1080/10618600.2020.1869027>), Box-Cox MDS (Chen & Buja, 2013, <https://jmlr.org/papers/v14/chen13a.html>), local MDS (Chen & Buja, 2009, <doi:10.1198/jasa.2009.0111>), curvilinear component analysis (Demartines & Herault, 1997, <doi:10.1109/72.554199>), curvilinear distance analysis (Lee, Lendasse & Verleysen, 2004, <doi:10.1016/j.neucom.2004.01.007>), nonlinear MDS with optimal dissimilarity powers functions (De Leeuw, 2024, <https://github.com/deleeuw/smacofManual/blob/main/smacofPO(power)/smacofPO.pdf>
), sparsified (power) MDS and sparsified multidimensional (power) distance analysis aka extended curvilinear (power) component analysis and extended curvilinear (power) distance analysis (Rusch, 2024, <doi:10.57938/355bf835-ddb7-42f4-8b85-129799fc240e>). Some functions are suitably flexible to allow any other sensible combination of explicit power transformations for weights, distances and input proximities with implicit ratio, interval, splines or nonmetric optimal scaling of the input proximities. Most functions use a Majorization-Minimization algorithm. Currently the methods are only available for one-mode two-way data (symmetric dissimilarity matrices).
Perform two-dimensional smoothing for spatial fields using FFT and the convolution theorem (see Gilleland 2013, <doi:10.5065/D61834G2>).
Allows users to easily build custom docker images <https://docs.docker.com/> from Amazon Web Service Sagemaker <https://aws.amazon.com/sagemaker/> using Amazon Web Service CodeBuild
<https://aws.amazon.com/codebuild/>.
This package provides a collection of methods for smoothing numerical data, commencing with a port of the Matlab gaussian window smoothing function. In addition, several functions typically used in smoothing of financial data are included.
Mosaic diagram, scatterplot matrix, Andrews curves, parallel coordinate diagram, radar diagram, and Chernoff plots as a Shiny app, which allow the order of variables to be changed interactively. The apps are intended as teaching examples.
This package provides the filtering algorithms for the state space models on the Stiefel manifold as well as the corresponding sampling algorithms for uniform, vector Langevin-Bingham and matrix Langevin-Bingham distributions on the Stiefel manifold.
Simple class to hold contents of a SMET file as specified in Bavay (2021) <https://code.wsl.ch/snow-models/meteoio/-/blob/master/doc/SMET_specifications.pdf>. There numerical meteorological measurements are all based on MKS (SI) units and timestamp is standardized to UTC time.
Exploratory analysis on any input data describing the structure and the relationships present in the data. The package automatically select the variable and does related descriptive statistics. Analyzing information value, weight of evidence, custom tables, summary statistics, graphical techniques will be performed for both numeric and categorical predictors.
This package provides flexible hazard ratio curves allowing non-linear relationships between continuous predictors and survival. To better understand the effects that each continuous covariate has on the outcome, results are expressed in terms of hazard ratio curves, taking a specific covariate value as reference. Confidence bands for these curves are also derived.
Introduces a fast and efficient Surrogate Variable Analysis algorithm that captures variation of unknown sources (batch effects) for high-dimensional data sets. The algorithm is built on the irwsva.build function of the sva package and proposes a revision on it that achieves an order of magnitude faster running time while trading no accuracy loss in return.
Fast computation of multivariate analyses of small (10s to 100s markers) to big (1000s to 100000s) genotype data. Runs Principal Component Analysis allowing for centering, z-score standardization and scaling for genetic drift, projection of ancient samples to modern genetic space and multivariate tests for differences in group location (Permutation-Based Multivariate Analysis of Variance) and dispersion (Permutation-Based Multivariate Analysis of Dispersion).
Preview spatial data as leaflet maps with minimal effort. smartmap is optimized for interactive use and distinguishes itself from similar packages because it does not need real spatial ('sp or sf') objects an input; instead, it tries to automatically coerce everything that looks like spatial data to sf objects or leaflet maps. It - for example - supports direct mapping of: a vector containing a single coordinate pair, a two column matrix, a data.frame with longitude and latitude columns, or the path or URL to a (possibly compressed) shapefile'.
Implementation of the SIC epsilon-telescope method, either using single or distributional (multiparameter) regression. Includes classical regression with normally distributed errors and robust regression, where the errors are from the Laplace distribution. The "smooth generalized normal distribution" is used, where the estimation of an additional shape parameter allows the user to move smoothly between both types of regression. See O'Neill and Burke (2022) "Robust Distributional Regression with Automatic Variable Selection" for more details. <arXiv:2212.07317>
. This package also contains the data analyses from O'Neill and Burke (2023). "Variable selection using a smooth information criterion for distributional regression models". <doi:10.1007/s11222-023-10204-8>.
Inference techniques for Fay Herriot Model.
This package provides a set of functions to build a scoring model from beginning to end, leading the user to follow an efficient and organized development process, reducing significantly the time spent on data exploration, variable selection, feature engineering, binning and model selection among other recurrent tasks. The package also incorporates monotonic and customized binning, scaling capabilities that transforms logistic coefficients into points for a better business understanding and calculates and visualizes classic performance metrics of a classification model.
Data practitioners regularly use the R and Python programming languages to prepare data for analyses. Thus, they encode important data preprocessing decisions in R and Python code. The smallsets package subsequently decodes these decisions into a Smallset Timeline, a static, compact visualisation of data preprocessing decisions (Lucchesi et al. (2022) <doi:10.1145/3531146.3533175>). The visualisation consists of small data snapshots of different preprocessing steps. The smallsets package builds this visualisation from a user's dataset and preprocessing code located in an R', R Markdown', Python', or Jupyter Notebook file. Users simply add structured comments with snapshot instructions to the preprocessing code. One optional feature in smallsets requires installation of the Gurobi optimisation software and gurobi R package, available from <https://www.gurobi.com>. More information regarding the optional feature and gurobi installation can be found in the smallsets vignette.
This package provides functions used in courses taught by Dr. Small at Drew University.
Fit univariate right, left or interval censored regression model under the scale mixture of normal distributions.