This package implements the SparseStep
model for solving regression problems with a sparsity constraint on the parameters. The SparseStep
regression model was proposed in Van den Burg, Groenen, and Alfons (2017) <arXiv:1701.06967>
. In the model, a regularization term is added to the regression problem which approximates the counting norm of the parameters. By iteratively improving the approximation a sparse solution to the regression problem can be obtained. In this package both the standard SparseStep
algorithm is implemented as well as a path algorithm which uses golden section search to determine solutions with different values for the regularization parameter.
This package contains efficient implementations of Discrete Optimal Transport algorithms for the computation of Kantorovich-Wasserstein distances between pairs of large spatial maps (Bassetti, Gualandi, Veneroni (2020), <doi:10.1137/19M1261195>). All the algorithms are based on an ad-hoc implementation of the Network Simplex algorithm. The package has four main helper functions: compareOneToOne()
(to compare two spatial maps), compareOneToMany()
(to compare a reference map with a list of other maps), compareAll()
(to compute a matrix of distances between a list of maps), and focusArea()
(to compute the KWD distance within a focus area). In non-convex maps, the helper functions first build the convex-hull of the input bins and pad the weights with zeros.
This package provides a graph community detection algorithm that aims to be performant on large graphs and robust, returning consistent results across runs. SpeakEasy
2 (SE2), the underlying algorithm, is described in Chris Gaiteri, David R. Connell & Faraz A. Sultan et al. (2023) <doi:10.1186/s13059-023-03062-0>. The core algorithm is written in C', providing speed and keeping the memory requirements low. This implementation can take advantage of multiple computing cores without increasing memory usage. SE2 can detect community structure across scales, making it a good choice for biological data, which often has hierarchical structure. Graphs can be passed to the algorithm as adjacency matrices using base R matrices, the Matrix library, igraph graphs, or any data that can be coerced into a matrix.
Fit latent variable models with the GEV distribution as the data likelihood and the GEV parameters following latent Gaussian processes. The models in this package are built using the template model builder TMB in R, which has the fast ability to integrate out the latent variables using Laplace approximation. This package allows the users to choose in the fit function which GEV parameter(s) is considered as a spatially varying random effect following a Gaussian process, so the users can fit spatial GEV models with different complexities to their dataset without having to write the models in TMB by themselves. This package also offers methods to sample from both fixed and random effects posteriors as well as the posterior predictive distributions at different spatial locations. Methods for fitting this class of models are described in Chen, Ramezan, and Lysy (2024) <doi:10.48550/arXiv.2110.07051>
.
This package provides a multidimensional dataset of students performance assessment in high school physics. The SPHERE dataset was collected from 497 students in four public high schools specifically measuring their conceptual understanding, scientific ability, and attitude toward physics [see Santoso et al. (2024) <doi:10.17632/88d7m2fv7p.1>]. The data collection was conducted using some research based assessments established by the physics education research community. They include the Force Concept Inventory, the Force and Motion Conceptual Evaluation, the Rotational and Rolling Motion Conceptual Survey, the Fluid Mechanics Concept Inventory, the Mechanical Waves Conceptual Survey, the Thermal Concept Evaluation, the Survey of Thermodynamic Processes and First and Second Laws, the Scientific Abilities Assessment Rubrics, and the Colorado Learning Attitudes about Science Survey. Students attributes related to gender, age, socioeconomic status, domicile, literacy, physics identity, and test results administered using teachers developed items are also reported in this dataset.
Spatial downscaling of climate data (Global Circulation Models/Regional Climate Models) using quantile-quantile bias correction technique.
This package provides a set of functions for obtaining positional parameters and magnitude difference between components of binary and multiple stellar systems from series of speckle images.
This package implements the diffusion map method of dimensionality reduction and spectral method of combining multiple diffusion maps, including creation of the spectra and visualization of maps.
Utility functions for spectroscopy. 1. Functions to simulate spectra for use in teaching or testing. 2. Functions to process files created by LoggerPro
and SpectraSuite
software.
An R data package containing setlists from all Bruce Springsteen concerts over 1973-2021. Also includes all his song details such as lyrics and albums. Data extracted from: <http://brucebase.wikidot.com/>.
Inspect interactively the spatially-resolved transcriptomics data from the 10x Genomics Visium platform as well as data from the Maynard, Collado-Torres et al, Nature Neuroscience, 2021 project analyzed by Lieber Institute for Brain Development (LIBD) researchers and collaborators.
Visualization and analysis of Vectra Immunoflourescent data. Options for calculating both the univariate and bivariate Ripley's K are included. Calculations are performed using a permutation-based approach presented by Wilson et al. <doi:10.1101/2021.04.27.21256104>.
This package provides functions for differential gene expression analysis of gene expression time-course data. Natural cubic spline regression models are used. Identified genes may further be used for pathway enrichment analysis and/or the reconstruction of time dependent gene regulatory association networks.
This package provides functions to generate or sample from all possible splits of features or variables into a number of specified groups. Also computes the best split selection estimator (for low-dimensional data) as defined in Christidis, Van Aelst and Zamar (2019) <arXiv:1812.05678>
.
This package provides sparse vectors powered by ALTREP (Alternative Representations for R Objects) that behave like regular vectors, and can thus be used in data frames. Also provides tools to convert between sparse matrices and data frames with sparse columns and functions to interact with sparse vectors.
Corrects the spelling of a given word in English using a modification of Peter Norvig's spell correct algorithm (see <http://norvig.com/spell-correct.html>) which handles up to three edits. The algorithm tries to find the spelling with maximum probability of intended correction out of all possible candidate corrections from the original word.
Making specification curve analysis easy, fast, and pretty. It improves upon existing offerings with additional features and tidyverse integration. Users can easily visualize and evaluate how their models behave under different specifications with a high degree of customization. For a description and applications of specification curve analysis see Simonsohn, Simmons, and Nelson (2020) <doi:10.1038/s41562-020-0912-z>.
This package performs estimation and testing of the treatment effect in a 2-group randomized clinical trial with a quantitative, dichotomous, or right-censored time-to-event endpoint. The method improves efficiency by leveraging baseline predictors of the endpoint. The inverse probability weighting technique of Robins, Rotnitzky, and Zhao (JASA, 1994) is used to provide unbiased estimation when the endpoint is missing at random.
Spatially-aware quality control (QC) software for both spot-level and artifact-level QC in spot-based spatial transcripomics, such as 10x Visium. These methods calculate local (nearest-neighbors) mean and variance of standard QC metrics (library size, unique genes, and mitochondrial percentage) to identify outliers spot and large technical artifacts. Scales linearly with the number of spots and is designed to be used with SpatialExperiment
objects.
SpatialCPie
is an R package designed to facilitate cluster evaluation for spatial transcriptomics data by providing intuitive visualizations that display the relationships between clusters in order to guide the user during cluster identification and other downstream applications. The package is built around a shiny "gadget" to allow the exploration of the data with multiple plots in parallel and an interactive UI. The user can easily toggle between different cluster resolutions in order to choose the most appropriate visual cues.
SpectralTAD
is an R package designed to identify Topologically Associated Domains (TADs) from Hi-C contact matrices. It uses a modified version of spectral clustering that uses a sliding window to quickly detect TADs. The function works on a range of different formats of contact matrices and returns a bed file of TAD coordinates. The method does not require users to adjust any parameters to work and gives them control over the number of hierarchical levels to be returned.
This package provides methods for spatial risk calculations. It offers an efficient approach to determine the sum of all observations within a circle of a certain radius. This might be beneficial for insurers who are required (by a recent European Commission regulation) to determine the maximum value of insured fire risk policies of all buildings that are partly or fully located within a circle of a radius of 200m. See Church (1974) <doi:10.1007/BF01942293> for a description of the problem.
Deconvolution of spatial transcriptomics data based on neural networks and single-cell RNA-seq data. SpatialDDLS
implements a workflow to create neural network models able to make accurate estimates of cell composition of spots from spatial transcriptomics data using deep learning and the meaningful information provided by single-cell RNA-seq data. See Torroja and Sanchez-Cabo (2019) <doi:10.3389/fgene.2019.00978> and Mañanes et al. (2024) <doi:10.1093/bioinformatics/btae072> to get an overview of the method and see some examples of its performance.
Selection of spatially balanced samples. In particular, the implemented sampling designs allow to select probability samples well spread over the population of interest, in any dimension and using any distance function (e.g. Euclidean distance, Manhattan distance). For more details, Pantalone F, Benedetti R, and Piersimoni F (2022) <doi:10.18637/jss.v103.c02>, Benedetti R and Piersimoni F (2017) <doi:10.1002/bimj.201600194>, and Benedetti R and Piersimoni F (2017) <arXiv:1710.09116>
. The implementation has been done in C++ through the use of Rcpp and RcppArmadillo
'.