Spatial forecast verification refers to verifying weather forecasts when the verification set (forecast and observations) is on a spatial field, usually a high-resolution gridded spatial field. Most of the functions here require the forecast and observed fields to be gridded and on the same grid. For a thorough review of most of the methods in this package, please see Gilleland et al. (2009) <doi: 10.1175/2009WAF2222269.1> and for a tutorial on some of the main functions available here, see Gilleland (2022) <doi: 10.5065/4px3-5a05>.
SpotClean
is a computational method to adjust for spot swapping in spatial transcriptomics data. Recent spatial transcriptomics experiments utilize slides containing thousands of spots with spot-specific barcodes that bind mRNA
. Ideally, unique molecular identifiers at a spot measure spot-specific expression, but this is often not the case due to bleed from nearby spots, an artifact we refer to as spot swapping. SpotClean
is able to estimate the contamination rate in observed data and decontaminate the spot swapping effect, thus increase the sensitivity and precision of downstream analyses.
Some R functions, such as optim()
, require a function its gradient passed as separate arguments. When these are expensive to calculate it may be much faster to calculate the function (fn) and gradient (gr) together since they often share many calculations (chain rule). This package allows the user to pass in a single function that returns both the function and gradient, then splits (hence splitfngr') them so the results can be accessed separately. The functions provided allow this to be done with any number of functions/values, not just for functions and gradients.
This package provides interface to the Spectator Earth API <https://api.spectator.earth/>, mainly for obtaining the acquisition plans and satellite overpasses for Sentinel-1, Sentinel-2, Landsat-8 and Landsat-9 satellites. Current position and trajectory can also be obtained for a much larger set of satellites. It is also possible to search the archive for available images over the area of interest for a given (past) period, get the URL links to download the whole image tiles, or alternatively to download the image for just the area of interest based on selected spectral bands.
This package implements the algorithm described in Barron, M., and Li, J. (Not yet published). This algorithm clusters samples from multiple ordered populations, links the clusters across the conditions and identifies marker genes for these changes. The package was designed for scRNA-Seq
data but is also applicable to many other data types, just replace cells with samples and genes with variables. The package also contains functions for estimating the parameters for SparseMDC
as outlined in the paper. We recommend that users further select their marker genes using the magnitude of the cluster centers.
This package creates a wrapper for the SuiteSparse
routines that execute the Takahashi equations. These equations compute the elements of the inverse of a sparse matrix at locations where the its Cholesky factor is structurally non-zero. The resulting matrix is known as a sparse inverse subset. Some helper functions are also implemented. Support for spam matrices is currently limited and will be implemented in the future. See Rue and Martino (2007) <doi:10.1016/j.jspi.2006.07.016> and Zammit-Mangion and Rougier (2018) <doi:10.1016/j.csda.2018.02.001> for the application of these equations to statistics.
Hail is an open-source, general-purpose, python based data analysis tool with additional data types and methods for working with genomic data, see <https://hail.is/>. Hail is built to scale and has first-class support for multi-dimensional structured data, like the genomic data in a genome-wide association study (GWAS). Hail is exposed as a python library, using primitives for distributed queries and linear algebra implemented in scala', spark', and increasingly C++'. The sparkhail is an R extension using sparklyr package. The idea is to help R users to use hail functionalities with the well-know tidyverse syntax, see <https://www.tidyverse.org/>.
This package provides elastic net penalized maximum likelihood estimator for structural equation models (SEM). The package implements `lasso` and `elastic net` (l1/l2) penalized SEM and estimates the model parameters with an efficient block coordinate ascent algorithm that maximizes the penalized likelihood of the SEM. Hyperparameters are inferred from cross-validation (CV). A Stability Selection (STS) function is also available to provide accurate causal effect selection. The software achieves high accuracy performance through a `Network Generative Pre-trained Transformer` (Network GPT) Framework with two steps: 1) pre-trains the model to generate a complete (fully connected) graph; and 2) uses the complete graph as the initial state to fit the `elastic net` penalized SEM.
Sparse principal component analysis (SPCA) attempts to find sparse weight vectors (loadings), i.e., a weight vector with only a few active (nonzero) values. This approach provides better interpretability for the principal components in high-dimensional data settings. This is, because the principal components are formed as a linear combination of only a few of the original variables. This package provides efficient routines to compute SPCA. Specifically, a variable projection solver is used to compute the sparse solution. In addition, a fast randomized accelerated SPCA routine and a robust SPCA routine is provided. Robust SPCA allows to capture grossly corrupted entries in the data. The methods are discussed in detail by N. Benjamin Erichson et al. (2018) <arXiv:1804.00341>
.
The analysis and visualization of alternative splicing (AS) events from RNA sequencing data remains challenging. SpliceWiz
is a user-friendly and performance-optimized R package for AS analysis, by processing alignment BAM files to quantify read counts across splice junctions, IRFinder-based intron retention quantitation, and supports novel splicing event identification. We introduce a novel visualization for AS using normalized coverage, thereby allowing visualization of differential AS across conditions. SpliceWiz
features a shiny-based GUI facilitating interactive data exploration of results including gene ontology enrichment. It is performance optimized with multi-threaded processing of BAM files and a new COV file format for fast recall of sequencing coverage. Overall, SpliceWiz
streamlines AS analysis, enabling reliable identification of functionally relevant AS events for further characterization.
`SPOTlight`provides a method to deconvolute spatial transcriptomics spots using a seeded NMF approach along with visualization tools to assess the results. Spatially resolved gene expression profiles are key to understand tissue organization and function. However, novel spatial transcriptomics (ST) profiling techniques lack single-cell resolution and require a combination with single-cell RNA sequencing (scRNA-seq
) information to deconvolute the spatially indexed datasets. Leveraging the strengths of both data types, we developed SPOTlight, a computational tool that enables the integration of ST with scRNA-seq
data to infer the location of cell types and states within a complex tissue. SPOTlight is centered around a seeded non-negative matrix factorization (NMF) regression, initialized using cell-type marker genes and non-negative least squares (NNLS) to subsequently deconvolute ST capture locations (spots).
Implementation of various estimation methods for dynamic factor models (DFMs) including principal components analysis (PCA) Stock and Watson (2002) <doi:10.1198/016214502388618960>, 2Stage Giannone et al. (2008) <doi:10.1016/j.jmoneco.2008.05.010>, expectation-maximisation (EM) Banbura and Modugno (2014) <doi:10.1002/jae.2306>, and the novel EM-sparse approach for sparse DFMs Mosley et al. (2023) <arXiv:2303.11892>
. Options to use classic multivariate Kalman filter and smoother (KFS) equations from Shumway and Stoffer (1982) <doi:10.1111/j.1467-9892.1982.tb00349.x> or fast univariate KFS equations from Koopman and Durbin (2000) <doi:10.1111/1467-9892.00186>, and options for independent and identically distributed (IID) white noise or auto-regressive (AR(1)) idiosyncratic errors. Algorithms coded in C++ and linked to R via RcppArmadillo
'.
Social network analysis is becoming commonplace in many social science disciplines, but access to useful network data, especially among marginalized populations, still remains a formidable challenge. This package mitigates that problem by providing tools to simulate spatial Bernoulli networks as proposed in Carter T. Butts (2002, ISBN:978-0-493-72676-2), "Spatial models of large-scale interpersonal networks." Using this package, network analysts can simulate a spatial point process or sequence with a given number of nodes inside a geographical boundary and estimate the probability of a tie formation between all node pairs. When simulating a network, an analyst can choose between five spatial interaction functions. The package also enables quick comparison of summary statistics for simulated networks and provides simple to use plotting methods for its classes that return plots which can be further refined with the ggplot2 package.
Many complex diseases are known to be affected by the interactions between genetic variants and environmental exposures beyond the main genetic and environmental effects. Existing Bayesian methods for gene-environment (GÃ E) interaction studies are challenged by the high-dimensional nature of the study and the complexity of environmental influences. We have developed a novel and powerful semi-parametric Bayesian variable selection method that can accommodate linear and nonlinear GÃ E interactions simultaneously (Ren et al. (2020) <doi:10.1002/sim.8434>). Furthermore, the proposed method can conduct structural identification by distinguishing nonlinear interactions from main effects only case within Bayesian framework. Spike-and-slab priors are incorporated on both individual and group level to shrink coefficients corresponding to irrelevant main and interaction effects to zero exactly. The Markov chain Monte Carlo algorithms of the proposed and alternative methods are efficiently implemented in C++.
This package performs non-parametric tests of parametric specifications. Five tests are available. Specific bandwidth and kernel methods can be chosen along with many other options. Allows parallel computing to quickly compute p-values based on the bootstrap. Methods implemented in the package are H.J. Bierens (1982) <doi:10.1016/0304-4076(82)90105-1>, J.C. Escanciano (2006) <doi:10.1017/S0266466606060506>, P.L. Gozalo (1997) <doi:10.1016/S0304-4076(97)86571-2>, P. Lavergne and V. Patilea (2008) <doi:10.1016/j.jeconom.2007.08.014>, P. Lavergne and V. Patilea (2012) <doi:10.1198/jbes.2011.07152>, J.H. Stock and M.W. Watson (2006) <doi:10.1111/j.1538-4616.2007.00014.x>, C.F.J. Wu (1986) <doi:10.1214/aos/1176350142>, J. Yin, Z. Geng, R. Li, H. Wang (2010) <https://www.jstor.org/stable/24309002> and J.X. Zheng (1996) <doi:10.1016/0304-4076(95)01760-7>.
Perform variable selection for the spatial Poisson regression model under the adaptive elastic net penalty. Spatial count data with covariates is the input. We use a spatial Poisson regression model to link the spatial counts and covariates. For maximization of the likelihood under adaptive elastic net penalty, we implemented the penalized quasi-likelihood (PQL) and the approximate penalized loglikelihood (APL) methods. The proposed methods can automatically select important covariates, while adjusting for possible spatial correlations among the responses. More details are available in Xie et al. (2018, <arXiv:1809.06418>
). The package also contains the Lyme disease dataset, which consists of the disease case data from 2006 to 2011, and demographic data and land cover data in Virginia. The Lyme disease case data were collected by the Virginia Department of Health. The demographic data (e.g., population density, median income, and average age) are from the 2010 census. Land cover data were obtained from the Multi-Resolution Land Cover Consortium for 2006.
Automatic generation and selection of spatial predictors for spatial regression with Random Forest. Spatial predictors are surrogates of variables driving the spatial structure of a response variable. The package offers two methods to generate spatial predictors from a distance matrix among training cases: 1) Moran's Eigenvector Maps (MEMs; Dray, Legendre, and Peres-Neto 2006 <DOI:10.1016/j.ecolmodel.2006.02.015>): computed as the eigenvectors of a weighted matrix of distances; 2) RFsp (Hengl et al. <DOI:10.7717/peerj.5518>): columns of the distance matrix used as spatial predictors. Spatial predictors help minimize the spatial autocorrelation of the model residuals and facilitate an honest assessment of the importance scores of the non-spatial predictors. Additionally, functions to reduce multicollinearity, identify relevant variable interactions, tune random forest hyperparameters, assess model transferability via spatial cross-validation, and explore model results via partial dependence curves and interaction surfaces are included in the package. The modelling functions are built around the highly efficient ranger package (Wright and Ziegler 2017 <DOI:10.18637/jss.v077.i01>).
This package provides methods and data for cluster detection and disease mapping.
Various functions for creating spherical coordinate system plots via extensions to rgl.
Graphs (or networks) and graph component calculations for spatial locations in 1D, 2D, 3D etc.
Estimation of functional linear mixed models for irregularly or sparsely sampled data based on functional principal component analysis.
Stores and eases the manipulation of spectra and associated data, with dedicated classes for spatial and soil-related data.
This package implements the method of general semiparametric maximum likelihood estimation for logistic models in case-mother control-mother designs.
SparseGrid
is a package to create sparse grids for numerical integration, based on code from www.sparse-grids.de.