Introduction to some novel accurate hybrid methods of geostatistical and machine learning methods for spatial predictive modelling. It contains two commonly used geostatistical methods, two machine learning methods, four hybrid methods and two averaging methods. For each method, two functions are provided. One function is for assessing the predictive errors and accuracy of the method based on cross-validation. The other one is for generating spatial predictions using the method. For details please see: Li, J., Potter, A., Huang, Z., Daniell, J. J. and Heap, A. (2010) <https://ecat.ga.gov.au/geonetwork/srv/eng/catalog.search#/metadata/71407> Li, J., Heap, A. D., Potter, A., Huang, Z. and Daniell, J. (2011) <doi:10.1016/j.csr.2011.05.015> Li, J., Heap, A. D., Potter, A. and Daniell, J. (2011) <doi:10.1016/j.envsoft.2011.07.004> Li, J., Potter, A., Huang, Z. and Heap, A. (2012) <https://ecat.ga.gov.au/geonetwork/srv/eng/catalog.search#/metadata/74030>.
This package provides tools implementing an automated version of the graphic double integration technique (GDI) for volume implementation, and some other related utilities for paleontological image-analysis. GDI was first employed by Jerison (1973) <ISBN:9780323141086> and Hurlburt (1999) <doi:10.1080/02724634.1999.10011145> and is primarily used for volume or mass estimation of (extinct) animals. The package gdi aims to make this technique as convenient and versatile as possible. The core functions of gdi provide utilities for automatically measuring diameters from digital silhouettes provided as image files and calculating volume via graphic double integration with simple elliptical, superelliptical (following Motani 2001 <doi:10.1666/0094-8373(2001)027%3C0735:EBMFST%3E2.0.CO;2>) or complex cross-sectional models. Additionally, the package provides functions for estimating the center of mass position (COM), the moment of inertia (I) for 3D shapes and the second moment of area (Ix, Iy, Iz) of 2D cross-sections, as well as for visualization of results.
We provide tools to estimate the individualized interval-valued dose rule (I2DR) that maximizes the expected beneficial clinical outcome for each individual and returns an optimal interval-valued dose, by using the jump Q-learning (JQL) method. The jump Q-learning method directly models the conditional mean of the response given the dose level and the baseline covariates via jump penalized least squares regression under the framework of Q learning. We develop a searching algorithm by dynamic programming in order to find the optimal I2DR with the time complexity O(n2) and spatial complexity O(n). To alleviate the effects of misspecification of the Q-function, a residual jump Q-learning is further proposed to estimate the optimal I2DR. The outcome of interest includes the best partition of the entire dosage of interest, the regression coefficients of each partition, and the value function under the estimated I2DR as well as the Wald-type confidence interval of value function constructed through the Bootstrap.
The main goal of this package is drawing the membership function of the fuzzy p-value which is defined as a fuzzy set on the unit interval for three following problems: (1) testing crisp hypotheses based on fuzzy data, see Filzmoser and Viertl (2004) <doi:10.1007/s001840300269>, (2) testing fuzzy hypotheses based on crisp data, see Parchami et al. (2010) <doi:10.1007/s00362-008-0133-4>, and (3) testing fuzzy hypotheses based on fuzzy data, see Parchami et al. (2012) <doi:10.1007/s00362-010-0353-2>. In all cases, the fuzziness of data or / and the fuzziness of the boundary of null fuzzy hypothesis transported via the p-value function and causes to produce the fuzzy p-value. If the p-value is fuzzy, it is more appropriate to consider a fuzzy significance level for the problem. Therefore, the comparison of the fuzzy p-value and the fuzzy significance level is evaluated by a fuzzy ranking method in this package.
This package provides methods and tools for (pre-)processing of metabolomics datasets (i.e. peak matrices), including filtering, normalisation, missing value imputation, scaling, and signal drift and batch effect correction methods. Filtering methods are based on: the fraction of missing values (across samples or features); Relative Standard Deviation (RSD) calculated from the Quality Control (QC) samples; the blank samples. Normalisation methods include Probabilistic Quotient Normalisation (PQN) and normalisation to total signal intensity. A unified user interface for several commonly used missing value imputation algorithms is also provided. Supported methods are: k-nearest neighbours (knn), random forests (rf), Bayesian PCA missing value estimator (bpca), mean or median value of the given feature and a constant small value. The generalised logarithm (glog) transformation algorithm is available to stabilise the variance across low and high intensity mass spectral features. Finally, this package provides an implementation of the Quality Control-Robust Spline Correction (QCRSC) algorithm for signal drift and batch effect correction of mass spectrometry-based datasets.
Incorporating node-level covariates for community detection has gained increasing attention these years. This package provides the function for implementing the novel community detection algorithm known as Network-Adjusted Covariates for Community Detection (NAC), which is designed to detect latent community structure in graphs with node-level information, i.e., covariates. This algorithm can handle models such as the degree-corrected stochastic block model (DCSBM) with covariates. NAC specifically addresses the discrepancy between the community structure inferred from the adjacency information and the community structure inferred from the covariates information. For more detailed information, please refer to the reference paper: Yaofang Hu and Wanjie Wang (2023) <arXiv:2306.15616>. In addition to NAC, this package includes several other existing community detection algorithms that are compared to NAC in the reference paper. These algorithms are Spectral Clustering On Ratios-of Eigenvectors (SCORE), network-based regularized spectral clustering (Net-based), covariate-based spectral clustering (Cov-based), covariate-assisted spectral clustering (CAclustering) and semidefinite programming (SDP).
Mortality rates are typically provided in an abridged format, i.e., by age groups 0, [1, 5], [5, 10]', [10, 15]', and so on. Some applications necessitate a detailed (single) age description. Despite the large number of proposed approaches in the literature, only a few methods ensure great performance at both younger and higher ages. For example, the 6-term Lagrange interpolation function is well suited to mortality interpolation at younger ages (with irregular intervals), but not at older ages. The Karup-King method, on the other hand, performs well at older ages but is not suitable for younger ones. Interested readers can find a full discussion of the two stated methods in the book Shryock, Siegel, and Associates (1993).The Q2q package combines the two methods to allow for the interpolation of mortality rates across all age groups. It begins by implementing each method independently, and then the resulting curves are linked using a 5-age averaged error between the two partial curves.
Implementations of the kernel measure of multi-sample dissimilarity (KMD) between several samples using K-nearest neighbor graphs and minimum spanning trees. The KMD measures the dissimilarity between multiple samples, based on the observations from them. It converges to the population quantity (depending on the kernel) which is between 0 and 1. A small value indicates the multiple samples are from the same distribution, and a large value indicates the corresponding distributions are different. The population quantity is 0 if and only if all distributions are the same, and 1 if and only if all distributions are mutually singular. The package also implements the tests based on KMD for H0: the M distributions are equal against H1: not all the distributions are equal. Both permutation test and asymptotic test are available. These tests are consistent against all alternatives where at least two samples have different distributions. For more details on KMD and the associated tests, see Huang, Z. and B. Sen (2022) <arXiv:2210.00634>.
The kernelSmoothing() function allows you to square and smooth geolocated data. It calculates a classical kernel smoothing (conservative) or a geographically weighted median. There are four major call modes of the function. The first call mode is kernelSmoothing(obs, epsg, cellsize, bandwidth) for a classical kernel smoothing and automatic grid. The second call mode is kernelSmoothing(obs, epsg, cellsize, bandwidth, quantiles) for a geographically weighted median and automatic grid. The third call mode is kernelSmoothing(obs, epsg, cellsize, bandwidth, centroids) for a classical kernel smoothing and user grid. The fourth call mode is kernelSmoothing(obs, epsg, cellsize, bandwidth, quantiles, centroids) for a geographically weighted median and user grid. Geographically weighted summary statistics : a framework for localised exploratory data analysis, C.Brunsdon & al., in Computers, Environment and Urban Systems C.Brunsdon & al. (2002) <doi:10.1016/S0198-9715(01)00009-6>, Statistical Analysis of Spatial and Spatio-Temporal Point Patterns, Third Edition, Diggle, pp. 83-86, (2003) <doi:10.1080/13658816.2014.937718>.
Developed for computing the probability density function, computing the cumulative distribution function, computing the quantile function, random generation, drawing q-q plot, and estimating the parameters of 24 G-family of statistical distributions via the maximum product spacing approach introduced in <https://www.jstor.org/stable/2345411>. The set of families contains: beta G distribution, beta exponential G distribution, beta extended G distribution, exponentiated G distribution, exponentiated exponential Poisson G distribution, exponentiated generalized G distribution, exponentiated Kumaraswamy G distribution, gamma type I G distribution, gamma type II G distribution, gamma uniform G distribution, gamma-X generated of log-logistic family of G distribution, gamma-X family of modified beta exponential G distribution, geometric exponential Poisson G distribution, generalized beta G distribution, generalized transmuted G distribution, Kumaraswamy G distribution, log gamma type I G distribution, log gamma type II G distribution, Marshall Olkin G distribution, Marshall Olkin Kumaraswamy G distribution, modified beta G distribution, odd log-logistic G distribution, truncated-exponential skew-symmetric G distribution, and Weibull G distribution.
Calculate magnetic field at a given location and time according to the World Magnetic Model (WMM). Both the main field and secular variation components are returned. This functionality is useful for physicists and geophysicists who need orthogonal components from WMM. Currently, this package supports annualized time inputs between 2000 and 2025. If desired, users can specify which WMM version to use, e.g., the original WMM2015 release or the recent out-of-cycle WMM2015 release. Methods used to implement WMM, including the Gauss coefficients for each release, are described in the following publications: Chulliat et al (2020) <doi:10.25923/ytk1-yx35>, Chulliat et al (2019) <doi:10.25921/xhr3-0t19>, Chulliat et al (2015) <doi:10.7289/V5TB14V7>, Maus et al (2010) <https://www.ngdc.noaa.gov/geomag/WMM/data/WMMReports/WMM2010_Report.pdf>, McLean et al (2004) <https://www.ngdc.noaa.gov/geomag/WMM/data/WMMReports/TRWMM_2005.pdf>, and Macmillian et al (2000) <https://www.ngdc.noaa.gov/geomag/WMM/data/WMMReports/wmm2000.pdf>.
An implementation of the Similarity-First Search algorithm (SFS), a combinatorial algorithm which can be used to solve the seriation problem and to recognize some structured weighted graphs. The SFS algorithm represents a generalization to weighted graphs of the graph search algorithm Lexicographic Breadth-First Search (Lex-BFS), a variant of Breadth-First Search. The SFS algorithm reduces to Lex-BFS when applied to binary matrices (or, equivalently, unweighted graphs). Hence this library can be also considered for Lex-BFS applications such as recognition of graph classes like chordal or unit interval graphs. In fact, the SFS seriation algorithm implemented in this package is a multisweep algorithm, which consists in repeating a finite number of SFS iterations (at most n sweeps for a matrix of size n). If the data matrix has a Robinsonian structure, then the ranking returned by the multistep SFS algorithm is a Robinson ordering of the input matrix. Otherwise the algorithm can be used as a heuristic to return a ranking partially satisfying the Robinson property.
This package provides a toolbox to derive flexible cutoffs for fit indices in Covariance-based Structural Equation Modeling based on the paper by Niemand & Mai (2018) <doi:10.1007/s11747-018-0602-9>. Flexible cutoffs are an alternative to fixed cutoffs - rules-of-thumb - regarding an appropriate cutoff for fit indices such as CFI or SRMR'. It has been demonstrated that these flexible cutoffs perform better than fixed cutoffs in grey areas where misspecification is not easy to detect. The package provides an alternative to the tool at <https://flexiblecutoffs.org> as it allows to tailor flexible cutoffs to a given dataset and model, which is so far not available in the tool. The package simulates fit indices based on a given dataset and model and then estimates the flexible cutoffs. Some useful functions, e.g., to determine the GoF- or BoF-nature of a fit index, are provided. So far, additional options for a relative use (is a model better than another?) are provided in an exploratory manner.
This package provides a framework for analysing state sequences with probabilistic suffix trees (PST), the construction that stores variable length Markov chains (VLMC). Besides functions for learning and optimizing VLMC models, the PST library includes many additional tools to analyse sequence data with these models: visualization tools, functions for sequence prediction and artificial sequences generation, as well as for context and pattern mining. The package is specifically adapted to the field of social sciences by allowing to learn VLMC models from sets of individual sequences possibly containing missing values, and by accounting for case weights. The library also allows to compute probabilistic divergence between two models, and to fit segmented VLMC, where sub-models fitted to distinct strata of the learning sample are stored in a single PST. This software results from research work executed within the framework of the Swiss National Centre of Competence in Research LIVES, which is financed by the Swiss National Science Foundation. The authors are grateful to the Swiss National Science Foundation for its financial support.
Basic infrastructure and several algorithms for 1d-4d bin packing problem. This package provides a set of c-level classes and solvers for 1d-4d bin packing problem, and an r-level solver for 4d bin packing problem, which is a wrapper over the c-level 4d bin packing problem solver. The 4d bin packing problem solver aims to solve bin packing problem, a.k.a container loading problem, with an additional constraint on weight. Given a set of rectangular-shaped items, and a set of rectangular-shaped bins with weight limit, the solver looks for an orthogonal packing solution such that minimizes the number of bins and maximize volume utilization. Each rectangular-shaped item i = 1, .. , n is characterized by length l_i, depth d_i, height h_i, and weight w_i, and each rectangular-shaped bin j = 1, .. , m is specified similarly by length l_j, depth d_j, height h_j, and weight limit w_j. The item can be rotated into any orthogonal direction, and no further restrictions implied.
This package provides efficient Markov chain Monte Carlo (MCMC) algorithms for dynamic shrinkage processes, which extend global-local shrinkage priors to the time series setting by allowing shrinkage to depend on its own past. These priors yield locally adaptive estimates, useful for time series and regression functions with irregular features. The package includes full MCMC implementations for trend filtering using dynamic shrinkage on signal differences, producing locally constant or linear fits with adaptive credible bands. Also included are models with static shrinkage and normal-inverse-Gamma priors for comparison. Additional tools cover dynamic regression with time-varying coefficients and B-spline models with shrinkage on basis differences, allowing for flexible curve-fitting with unequally spaced data. Some support for heteroscedastic errors, outlier detection, and change point estimation. Methods in this package are described in Kowal et al. (2019) <doi:10.1111/rssb.12325>, Wu et al. (2024) <doi:10.1080/07350015.2024.2362269>, Schafer and Matteson (2024) <doi:10.1080/00401706.2024.2407316>, and Cho and Matteson (2024) <doi:10.48550/arXiv.2408.11315>.
This package provides propensity score weighting methods to control for confounding in causal inference with dichotomous treatments and continuous/binary outcomes. It includes the following functional modules: (1) visualization of the propensity score distribution in both treatment groups with mirror histogram, (2) covariate balance diagnosis, (3) propensity score model specification test, (4) weighted estimation of treatment effect, and (5) augmented estimation of treatment effect with outcome regression. The weighting methods include the inverse probability weight (IPW) for estimating the average treatment effect (ATE), the IPW for average treatment effect of the treated (ATT), the IPW for the average treatment effect of the controls (ATC), the matching weight (MW), the overlap weight (OVERLAP), and the trapezoidal weight (TRAPEZOIDAL). Sandwich variance estimation is provided to adjust for the sampling variability of the estimated propensity score. These methods are discussed by Hirano et al (2003) <DOI:10.1111/1468-0262.00442>, Lunceford and Davidian (2004) <DOI:10.1002/sim.1903>, Li and Greene (2013) <DOI:10.1515/ijb-2012-0030>, and Li et al (2016) <DOI:10.1080/01621459.2016.1260466>.
Bayes factors represent the ratio of probabilities assigned to data by competing scientific hypotheses. However, one drawback of Bayes factors is their dependence on prior specifications that define null and alternative hypotheses. Additionally, there are challenges in their computation. To address these issues, we define Bayes factor functions (BFFs) directly from common test statistics. BFFs express Bayes factors as a function of the prior densities used to define the alternative hypotheses. These prior densities are centered on standardized effects, which serve as indices for the BFF. Therefore, BFFs offer a summary of evidence in favor of alternative hypotheses that correspond to a range of scientifically interesting effect sizes. Such summaries remove the need for arbitrary thresholds to determine "statistical significance." BFFs are available in closed form and can be easily computed from z, t, chi-squared, and F statistics. They depend on hyperparameters "r" and "tau^2", which determine the shape and scale of the prior distributions defining the alternative hypotheses. Plots of BFFs versus effect size provide informative summaries of hypothesis tests that can be easily aggregated across studies.
This package provides statistical tests and algorithms for the detection of change points in time series and point processes - particularly for changes in the mean in time series and for changes in the rate and in the variance in point processes. References - Michael Messer, Marietta Kirchner, Julia Schiemann, Jochen Roeper, Ralph Neininger and Gaby Schneider (2014), A multiple filter test for the detection of rate changes in renewal processes with varying variance <doi:10.1214/14-AOAS782>. Stefan Albert, Michael Messer, Julia Schiemann, Jochen Roeper, Gaby Schneider (2017), Multi-scale detection of variance changes in renewal processes in the presence of rate change points <doi:10.1111/jtsa.12254>. Michael Messer, Kaue M. Costa, Jochen Roeper and Gaby Schneider (2017), Multi-scale detection of rate changes in spike trains with weak dependencies <doi:10.1007/s10827-016-0635-3>. Michael Messer, Stefan Albert and Gaby Schneider (2018), The multiple filter test for change point detection in time series <doi:10.1007/s00184-018-0672-1>. Michael Messer, Hendrik Backhaus, Albrecht Stroh and Gaby Schneider (2019+) Peak detection in time series.
Artificial selection through selective breeding is an efficient way to induce changes in traits of interest in experimental populations. This package (sra) provides a set of tools to analyse artificial-selection response datasets. The data typically feature for several generations the average value of a trait in a population, the variance of the trait, the population size and the average value of the parents that were chosen to breed. Sra implements two families of models aiming at describing the dynamics of the genetic architecture of the trait during the selection response. The first family relies on purely descriptive (phenomenological) models, based on an autoregressive framework. The second family provides different mechanistic models, accounting e.g. for inbreeding, mutations, genetic and environmental canalization, or epistasis. The parameters underlying the dynamics of the time series are estimated by maximum likelihood. The sra package thus provides (i) a wrapper for the R functions mle() and optim() aiming at fitting in a convenient way a predetermined set of models, and (ii) some functions to plot and analyze the output of the models.
Fits the Bayesian multinomial probit model via Markov chain Monte Carlo. The multinomial probit model is often used to analyze the discrete choices made by individuals recorded in survey data. Examples where the multinomial probit model may be useful include the analysis of product choice by consumers in market research and the analysis of candidate or party choice by voters in electoral studies. The MNP package can also fit the model with different choice sets for each individual, and complete or partial individual choice orderings of the available alternatives from the choice set. The estimation is based on the efficient marginal data augmentation algorithm that is developed by Imai and van Dyk (2005). "A Bayesian Analysis of the Multinomial Probit Model Using the Data Augmentation." Journal of Econometrics, Vol. 124, No. 2 (February), pp. 311-334. <doi:10.1016/j.jeconom.2004.02.002> Detailed examples are given in Imai and van Dyk (2005). "MNP: R Package for Fitting the Multinomial Probit Model." Journal of Statistical Software, Vol. 14, No. 3 (May), pp. 1-32. <doi:10.18637/jss.v014.i03>.
An implementation of a collection of tools for exact Bayesian inference with discrete times series. This package contains functions that can be used for prediction, model selection, estimation, segmentation/change-point detection and other statistical tasks. Specifically, the functions provided can be used for the exact computation of the prior predictive likelihood of the data, for the identification of the a posteriori most likely (MAP) variable-memory Markov models, for calculating the exact posterior probabilities and the AIC and BIC scores of these models, for prediction with respect to log-loss and 0-1 loss and segmentation/change-point detection. Example data sets from finance, genetics, animal communication and meteorology are also provided. Detailed descriptions of the underlying theory and algorithms can be found in [Kontoyiannis et al. Bayesian Context Trees: Modelling and exact inference for discrete time series. Journal of the Royal Statistical Society: Series B (Statistical Methodology), April 2022. Available at: <arXiv:2007.14900> [stat.ME], July 2020] and [Lungu et al. Change-point Detection and Segmentation of Discrete Data using Bayesian Context Trees <arXiv:2203.04341> [stat.ME], March 2022].
Stochastic Newton Sampler (SNS) is a Metropolis-Hastings-based, Markov Chain Monte Carlo sampler for twice differentiable, log-concave probability density functions (PDFs) where the proposal density function is a multivariate Gaussian resulting from a second-order Taylor-series expansion of log-density around the current point. The mean of the Gaussian proposal is the full Newton-Raphson step from the current point. A Boolean flag allows for switching from SNS to Newton-Raphson optimization (by choosing the mean of proposal function as next point). This can be used during burn-in to get close to the mode of the PDF (which is unique due to concavity). For high-dimensional densities, mixing can be improved via state space partitioning strategy, in which SNS is applied to disjoint subsets of state space, wrapped in a Gibbs cycle. Numerical differentiation is available when analytical expressions for gradient and Hessian are not available. Facilities for validation and numerical differentiation of log-density are provided. Note: Formerly available versions of the MfUSampler can be obtained from the archive <https://cran.r-project.org/src/contrib/Archive/MfUSampler/>.
Computes various geospatial indices of socioeconomic deprivation and disparity in the United States. Some indices are considered "spatial" because they consider the values of neighboring (i.e., adjacent) census geographies in their computation, while other indices are "aspatial" because they only consider the value within each census geography. Two types of aspatial neighborhood deprivation indices (NDI) are available: including: (1) based on Messer et al. (2006) <doi:10.1007/s11524-006-9094-x> and (2) based on Andrews et al. (2020) <doi:10.1080/17445647.2020.1750066> and Slotman et al. (2022) <doi:10.1016/j.dib.2022.108002> who use variables chosen by Roux and Mair (2010) <doi:10.1111/j.1749-6632.2009.05333.x>. Both are a decomposition of multiple demographic characteristics from the U.S. Census Bureau American Community Survey 5-year estimates (ACS-5; 2006-2010 onward). Using data from the ACS-5 (2005-2009 onward), the package can also compute indices of racial or ethnic residential segregation, including but limited to those discussed in Massey & Denton (1988) <doi:10.1093/sf/67.2.281>, and additional indices of socioeconomic disparity.