Analytical methods to locate and characterise ecotones, ecosystems and environmental patchiness along ecological gradients. Methods are implemented for isolated sampling or for space/time series. It includes Detrended Correspondence Analysis (Hill & Gauch (1980) <doi:10.1007/BF00048870>), fuzzy clustering (De Cáceres et al. (2010) <doi:10.1080/01621459.1963.10500845>), biodiversity indices (Jost (2006) <doi:10.1111/j.2006.0030-1299.14714.x>), and network analyses (Epskamp et al. (2012) <doi:10.18637/jss.v048.i04>) - as well as tools to explore the number of clusters in the data. Functions to produce synthetic ecological datasets are also provided.
Fits mixed Poisson regression models (Poisson-Inverse Gaussian or Negative-Binomial) on data sets with response variables being count data. The models can have varying precision parameter, where a linear regression structure (through a link function) is assumed to hold on the precision parameter. The Expectation-Maximization algorithm for both these models (Poisson Inverse Gaussian and Negative Binomial) is an important contribution of this package. Another important feature of this package is the set of functions to perform global and local influence analysis. See Barreto-Souza and Simas (2016) <doi:10.1007/s11222-015-9601-6> for further details.
Implementation of analytical models for estimating streamflow depletion due to groundwater pumping, and other related tools. Functions are broadly split into two groups: (1) analytical streamflow depletion models, which estimate streamflow depletion for a single stream reach resulting from groundwater pumping; and (2) depletion apportionment equations, which distribute estimated streamflow depletion among multiple stream reaches within a stream network. See Zipper et al. (2018) <doi:10.1029/2018WR022707> for more information on depletion apportionment equations and Zipper et al. (2019) <doi:10.1029/2018WR024403> for more information on analytical depletion functions, which combine analytical models and depletion apportionment equations.
Facilitates basic and equation-based analyses of some important soil properties related to soil chemical environment and nutrient availability to plants. Freundlich H (1907). <doi:10.1515/zpch-1907-5723>. Datta SP, Bhadoria PBS (1999). <doi:10.1002%2F%28SICI%291522-2624%28199903%29162%3A2%3C183%3A%3AAID-JPLN183%3E3.0.CO%3B2-A>."Boron adsorption and desorption in some acid soils of West Bengal, India". Langmuir I (1918). <doi:10.1021/ja02242a004> "The adsorption of gases on plane surfaces of glass, mica, and platinum". Khasawneh FE (1971). <doi:10.2136/sssaj1971.03615995003500030029x> "Solution ion activity and plant growth".
This package provides tools for simulating spatially dependent predictors (continuous or binary), which are used to generate scalar outcomes in a (generalized) linear model framework. Continuous predictors are generated using traditional multivariate normal distributions or Gauss Markov random fields with several correlation function approaches (e.g., see Rue (2001) <doi:10.1111/1467-9868.00288> and Furrer and Sain (2010) <doi:10.18637/jss.v036.i10>), while binary predictors are generated using a Boolean model (see Cressie and Wikle (2011, ISBN: 978-0-471-69274-4)). Parameter vectors exhibiting spatial clustering can also be easily specified by the user.
This package provides a set of fast and convenient functions to help conducting accessibility analyses. Given a pre-computed travel cost matrix and a land use dataset (containing the location of jobs, healthcare and population, for example), the package allows one to calculate accessibility levels and accessibility poverty and inequality. The package covers the majority of the most commonly used accessibility measures (such as cumulative opportunities, gravity-based and floating catchment areas methods), as well as the most frequently used inequality and poverty metrics (such as the Palma ratio, the concentration and Theil indices and the FGT family of measures).
Extends the base classes and methods of caret package for integration of base learners. The user can input the number of different base learners, and specify the final learner, along with the train-validation-test data partition split ratio. The predictions on the unseen new data is the resultant of the ensemble meta-learning <https://machinelearningmastery.com/stacking-ensemble-machine-learning-with-python/> of the heterogeneous learners aimed to reduce the generalization error in the predictive models. It significantly lowers the barrier for the practitioners to apply heterogeneous ensemble learning techniques in an amateur fashion to their everyday predictive problems.
When analyzing data, plots are a helpful tool for visualizing data and interpreting statistical models. This package provides a set of simple tools for building plots incrementally, starting with an empty plot region, and adding bars, data points, regression lines, error bars, gradient legends, density distributions in the margins, and even pictures. The package builds further on R graphics by simply combining functions and settings in order to reduce the amount of code to produce for the user. As a result, the package does not use formula input or special syntax, but can be used in combination with default R plot functions.
Computation of a cubic B-spline basis for arbitrary knots. It also provides the 1st and 2nd derivatives, as well as the integral of the basis elements. It is used by the author to fit penalized B-spline models, see e.g. Jullion, A. and Lambert, P. (2006) <doi:10.1016/j.csda.2006.09.027>, Lambert, P. and Eilers, P.H.C. (2009) <doi:10.1016/j.csda.2008.11.022> and, more recently, Lambert, P. (2021) <doi:10.1016/j.csda.2021.107250>. It is inspired by the algorithm developed by de Boor, C. (1977) <doi:10.1137/0714026>.
An R interface to version 0.3 of the ROPTLIB optimization library (see <https://www.math.fsu.edu/~whuang2/> for more information). Optimize real- valued functions over manifolds such as Stiefel, Grassmann, and Symmetric Positive Definite matrices. For details see Martin et. al. (2020) <doi:10.18637/jss.v093.i01>. Note that the optional ldr package used in some of this package's examples can be obtained from either JSS <https://www.jstatsoft.org/index.php/jss/article/view/v061i03/2886> or from the CRAN archives <https://cran.r-project.org/src/contrib/Archive/ldr/ldr_1.3.3.tar.gz>.
Spike and slab regression with a variety of residual error distributions corresponding to Gaussian, Student T, probit, logit, SVM, and a few others. Spike and slab regression is Bayesian regression with prior distributions containing a point mass at zero. The posterior updates the amount of mass on this point, leading to a posterior distribution that is actually sparse, in the sense that if you sample from it many coefficients are actually zeros. Sampling from this posterior distribution is an elegant way to handle Bayesian variable selection and model averaging. See <DOI:10.1504/IJMMNO.2014.059942> for an explanation of the Gaussian case.
NEON data packages can be accessed through the NEON Data Portal <https://www.neonscience.org> or through the NEON Data API (see <https://data.neonscience.org/data-api> for documentation). Data delivered from the Data Portal are provided as monthly zip files packaged within a parent zip file, while individual files can be accessed from the API. This package provides tools that aid in discovering, downloading, and reformatting data prior to use in analyses. This includes downloading data via the API, merging data tables by type, and converting formats. For more information, see the readme file at <https://github.com/NEONScience/NEON-utilities>.
Gain seamless access to origin-destination (OD) data from the Spanish Ministry of Transport, hosted at <https://www.transportes.gob.es/ministerio/proyectos-singulares/estudios-de-movilidad-con-big-data/opendata-movilidad>. This package simplifies the management of these large datasets by providing tools to download zone boundaries, handle associated origin-destination data, and process it efficiently with the duckdb database interface. Local caching minimizes repeated downloads, streamlining workflows for researchers and analysts. Extensive documentation is available at <https://ropenspain.github.io/spanishoddata/index.html>, offering guides on creating static and dynamic mobility flow visualizations and transforming large datasets into analysis-ready formats.
Set of tools to compute metrics and indices for climate analysis. The package provides functions to compute extreme indices, evaluate the agreement between models and combine theses models into an ensemble. Multi-model time series of climate indices can be computed either after averaging the 2-D fields from different models provided they share a common grid or by combining time series computed on the model native grid. Indices can be assigned weights and/or combined to construct new indices. The package makes use of some of the methods described in: N. Manubens et al. (2018) <doi:10.1016/j.envsoft.2018.01.018>.
Fit calibrations curves for clinical prediction models and calculate several associated metrics (Eavg, E50, E90, Emax). Ideally predicted probabilities from a prediction model should align with observed probabilities. Calibration curves relate predicted probabilities (or a transformation thereof) to observed outcomes via a flexible non-linear smoothing function. pmcalibration allows users to choose between several smoothers (regression splines, generalized additive models/GAMs, lowess, loess). Both binary and time-to-event outcomes are supported. See Van Calster et al. (2016) <doi:10.1016/j.jclinepi.2015.12.005>; Austin and Steyerberg (2019) <doi:10.1002/sim.8281>; Austin et al. (2020) <doi:10.1002/sim.8570>.
This package provides a tool to calculate Cardiovascular Risk Scores in large data frames as published in Perez-Vicencio, et al (2024) <doi:10.1136/openhrt-2024-002755>. Cardiovascular risk scores are statistical tools used to assess an individual's likelihood of developing a cardiovascular disease based on various risk factors, such as age, gender, blood pressure, cholesterol levels, and smoking. Here we bring together the six most commonly used in the emergency department. Using RiskScorescvd', you can calculate all the risk scores in an extended dataset in seconds. PCE (ASCVD) described in Goff, et al (2013) <doi:10.1161/01.cir.0000437741.48606.98>. EDACS described in Mark DG, et al (2016) <doi:10.1016/j.jacc.2017.11.064>. GRACE described in Fox KA, et al (2006) <doi:10.1136/bmj.38985.646481.55>. HEART is described in Mahler SA, et al (2017) <doi:10.1016/j.clinbiochem.2017.01.003>. SCORE2/OP described in SCORE2 working group and ESC Cardiovascular risk collaboration (2021) <doi:10.1093/eurheartj/ehab309>. TIMI described in Antman EM, et al (2000) <doi:10.1001/jama.284.7.835>. SCORE2-Diabetes described in SCORE2-Diabetes working group and ESC Cardiovascular risk collaboration (2023) <doi:10.1093/eurheartj/ehab260>. SCORE2/OP with CKD add-on described in Kunihiro M et al (2022) <doi:10.1093/eurjpc/zwac176>.
We implement causal decomposition analysis using methods proposed by Park, Lee, and Qin (2022) and Park, Kang, and Lee (2023), which provide researchers with multiple-mediator imputation, single-mediator imputation, and product-of-coefficients regression approaches to estimate the initial disparity, disparity reduction, and disparity remaining (<doi:10.1177/00491241211067516>; <doi:10.1177/00811750231183711>). We also implement sensitivity analysis for causal decomposition using R-squared values as sensitivity parameters (Park, Kang, Lee, and Ma, 2023 <doi:10.1515/jci-2022-0031>). Finally, we include individualized causal decomposition and sensitivity analyses proposed by Park, Kang, and Lee (2025+) <doi:10.48550/arXiv.2506.19010>.
Allows to detect spatial clusters of abnormal values on multivariate or functional data (Frévent et al. (2022) <doi:10.32614/RJ-2022-045>). See also: Frévent et al. (2023) <doi:10.1093/jrsssc/qlad017>, Smida et al. (2022) <doi:10.1016/j.csda.2021.107378>, Frévent et al. (2021) <doi:10.1016/j.spasta.2021.100550>. Cucala et al. (2019) <doi:10.1016/j.spasta.2018.10.002>, Cucala et al. (2017) <doi:10.1016/j.spasta.2017.06.001>, Jung and Cho (2015) <doi:10.1186/s12942-015-0024-6>, Kulldorff et al. (2009) <doi:10.1186/1476-072X-8-58>.
This package contains tools for exploring Hardy-Weinberg equilibrium for diallelic genetic marker data. All classical tests (chi-square, exact, likelihood-ratio and permutation tests) for Hardy-Weinberg equilibrium are included in the package, as well as functions for power computation and for the simulation of marker data under equilibrium and disequilibrium. Routines for dealing with markers on the X-chromosome are included. Functions for testing equilibrium in the presence of missing data by using multiple imputation are also provided. Implements several graphics for exploring the equilibrium status of a large set of diallelic markers: ternary plots with acceptance regions, log-ratio plots and Q-Q plots.
Offering enhanced statistical power compared to traditional hypothesis testing methods, informative hypothesis testing allows researchers to explicitly model their expectations regarding the relationships among parameters. An important software tool for this framework is restriktor'. The mmirestriktor package provides shiny web applications to implement some of the basic functionality of restriktor'. The mmirestriktor() function launches a shiny application for fitting and analyzing models with constraints. The FbarCards() function launches a card game application which can help build intuition about informative hypothesis testing. The iht_interpreter() helps interpret informative hypothesis testing results based on guidelines in Vanbrabant and Rosseel (2020) <doi:10.4324/9780429273872-14>.
Distance multivariance is a measure of dependence which can be used to detect and quantify dependence of arbitrarily many random vectors. The necessary functions are implemented in this packages and examples are given. It includes: distance multivariance, distance multicorrelation, dependence structure detection, tests of independence and copula versions of distance multivariance based on the Monte Carlo empirical transform. Detailed references are given in the package description, as starting point for the theoretic background we refer to: B. Böttcher, Dependence and Dependence Structures: Estimation and Visualization Using the Unifying Concept of Distance Multivariance. Open Statistics, Vol. 1, No. 1 (2020), <doi:10.1515/stat-2020-0001>.
The computational complexity of the implemented algorithm for Kendall's correlation is O(n log(n)), which is faster than the base R implementation with a computational complexity of O(n^2). For small vectors (i.e., less than 100 observations), the time difference is negligible. However, for larger vectors, the speed difference can be substantial and the numerical difference is minimal. The references are Knight (1966) <doi:10.2307/2282833>, Abrevaya (1999) <doi:10.1016/S0165-1765(98)00255-9>, Christensen (2005) <doi:10.1007/BF02736122> and Emara (2024) <https://learningcpp.org/>. This implementation is described in Vargas Sepulveda (2025) <doi:10.1371/journal.pone.0326090>.
Life expectancy is highly correlated over time among countries and between males and females. These associations can be used to improve forecasts. Here we have implemented a method for forecasting female life expectancy based on analysis of the gap between female life expectancy in a country compared with the record level of female life expectancy in the world. Second, to forecast male life expectancy, the gap between male life expectancy and female life expectancy in a country is analysed. We named this method the Double-Gap model. For a detailed description of the method see Pascariu et al. (2018). <doi:10.1016/j.insmatheco.2017.09.011>.
Hierarchical multistate models are considered to perform the analysis of independent/clustered semi-competing risks data. The package allows to choose the specification for model components from a range of options giving users substantial flexibility, including: accelerated failure time or proportional hazards regression models; parametric or non-parametric specifications for baseline survival functions and cluster-specific random effects distribution; a Markov or semi-Markov specification for terminal event following non-terminal event. While estimation is mainly performed within the Bayesian paradigm, the package also provides the maximum likelihood estimation approach for several parametric models. The package also includes functions for univariate survival analysis as complementary analysis tools.