An implementation of the Ordered Forest estimator as developed in Lechner & Okasa (2019) <arXiv:1907.02436>
. The Ordered Forest flexibly estimates the conditional probabilities of models with ordered categorical outcomes (so-called ordered choice models). Additionally to common machine learning algorithms the orf package provides functions for estimating marginal effects as well as statistical inference thereof and thus provides similar output as in standard econometric models for ordered choice. The core forest algorithm relies on the fast C++ forest implementation from the ranger package (Wright & Ziegler, 2017) <arXiv:1508.04409>
.
Numerical derivatives through finite-difference approximations can be calculated using the pnd package with parallel capabilities and optimal step-size selection to improve accuracy. These functions facilitate efficient computation of derivatives, gradients, Jacobians, and Hessians, allowing for more evaluations to reduce the mathematical and machine errors. Designed for compatibility with the numDeriv
package, which has not received updates in several years, it introduces advanced features such as computing derivatives of arbitrary order, improving the accuracy of Hessian approximations by avoiding repeated differencing, and parallelising slow functions on Windows, Mac, and Linux.
This package provides tools for Topological Data Analysis. The package focuses on statistical analysis of persistent homology and density clustering. For that, this package provides an R interface for the efficient algorithms of the C++ libraries GUDHI <https://project.inria.fr/gudhi/software/>, Dionysus <https://www.mrzv.org/software/dionysus/>, and PHAT <https://bitbucket.org/phat-code/phat/>. This package also implements methods from Fasy et al. (2014) <doi:10.1214/14-AOS1252> and Chazal et al. (2015) <doi:10.20382/jocg.v6i2a8> for analyzing the statistical significance of persistent homology features.
Plant ecologists often need to collect "traits" data about plant species which are often scattered among various databases: TR8 contains a set of tools which take care of automatically retrieving some of those functional traits data for plant species from publicly available databases (The Ecological Flora of the British Isles, LEDA traitbase, Ellenberg values for Italian Flora, Mycorrhizal intensity databases, BROT, PLANTS, Jepson Flora Project). The TR8 name, inspired by "car plates" jokes, was chosen since it both reminds of the main object of the package and is extremely short to type.
Analyze count time series with excess zeros. Two types of statistical models are supported: Markov regression and state-space models. They are also known as observation-driven and parameter-driven models respectively in the time series literature. The functions used for Markov regression or observation-driven models can also be used to fit ordinary regression models with independent data under the zero-inflated Poisson (ZIP) or zero-inflated negative binomial (ZINB) assumption. The package also contains miscellaneous functions to compute density, distribution, quantile, and generate random numbers from ZIP and ZINB distributions.
This package provides methods for learning causal relationships among a set of foreground variables X based on signals from a (potentially much larger) set of background variables Z, which are known non-descendants of X. The confounder blanket learner (CBL) uses sparse regression techniques to simultaneously perform many conditional independence tests, with complementary pairs stability selection to guarantee finite sample error control. CBL is sound and complete with respect to a so-called "lazy oracle", and works with both linear and nonlinear systems. For details, see Watson & Silva (2022) <arXiv:2205.05715>
.
This package contains different algorithms and construction methods for optimal Latin hypercube designs (LHDs) with flexible sizes. Our package is comprehensive since it is capable of generating maximin distance LHDs, maximum projection LHDs, and orthogonal and nearly orthogonal LHDs. Detailed comparisons and summary of all the algorithms and construction methods in this package can be found at Hongzhi Wang, Qian Xiao and Abhyuday Mandal (2021) <doi:10.48550/arXiv.2010.09154>
. This package is particularly useful in the area of Design and Analysis of Experiments (DAE). More specifically, design of computer experiments.
Generalised additive P-spline regression models estimation using the separation of overlapping precision matrices (SOP) method. Estimation is based on the equivalence between P-splines and linear mixed models, and variance/smoothing parameters are estimated based on restricted maximum likelihood (REML). The package enables users to estimate P-spline models with overlapping penalties. Based on the work described in Rodriguez-Alvarez et al. (2015) <doi:10.1007/s11222-014-9464-2>; Rodriguez-Alvarez et al. (2019) <doi:10.1007/s11222-018-9818-2>, and Eilers and Marx (1996) <doi:10.1214/ss/1038425655>.
The Truncated Factor Model is a statistical model designed to handle specific data structures in data analysis. This R package focuses on the Sparse Online Principal Component Estimation method, which is used to calculate data such as the loading matrix and specific variance matrix for truncated data, thereby better explaining the relationship between common factors and original variables. Additionally, the R package also provides other equations for comparison with the Sparse Online Principal Component Estimation method.The philosophy of the package is described in thesis. (2023) <doi:10.1007/s00180-022-01270-z>.
Fits boundary line models to datasets as proposed by Webb (1972) <doi:10.1080/00221589.1972.11514472> and makes statistical inferences about their parameters. Provides additional tools for testing datasets for evidence of boundary presence and selecting initial starting values for model optimization prior to fitting the boundary line models. It also includes tools for conducting post-hoc analyses such as predicting boundary values and identifying the most limiting factor (Miti, Milne, Giller, Lark (2024) <doi:10.1016/j.fcr.2024.109365>). This ensures a comprehensive analysis for datasets that exhibit upper boundary structures.
Diffusion Weighted Imaging (DWI) is a Magnetic Resonance Imaging modality, that measures diffusion of water in tissues like the human brain. The package contains R-functions to process diffusion-weighted data. The functionality includes diffusion tensor imaging (DTI), diffusion kurtosis imaging (DKI), modeling for high angular resolution diffusion weighted imaging (HARDI) using Q-ball-reconstruction and tensor mixture models, several methods for structural adaptive smoothing including POAS and msPOAS
, and a streamline fiber tracking for tensor and tensor mixture models. The package provides functionality to manipulate and visualize results in 2D and 3D.
Goodness-of-fit tests for selection of r in the r-largest order statistics (GEVr) model. Goodness-of-fit tests for threshold selection in the Generalized Pareto distribution (GPD). Random number generation and density functions for the GEVr distribution. Profile likelihood for return level estimation using the GEVr and Generalized Pareto distributions. P-value adjustments for sequential, multiple testing error control. Non-stationary fitting of GEVr and GPD. Bader, B., Yan, J. & Zhang, X. (2016) <doi:10.1007/s11222-016-9697-3>. Bader, B., Yan, J. & Zhang, X. (2018) <doi:10.1214/17-AOAS1092>.
For multiscale analysis, this package carries out ensemble patch transform, its visualization and multiscale decomposition. The detailed procedure is described in Kim et al. (2020), and Oh and Kim (2020). D. Kim, G. Choi, H.-S. Oh, Ensemble patch transformation: a flexible framework for decomposition and filtering of signal, EURASIP Journal on Advances in Signal Processing 30 (2020) 1-27 <doi:10.1186/s13634-020-00690-7>. H.-S. Oh, D. Kim, Image decomposition by bidimensional ensemble patch transform, Pattern Recognition Letters 135 (2020) 173-179 <doi:10.1016/j.patrec.2020.03.029>.
Estimation of Rosenthal's fail safe number including confidence intervals. The relevant papers are the following. Konstantinos C. Fragkos, Michail Tsagris and Christos C. Frangos (2014). "Publication Bias in Meta-Analysis: Confidence Intervals for Rosenthal's Fail-Safe Number". International Scholarly Research Notices, Volume 2014. <doi:10.1155/2014/825383>. Konstantinos C. Fragkos, Michail Tsagris and Christos C. Frangos (2017). "Exploring the distribution for the estimator of Rosenthal's fail-safe number of unpublished studies in meta-analysis". Communications in Statistics-Theory and Methods, 46(11):5672--5684. <doi:10.1080/03610926.2015.1109664>.
This package provides advanced algorithms for analyzing pointcloud data in forestry applications. Key features include fast voxelization of large datasets; segmentation of point clouds into forest floor, understorey, canopy, and wood components. The package enables efficient processing of large-scale forest pointcloud data, offering insights into forest structure, connectivity, and fire risk assessment. Algorithms to analyze pointcloud data (.xyz input file). For more details, see Ferrara & Arrizza (2025) <https://hdl.handle.net/20.500.14243/533471>. For single tree segmentation details, see Ferrara et al. (2018) <doi:10.1016/j.agrformet.2018.04.008>.
This tool computes the probability of detection (POD) curve and the limit of detection (LOD), i.e. the number of copies of the target DNA sequence required to ensure a 95 % probability of detection (LOD95). Other quantiles of the LOD can be specified. This is a reimplementation of the mathematical-statistical modelling of the validation of qualitative polymerase chain reaction (PCR) methods within a single laboratory as provided by the commercial tool PROLab <http://quodata.de/>. The modelling itself has been described by Uhlig et al. (2015) <doi:10.1007/s00769-015-1112-9>.
The overall performance of soil ecosystem services and productivity greatly relies on soil health, making it a crucial indicator. The evaluation of soil physical, chemical, and biological parameters is necessary to determine the overall soil quality index. In our package, three commonly used methods, including linear scoring, regression-based, and principal component-based soil quality indexing, are employed to calculate the soil quality index. This package has been developed using concept of Bastida et al. (2008) and Doran and Parkin (1994) <doi:10.1016/j.geoderma.2008.08.007> <doi:10.2136/sssaspecpub35.c1>.
Parametric time warping aligns patterns. It aims to put corresponding features at the same locations. The algorithm searches for an optimal polynomial describing the warping. It is possible to align one sample to a reference, several samples to the same reference, or several samples to several references. One can choose between calculating individual warpings, or one global warping for a set of samples and one reference. Two optimization criteria are implemented: RMS error and WCC. Both warping of peak profiles and of peak lists are supported.
This package provides an implementation of efficient approximate leave-one-out (LOO) cross-validation for Bayesian models fit using Markov chain Monte Carlo, as described in doi:10.1007/s11222-016-9696-4. The approximation uses Pareto smoothed importance sampling (PSIS), a new procedure for regularizing importance weights. As a byproduct of the calculations, we also obtain approximate standard errors for estimated predictive errors and for the comparison of predictive errors between models. The package also provides methods for using stacking and other model weighting techniques to average Bayesian predictive distributions.
Package ACV (short for Affine Cross-Validation) offers an improved time-series cross-validation loss estimator which utilizes both in-sample and out-of-sample forecasting performance via a carefully constructed affine weighting scheme. Under the assumption of stationarity, the estimator is the best linear unbiased estimator of the out-of-sample loss. Besides that, the package also offers improved versions of Diebold-Mariano and Ibragimov-Muller tests of equal predictive ability which deliver more power relative to their conventional counterparts. For more information, see the accompanying article Stanek (2021) <doi:10.2139/ssrn.3996166>.
Sequential Poisson sampling is a variation of Poisson sampling for drawing probability-proportional-to-size samples with a given number of units, and is commonly used for price-index surveys. This package gives functions to draw stratified sequential Poisson samples according to the method by Ohlsson (1998, ISSN:0282-423X), as well as other order sample designs by Rosén (1997, <doi:10.1016/S0378-3758(96)00186-3>), and generate appropriate bootstrap replicate weights according to the generalized bootstrap method by Beaumont and Patak (2012, <doi:10.1111/j.1751-5823.2011.00166.x>).
Facilities for constructing variance dispersion graphs, fraction- of-design-space plots and similar graphics for exploring the properties of experimental designs. The design region is explored via random sampling, which allows for more flexibility than traditional variance dispersion graphs. A formula interface is leveraged to provide access to complex model formulae. Graphics can be constructed simultaneously for multiple experimental designs and/or multiple model formulae. Instead of using pointwise optimization to find the minimum and maximum scaled prediction variance curves, which can be inaccurate and time consuming, this package uses quantile regression as an alternative.
This is a set of minimization tools (maximum likelihood estimation and least square fitting) to solve examples in the Johan Gabrielsson and Dan Weiner's book "Pharmacokinetic and Pharmacodynamic Data Analysis - Concepts and Applications" 5th ed. (ISBN:9198299107). Examples include linear and nonlinear compartmental model, turn-over model, single or multiple dosing bolus/infusion/oral models, allometry, toxicokinetics, reversible metabolism, in-vitro/in-vivo extrapolation, enterohepatic circulation, metabolite modeling, Emax model, inhibitory model, tolerance model, oscillating response model, enantiomer interaction model, effect compartment model, drug-drug interaction model, receptor occupancy model, and rebound phenomena model.
This data management package provides some helper classes for publicly available data sources (HMD, DESTATIS) in Demography. Similar to ideas developed in the Bioconductor project <https://bioconductor.org> we strive to encapsulate data in easy to use S4 objects. If original data is provided in a text file, the resulting S4 object contains all information from that text file. But the information is somehow structured (header, footer, etc). Further the classes provide methods to make a subset for selected calendar years or selected regions. The resulting subset objects still contain the original header and footer information.