This package provides a comprehensive set of tools for working with order statistics, including functions for simulating order statistics, censored samples (Type I and Type II), and record values from various continuous distributions. Additionally, it offers functions to compute moments (mean, variance, skewness, kurtosis) of order statistics for several continuous distributions. These tools assist researchers and statisticians in understanding and analyzing the properties of order statistics and related data. The methods and algorithms implemented in this package are based on several published works, including Ahsanullah et al (2013, ISBN:9789491216831), Arnold and Balakrishnan (2012, ISBN:1461236444), Harter and Balakrishnan (1996, ISBN:9780849394522), Balakrishnan and Sandhu (1995) <doi:10.1080/00031305.1995.10476150>, Genç (2012) <doi:10.1007/s00362-010-0320-y>, Makouei et al (2021) <doi:10.1016/j.cam.2021.113386> and Nagaraja (2013) <doi:10.1016/j.spl.2013.06.028>.
Network meta-analysis tools based on contrast-based approach using the multivariate meta-analysis and meta-regression models (Noma et al. (2025) <doi:10.1101/2025.09.15.25335823>). Comprehensive analysis tools for network meta-analysis and meta-regression (e.g., synthesis analysis, ranking analysis, and creating league table) are available through simple commands. For inconsistency assessment, the local and global inconsistency tests based on the Higgins design-by-treatment interaction model are available. In addition, the side-splitting methods and Jackson's random inconsistency model can be applied. Standard graphical tools for network meta-analysis, including network plots, ranked forest plots, and transitivity analyses, are also provided. For the synthesis analyses, the Noma-Hamura's improved REML (restricted maximum likelihood)-based methods (Noma et al. (2023) <doi:10.1002/jrsm.1652> <doi:10.1002/jrsm.1651>) are adopted as the default methods.
Computing package for Multidimensional Poverty Index (MPI) using Alkire-Foster method. Given N individuals, each person has D indicators of deprivation, the package compute MPI value to represent the degree of poverty in a population. The inputs are 1) an N by D matrix, which has the element (i,j) represents whether an individual i is deprived in an indicator j (1 is deprived and 0 is not deprived), and 2) the deprivation threshold. The main output is the MPI value, which has the range between zero and one. MPI value is approaching one if almost all people are deprived in all indicators, and it is approaching zero if almost no people are deprived in any indicator. Please see Alkire S., Chatterjee, M., Conconi, A., Seth, S. and Ana Vaz (2014) <doi:10.35648/20.500.12413/11781/ii039> for The Alkire-Foster methodology.
This package performs parametric and non-parametric estimation and simulation for multi-state discrete-time semi-Markov processes. For the parametric estimation, several discrete distributions are considered for the sojourn times: Uniform, Geometric, Poisson, Discrete Weibull and Negative Binomial. The non-parametric estimation concerns the sojourn time distributions, where no assumptions are done on the shape of distributions. Moreover, the estimation can be done on the basis of one or several sample paths, with or without censoring at the beginning or/and at the end of the sample paths. The implemented methods are described in Barbu, V.S., Limnios, N. (2008) <doi:10.1007/978-0-387-73173-5>, Barbu, V.S., Limnios, N. (2008) <doi:10.1080/10485250701261913> and Trevezas, S., Limnios, N. (2011) <doi:10.1080/10485252.2011.555543>. Estimation and simulation of discrete-time k-th order Markov chains are also considered.
Paired mass distance (PMD) analysis proposed in Yu, Olkowicz and Pawliszyn (2018) <doi:10.1016/j.aca.2018.10.062> and PMD based reactomics analysis proposed in Yu and Petrick (2020) <doi:10.1038/s42004-020-00403-z> for gas/liquid chromatographyâ mass spectrometry (GC/LC-MS) based non-targeted analysis. PMD analysis including GlobalStd algorithm and structure/reaction directed analysis. GlobalStd algorithm could found independent peaks in m/z-retention time profiles based on retention time hierarchical cluster analysis and frequency analysis of paired mass distances within retention time groups. Structure directed analysis could be used to find potential relationship among those independent peaks in different retention time groups based on frequency of paired mass distances. Reactomics analysis could also be performed to build PMD network, assign sources and make biomarker reaction discovery. GUIs for PMD analysis is also included as shiny applications.
This package performs linear regression with respect to a data-driven convex loss function that is chosen to minimize the asymptotic covariance of the resulting M-estimator. The convex loss function is estimated in 5 steps: (1) form an initial OLS (ordinary least squares) or LAD (least absolute deviation) estimate of the regression coefficients; (2) use the resulting residuals to obtain a kernel estimator of the error density; (3) estimate the score function of the errors by differentiating the logarithm of the kernel density estimate; (4) compute the L2 projection of the estimated score function onto the set of decreasing functions; (5) take a negative antiderivative of the projected score function estimate. Newton's method (with Hessian modification) is then used to minimize the convex empirical risk function. Further details of the method are given in Feng et al. (2024) <doi:10.48550/arXiv.2403.16688>.
This package provides a likelihood-based hypothesis testing approach is implemented for assessing causal mediation. Described in Millstein, Chen, and Breton (2016), <DOI:10.1093/bioinformatics/btw135>, it could be used to test for mediation of a known causal association between a DNA variant, the instrumental variable', and a clinical outcome or phenotype by gene expression or DNA methylation, the potential mediator. Another example would be testing mediation of the effect of a drug on a clinical outcome by the molecular target. The hypothesis test generates a p-value or permutation-based FDR value with confidence intervals to quantify uncertainty in the causal inference. The outcome can be represented by either a continuous or binary variable, the potential mediator is continuous, and the instrumental variable can be continuous or binary and is not limited to a single variable but may be a design matrix representing multiple variables.
Collective matrix factorization (CMF) finds joint low-rank representations for a collection of matrices with shared row or column entities. This code learns a variational Bayesian approximation for CMF, supporting multiple likelihood potentials and missing data, while identifying both factors shared by multiple matrices and factors private for each matrix. For further details on the method see Klami et al. (2014) <arXiv:1312.5921>. The package can also be used to learn Bayesian canonical correlation analysis (CCA) and group factor analysis (GFA) models, both of which are special cases of CMF. This is likely to be useful for people looking for CCA and GFA solutions supporting missing data and non-Gaussian likelihoods. See Klami et al. (2013) <https://research.cs.aalto.fi/pml/online-papers/klami13a.pdf> and Virtanen et al. (2012) <http://proceedings.mlr.press/v22/virtanen12.html> for details on Bayesian CCA and GFA, respectively.
General P-splines are non-uniform B-splines penalized by a general difference penalty, proposed by Li and Cao (2022) <arXiv:2201.06808>. Constructible on arbitrary knots, they extend the standard P-splines of Eilers and Marx (1996) <doi:10.1214/ss/1038425655>. They are also related to the O-splines of O'Sullivan (1986) <doi:10.1214/ss/1177013525> via a sandwich formula that links a general difference penalty to a derivative penalty. The package includes routines for setting up and handling difference and derivative penalties. It also fits P-splines and O-splines to (x, y) data (optionally weighted) for a grid of smoothing parameter values in the automatic search intervals of Li and Cao (2023) <doi:10.1007/s11222-022-10178-z>. It aims to facilitate other packages to implement P-splines or O-splines as a smoothing tool in their model estimation framework.
Fetch Australian Clean Energy Regulator data on carbon credits, safeguard mechanism facilities, renewable energy certificates, and greenhouse gas reporting. Provides tidy access to the Australian Carbon Credit Unit ('ACCU') Scheme project register, Safeguard Mechanism baselines and covered emissions, Large-scale Renewable Energy Target ('LRET') power station accreditations, Small-scale Renewable Energy Scheme ('SRES') installation data, the National Greenhouse and Energy Reporting ('NGER') scheme, and Quarterly Carbon Market Reports <https://cer.gov.au/markets/reports-and-data>. Includes a post-Chubb ACCU integrity layer (Chubb 2022 Independent Review), Safeguard reform handling (declining industry baselines from July 2023), National Greenhouse and Energy Reporting scope discipline (Scope 1 / Scope 2 market vs location / Climate Active), reconciliation against the Quarterly Carbon Market Report, and reproducibility helpers (snapshot pinning, SHA-256 cache integrity, session manifest, optional Zenodo deposit). Data is published by the Clean Energy Regulator under a Creative Commons Attribution 4.0 International licence.
This package implements the doubly robust distribution balancing weighting proposed by Katsumata (2024) <doi:10.1017/psrm.2024.23>, which improves the augmented inverse probability weighting (AIPW) by estimating propensity scores with estimating equations suitable for the pre-specified parameter of interest (e.g., the average treatment effects or the average treatment effects on the treated) and estimating outcome models with the estimated inverse probability weights. It also implements the covariate balancing propensity score proposed by Imai and Ratkovic (2014) <doi:10.1111/rssb.12027> and the entropy balancing weighting proposed by Hainmueller (2012) <doi:10.1093/pan/mpr025>, both of which use covariate balancing conditions in propensity score estimation. The point estimate of the parameter of interest and its uncertainty as well as coefficients for propensity score estimation and outcome regression are produced using the M-estimation. The same functions can be used to estimate average outcomes in missing outcome cases.
Index of Multiple Deprivation for UK nations at various geographical levels. In England, deprivation data is for Lower Layer Super Output Areas, Middle Layer Super Output Areas, Wards, and Local Authorities based on data from <https://www.gov.uk/government/statistics/english-indices-of-deprivation-2019>. In Wales, deprivation data is for Lower Layer Super Output Areas, Middle Layer Super Output Areas, Wards, and Local Authorities based on data from <https://gov.wales/welsh-index-multiple-deprivation-full-index-update-ranks-2019>. In Scotland, deprivation data is for Data Zones, Intermediate Zones, and Council Areas based on data from <https://simd.scot>. In Northern Ireland, deprivation data is for Super Output Areas and Local Government Districts based on data from <https://www.nisra.gov.uk/statistics/deprivation/northern-ireland-multiple-deprivation-measure-2017-nimdm2017>. The IMD package also provides the composite UK index developed by <https://github.com/mysociety/composite_uk_imd>.
Accumulated Local Effects (ALE) were initially developed as a model-agnostic approach for global explanations of the results of black-box machine learning algorithms. ALE has a key advantage over other approaches like partial dependency plots (PDP) and SHapley Additive exPlanations (SHAP): its values represent a clean functional decomposition of the model. As such, ALE values are not affected by the presence or absence of interactions among variables in a mode. Moreover, its computation is relatively rapid. This package reimplements the algorithms for calculating ALE data and develops highly interpretable visualizations for plotting these ALE values. It also extends the original ALE concept to add bootstrap-based confidence intervals and ALE-based statistics that can be used for statistical inference. For more details, see Okoli, Chitu. 2023. â Statistical Inference Using Machine Learning and Classical Techniques Based on Accumulated Local Effects (ALE).â arXiv. <doi:10.48550/arXiv.2310.09877>.
There is no ophthalmic researcher who has not had headaches from the handling of visual acuity entries. Different notations, untidy entries. This shall now be a matter of the past. Eye makes it as easy as pie to work with VA data - easy cleaning, easy conversion between Snellen, logMAR, ETDRS letters, and qualitative visual acuity shall never pester you again. The eye package automates the pesky task to count number of patients and eyes, and can help to clean data with easy re-coding for right and left eyes. It also contains functions to help reshaping eye side specific variables between wide and long format. Visual acuity conversion is based on Schulze-Bonsel et al. (2006) <doi:10.1167/iovs.05-0981>, Gregori et al. (2010) <doi:10.1097/iae.0b013e3181d87e04>, Beck et al. (2003) <doi:10.1016/s0002-9394(02)01825-1> and Bach (2007) <https://michaelbach.de/sci/acuity.html>.
This package provides statistical methods for auditing as implemented in JASP for Audit (Derks et al., 2021 <doi:10.21105/joss.02733>). First, the package makes it easy for an auditor to plan a statistical sample, select the sample from the population, and evaluate the misstatement in the sample compliant with international auditing standards. Second, the package provides statistical methods for auditing data, including tests of digit distributions and repeated values. Finally, the package includes methods for auditing algorithms on the aspect of fairness and bias. Next to classical statistical methodology, the package implements Bayesian equivalents of these methods whose statistical underpinnings are described in Derks et al. (2021) <doi:10.1111/ijau.12240>, Derks et al. (2024) <doi:10.2308/AJPT-2021-086>, Derks et al. (2022) <doi:10.31234/osf.io/8nf3e> Derks et al. (2024) <doi:10.31234/osf.io/tgq5z>, and Derks et al. (2025) <doi:10.31234/osf.io/b8tu2>.
Univariate and multivariate SQC tools that completes and increases the SQC techniques available in R. Apart from integrating different R packages devoted to SQC ('qcc','MSQC'), provides nonparametric tools that are highly useful when Gaussian assumption is not met. This package computes standard univariate control charts for individual measurements, X-bar', S', R', p', np', c', u', EWMA and CUSUM'. In addition, it includes functions to perform multivariate control charts such as Hotelling T2', MEWMA and MCUSUM'. As representative feature, multivariate nonparametric alternatives based on data depth are implemented in this package: r', Q and S control charts. In addition, Phase I and II control charts for functional data are included. This package also allows the estimation of the most complete set of capability indices from first to fourth generation, covering the nonparametric alternatives, and performing the corresponding capability analysis graphical outputs, including the process capability plots. See Flores et al. (2021) <doi:10.32614/RJ-2021-034>.
The standard Difference-in-Differences (DID) setup involves two periods and two groups -- a treated group and untreated group. Many applications of DID methods involve more than two periods and have individuals that are treated at different points in time. This package contains tools for computing average treatment effect parameters in Difference in Differences setups with more than two periods and with variation in treatment timing using the methods developed in Callaway and Sant'Anna (2021) <doi:10.1016/j.jeconom.2020.12.001>. The main parameters are group-time average treatment effects which are the average treatment effect for a particular group at a particular time. These can be aggregated into a fewer number of treatment effect parameters, and the package deals with the cases where there is selective treatment timing, dynamic treatment effects, calendar time effects, or combinations of these. There are also functions for testing the Difference in Differences assumption, and plotting group-time average treatment effects.
Introduction to some novel accurate hybrid methods of geostatistical and machine learning methods for spatial predictive modelling. It contains two commonly used geostatistical methods, two machine learning methods, four hybrid methods and two averaging methods. For each method, two functions are provided. One function is for assessing the predictive errors and accuracy of the method based on cross-validation. The other one is for generating spatial predictions using the method. For details please see: Li, J., Potter, A., Huang, Z., Daniell, J. J. and Heap, A. (2010) <https://ecat.ga.gov.au/geonetwork/srv/eng/catalog.search#/metadata/71407> Li, J., Heap, A. D., Potter, A., Huang, Z. and Daniell, J. (2011) <doi:10.1016/j.csr.2011.05.015> Li, J., Heap, A. D., Potter, A. and Daniell, J. (2011) <doi:10.1016/j.envsoft.2011.07.004> Li, J., Potter, A., Huang, Z. and Heap, A. (2012) <https://ecat.ga.gov.au/geonetwork/srv/eng/catalog.search#/metadata/74030>.
This package provides methods and tools for (pre-)processing of metabolomics datasets (i.e. peak matrices), including filtering, normalisation, missing value imputation, scaling, and signal drift and batch effect correction methods. Filtering methods are based on: the fraction of missing values (across samples or features); Relative Standard Deviation (RSD) calculated from the Quality Control (QC) samples; the blank samples. Normalisation methods include Probabilistic Quotient Normalisation (PQN) and normalisation to total signal intensity. A unified user interface for several commonly used missing value imputation algorithms is also provided. Supported methods are: k-nearest neighbours (knn), random forests (rf), Bayesian PCA missing value estimator (bpca), mean or median value of the given feature and a constant small value. The generalised logarithm (glog) transformation algorithm is available to stabilise the variance across low and high intensity mass spectral features. Finally, this package provides an implementation of the Quality Control-Robust Spline Correction (QCRSC) algorithm for signal drift and batch effect correction of mass spectrometry-based datasets.
Incorporating node-level covariates for community detection has gained increasing attention these years. This package provides the function for implementing the novel community detection algorithm known as Network-Adjusted Covariates for Community Detection (NAC), which is designed to detect latent community structure in graphs with node-level information, i.e., covariates. This algorithm can handle models such as the degree-corrected stochastic block model (DCSBM) with covariates. NAC specifically addresses the discrepancy between the community structure inferred from the adjacency information and the community structure inferred from the covariates information. For more detailed information, please refer to the reference paper: Yaofang Hu and Wanjie Wang (2023) <arXiv:2306.15616>. In addition to NAC, this package includes several other existing community detection algorithms that are compared to NAC in the reference paper. These algorithms are Spectral Clustering On Ratios-of Eigenvectors (SCORE), network-based regularized spectral clustering (Net-based), covariate-based spectral clustering (Cov-based), covariate-assisted spectral clustering (CAclustering) and semidefinite programming (SDP).
Mortality rates are typically provided in an abridged format, i.e., by age groups 0, [1, 5], [5, 10]', [10, 15]', and so on. Some applications necessitate a detailed (single) age description. Despite the large number of proposed approaches in the literature, only a few methods ensure great performance at both younger and higher ages. For example, the 6-term Lagrange interpolation function is well suited to mortality interpolation at younger ages (with irregular intervals), but not at older ages. The Karup-King method, on the other hand, performs well at older ages but is not suitable for younger ones. Interested readers can find a full discussion of the two stated methods in the book Shryock, Siegel, and Associates (1993).The Q2q package combines the two methods to allow for the interpolation of mortality rates across all age groups. It begins by implementing each method independently, and then the resulting curves are linked using a 5-age averaged error between the two partial curves.
Implementations of the kernel measure of multi-sample dissimilarity (KMD) between several samples using K-nearest neighbor graphs and minimum spanning trees. The KMD measures the dissimilarity between multiple samples, based on the observations from them. It converges to the population quantity (depending on the kernel) which is between 0 and 1. A small value indicates the multiple samples are from the same distribution, and a large value indicates the corresponding distributions are different. The population quantity is 0 if and only if all distributions are the same, and 1 if and only if all distributions are mutually singular. The package also implements the tests based on KMD for H0: the M distributions are equal against H1: not all the distributions are equal. Both permutation test and asymptotic test are available. These tests are consistent against all alternatives where at least two samples have different distributions. For more details on KMD and the associated tests, see Huang, Z. and B. Sen (2022) <arXiv:2210.00634>.
The kernelSmoothing() function allows you to square and smooth geolocated data. It calculates a classical kernel smoothing (conservative) or a geographically weighted median. There are four major call modes of the function. The first call mode is kernelSmoothing(obs, epsg, cellsize, bandwidth) for a classical kernel smoothing and automatic grid. The second call mode is kernelSmoothing(obs, epsg, cellsize, bandwidth, quantiles) for a geographically weighted median and automatic grid. The third call mode is kernelSmoothing(obs, epsg, cellsize, bandwidth, centroids) for a classical kernel smoothing and user grid. The fourth call mode is kernelSmoothing(obs, epsg, cellsize, bandwidth, quantiles, centroids) for a geographically weighted median and user grid. Geographically weighted summary statistics : a framework for localised exploratory data analysis, C.Brunsdon & al., in Computers, Environment and Urban Systems C.Brunsdon & al. (2002) <doi:10.1016/S0198-9715(01)00009-6>, Statistical Analysis of Spatial and Spatio-Temporal Point Patterns, Third Edition, Diggle, pp. 83-86, (2003) <doi:10.1080/13658816.2014.937718>.
Developed for computing the probability density function, computing the cumulative distribution function, computing the quantile function, random generation, drawing q-q plot, and estimating the parameters of 24 G-family of statistical distributions via the maximum product spacing approach introduced in <https://www.jstor.org/stable/2345411>. The set of families contains: beta G distribution, beta exponential G distribution, beta extended G distribution, exponentiated G distribution, exponentiated exponential Poisson G distribution, exponentiated generalized G distribution, exponentiated Kumaraswamy G distribution, gamma type I G distribution, gamma type II G distribution, gamma uniform G distribution, gamma-X generated of log-logistic family of G distribution, gamma-X family of modified beta exponential G distribution, geometric exponential Poisson G distribution, generalized beta G distribution, generalized transmuted G distribution, Kumaraswamy G distribution, log gamma type I G distribution, log gamma type II G distribution, Marshall Olkin G distribution, Marshall Olkin Kumaraswamy G distribution, modified beta G distribution, odd log-logistic G distribution, truncated-exponential skew-symmetric G distribution, and Weibull G distribution.