The PP package includes estimation of (MLE, WLE, MAP, EAP, ROBUST) person parameters for the 1,2,3,4-PL model and the GPCM (generalized partial credit model). The parameters are estimated under the assumption that the item parameters are known and fixed. The package is useful e.g. in the case that items from an item pool / item bank with known item parameters are administered to a new population of test-takers and an ability estimation for every test-taker is needed.
This package provides functions to select samples using PPS (probability proportional to size) sampling. The package also includes a function for stratified simple random sampling, a function to compute joint inclusion probabilities for Sampford's method of PPS sampling, and a few utility functions. The user's guide pps-ug.pdf is included in the .../pps/doc directory. The methods are described in standard survey sampling theory books such as Cochran's "Sampling Techniques"; see the user's guide for references.
This package provides a toolbox for deterministic, probabilistic and privacy-preserving record linkage techniques. Combines the functionality of the Merge ToolBox
(<https://www.record-linkage.de>) with current privacy-preserving techniques.
This package contains linear and nonlinear regression methods based on partial least squares and penalization techniques. Model parameters are selected via cross-validation, and confidence intervals ans tests for the regression coefficients can be conducted via jackknifing.
This package performs partial principal component analysis of a large sparse matrix. The matrix may be stored as a list of matrices to be concatenated (implicitly) horizontally. Useful application includes cases where the number of total nonzero entries exceed the capacity of 32 bit integers (e.g., with large Single Nucleotide Polymorphism data).
The Predictive Power Score (PPS) is an asymmetric, data-type-agnostic score that can detect linear or non-linear relationships between two variables. The score ranges from 0 (no predictive power) to 1 (perfect predictive power). PPS can be useful for data exploration purposes, in the same way correlation analysis is. For more information on PPS, see <https://github.com/paulvanderlaken/ppsr>.
This package implements recently developed projection pursuit algorithms for finding optimal linear cluster separators. The clustering algorithms use optimal hyperplane separators based on minimum density, Pavlidis et. al (2016) <http://jmlr.org/papers/volume17/15-307/15-307.pdf>; minimum normalised cut, Hofmeyr (2017) <doi:10.1109/TPAMI.2016.2609929>; and maximum variance ratio clusterability, Hofmeyr and Pavlidis (2015) <doi:10.1109/SSCI.2015.116>.
Efficient statistical inference of two-sample MR (Mendelian Randomization) analysis. It can account for the correlated instruments and the horizontal pleiotropy, and can provide the accurate estimates of both causal effect and horizontal pleiotropy effect as well as the two corresponding p-values. There are two main functions in the PPMR package. One is PMR_individual()
for individual level data, the other is PMR_summary()
for summary data.
Calculates the Probability Plot Correlation Coefficient (PPCC) between a continuous variable X and a specified distribution. The corresponding composite hypothesis test that was first introduced by Filliben (1975) <doi: 10.1080/00401706.1975.10489279> can be performed to test whether the sample X is element of either the Normal, log-Normal, Exponential, Uniform, Cauchy, Logistic, Generalized Logistic, Gumbel (GEVI), Weibull, Generalized Extreme Value, Pearson III (Gamma 2), Mielke's Kappa, Rayleigh or Generalized Logistic Distribution. The PPCC test is performed with a fast Monte-Carlo simulation.
This package implements data processing described in <doi:10.1126/sciadv.abk3283> to align modern differentially private data with formatting of older US Census data releases. The primary goal is to read in Census Privacy Protected Microdata Files data in a reproducible way. This includes tools for aggregating to relevant levels of geography by creating geographic identifiers which match the US Census Bureau's numbering. Additionally, there are tools for grouping race numeric identifiers into categories, consistent with OMB (Office of Management and Budget) classifications. Functions exist for downloading and linking to existing sources of privacy protected microdata.
Reconstruction of paleoclimate niches using phylogenetic comparative methods and projection reconstructed niches onto paleoclimate maps. The user can specify various models of trait evolution or estimate the best fit model, include fossils, use one or multiple phylogenies for inference, and make animations of shifting suitable habitat through time. This model was first used in Lawing and Polly (2011), and further implemented in Lawing et al (2016) and Rivera et al (2020). Lawing and Polly (2011) <doi:10.1371/journal.pone.0028554> "Pleistocene climate, phylogeny and climate envelope models: An integrative approach to better understand species response to climate change" Lawing et al (2016) <doi:10.1086/687202> "Including fossils in phylogenetic climate reconstructions: A deep time perspective on the climatic niche evolution and diversification of spiny lizards (Sceloporus)" Rivera et al (2020) <doi:10.1111/jbi.13915> "Reconstructing historical shifts in suitable habitat of Sceloporus lineages using phylogenetic niche modelling.".
This package provides functionality for Bayesian analysis of replication studies using power prior approaches (Pawel et al., 2023) <doi:10.1007/s11749-023-00888-5>.
This package provides users not only with a function to readily calculate the higher-order partial and semi-partial correlations but also with statistics and p-values of the correlation coefficients.
This package provides functions are available to calibrate designs over a range of posterior and predictive thresholds, to plot the various design options, and to obtain the operating characteristics of optimal accuracy and optimal efficiency designs.
This is an implementation of the partial profile score feature selection (PPSFS) approach to generalized linear (interaction) models. The PPSFS is highly scalable even for ultra-high-dimensional feature space. See the paper by Xu, Luo and Chen (2021, <doi:10.4310/21-SII706>).
This package provides methods for fitting point processes with parameters of generalised additive model (GAM) form are provided. For an introduction to point processes see Cox, D.R & Isham, V. (Point Processes, 1980, CRC Press), GAMs see Wood, S.N. (2017) <doi:10.1201/9781315370279>, and the fitting approach see Wood, S.N., Pya, N. & Safken, B. (2016) <doi:10.1080/01621459.2016.1180986>.
An implementation of the one-step privacy-protecting method for estimating the overall and site-specific hazard ratios using inverse probability weighted Cox models in distributed data network studies, as proposed by Shu, Yoshida, Fireman, and Toh (2019) <doi: 10.1177/0962280219869742>. This method only requires sharing of summary-level riskset tables instead of individual-level data. Both the conventional inverse probability weights and the stabilized weights are implemented.
This package implements linear and generalized linear models for provider profiling, incorporating both fixed and random effects. For large-scale providers, the linear profiled-based method and the SerBIN
method for binary data reduce the computational burden. Provides post-modeling features, such as indirect and direct standardization measures, hypothesis testing, confidence intervals, and post-estimation visualization. For more information, see Wu et al. (2022) <doi:10.1002/sim.9387>.
In the era of big data, data redundancy and distributed characteristics present novel challenges to data analysis. This package introduces a method for estimating optimal subsets of redundant distributed data, based on PPCDT (Conjunction of Power and P-value in Distributed Settings). Leveraging PPC technology, this approach can efficiently extract valuable information from redundant distributed data and determine the optimal subset. Experimental results demonstrate that this method not only enhances data quality and utilization efficiency but also assesses its performance effectively. The philosophy of the package is described in Guo G. (2020) <doi:10.1007/s00180-020-00974-4>.
Generates chronological and ordered p-plots for data vectors or vectors of p-values. The p-plot visualizes the evolution of the p-value of a significance test across the sampled data. It allows for assessing the consistency of the observed effects, for detecting the presence of potential moderator variables, and for estimating the influence of outlier values on the observed results. For non-significant findings, it can diagnose patterns indicative of underpowered study designs. The p-plot can thus either back the binary accept-vs-reject decision of common null-hypothesis significance tests, or it can qualify this decision and stimulate additional empirical work to arrive at more robust and replicable statistical inferences.
Stochastic block model used for dynamic graphs represented by Poisson processes. To model recurrent interaction events in continuous time, an extension of the stochastic block model is proposed where every individual belongs to a latent group and interactions between two individuals follow a conditional inhomogeneous Poisson process with intensity driven by the individualsâ latent groups. The model is shown to be identifiable and its estimation is based on a semiparametric variational expectation-maximization algorithm. Two versions of the method are developed, using either a nonparametric histogram approach (with an adaptive choice of the partition size) or kernel intensity estimators. The number of latent groups can be selected by an integrated classification likelihood criterion. Y. Baraud and L. Birgé (2009). <doi:10.1007/s00440-007-0126-6>. C. Biernacki, G. Celeux and G. Govaert (2000). <doi:10.1109/34.865189>. M. Corneli, P. Latouche and F. Rossi (2016). <doi:10.1016/j.neucom.2016.02.031>. J.-J. Daudin, F. Picard and S. Robin (2008). <doi:10.1007/s11222-007-9046-7>. A. P. Dempster, N. M. Laird and D. B. Rubin (1977). <http://www.jstor.org/stable/2984875>. G. Grégoire (1993). <http://www.jstor.org/stable/4616289>. L. Hubert and P. Arabie (1985). <doi:10.1007/BF01908075>. M. Jordan, Z. Ghahramani, T. Jaakkola and L. Saul (1999). <doi:10.1023/A:1007665907178>. C. Matias, T. Rebafka and F. Villers (2018). <doi:10.1093/biomet/asy016>. C. Matias and S. Robin (2014). <doi:10.1051/proc/201447004>. H. Ramlau-Hansen (1983). <doi:10.1214/aos/1176346152>. P. Reynaud-Bouret (2006). <doi:10.3150/bj/1155735930>.
This package implements the copula-based estimator for univariate long-range dependent processes, introduced in Pumi et al. (2023) <doi:10.1007/s00362-023-01418-z>. Notably, this estimator is capable of handling missing data and has been shown to perform exceptionally well, even when up to 70% of data is missing (as reported in <arXiv:2303.04754>
) and has been found to outperform several other commonly applied estimators.
This package provides a suite of diagnostic tools for univariate point processes. This includes tools for simulating and fitting both common and more complex temporal point processes. We also include functions to visualise these point processes and collect existing diagnostic tools of Brown et al. (2002) <doi:10.1162/08997660252741149> and Wu et al. (2021) <doi:10.1002/9781119821588.ch7>, which can be used to assess the fit of a chosen point process model.
This package implements the Bi-objective Lexicographical Classification method and Performance Assessment Ratio at 10% metric for algorithm classification. Constructs matrices representing algorithm performance under multiple criteria, facilitating decision-making in algorithm selection and evaluation. Analyzes and compares algorithm performance based on various metrics to identify the most suitable algorithms for specific tasks. This package includes methods for algorithm classification and evaluation, with examples provided in the documentation. Carvalho (2019) presents a statistical evaluation of algorithmic computational experimentation with infeasible solutions <doi:10.48550/arXiv.1902.00101>
. Moreira and Carvalho (2023) analyze power in preprocessing methodologies for datasets with missing values <doi:10.1080/03610918.2023.2234683>.