Specialized solvers for combinatorial optimization problems in the Subset Sum family. The solvers differ from the mainstream in the options of (i) restricting subset size, (ii) bounding subset elements, (iii) mining real-value multisets with predefined subset sum errors, (iv) finding one or more subsets in limited time. A novel algorithm for mining the one-dimensional Subset Sum induced algorithms for the multi-Subset Sum and the multidimensional Subset Sum. The multi-threaded framework for the latter offers exact algorithms to the multidimensional Knapsack and the Generalized Assignment problems. Historical updates include (a) renewed implementation of the multi-Subset Sum, multidimensional Knapsack and Generalized Assignment solvers; (b) availability of bounding solution space in the multidimensional Subset Sum; (c) fundamental data structure and architectural changes for enhanced cache locality and better chance of SIMD vectorization; (d) option of mapping floating-point instance to compressed 64-bit integer instance with user-controlled precision loss, which could yield substantial speedup due to the dimension reduction and efficient compressed integer arithmetic via bit-manipulations; (e) distributed computing infrastructure for multidimensional subset sum; (f) arbitrary-precision zero-margin-of-error multidimensional Subset Sum accelerated by a simplified Bloom filter. The package contains a copy of xxHash
from <https://github.com/Cyan4973/xxHash>
. Package vignette (<doi:10.48550/arXiv.1612.04484>
) detailed a few historical updates. Functions prefixed with aux (auxiliary) are independent implementations of published algorithms for solving optimization problems less relevant to Subset Sum.
In various domains, many datasets exhibit both high variable dependency and group structures, which necessitates their simultaneous estimation. This package provides functions for two subgroup identification methods based on penalized functions, both of which utilize factor model structures to adapt to data with cross-sectional dependency. The first method is the Subgroup Identification with Latent Factor Structure Method (SILFSM) we proposed. By employing Center-Augmented Regularization and factor structures, the SILFSM effectively eliminates data dependencies while identifying subgroups within datasets. For this model, we offer optimization functions based on two different methods: Coordinate Descent and our newly developed Difference of Convex-Alternating Direction Method of Multipliers (DC-ADMM) algorithms; the latter can be applied to cases where the distance function in Center-Augmented Regularization takes L1 and L2 forms. The other method is the Factor-Adjusted Pairwise Fusion Penalty (FA-PFP) model, which incorporates factor augmentation into the Pairwise Fusion Penalty (PFP) developed by Ma, S. and Huang, J. (2017) <doi:10.1080/01621459.2016.1148039>. Additionally, we provide a function for the Standard CAR (S-CAR) method, which does not consider the dependency and is for comparative analysis with other approaches. Furthermore, functions based on the Bayesian Information Criterion (BIC) of the SILFSM and the FA-PFP method are also included in SILFS for selecting tuning parameters. For more details of Subgroup Identification with Latent Factor Structure Method, please refer to He et al. (2024) <doi:10.48550/arXiv.2407.00882>
.
In many binary classification applications, such as disease diagnosis and spam detection, practitioners commonly face the need to limit type I error (i.e., the conditional probability of misclassifying a class 0 observation as class 1) so that it remains below a desired threshold. To address this need, the Neyman-Pearson (NP) classification paradigm is a natural choice; it minimizes type II error (i.e., the conditional probability of misclassifying a class 1 observation as class 0) while enforcing an upper bound, alpha, on the type I error. Although the NP paradigm has a century-long history in hypothesis testing, it has not been well recognized and implemented in classification schemes. Common practices that directly limit the empirical type I error to no more than alpha do not satisfy the type I error control objective because the resulting classifiers are still likely to have type I errors much larger than alpha. As a result, the NP paradigm has not been properly implemented for many classification scenarios in practice. In this work, we develop the first umbrella algorithm that implements the NP paradigm for all scoring-type classification methods, including popular methods such as logistic regression, support vector machines and random forests. Powered by this umbrella algorithm, we propose a novel graphical tool for NP classification methods: NP receiver operating characteristic (NP-ROC) bands, motivated by the popular receiver operating characteristic (ROC) curves. NP-ROC bands will help choose in a data adaptive way and compare different NP classifiers.
Background - Traditional gene set enrichment analyses are typically limited to a few ontologies and do not account for the interdependence of gene sets or terms, resulting in overcorrected p-values. To address these challenges, we introduce mulea, an R package offering comprehensive overrepresentation and functional enrichment analysis. Results - mulea employs a progressive empirical false discovery rate (eFDR
) method, specifically designed for interconnected biological data, to accurately identify significant terms within diverse ontologies. mulea expands beyond traditional tools by incorporating a wide range of ontologies, encompassing Gene Ontology, pathways, regulatory elements, genomic locations, and protein domains. This flexibility enables researchers to tailor enrichment analysis to their specific questions, such as identifying enriched transcriptional regulators in gene expression data or overrepresented protein domains in protein sets. To facilitate seamless analysis, mulea provides gene sets (in standardised GMT format) for 27 model organisms, covering 22 ontology types from 16 databases and various identifiers resulting in almost 900 files. Additionally, the muleaData
ExperimentData
Bioconductor package simplifies access to these pre-defined ontologies. Finally, mulea's architecture allows for easy integration of user-defined ontologies, or GMT files from external sources (e.g., MSigDB
or Enrichr), expanding its applicability across diverse research areas. Conclusions - mulea is distributed as a CRAN R package. It offers researchers a powerful and flexible toolkit for functional enrichment analysis, addressing limitations of traditional tools with its progressive eFDR
and by supporting a variety of ontologies. Overall, mulea fosters the exploration of diverse biological questions across various model organisms.
Collection of R functions to do purely presence-only species distribution modeling with isolation forest (iForest
) and its variations such as Extended isolation forest and SCiForest
. See the details of these methods in references: Liu, F.T., Ting, K.M. and Zhou, Z.H. (2008) <doi:10.1109/ICDM.2008.17>, Hariri, S., Kind, M.C. and Brunner, R.J. (2019) <doi:10.1109/TKDE.2019.2947676>, Liu, F.T., Ting, K.M. and Zhou, Z.H. (2010) <doi:10.1007/978-3-642-15883-4_18>, Guha, S., Mishra, N., Roy, G. and Schrijvers, O. (2016) <https://proceedings.mlr.press/v48/guha16.html>, Cortes, D. (2021) <doi:10.48550/arXiv.2110.13402>
. Additionally, Shapley values are used to explain model inputs and outputs. See details in references: Shapley, L.S. (1953) <doi:10.1515/9781400881970-018>, Lundberg, S.M. and Lee, S.I. (2017) <https://dm-gatech.github.io/CS8803-Fall2018-DML-Papers/shapley.pdf>, Molnar, C. (2020) <ISBN:978-0-244-76852-2>, Å trumbelj, E. and Kononenko, I. (2014) <doi:10.1007/s10115-013-0679-x>. itsdm also provides functions to diagnose variable response, analyze variable importance, draw spatial dependence of variables and examine variable contribution. As utilities, the package includes a few functions to download bioclimatic variables including WorldClim
version 2.0 (see Fick, S.E. and Hijmans, R.J. (2017) <doi:10.1002/joc.5086>) and CMCC-BioClimInd
(see Noce, S., Caporaso, L. and Santini, M. (2020) <doi:10.1038/s41597-020-00726-5>.
Stochastic block model used for dynamic graphs represented by Poisson processes. To model recurrent interaction events in continuous time, an extension of the stochastic block model is proposed where every individual belongs to a latent group and interactions between two individuals follow a conditional inhomogeneous Poisson process with intensity driven by the individualsâ latent groups. The model is shown to be identifiable and its estimation is based on a semiparametric variational expectation-maximization algorithm. Two versions of the method are developed, using either a nonparametric histogram approach (with an adaptive choice of the partition size) or kernel intensity estimators. The number of latent groups can be selected by an integrated classification likelihood criterion. Y. Baraud and L. Birgé (2009). <doi:10.1007/s00440-007-0126-6>. C. Biernacki, G. Celeux and G. Govaert (2000). <doi:10.1109/34.865189>. M. Corneli, P. Latouche and F. Rossi (2016). <doi:10.1016/j.neucom.2016.02.031>. J.-J. Daudin, F. Picard and S. Robin (2008). <doi:10.1007/s11222-007-9046-7>. A. P. Dempster, N. M. Laird and D. B. Rubin (1977). <http://www.jstor.org/stable/2984875>. G. Grégoire (1993). <http://www.jstor.org/stable/4616289>. L. Hubert and P. Arabie (1985). <doi:10.1007/BF01908075>. M. Jordan, Z. Ghahramani, T. Jaakkola and L. Saul (1999). <doi:10.1023/A:1007665907178>. C. Matias, T. Rebafka and F. Villers (2018). <doi:10.1093/biomet/asy016>. C. Matias and S. Robin (2014). <doi:10.1051/proc/201447004>. H. Ramlau-Hansen (1983). <doi:10.1214/aos/1176346152>. P. Reynaud-Bouret (2006). <doi:10.3150/bj/1155735930>.
Estimation, based on conditional maximum likelihood, of the quadratic exponential model proposed by Bartolucci, F. & Nigro, V. (2010, Econometrica) <DOI:10.3982/ECTA7531> and of a simplified and a modified version of this model. The quadratic exponential model is suitable for the analysis of binary longitudinal data when state dependence (further to the effect of the covariates and a time-fixed individual intercept) has to be taken into account. Therefore, this is an alternative to the dynamic logit model having the advantage of easily allowing conditional inference in order to eliminate the individual intercepts and then getting consistent estimates of the parameters of main interest (for the covariates and the lagged response). The simplified version of this model does not distinguish, as the original model does, between the last time occasion and the previous occasions. The modified version formulates in a different way the interaction terms and it may be used to test in a easy way state dependence as shown in Bartolucci, F., Nigro, V. & Pigini, C. (2018, Econometric Reviews) <DOI:10.1080/07474938.2015.1060039>. The package also includes estimation of the dynamic logit model by a pseudo conditional estimator based on the quadratic exponential model, as proposed by Bartolucci, F. & Nigro, V. (2012, Journal of Econometrics) <DOI:10.1016/j.jeconom.2012.03.004>. For large time dimensions of the panel, the computation of the proposed models involves a recursive function from Krailo M. D., & Pike M. C. (1984, Journal of the Royal Statistical Society. Series C (Applied Statistics)) and Bartolucci F., Valentini, F. & Pigini C. (2021, Computational Economics <DOI:10.1007/s10614-021-10218-2>.
This package provides methods that use flexible variants of multidimensional scaling (MDS) which incorporate parametric nonlinear distance transformations and trade-off the goodness-of-fit fit with structure considerations to find optimal hyperparameters, also known as structure optimized proximity scaling (STOPS) (Rusch, Mair & Hornik, 2023,<doi:10.1007/s11222-022-10197-w>). The package contains various functions, wrappers, methods and classes for fitting, plotting and displaying different 1-way MDS models with ratio, interval, ordinal optimal scaling in a STOPS framework. These cover essentially the functionality of the package smacofx, including Torgerson (classical) scaling with power transformations of dissimilarities, SMACOF MDS with powers of dissimilarities, Sammon mapping with powers of dissimilarities, elastic scaling with powers of dissimilarities, spherical SMACOF with powers of dissimilarities, (ALSCAL) s-stress MDS with powers of dissimilarities, r-stress MDS, MDS with powers of dissimilarities and configuration distances, elastic scaling powers of dissimilarities and configuration distances, Sammon mapping powers of dissimilarities and configuration distances, power stress MDS (POST-MDS), approximate power stress, Box-Cox MDS, local MDS, Isomap, curvilinear component analysis (CLCA), curvilinear distance analysis (CLDA) and sparsified (power) multidimensional scaling and (power) multidimensional distance analysis (experimental models from smacofx influenced by CLCA). All of these models can also be fit by optimizing over hyperparameters based on goodness-of-fit fit only (i.e., no structure considerations). The package further contains functions for optimization, specifically the adaptive Luus-Jaakola algorithm and a wrapper for Bayesian optimization with treed Gaussian process with jumps to linear models, and functions for various c-structuredness indices. Hyperparameter optimization can be done with a number of techniques but we recommend either Bayesian optimization or particle swarm. For using "Kriging", users need to install a version of the archived DiceOptim
R package.
Statistical power and minimum required sample size calculations for (1) testing a proportion (one-sample) against a constant, (2) testing a mean (one-sample) against a constant, (3) testing difference between two proportions (independent samples), (4) testing difference between two means or groups (parametric and non-parametric tests for independent and paired samples), (5) testing a correlation (one-sample) against a constant, (6) testing difference between two correlations (independent samples), (7) testing a single coefficient in multiple linear regression, logistic regression, and Poisson regression (with standardized or unstandardized coefficients, with no covariates or covariate adjusted), (8) testing an indirect effect (with standardized or unstandardized coefficients, with no covariates or covariate adjusted) in the mediation analysis (Sobel, Joint, and Monte Carlo tests), (9) testing an R-squared against zero in linear regression, (10) testing an R-squared difference against zero in hierarchical regression, (11) testing an eta-squared or f-squared (for main and interaction effects) against zero in analysis of variance (could be one-way, two-way, and three-way), (12) testing an eta-squared or f-squared (for main and interaction effects) against zero in analysis of covariance (could be one-way, two-way, and three-way), (13) testing an eta-squared or f-squared (for between, within, and interaction effects) against zero in one-way repeated measures analysis of variance (with non-sphericity correction and repeated measures correlation), and (14) testing goodness-of-fit or independence for contingency tables. Alternative hypothesis can be formulated as "not equal", "less", "greater", "non-inferior", "superior", or "equivalent" in (1), (2), (3), and (4); as "not equal", "less", or "greater" in (5), (6), (7) and (8); but always as "greater" in (9), (10), (11), (12), (13), and (14). Reference: Bulus and Polat (2023) <https://osf.io/ua5fc>.
This package provides tools for semantic segmentation of geospatial data using convolutional neural network-based deep learning. Utility functions allow for creating masks, image chips, data frames listing image chips in a directory, and DataSets
for use within DataLoaders
. Additional functions are provided to serve as checks during the data preparation and training process. A UNet architecture can be defined with 4 blocks in the encoder, a bottleneck block, and 4 blocks in the decoder. The UNet can accept a variable number of input channels, and the user can define the number of feature maps produced in each encoder and decoder block and the bottleneck. Users can also choose to (1) replace all rectified linear unit (ReLU
) activation functions with leaky ReLU
or swish, (2) implement attention gates along the skip connections, (3) implement squeeze and excitation modules within the encoder blocks, (4) add residual connections within all blocks, (5) replace the bottleneck with a modified atrous spatial pyramid pooling (ASPP) module, and/or (6) implement deep supervision using predictions generated at each stage in the decoder. A unified focal loss framework is implemented after Yeung et al. (2022) <doi:10.1016/j.compmedimag.2021.102026>. We have also implemented assessment metrics using the luz package including F1-score, recall, and precision. Trained models can be used to predict to spatial data without the need to generate chips from larger spatial extents. Functions are available for performing accuracy assessment. The package relies on torch for implementing deep learning, which does not require the installation of a Python environment. Raster geospatial data are handled with terra'. Models can be trained using a Compute Unified Device Architecture (CUDA)-enabled graphics processing unit (GPU); however, multi-GPU training is not supported by torch in R'.
This package provides a variety of methods are provided to estimate and visualize distributional differences in terms of effect sizes. Particular emphasis is upon evaluating differences between two or more distributions across the entire scale, rather than at a single point (e.g., differences in means). For example, Probability-Probability (PP) plots display the difference between two or more distributions, matched by their empirical CDFs (see Ho and Reardon, 2012; <doi:10.3102/1076998611411918>), allowing for examinations of where on the scale distributional differences are largest or smallest. The area under the PP curve (AUC) is an effect-size metric, corresponding to the probability that a randomly selected observation from the x-axis distribution will have a higher value than a randomly selected observation from the y-axis distribution. Binned effect size plots are also available, in which the distributions are split into bins (set by the user) and separate effect sizes (Cohen's d) are produced for each bin - again providing a means to evaluate the consistency (or lack thereof) of the difference between two or more distributions at different points on the scale. Evaluation of empirical CDFs is also provided, with built-in arguments for providing annotations to help evaluate distributional differences at specific points (e.g., semi-transparent shading). All function take a consistent argument structure. Calculation of specific effect sizes is also possible. The following effect sizes are estimable: (a) Cohen's d, (b) Hedges g, (c) percentage above a cut, (d) transformed (normalized) percentage above a cut, (e) area under the PP curve, and (f) the V statistic (see Ho, 2009; <doi:10.3102/1076998609332755>), which essentially transforms the area under the curve to standard deviation units. By default, effect sizes are calculated for all possible pairwise comparisons, but a reference group (distribution) can be specified.
There are diverse purposes such as biomarker confirmation, novel biomarker discovery, constructing predictive models, model-based prediction, and validation. It handles binary, continuous, and time-to-event outcomes at the sample or patient level. - Biomarker confirmation utilizes established functions like glm()
from stats', coxph()
from survival', surv_fit()
, and ggsurvplot()
from survminer'. - Biomarker discovery and variable selection are facilitated by three LASSO-related functions LASSO2()
, LASSO_plus()
, and LASSO2plus()
, leveraging the glmnet R package with additional steps. - Eight versatile modeling functions are offered, each designed for predictive models across various outcomes and data types. 1) LASSO2()
, LASSO_plus()
, LASSO2plus()
, and LASSO2_reg()
perform variable selection using LASSO methods and construct predictive models based on selected variables. 2) XGBtraining()
employs XGBoost for model building and is the only function not involving variable selection. 3) Functions like LASSO2_XGBtraining()
, LASSOplus_XGBtraining()
, and LASSO2plus_XGBtraining()
combine LASSO-related variable selection with XGBoost for model construction. - All models support prediction and validation, requiring a testing dataset comparable to the training dataset. Additionally, the package introduces XGpred()
for risk prediction based on survival data, with the XGpred_predict()
function available for predicting risk groups in new datasets. The methodology is based on our new algorithms and various references: - Hastie et al. (1992, ISBN 0 534 16765-9), - Therneau et al. (2000, ISBN 0-387-98784-3), - Kassambara et al. (2021) <https://CRAN.R-project.org/package=survminer>, - Friedman et al. (2010) <doi:10.18637/jss.v033.i01>, - Simon et al. (2011) <doi:10.18637/jss.v039.i05>, - Harrell (2023) <https://CRAN.R-project.org/package=rms>, - Harrell (2023) <https://CRAN.R-project.org/package=Hmisc>, - Chen and Guestrin (2016) <arXiv:1603.02754>
, - Aoki et al. (2023) <doi:10.1200/JCO.23.01115>.
First, we provide functions to calculate the partial derivative of the first-passage time diffusion probability density function (PDF) and cumulative distribution function (CDF) with respect to the first-passage time t (only for PDF), the upper barrier a, the drift rate v, the relative starting point w, the non-decision time t0, the inter-trial variability of the drift rate sv, the inter-trial variability of the rel. starting point sw, and the inter-trial variability of the non-decision time st0. In addition the PDF and CDF themselves are also provided. Most calculations are done on the logarithmic scale to make it more stable. Since the PDF, CDF, and their derivatives are represented as infinite series, we give the user the option to control the approximation errors with the argument precision'. For the numerical integration we used the C library cubature by Johnson, S. G. (2005-2013) <https://github.com/stevengj/cubature>. Numerical integration is required whenever sv, sw, and/or st0 is not zero. Note that numerical integration reduces speed of the computation and the precision cannot be guaranteed anymore. Therefore, whenever numerical integration is used an estimate of the approximation error is provided in the output list. Note: The large number of contributors (ctb) is due to copying a lot of C/C++ code chunks from the GNU Scientific Library (GSL). Second, we provide methods to sample from the first-passage time distribution with or without user-defined truncation from above. The first method is a new adaptive rejection sampler building on the works of Gilks and Wild (1992; <doi:10.2307/2347565>) and Hartmann and Klauer (in press). The second method is a rejection sampler provided by Drugowitsch (2016; <doi:10.1038/srep20490>). The third method is an inverse transformation sampler. The fourth method is a "pseudo" adaptive rejection sampler that builds on the first method. For more details see the corresponding help files.
Self-reported health, happiness, attitudes, and other statuses or perceptions are often the subject of biases that may come from different sources. For example, the evaluation of an individualâ s own health may depend on previous medical diagnoses, functional status, and symptoms and signs of illness; as on well as life-style behaviors, including contextual social, gender, age-specific, linguistic and other cultural factors (Jylha 2009 <doi:10.1016/j.socscimed.2009.05.013>; Oksuzyan et al. 2019 <doi:10.1016/j.socscimed.2019.03.002>). The hopit package offers versatile functions for analyzing different self-reported ordinal variables, and for helping to estimate their biases. Specifically, the package provides the function to fit a generalized ordered probit model that regresses original self-reported status measures on two sets of independent variables (King et al. 2004 <doi:10.1017/S0003055403000881>; Jurges 2007 <doi:10.1002/hec.1134>; Oksuzyan et al. 2019 <doi:10.1016/j.socscimed.2019.03.002>). The first set of variables (e.g., health variables) included in the regression are individual statuses and characteristics that are directly related to the self-reported variable. In the case of self-reported health, these could be chronic conditions, mobility level, difficulties with daily activities, performance on grip strength tests, anthropometric measures, and lifestyle behaviors. The second set of independent variables (threshold variables) is used to model cut-points between adjacent self-reported response categories as functions of individual characteristics, such as gender, age group, education, and country (Oksuzyan et al. 2019 <doi:10.1016/j.socscimed.2019.03.002>). The model helps to adjust for specific socio-demographic and cultural differences in how the continuous latent health is projected onto the ordinal self-rated measure. The fitted model can be used to calculate an individual predicted latent status variable, a latent index, and standardized latent coefficients; and makes it possible to reclassify a categorical status measure that has been adjusted for inter-individual differences in reporting behavior.
Nuclear magnetic resonance (NMR) is a highly versatile analytical technique for studying molecular configuration, conformation, and dynamics, especially those of biomacromolecules such as proteins. Biological Magnetic Resonance Data Bank ('BMRB') is a repository for Data from NMR Spectroscopy on Proteins, Peptides, Nucleic Acids, and other Biomolecules. Currently, BMRB offers an R package RBMRB to fetch data, however, it doesn't easily offer individual data file downloading and storing in a local directory. When using RBMRB', the data will stored as an R object, which fundamentally hinders the NMR researches to access the rich information from raw data, for example, the metadata. Here, BMRBr File Downloader ('BMRBr') offers a more fundamental, low level downloader, which will download original deposited .str format file. This type of file contains information such as entry title, authors, citation, protein sequences, and so on. Many factors affect NMR experiment outputs, such as temperature, resonance sensitivity and etc., approximately 40% of the entries in the BMRB have chemical shift accuracy problems [1,2] Unfortunately, current reference correction methods are heavily dependent on the availability of assigned protein chemical shifts or protein structure. This is my current research project is going to solve, which will be included in the future release of the package. The current version of the package is sufficient and robust enough for downloading individual BMRB data file from the BMRB database <http://www.bmrb.wisc.edu>. The functionalities of this package includes but not limited: * To simplifies NMR researches by combine data downloading and results analysis together. * To allows NMR data reaches a broader audience that could utilize more than just chemical shifts but also metadata. * To offer reference corrected data for entries without assignment or structure information (future release). Reference: [1] E.L. Ulrich, H. Akutsu, J.F. Doreleijers, Y. Harano, Y.E. Ioannidis, J. Lin, et al., BioMagResBank
, Nucl. Acids Res. 36 (2008) D402â 8. <doi:10.1093/nar/gkm957>. [2] L. Wang, H.R. Eghbalnia, A. Bahrami, J.L. Markley, Linear analysis of carbon-13 chemical shift differences and its application to the detection and correction of errors in referencing and spin system identifications, J. Biomol. NMR. 32 (2005) 13â 22. <doi:10.1007/s10858-005-1717-0>.
This method is a new class of model selection strategies, for mixed model selection, which includes linear and generalized linear mixed models. The idea involves a procedure to isolate a subgroup of what are known as correct models (of which the optimal model is a member). This is accomplished by constructing a statistical fence, or barrier, to carefully eliminate incorrect models. Once the fence is constructed, the optimal model is selected from among those within the fence according to a criterion which can be made flexible. References: 1. Jiang J., Rao J.S., Gu Z., Nguyen T. (2008), Fence Methods for Mixed Model Selection. The Annals of Statistics, 36(4): 1669-1692. <DOI:10.1214/07-AOS517> <https://projecteuclid.org/euclid.aos/1216237296>. 2. Jiang J., Nguyen T., Rao J.S. (2009), A Simplified Adaptive Fence Procedure. Statistics and Probability Letters, 79, 625-629. <DOI:10.1016/j.spl.2008.10.014> <https://www.researchgate.net/publication/23991417_A_simplified_adaptive_fence_procedure> 3. Jiang J., Nguyen T., Rao J.S. (2010), Fence Method for Nonparametric Small Area Estimation. Survey Methodology, 36(1), 3-11. <http://publications.gc.ca/collections/collection_2010/statcan/12-001-X/12-001-x2010001-eng.pdf>. 4. Jiming Jiang, Thuan Nguyen and J. Sunil Rao (2011), Invisible fence methods and the identification of differentially expressed gene sets. Statistics and Its Interface, Volume 4, 403-415. <http://www.intlpress.com/site/pub/files/_fulltext/journals/sii/2011/0004/0003/SII-2011-0004-0003-a014.pdf>. 5. Thuan Nguyen & Jiming Jiang (2012), Restricted fence method for covariate selection in longitudinal data analysis. Biostatistics, 13(2), 303-314. <DOI:10.1093/biostatistics/kxr046> <https://academic.oup.com/biostatistics/article/13/2/303/263903/Restricted-fence-method-for-covariate-selection-in>. 6. Thuan Nguyen, Jie Peng, Jiming Jiang (2014), Fence Methods for Backcross Experiments. Statistical Computation and Simulation, 84(3), 644-662. <DOI:10.1080/00949655.2012.721885> <https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3891925/>. 7. Jiang, J. (2014), The fence methods, in Advances in Statistics, Hindawi Publishing Corp., Cairo. <DOI:10.1155/2014/830821>. 8. Jiming Jiang and Thuan Nguyen (2015), The Fence Methods, World Scientific, Singapore. <https://www.abebooks.com/9789814596060/Fence-Methods-Jiming-Jiang-981459606X/plp>.
We provide a collection of various classical tests and latest normal-reference tests for comparing high-dimensional mean vectors including two-sample and general linear hypothesis testing (GLHT) problem. Some existing tests for two-sample problem [see Bai, Zhidong, and Hewa Saranadasa.(1996) <https://www.jstor.org/stable/24306018>; Chen, Song Xi, and Ying-Li Qin.(2010) <doi:10.1214/09-aos716>; Srivastava, Muni S., and Meng Du.(2008) <doi:10.1016/j.jmva.2006.11.002>; Srivastava, Muni S., Shota Katayama, and Yutaka Kano.(2013)<doi:10.1016/j.jmva.2012.08.014>]. Normal-reference tests for two-sample problem [see Zhang, Jin-Ting, Jia Guo, Bu Zhou, and Ming-Yen Cheng.(2020) <doi:10.1080/01621459.2019.1604366>; Zhang, Jin-Ting, Bu Zhou, Jia Guo, and Tianming Zhu.(2021) <doi:10.1016/j.jspi.2020.11.008>; Zhang, Liang, Tianming Zhu, and Jin-Ting Zhang.(2020) <doi:10.1016/j.ecosta.2019.12.002>; Zhang, Liang, Tianming Zhu, and Jin-Ting Zhang.(2023) <doi:10.1080/02664763.2020.1834516>; Zhang, Jin-Ting, and Tianming Zhu.(2022) <doi:10.1080/10485252.2021.2015768>; Zhang, Jin-Ting, and Tianming Zhu.(2022) <doi:10.1007/s42519-021-00232-w>; Zhu, Tianming, Pengfei Wang, and Jin-Ting Zhang.(2023) <doi:10.1007/s00180-023-01433-6>]. Some existing tests for GLHT problem [see Fujikoshi, Yasunori, Tetsuto Himeno, and Hirofumi Wakaki.(2004) <doi:10.14490/jjss.34.19>; Srivastava, Muni S., and Yasunori Fujikoshi.(2006) <doi:10.1016/j.jmva.2005.08.010>; Yamada, Takayuki, and Muni S. Srivastava.(2012) <doi:10.1080/03610926.2011.581786>; Schott, James R.(2007) <doi:10.1016/j.jmva.2006.11.007>; Zhou, Bu, Jia Guo, and Jin-Ting Zhang.(2017) <doi:10.1016/j.jspi.2017.03.005>]. Normal-reference tests for GLHT problem [see Zhang, Jin-Ting, Jia Guo, and Bu Zhou.(2017) <doi:10.1016/j.jmva.2017.01.002>; Zhang, Jin-Ting, Bu Zhou, and Jia Guo.(2022) <doi:10.1016/j.jmva.2021.104816>; Zhu, Tianming, Liang Zhang, and Jin-Ting Zhang.(2022) <doi:10.5705/ss.202020.0362>; Zhu, Tianming, and Jin-Ting Zhang.(2022) <doi:10.1007/s00180-021-01110-6>; Zhang, Jin-Ting, and Tianming Zhu.(2022) <doi:10.1016/j.csda.2021.107385>].
Companion package, functions, data sets, examples for the book Patrice Bertail and Anna Dudek (2025), Bootstrap for Dependent Data, with an R package (by Bernard Desgraupes and Karolina Marek) - submitted. Kreiss, J.-P. and Paparoditis, E. (2003) <doi:10.1214/aos/1074290332> Politis, D.N., and White, H. (2004) <doi:10.1081/ETC-120028836> Patton, A., Politis, D.N., and White, H. (2009) <doi:10.1080/07474930802459016> Tsybakov, A. B. (2018) <doi:10.1007/b13794> Bickel, P., and Sakov, A. (2008) <doi:10.1214/18-AOS1803> Götze, F. and RaÄ kauskas, A. (2001) <doi:10.1214/lnms/1215090074> Politis, D. N., Romano, J. P., & Wolf, M. (1999, ISBN:978-0-387-98854-2) Carlstein E. (1986) <doi:10.1214/aos/1176350057> Künsch, H. (1989) <doi:10.1214/aos/1176347265> Liu, R. and Singh, K. (1992) <https://www.stat.purdue.edu/docs/research/tech-reports/1991/tr91-07.pdf> Politis, D.N. and Romano, J.P. (1994) <doi:10.1080/01621459.1994.10476870> Politis, D.N. and Romano, J.P. (1992) <https://www.stat.purdue.edu/docs/research/tech-reports/1991/tr91-07.pdf> Patrice Bertail, Anna E. Dudek. (2022) <doi:10.3150/23-BEJ1683> Dudek, A.E., LeÅ kow, J., Paparoditis, E. and Politis, D. (2014a) <https://ideas.repec.org/a/bla/jtsera/v35y2014i2p89-114.html> Beran, R. (1997) <doi:10.1023/A:1003114420352> B. Efron, and Tibshirani, R. (1993, ISBN:9780429246593) Bickel, P. J., Götze, F. and van Zwet, W. R. (1997) <doi:10.1007/978-1-4614-1314-1_17> A. C. Davison, D. Hinkley (1997) <doi:10.2307/1271471> Falk, M., & Reiss, R. D. (1989) <doi:10.1007/BF00354758> Lahiri, S. N. (2003) <doi:10.1007/978-1-4757-3803-2> Shimizu, K. .(2017) <doi:10.1007/978-3-8348-9778-7> Park, J.Y. (2003) <doi:10.1111/1468-0262.00471> Kirch, C. and Politis, D. N. (2011) <doi:10.48550/arXiv.1211.4732>
Bertail, P. and Dudek, A.E. (2024) <doi:10.3150/23-BEJ1683> Dudek, A. E. (2015) <doi:10.1007/s00184-014-0505-9> Dudek, A. E. (2018) <doi:10.1080/10485252.2017.1404060> Bertail, P., Clémençon, S. (2006a) <https://ideas.repec.org/p/crs/wpaper/2004-47.html> Bertail, P. and Clémençon, S. (2006, ISBN:978-0-387-36062-1) RaduloviÄ , D. (2006) <doi:10.1007/BF02603005> Bertail, P. Politis, D. N. Rhomari, N. (2000) <doi:10.1080/02331880008802701> Nordman, D.J. Lahiri, S.N.(2004) <doi:10.1214/009053604000000779> Politis, D.N. Romano, J.P. (1993) <doi:10.1006/jmva.1993.1085> Hurvich, C. M. and Zeger, S. L. (1987, ISBN:978-1-4612-0099-4) Bertail, P. and Dudek, A. (2021) <doi:10.1214/20-EJS1787> Bertail, P., Clémençon, S. and Tressou, J. (2015) <doi:10.1111/jtsa.12105> Asmussen, S. (1987) <doi:10.1007/978-3-662-11657-9> Efron, B. (1979) <doi:10.1214/aos/1176344552> Gray, H., Schucany, W. and Watkins, T. (1972) <doi:10.2307/2335521> Quenouille, M.H. (1949) <doi:10.1111/j.2517-6161.1949.tb00023.x> Quenouille, M. H. (1956) <doi:10.2307/2332914> Prakasa Rao, B. L. S. and Kulperger, R. J. (1989) <https://www.jstor.org/stable/25050735> Rajarshi, M.B. (1990) <doi:10.1007/BF00050835> Dudek, A.E. Maiz, S. and Elbadaoui, M. (2014) <doi:10.1016/j.sigpro.2014.04.022> Beran R. (1986) <doi:10.1214/aos/1176349847> Maritz, J. S. and Jarrett, R. G. (1978) <doi:10.2307/2286545> Bertail, P., Politis, D., Romano, J. (1999) <doi:10.2307/2670177> Bertail, P. and Clémençon, S. (2006b) <doi:10.1007/0-387-36062-X_1> RaduloviÄ , D. (2004) <doi:10.1007/BF02603005> Hurd, H.L., Miamee, A.G. (2007) <doi:10.1002/9780470182833> Bühlmann, P. (1997) <doi:10.2307/3318584> Choi, E., Hall, P. (2000) <doi:10.1111/1467-9868.00244> Efron, B., Tibshirani, R. (1993, ISBN:9780429246593) Bertail, P., Clémençon, S. and Tressou, J. (2009) <doi:10.1007/s10687-009-0081-y> Bertail, P., Medina-Garay, A., De Lima-Medina, F. and Jales, I. (2024) <doi:10.1080/02331888.2024.2344670>.
Queries data from WHOIS servers.
This package provides Rusty Object Notation (RON).
This package provides Rusty Object Notation (RON).
This package provides Rusty Object Notation (RON).
This package provides Rusty Object Notation (RON).
This package provides Rusty Object Notation (RON).