Utilize an orthogonality constrained optimization algorithm of Wen & Yin (2013) <DOI:10.1007/s10107-012-0584-1> to solve a variety of dimension reduction problems in the semiparametric framework, such as Ma & Zhu (2012) <DOI:10.1080/01621459.2011.646925>, Ma & Zhu (2013) <DOI:10.1214/12-AOS1072>, Sun, Zhu, Wang & Zeng (2019) <DOI:10.1093/biomet/asy064> and Zhou, Zhu & Zeng (2021) <DOI:10.1093/biomet/asaa087>. The package also implements some existing dimension reduction methods such as hMave by Xia, Zhang, & Xu (2010) <DOI:10.1198/jasa.2009.tm09372> and partial SAVE by Feng, Wen & Zhu (2013) <DOI:10.1080/01621459.2012.746065>. It also serves as a general purpose optimization solver for problems with orthogonality constraints, i.e., in Stiefel manifold. Parallel computing for approximating the gradient is enabled through OpenMP'.
Calculates a comprehensive list of features from profile hidden Markov models (HMMs) of proteins. Adapts and ports features for use with HMMs instead of Position Specific Scoring Matrices, in order to take advantage of more accurate multiple sequence alignment by programs such as HHBlits Remmert et al. (2012) <DOI:10.1038/nmeth.1818> and HMMer Eddy (2011) <DOI:10.1371/journal.pcbi.1002195>. Features calculated by this package can be used for protein fold classification, protein structural class prediction, sub-cellular localization and protein-protein interaction, among other tasks. Some examples of features extracted are found in Song et al. (2018) <DOI:10.3390/app8010089>, Jin & Zhu (2021) <DOI:10.1155/2021/8629776>, Lyons et al. (2015) <DOI:10.1109/tnb.2015.2457906> and Saini et al. (2015) <DOI:10.1016/j.jtbi.2015.05.030>.
When comparing single cases to control populations and no parameters are known researchers and clinicians must estimate these with a control sample. This is often done when testing a case's abnormality on some variable or testing abnormality of the discrepancy between two variables. Appropriate frequentist and Bayesian methods for doing this are here implemented, including tests allowing for the inclusion of covariates. These have been developed first and foremost by John Crawford and Paul Garthwaite, e.g. in Crawford and Howell (1998) <doi:10.1076/clin.12.4.482.7241>, Crawford and Garthwaite (2005) <doi:10.1037/0894-4105.19.3.318>, Crawford and Garthwaite (2007) <doi:10.1080/02643290701290146> and Crawford, Garthwaite and Ryan (2011) <doi:10.1016/j.cortex.2011.02.017>. The package is also equipped with power calculators for each method.
This package provides methods for building self-organizing maps (SOMs) with a number of distinguishing features such automatic centroid detection and cluster visualization using starbursts. For more details see the paper "Improved Interpretability of the Unified Distance Matrix with Connected Components" by Hamel and Brown (2011) in <ISBN:1-60132-168-6>. The package provides user-friendly access to two models we construct: (a) a SOM model and (b) a centroid based clustering model. The package also exposes a number of quality metrics for the quantitative evaluation of the map, Hamel (2016) <doi:10.1007/978-3-319-28518-4_4>. Finally, we reintroduced our fast, vectorized training algorithm for SOM with substantial improvements. It is about an order of magnitude faster than the canonical, stochastic C implementation <doi:10.1007/978-3-030-01057-7_60>.
English is the native language for only 5% of the World population. Also, only 17% of us can understand this text. Moreover, the Latin alphabet is the main one for merely 36% of the total. The early computer era, now a very long time ago, was dominated by the US. Due to the proliferation of the internet, smartphones, social media, and other technologies and communication platforms, this is no longer the case. This package replaces base R string functions (such as grep(), tolower(), sprintf(), and strptime()) with ones that fully support the Unicode standards related to natural language and date-time processing. It also fixes some long-standing inconsistencies, and introduces some new, useful features. Thanks to ICU (International Components for Unicode) and stringi', they are fast, reliable, and portable across different platforms.
Inference procedures accommodate a flexible range of hazard ratio patterns with a two-sample semi-parametric model. This model contains the proportional hazards model and the proportional odds model as sub-models, and accommodates non-proportional hazards situations to the extreme of having crossing hazards and crossing survivor functions. Overall, this package has four major functions: 1) the parameter estimation, namely short-term and long-term hazard ratio parameters; 2) 95 percent and 90 percent point-wise confidence intervals and simultaneous confidence bands for the hazard ratio function; 3) p-value of the adaptive weighted log-rank test; 4) p-values of two lack-of-fit tests for the model. See the included "read_me_first.pdf" for brief instructions. In this version (1.1), there is no need to sort the data before applying this package.
This package provides an arsenal of R functions for large-scale statistical summaries, which are streamlined to work within the latest reporting tools in R and RStudio and which use formulas and versatile summary statistics for summary tables and models. The primary functions include
tableby, a Table-1-like summary of multiple variable types by the levels of one or more categorical variables;paired, a Table-1-like summary of multiple variable types paired across two time points;modelsum, which performs simple model fits on one or more endpoints for many variables (univariate or adjusted for covariates);freqlist, a powerful frequency table across many categorical variables;comparedf, a function for comparingdata.frames; andwrite2, a function to output tables to a document.
Computation of adherence to medications from Electronic Health care Data and visualization of individual medication histories and adherence patterns. The package implements a set of S3 classes and functions consistent with current adherence guidelines and definitions. It allows the computation of different measures of adherence (as defined in the literature, but also several original ones), their publication-quality plotting, the estimation of event duration and time to initiation, the interactive exploration of patient medication history and the real-time estimation of adherence given various parameter settings. It scales from very small datasets stored in flat CSV files to very large databases and from single-thread processing on mid-range consumer laptops to parallel processing on large heterogeneous computing clusters. It exposes a standardized interface allowing it to be used from other programming languages and platforms, such as Python.
Statistical methods for ROC surface analysis in three-class classification problems for clustered data and in presence of covariates. In particular, the package allows to obtain covariate-specific point and interval estimation for: (i) true class fractions (TCFs) at fixed pairs of thresholds; (ii) the ROC surface; (iii) the volume under ROC surface (VUS); (iv) the optimal pairs of thresholds. Methods considered in points (i), (ii) and (iv) are proposed and discussed in To et al. (2022) <doi:10.1177/09622802221089029>. Referring to point (iv), three different selection criteria are implemented: Generalized Youden Index (GYI), Closest to Perfection (CtP) and Maximum Volume (MV). Methods considered in point (iii) are proposed and discussed in Xiong et al. (2018) <doi:10.1177/0962280217742539>. Visualization tools are also provided. We refer readers to the articles cited above for all details.
Fits a geographically weighted regression model using zero inflated probability distributions. Has the zero inflated negative binomial distribution (zinb) as default, but also accepts the zero inflated Poisson (zip), negative binomial (negbin) and Poisson distributions. Can also fit the global versions of each regression model. Da Silva, A. R. & De Sousa, M. D. R. (2023). "Geographically weighted zero-inflated negative binomial regression: A general case for count data", Spatial Statistics <doi:10.1016/j.spasta.2023.100790>. Brunsdon, C., Fotheringham, A. S., & Charlton, M. E. (1996). "Geographically weighted regression: a method for exploring spatial nonstationarity", Geographical Analysis, <doi:10.1111/j.1538-4632.1996.tb00936.x>. Yau, K. K. W., Wang, K., & Lee, A. H. (2003). "Zero-inflated negative binomial mixed regression modeling of over-dispersed count data with extra zeros", Biometrical Journal, <doi:10.1002/bimj.200390024>.
This package provides functions and example datasets for phytosociological analysis, forest inventory, biomass and carbon estimation, and visualization of vegetation data. Includes functions to compute structural parameters [phytoparam(), summary.param(), stats()], estimate above-ground biomass and carbon [AGB()], stratify wood volume by diameter at breast height (DBH) classes [stratvol()], generate collector and rarefaction curves [collector.curve(), rarefaction()], and visualize basal areas on quadrat maps [BAplot(), including rectangular plots and individual coordinates]. Several example datasets are provided to demonstrate the functionality of these tools. For more details see FAO (1981, ISBN:92-5-101132-X) "Manual of forest inventory", IBGE (2012, ISBN:9788524042720) "Manual técnico da vegetação brasileira" and Heringer et al. (2020) "Phytosociology in R: A routine to estimate phytosociological parameters" <doi:10.22533/at.ed.3552009033>.
Processes amino acid alignments produced by the IPD-IMGT/HLA (Immuno Polymorphism-ImMunoGeneTics/Human Leukocyte Antigen) Database to identify user-defined amino acid residue motifs shared across HLA alleles, HLA alleles, or HLA haplotypes, and calculates frequencies based on HLA allele frequency data. SSHAARP (Searching Shared HLA Amino Acid Residue Prevalence) uses Generic Mapping Tools (GMT) software and the GMT R package to generate global frequency heat maps that illustrate the distribution of each user-defined map around the globe. SSHAARP analyzes the allele frequency data described by Solberg et al. (2008) <doi:10.1016/j.humimm.2008.05.001>, a global set of 497 population samples from 185 published datasets, representing 66,800 individuals total. Users may also specify their own datasets, but file conventions must follow the prebundled Solberg dataset, or the mock haplotype dataset.
DNA methylation signatures are usually based on multivariate approaches that require hundreds of sites for predictions. CimpleG is a method for the detection of small CpG methylation signatures used for cell-type classification and deconvolution. CimpleG is time efficient and performs as well as top performing methods for cell-type classification of blood cells and other somatic cells, while basing its prediction on a single DNA methylation site per cell type (but users can also select more sites if they so wish). Users can train cell type classifiers ('CimpleG based, and others) and directly apply these in a deconvolution of cell mixes context. Altogether, CimpleG provides a complete computational framework for the delineation of DNAm signatures and cellular deconvolution. For more details see Maié et al. (2023) <doi:10.1186/s13059-023-03000-0>.
Estimation of a density from grouped (tabulated) summary statistics evaluated in each of the big bins (or classes) partitioning the support of the variable. These statistics include class frequencies and central moments of order one up to four. The log-density is modelled using a linear combination of penalised B-splines. The multinomial log-likelihood involving the frequencies adds up to a roughness penalty based on the differences in the coefficients of neighbouring B-splines and the log of a root-n approximation of the sampling density of the observed vector of central moments in each class. The so-obtained penalized log-likelihood is maximized using the EM algorithm to get an estimate of the spline parameters and, consequently, of the variable density and related quantities such as quantiles, see Lambert, P. (2021) <arXiv:2107.03883> for details.
This package provides functions for (1) soil water retention (SWC) and unsaturated hydraulic conductivity (Ku) (van Genuchten-Mualem (vGM or vG) [1, 2], Peters-Durner-Iden (PDI) [3, 4, 5], Brooks and Corey (bc) [8]), (2) fitting of parameter for SWC and/or Ku using Shuffled Complex Evolution (SCE) optimisation and (3) calculation of soil hydraulic properties (Ku and soil water contents) based on the simplified evaporation method (SEM) [6, 7]. Main references: [1] van Genuchten (1980) <doi:10.2136/sssaj1980.03615995004400050002x>, [2] Mualem (1976) <doi:10.1029/WR012i003p00513>, [3] Peters (2013) <doi:10.1002/wrcr.20548>, [4] Iden and Durner (2013) <doi:10.1002/2014WR015937>, [5] Peters (2014) <doi:10.1002/2014WR015937>, [6] Wind G. P. (1966), [7] Peters and Durner (2008) <doi:10.1016/j.jhydrol.2008.04.016> and [8] Brooks and Corey (1964).
The model estimates air pollution removal by dry deposition on trees. It also estimates or uses hourly values for aerodynamic resistance, boundary layer resistance, canopy resistance, stomatal resistance, cuticular resistance, mesophyll resistance, soil resistance, friction velocity and deposition velocity. It also allows plotting graphical results for a specific time period. The pollutants are nitrogen dioxide, ozone, sulphur dioxide, carbon monoxide and particulate matter. Baldocchi D (1994) <doi:10.1093/treephys/14.7-8-9.1069>. Farquhar GD, von Caemmerer S, Berry JA (1980) Planta 149: 78-90. Hirabayashi S, Kroll CN, Nowak DJ (2015) i-Tree Eco Dry Deposition Model. Nowak DJ, Crane DE, Stevens JC (2006) <doi:10.1016/j.ufug.2006.01.007>. US EPA (1999) PCRAMMET User's Guide. EPA-454/B-96-001. Weiss A, Norman JM (1985) Agricultural and Forest Meteorology 34: 205â 213.
Perform state and parameter inference, and forecasting, in stochastic state-space systems using the ctsmTMB class. This class, built with the R6 package, provides a user-friendly interface for defining and handling state-space models. Inference is based on maximum likelihood estimation, with derivatives efficiently computed through automatic differentiation enabled by the TMB'/'RTMB packages (Kristensen et al., 2016) <doi:10.18637/jss.v070.i05>. The available inference methods include Kalman filters, in addition to a Laplace approximation-based smoothing method. For further details of these methods refer to the documentation of the CTSMR package <https://ctsm.info/ctsmr-reference.pdf> and Thygesen (2025) <doi:10.48550/arXiv.2503.21358>. Forecasting capabilities include moment predictions and stochastic path simulations, both implemented in C++ using Rcpp (Eddelbuettel et al., 2018) <doi:10.1080/00031305.2017.1375990> for computational efficiency.
Variable selection methods have been extensively developed for analyzing highdimensional omics data within both the frequentist and Bayesian frameworks. This package provides implementations of the spike-and-slab quantile (group) LASSO which have been developed along the line of Bayesian hierarchical models but deeply rooted in frequentist regularization methods by utilizing Expectationâ Maximization (EM) algorithm. The spike-and-slab quantile LASSO can handle data irregularity in terms of skewness and outliers in response variables, compared to its non-robust alternative, the spike-and-slab LASSO, which has also been implemented in the package. In addition, procedures for fitting the spike-and-slab quantile group LASSO and its non-robust counterpart have been implemented in the form of quantile/least-square varying coefficient mixed effect models for high-dimensional longitudinal data. The core module of this package is developed in C++'.
Density estimation for possibly large data sets and conditional/unconditional random number generation or bootstrapping with distribution element trees. The function det.construct translates a dataset into a distribution element tree. To evaluate the probability density based on a previously computed tree at arbitrary query points, the function det.query is available. The functions det1 and det2 provide density estimation and plotting for one- and two-dimensional datasets. Conditional/unconditional smooth bootstrapping from an available distribution element tree can be performed by det.rnd'. For more details on distribution element trees, see: Meyer, D.W. (2016) <arXiv:1610.00345> or Meyer, D.W., Statistics and Computing (2017) <doi:10.1007/s11222-017-9751-9> and Meyer, D.W. (2017) <arXiv:1711.04632> or Meyer, D.W., Journal of Computational and Graphical Statistics (2018) <doi:10.1080/10618600.2018.1482768>.
Collection of data sets from various assessments that can be used to evaluate psychometric models. These data sets have been analyzed in the following papers that introduced new methodology as part of the application section: Jimenez, A., Balamuta, J. J., & Culpepper, S. A. (2023) <doi:10.1111/bmsp.12307>, Culpepper, S. A., & Balamuta, J. J. (2021) <doi:10.1080/00273171.2021.1985949>, Yinghan Chen et al. (2021) <doi:10.1007/s11336-021-09750-9>, Yinyin Chen et al. (2020) <doi:10.1007/s11336-019-09693-2>, Culpepper, S. A. (2019a) <doi:10.1007/s11336-019-09683-4>, Culpepper, S. A. (2019b) <doi:10.1007/s11336-018-9643-8>, Culpepper, S. A., & Chen, Y. (2019) <doi:10.3102/1076998618791306>, Culpepper, S. A., & Balamuta, J. J. (2017) <doi:10.1007/s11336-015-9484-7>, and Culpepper, S. A. (2015) <doi:10.3102/1076998615595403>.
It is a hybrid spatial model that combines the variable selection capabilities of stepwise regression methods with the predictive power of the Geographically Weighted Regression(GWR) model.The developed hybrid model follows a two-step approach where the stepwise variable selection method is applied first to identify the subset of predictors that have the most significant impact on the response variable, and then a GWR model is fitted using those selected variables for spatial prediction at test or unknown locations. For method details,see Leung, Y., Mei, C. L. and Zhang, W. X. (2000).<DOI:10.1068/a3162>.This hybrid spatial model aims to improve the accuracy and interpretability of GWR predictions by selecting a subset of relevant variables through a stepwise selection process.This approach is particularly useful for modeling spatially varying relationships and improving the accuracy of spatial predictions.
Conditional graphical lasso estimator is an extension of the graphical lasso proposed to estimate the conditional dependence structure of a set of p response variables given q predictors. This package provides suitable extensions developed to study datasets with censored and/or missing values. Standard conditional graphical lasso is available as a special case. Furthermore, the package provides an integrated set of core routines for visualization, analysis, and simulation of datasets with censored and/or missing values drawn from a Gaussian graphical model. Details about the implemented models can be found in Augugliaro et al. (2023) <doi: 10.18637/jss.v105.i01>, Augugliaro et al. (2020b) <doi: 10.1007/s11222-020-09945-7>, Augugliaro et al. (2020a) <doi: 10.1093/biostatistics/kxy043>, Yin et al. (2001) <doi: 10.1214/11-AOAS494> and Stadler et al. (2012) <doi: 10.1007/s11222-010-9219-7>.
Partitioning clustering algorithms divide data sets into k subsets or partitions so-called clusters. They require some initialization procedures for starting the algorithms. Initialization of cluster prototypes is one of such kind of procedures for most of the partitioning algorithms. Cluster prototypes are the centers of clusters, i.e. centroids or medoids, representing the clusters in a data set. In order to initialize cluster prototypes, the package inaparc contains a set of the functions that are the implementations of several linear time-complexity and loglinear time-complexity methods in addition to some novel techniques. Initialization of fuzzy membership degrees matrices is another important task for starting the probabilistic and possibilistic partitioning algorithms. In order to initialize membership degrees matrices required by these algorithms, a number of functions based on some traditional and novel initialization techniques are also available in the package inaparc'.
Best Orthogonalized Subset Selection (BOSS) is a least-squares (LS) based subset selection method, that performs best subset selection upon an orthogonalized basis of ordered predictors, with the computational effort of a single ordinary LS fit. This package provides a highly optimized implementation of BOSS and estimates a heuristic degrees of freedom for BOSS, which can be plugged into an information criterion (IC) such as AICc in order to select the subset from candidates. It provides various choices of IC, including AIC, BIC, AICc, Cp and GCV. It also implements the forward stepwise selection (FS) with no additional computational cost, where the subset of FS is selected via cross-validation (CV). CV is also an option for BOSS. For details see: Tian, Hurvich and Simonoff (2021), "On the Use of Information Criteria for Subset Selection in Least Squares Regression", <arXiv:1911.10191>.