This package provides a general toolkit for downloading, managing, analyzing, and presenting data from the U.S. Census, including SF1 (Decennial short-form), SF3 (Decennial long-form), and the American Community Survey (ACS). Confidence intervals provided with ACS data are converted to standard errors to be bundled with estimates in complex acs objects. The package provides new methods to conduct standard operations on acs objects and present/plot data in statistically appropriate ways.
This LPE library is used to do significance analysis of microarray data with small number of replicates. It uses resampling based FDR adjustment, and gives less conservative results than traditional BH or BY procedures. Data accepted is raw data in txt format from MAS4, MAS5 or dChip. Data can also be supplied after normalization. LPE library is primarily used for analyzing data between two conditions. To use it for paired data, see LPEP library. For using LPE in multiple conditions, use HEM library.
This package provides a testing framework for testing the multivariate point null hypothesis. A testing framework described in Elder et al. (2022) <arXiv:2203.01897> to test the multivariate point null hypothesis. After the user selects a parameter of interest and defines the assumed data generating mechanism, this information should be encoded in functions for the parameter estimator and its corresponding influence curve. Some parameter and data generating mechanism combinations have codings in this package, and are explained in detail in the article.
This code provides a method to fit the hidden compact representation model as well as to identify the causal direction on discrete data. We implement an effective solution to recover the above hidden compact representation under the likelihood framework. Please see the Causal Discovery from Discrete Data using Hidden Compact Representation from NIPS 2018 by Ruichu Cai, Jie Qiao, Kun Zhang, Zhenjie Zhang and Zhifeng Hao (2018) <https://nips.cc/Conferences/2018/Schedule?showEvent=11274> for a description of some of our methods.
Variable selection techniques are essential tools for model selection and estimation in high-dimensional statistical models. Through this publicly available package, we provide a unified environment to carry out variable selection using iterative sure independence screening (SIS) (Fan and Lv (2008)<doi:10.1111/j.1467-9868.2008.00674.x>) and all of its variants in generalized linear models (Fan and Song (2009)<doi:10.1214/10-AOS798>) and the Cox proportional hazards model (Fan, Feng and Wu (2010)<doi:10.1214/10-IMSCOLL606>).
Uniform Error Index is the weighted average of different error measures. Uniform Error Index utilizes output from different error function and gives more robust and stable error values. This package has been developed to compute Uniform Error Index from ten different loss function like Error Square, Square of Square Error, Quasi Likelihood Error, LogR-Square, Absolute Error, Absolute Square Error etc. The weights are determined using Principal Component Analysis (PCA) algorithm of Yeasin and Paul (2024) <doi:10.1007/s11227-023-05542-3>.
This package provides functions to assist in the processing and exploration of data from environmental monitoring programs. The package name stands for "water quality" and reflects the original focus on time series data for physical and chemical properties of water, as well as the biota. Intended for programs that sample approximately monthly, quarterly or annually at discrete stations, a feature of many legacy data sets. Most of the functions should be useful for analysis of similar-frequency time series regardless of the subject matter.
This package provides fast and easy access to German census grid data from the 2011 and 2022 censuses <https://www.zensus2022.de/>, including a wide range of socio-economic indicators at multiple spatial resolutions (100m, 1km, 10km). Enables efficient download, processing, and analysis of large census datasets covering population, households, families, dwellings, and buildings. Harmonized data structures allow direct comparison with the 2011 census, supporting temporal and spatial analyses. Facilitates conversion of data into common formats for spatial analysis and mapping ('terra', sf', ggplot2').
This package aims to streamline and accelerate the process of saving and loading R objects, improving speed and compression compared to other methods. The package provides two compression formats: the qs2 format, which uses R serialization via the C API while optimizing compression and disk I/O, and the qdata format, featuring custom serialization for slightly faster performance and better compression. Additionally, the qs2 format can be directly converted to the standard RDS format, ensuring long-term compatibility with future versions of R.
This package provides a new method for interpretable heterogeneous treatment effects characterization in terms of decision rules via an extensive exploration of heterogeneity patterns by an ensemble-of-trees approach, enforcing high stability in the discovery. It relies on a two-stage pseudo-outcome regression, and it is supported by theoretical convergence guarantees. Bargagli-Stoffi, F. J., Cadei, R., Lee, K., & Dominici, F. (2023) Causal rule ensemble: Interpretable Discovery and Inference of Heterogeneous Treatment Effects. arXiv preprint <doi:10.48550/arXiv.2009.09036>.
The development of ISM was made by Warfield in 1974. ISM is the process of collaborating distinct or related essentials into a simplified and an organized format. Hence, ISM is a methodology that seeks the interrelationships among the various elements considered and endows with a hierarchical and multilevel structure. To run this package user needs to provide a matrix (VAXO) converted into 0's and 1's. Warfield,J.N. (1974) <doi:10.1109/TSMC.1974.5408524> Warfield,J.N. (1974, E-ISSN:2168-2909).
Computes the implied weights of linear regression models for estimating average causal effects and provides diagnostics based on these weights. These diagnostics rely on the analyses in Chattopadhyay and Zubizarreta (2023) <doi:10.1093/biomet/asac058> where several regression estimators are represented as weighting estimators, in connection to inverse probability weighting. lmw provides tools to diagnose representativeness, balance, extrapolation, and influence for these models, clarifying the target population of inference. Tools are also available to simplify estimating treatment effects for specific target populations of interest.
NNS (Nonlinear Nonparametric Statistics) leverages partial moments â the fundamental elements of variance that asymptotically approximate the area under f(x) â to provide a robust foundation for nonlinear analysis while maintaining linear equivalences. NNS delivers a comprehensive suite of advanced statistical techniques, including: Numerical integration, Numerical differentiation, Clustering, Correlation, Dependence, Causal analysis, ANOVA, Regression, Classification, Seasonality, Autoregressive modeling, Normalization, Stochastic dominance and Advanced Monte Carlo sampling. All routines based on: Viole, F. and Nawrocki, D. (2013), Nonlinear Nonparametric Statistics: Using Partial Moments (ISBN: 1490523995).
This package provides a facility to generate balanced semi-Latin rectangles with any cell size (preferably up to ten) with given number of treatments, see Uto, N.P. and Bailey, R.A. (2020). "Balanced Semi-Latin rectangles: properties, existence and constructions for block size two". Journal of Statistical Theory and Practice, 14(3), 1-11, <doi:10.1007/s42519-020-00118-3>. It also provides facility to generate partially balanced semi-Latin rectangles for cell size 2, 3 and 4 for any number of treatments.
Efficient Bayesian implementations of probit, logit, multinomial logit and binomial logit models. Functions for plotting and tabulating the estimation output are available as well. Estimation is based on Gibbs sampling where the Markov chain Monte Carlo algorithms are based on the latent variable representations and marginal data augmentation algorithms described in "Gregor Zens, Sylvia Frühwirth-Schnatter & Helga Wagner (2023). Ultimate Pólya Gamma Samplers â Efficient MCMC for possibly imbalanced binary and categorical data, Journal of the American Statistical Association <doi:10.1080/01621459.2023.2259030>".
This is an implementation of the Generalized Discrimination Score (also known as Two Alternatives Forced Choice Score, 2AFC) for various representations of forecasts and verifying observations. The Generalized Discrimination Score is a generic forecast verification framework which can be applied to any of the following verification contexts: dichotomous, polychotomous (ordinal and nominal), continuous, probabilistic, and ensemble. A comprehensive description of the Generalized Discrimination Score, including all equations used in this package, is provided by Mason and Weigel (2009) <doi:10.1175/MWR-D-10-05069.1>.
Column Text Format (CTF) is a new tabular data format designed for simplicity and performance. CTF is the simplest column store you can imagine: plain text files for each column in a table, and a metadata file. The underlying plain text means the data is human readable and familiar to programmers, unlike specialized binary formats. CTF is faster than row oriented formats like CSV when loading a subset of the columns in a table. This package provides functions to read and write CTF data from R.
Functions, S4 classes/methods and a graphical user interface (GUI) to design surveys to substantiate freedom from disease using a modified hypergeometric function (see Cameron and Baldock, 1997, <doi:10.1016/s0167-5877(97)00081-0>). Herd sensitivities are computed according to sampling strategies "individual sampling" or "limited sampling" (see M. Ziller, T. Selhorst, J. Teuffert, M. Kramer and H. Schlueter, 2002, <doi:10.1016/S0167-5877(01)00245-8>). Methods to compute the a-posteriori alpha-error are implemented. Risk-based targeted sampling is supported.
Diagnostic tools as residual analysis, global, local and total-local influence for the multivariate model from the random intercept Poisson generalized log gamma model are available in this package. Including also, the estimation process by maximum likelihood method, for details see Fabio, L. C; Villegas, C. L.; Carrasco, J.M.F and de Castro, M. (2023) <doi:10.1080/03610926.2021.1939380> and Fábio, L. C.; Villegas, C.; Mamun, A. S. M. A. and Carrasco, J. M. F. (2025) <doi:10.28951/bjb.v43i1.728>.
Population dynamic models underpin a range of analyses and applications in ecology and epidemiology. The various approaches for analysing population dynamics models (MPMs, IPMs, ODEs, POMPs, PVA) each require the model to be defined in a different way. This makes it difficult to combine different modelling approaches and data types to solve a given problem. pop aims to provide a flexible and easy to use common interface for constructing population dynamic models and enabling to them to be fitted and analysed in lots of different ways.
Optogenetics is a new tool to study neuronal circuits that have been genetically modified to allow stimulation by flashes of light. This package implements the methodological framework, Point-process Response model for Optogenetics (PRO), for analyzing data from these experiments. This method provides explicit nonlinear transformations to link the flash point-process with the spiking point-process. Such response functions can be used to provide important and interpretable scientific insights into the properties of the biophysical process that governs neural spiking in response to optogenetic stimulation.
In the big data setting, working data sets are often distributed on multiple machines. However, classical statistical methods are often developed to solve the problems of single estimation or inference. We employ a novel parallel quasi-likelihood method in generalized linear models, to make the variances between different sub-estimators relatively similar. Estimates are obtained from projection subsets of data and later combined by suitably-chosen unknown weights. The philosophy of the package is described in Guo G. (2020) <doi:10.1007/s00180-020-00974-4>.
This package implements a method for fitting a bounded probability distribution to quantiles (for example stated by an expert), see Bornkamp and Ickstadt (2009) for details. For this purpose B-splines are used, and the density is obtained by penalized least squares based on a Brier entropy penalty. The package provides methods for fitting the distribution as well as methods for evaluating the underlying density and cdf. In addition methods for plotting the distribution, drawing random numbers and calculating quantiles of the obtained distribution are provided.
The method implemented in this package performs bottom-up hierarchical clustering, using a Dirichlet Process (infinite mixture) to model uncertainty in the data and Bayesian model selection to decide at each step which clusters to merge. This avoids several limitations of traditional methods, for example how many clusters there should be and how to choose a principled distance metric. This implementation accepts multinomial (i.e. discrete, with 2+ categories) or time-series data. This version also includes a randomised algorithm which is more efficient for larger data sets.