Urban water and sanitation survey dataset collected by Water and Sanitation for the Urban Poor (WSUP) with technical support from Valid International. These citywide surveys have been collecting data allowing water and sanitation service levels across the entire city to be characterised, while also allowing more detailed data to be collected in areas of the city of particular interest. These surveys are intended to generate useful information for others working in the water and sanitation sector. Current release version includes datasets collected from a survey conducted in Dhaka, Bangladesh in March 2017. This survey in Dhaka is one of a series of surveys to be conducted by WSUP in various cities in which they operate including Accra, Ghana; Nakuru, Kenya; Antananarivo, Madagascar; Maputo, Mozambique; and, Lusaka, Zambia. This package will be updated once the surveys in other cities are completed and datasets have been made available.
The experiment selector cross-validated targeted maximum likelihood estimator (ES-CVTMLE) aims to select the experiment that optimizes the bias-variance tradeoff for estimating a causal average treatment effect (ATE) where different experiments may include a randomized controlled trial (RCT) alone or an RCT combined with real-world data. Using cross-validation, the ES-CVTMLE separates the selection of the optimal experiment from the estimation of the ATE for the chosen experiment. The estimated bias term in the selector is a function of the difference in conditional mean outcome under control for the RCT compared to the combined experiment. In order to help include truly unbiased external data in the analysis, the estimated average treatment effect on a negative control outcome may be added to the bias term in the selector. For more details about this method, please see Dang et al. (2022) <arXiv:2210.05802>.
Toolbox to process raw data from closed loop flux chamber (or tent) setups into ecosystem gas fluxes usable for analysis. It goes from a data frame of gas concentration over time (which can contain several measurements) and a meta data file indicating which measurement was done when, to a data frame of ecosystem gas fluxes including quality diagnostics. Organized with one function per step, maximizing user flexibility and backwards compatibility. Different models to estimate the fluxes from the raw data are available: exponential as described in Zhao et al (2018) <doi:10.1016/j.agrformet.2018.08.022>, exponential as described in Hutchinson and Mosier (1981) <doi:10.2136/sssaj1981.03615995004500020017x>, quadratic, and linear. Other functions include quality assessment, plotting for visual check, calculation of fluxes based on the setup specific parameters (chamber size, plot area, ...), gross primary production and transpiration rate calculation, and light response curves.
This package provides basic functionalities to calculate the position of satellites given a known state vector. The package includes implementations of the SGP4 and SDP4 simplified perturbation models to propagate orbital state vectors, as well as utilities to read TLE files and convert coordinates between different frames of reference. Several of the functionalities of the package (including the high-precision numerical orbit propagator) require the coefficients and data included in the asteRiskData package, available in a drat repository. To install this data package, run install.packages("asteRiskData", repos="https://rafael-ayala.github.io/drat/")'. Felix R. Hoots, Ronald L. Roehrich and T.S. Kelso (1988) <https://celestrak.org/NORAD/documentation/spacetrk.pdf>. David Vallado, Paul Crawford, Richard Hujsak and T.S. Kelso (2012) <doi:10.2514/6.2006-6753>. Felix R. Hoots, Paul W. Schumacher Jr. and Robert A. Glover (2014) <doi:10.2514/1.9161>.
This package provides functions to compute fuzzy versions of species occurrence patterns based on presence-absence data (including inverse distance interpolation, trend surface analysis, and prevalence-independent favourability obtained from probability of presence), as well as pair-wise fuzzy similarity (based on fuzzy logic versions of commonly used similarity indices) among those occurrence patterns. Includes also functions for model consensus and comparison (overlap and fuzzy similarity, fuzzy loss, fuzzy gain), and for data preparation, such as obtaining unique abbreviations of species names, defining the background region, cleaning and gridding (thinning) point occurrence data onto raster maps, selecting among (pseudo)absences to address survey bias, converting species lists (long format) to presence-absence tables (wide format), transposing part of a data frame, selecting relevant variables for models, assessing the false discovery rate, or analysing and dealing with multicollinearity. Initially described in Barbosa (2015) <doi:10.1111/2041-210X.12372>.
This package implements a physics-informed one-dimensional convolutional neural network (CNN1D-PINN) for estimating the complete soil water retention curve (SWRC) as a continuous function of matric potential, from soil texture, organic carbon, bulk density, and depth. The network architecture ensures strict monotonic decrease of volumetric water content with increasing suction by construction, through cumulative integration of non-negative slope outputs (monotone integral architecture). Four physics-based residual constraints adapted from Norouzi et al. (2025) <doi:10.1029/2024WR038149> are embedded in the loss function: (S1) linearity at the dry end (pF in [5, 7.6]); (S2) non-negativity at pF = 6.2; (S3) non-positivity at pF = 7.6; and (S4) a near-zero derivative in the saturated plateau region (pF in [-2, -0.3]). Includes tools for data preparation, model training, dense prediction, performance metrics, texture classification, and publication-quality visualisation.
Screen for and analyze non-linear sparse direct effects in the presence of unobserved confounding using the spectral deconfounding techniques (Ä evid, Bühlmann, and Meinshausen (2020)<jmlr.org/papers/v21/19-545.html>, Guo, Ä evid, and Bühlmann (2022) <doi:10.1214/21-AOS2152>). These methods have been shown to be a good estimate for the true direct effect if we observe many covariates, e.g., high-dimensional settings, and we have fairly dense confounding. Even if the assumptions are violated, it seems like there is not much to lose, and the deconfounded models will, in general, estimate a function closer to the true one than classical least squares optimization. SDModels provides functions SDAM() for Spectrally Deconfounded Additive Models (Scheidegger, Guo, and Bühlmann (2025) <doi:10.1145/3711116>) and SDForest() for Spectrally Deconfounded Random Forests (Ulmer, Scheidegger, and Bühlmann (2025) <doi:10.1080/10618600.2025.2569602>).
General linear modeling with multiple responses (MANCOVA). An overall p-value for each model term is calculated by the 50-50 MANOVA method by Langsrud (2002) <doi:10.1111/1467-9884.00320>, which handles collinear responses. Rotation testing, described by Langsrud (2005) <doi:10.1007/s11222-005-4789-5>, is used to compute adjusted single response p-values according to familywise error rates and false discovery rates (FDR). The approach to FDR is described in the appendix of Moen et al. (2005) <doi:10.1128/AEM.71.4.2086-2094.2005>. Unbalanced designs are handled by Type II sums of squares as argued in Langsrud (2003) <doi:10.1023/A:1023260610025>. Furthermore, the Type II philosophy is extended to continuous design variables as described in Langsrud et al. (2007) <doi:10.1080/02664760701594246>. This means that the method is invariant to scale changes and that common pitfalls are avoided.
Datasets and functions for the book "Modélisation statistique par la pratique avec R", F. Bertrand, E. Claeys and M. Maumy-Bertrand (2019, ISBN:9782100793525, Dunod, Paris). The first chapter of the book is dedicated to an introduction to the R statistical software. The second chapter deals with correlation analysis: Pearson, Spearman and Kendall simple, multiple and partial correlation coefficients. New wrapper functions for permutation tests or bootstrap of matrices of correlation are provided with the package. The third chapter is dedicated to data exploration with factorial analyses (PCA, CA, MCA, MDA) and clustering. The fourth chapter is dedicated to regression analysis: fitting and model diagnostics are detailed. The exercises focus on covariance analysis, logistic regression, Poisson regression, two-way analysis of variance for fixed or random factors. Various example datasets are shipped with the package: for instance on pokemon, world of warcraft, house tasks or food nutrition analyses.
Multilevel models (mixed effects models) are the statistical tool of choice for analyzing multilevel data (Searle et al, 2009). These models account for the correlated nature of observations within higher level units by adding group-level error terms that augment the singular residual error of a standard OLS regression. Multilevel and mixed effects models often require specialized data pre-processing and further post-estimation derivations and graphics to gain insight into model results. The package presented here, mlmtools', is a suite of pre- and post-estimation tools for multilevel models in R'. Package implements post-estimation tools designed to work with models estimated using lme4''s (Bates et al., 2014) lmer() function, which fits linear mixed effects regression models. Searle, S. R., Casella, G., & McCulloch, C. E. (2009, ISBN:978-0470009598). Bates, D., Mächler, M., Bolker, B., & Walker, S. (2014) <doi:10.18637/jss.v067.i01>.
The methods discussed in this package are new non-parametric methods based on sequential normal scores SNS (Conover et al (2017) <doi:10.1080/07474946.2017.1360091>), designed for sequences of observations, usually time series data, which may occur singly or in batches, and may be univariate or multivariate. These methods are designed to detect changes in the process, which may occur as changes in location (mean or median), changes in scale (standard deviation, or variance), or other changes of interest in the distribution of the observations, over the time observed. They usually apply to large data sets, so computations need to be simple enough to be done in a reasonable time on a computer, and easily updated as each new observation (or batch of observations) becomes available. Some examples and more detail in SNS is presented in the work by Conover et al (2019) <arXiv:1901.04443>.
User-friendly analysis of hierarchical multinomial processing tree (MPT) models that are often used in cognitive psychology. Implements the latent-trait MPT approach (Klauer, 2010) <DOI:10.1007/s11336-009-9141-0> and the beta-MPT approach (Smith & Batchelder, 2010) <DOI:10.1016/j.jmp.2009.06.007> to model heterogeneity of participants. MPT models are conveniently specified by an .eqn-file as used by other MPT software and data are provided by a .csv-file or directly in R. Models are either fitted by calling JAGS or by an MPT-tailored Gibbs sampler in C++ (only for nonhierarchical and beta MPT models). Provides tests of heterogeneity and MPT-tailored summaries and plotting functions. A detailed documentation is available in Heck, Arnold, & Arnold (2018) <DOI:10.3758/s13428-017-0869-7> and a tutorial on MPT modeling can be found in Schmidt, Erdfelder, & Heck (2023) <DOI:10.1037/met0000561>.
We developed a lightweight machine learning tool for RNA profiling of acute lymphoblastic leukemia (ALL), however, it can be used for any problem where multiple classes need to be identified from multi-dimensional data. The methodology is described in Makinen V-P, Rehn J, Breen J, Yeung D, White DL (2022) Multi-cohort transcriptomic subtyping of B-cell acute lymphoblastic leukemia, International Journal of Molecular Sciences 23:4574, <doi:10.3390/ijms23094574>. The classifier contains optimized mean profiles of the classes (centroids) as observed in the training data, and new samples are matched to these centroids using the shortest Euclidean distance. Centroids derived from a dataset of 1,598 ALL patients are included, but users can train the models with their own data as well. The output includes both numerical and visual presentations of the classification results. Samples with mixed features from multiple classes or atypical values are also identified.
This package provides functions for computing test subscores using different methods in both classical test theory (CTT) and item response theory (IRT). This package enables three types of subscoring methods within the framework of CTT and IRT, including (1) Wainer's augmentation method (Wainer et. al., 2001) <doi:10.4324/9781410604729>, (2) Haberman's subscoring methods (Haberman, 2008) <doi:10.3102/1076998607302636>, and (3) Yen's objective performance index (OPI; Yen, 1987) <https://www.ets.org/research/policy_research_reports/publications/paper/1987/hrap>. It also includes functions to compute Proportional Reduction of Mean Squared Errors (PRMSEs) in Haberman's methods which are used to examine whether test subscores are of added value. In addition, the package includes a function to assess the local independence assumption of IRT with Yen's Q3 statistic (Yen, 1984 <doi:10.1177/014662168400800201>; Yen, 1993 <doi:10.1111/j.1745-3984.1993.tb00423.x>).
The igblastr package provides functions to conveniently install and use a local IgBLAST installation from within R. The package also includes a set of built-in IgBLAST-compatible germline databases from OGRDB, the AIRR Community’s Open Germline Receptor Database, for various organisms. It provides functions to create additional IgBLAST-compatible germline databases using reference sequences retrieved from IMGT/V-QUEST or local FASTA files supplied by the user. When possible, the FWR/CDR boundaries on the V alleles (a.k.a "internal data") are computed and stored in the germline database, so can be used as a replacement for the internal data shipped with IgBLAST. IgBLAST is described at <https://pubmed.ncbi.nlm.nih.gov/23671333/>. IgBLAST web interface: <https://www.ncbi.nlm.nih.gov/igblast/>. OGRDB: <https://ogrdb.airr-community.org/>. IMGT/V-QUEST download site: <https://www.imgt.org/download/V-QUEST/>.
In the observational study design stage, matching/weighting methods are conducted. However, when many background variables are present, the decision as to which variables to prioritize for matching/weighting is not trivial. Thus, the joint treatment-outcome variable importance plots are created to guide variable selection. The joint variable importance plots enhance variable comparisons via unadjusted bias curves derived under the omitted variable bias framework. The plots translate variable importance into recommended values for tuning parameters in existing methods. Post-matching and/or weighting plots can also be used to visualize and assess the quality of the observational study design. The method motivation and derivation is presented in "Prioritizing Variables for Observational Study Design using the Joint Variable Importance Plot" by Liao et al. (2024) <doi:10.1080/00031305.2024.2303419>. See the package paper by Liao and Pimentel (2024) <doi:10.21105/joss.06093> for a beginner friendly user introduction.
This package contains functions for multiple imputation which complements existing functionality in R. In particular, several imputation methods for the mice package (van Buuren & Groothuis-Oudshoorn, 2011, <doi:10.18637/jss.v045.i03>) are implemented. Main features of the miceadds package include plausible value imputation (Mislevy, 1991, <doi:10.1007/BF02294457>), multilevel imputation for variables at any level or with any number of hierarchical and non-hierarchical levels (Grund, Luedtke & Robitzsch, 2018, <doi:10.1177/1094428117703686>; van Buuren, 2018, Ch.7, <doi:10.1201/9780429492259>), imputation using partial least squares (PLS) for high dimensional predictors (Robitzsch, Pham & Yanagida, 2016), nested multiple imputation (Rubin, 2003, <doi:10.1111/1467-9574.00217>), substantive model compatible imputation (Bartlett et al., 2015, <doi:10.1177/0962280214521348>), and features for the generation of synthetic datasets (Reiter, 2005, <doi:10.1111/j.1467-985X.2004.00343.x>; Nowok, Raab, & Dibben, 2016, <doi:10.18637/jss.v074.i11>).
This package provides functions to compute various clinical scores used in healthcare. These include the Charlson Comorbidity Index (CCI), predicting 10-year survival in patients with multiple comorbidities; the EPICES score, an individual indicator of precariousness considering its multidimensional nature; the MELD score for chronic liver disease severity; the Alternative Fistula Risk Score (a-FRS) for postoperative pancreatic fistula risk; and the Distal Pancreatectomy Fistula Risk Score (D-FRS) for risk following distal pancreatectomy. For detailed methodology, refer to Charlson et al. (1987) <doi:10.1016/0021-9681(87)90171-8> , Sass et al. (2006) <doi:10.1007/s10332-006-0131-5>, Kamath et al. (2001) <doi:10.1053/jhep.2001.22172>, Kim et al. (2008) <doi:10.1056/NEJMoa0801209> Kim et al. (2021) <doi:10.1053/j.gastro.2021.08.050>, Mungroop et al. (2019) <doi:10.1097/SLA.0000000000002620>, and de Pastena et al. (2023) <doi:10.1097/SLA.0000000000005497>..
Extensive functions for bivariate copula (bicopula) computations and related operations for bicopula theory. The lower, upper, product, and select other bicopula are implemented along with operations including the diagonal, survival copula, dual of a copula, co-copula, and numerical bicopula density. Level sets, horizontal and vertical sections are supported. Numerical derivatives and inverses of a bicopula are provided through which simulation is implemented. Bicopula composition, convex combination, asymmetry extension, and products also are provided. Support extends to the Kendall Function as well as the Lmoments thereof. Kendall Tau, Spearman Rho and Footrule, Gini Gamma, Blomqvist Beta, Hoeffding Phi, Schweizer- Wolff Sigma, tail dependency, tail order, skewness, and bivariate Lmoments are implemented, and positive/negative quadrant dependency, left (right) increasing (decreasing) are available. Other features include Kullback-Leibler Divergence, Vuong Procedure, spectral measure, and Lcomoments for fit and inference, Lcomoment ratio diagrams, maximum likelihood, and AIC, BIC, and RMSE for goodness-of-fit.
Computes fungible coefficients and Monte Carlo data. Underlying theory for these functions is described in the following publications: Waller, N. (2008). Fungible Weights in Multiple Regression. Psychometrika, 73(4), 691-703, <DOI:10.1007/s11336-008-9066-z>. Waller, N. & Jones, J. (2009). Locating the Extrema of Fungible Regression Weights. Psychometrika, 74(4), 589-602, <DOI:10.1007/s11336-008-9087-7>. Waller, N. G. (2016). Fungible Correlation Matrices: A Method for Generating Nonsingular, Singular, and Improper Correlation Matrices for Monte Carlo Research. Multivariate Behavioral Research, 51(4), 554-568. Jones, J. A. & Waller, N. G. (2015). The normal-theory and asymptotic distribution-free (ADF) covariance matrix of standardized regression coefficients: theoretical extensions and finite sample behavior. Psychometrika, 80, 365-378, <DOI:10.1007/s11336-013-9380-y>. Waller, N. G. (2018). Direct Schmid-Leiman transformations and rank-deficient loadings matrices. Psychometrika, 83, 858-870. <DOI:10.1007/s11336-017-9599-0>.
Approximate Bayesian regularization using Gaussian approximations. The input is a vector of estimates and a Gaussian error covariance matrix of the key parameters. Bayesian shrinkage is then applied to obtain parsimonious solutions. The method is described on Karimova, van Erp, Leenders, and Mulder (2024) <DOI:10.31234/osf.io/2g8qm>. Gibbs samplers are used for model fitting. The shrinkage priors that are supported are Gaussian (ridge) priors, Laplace (lasso) priors (Park and Casella, 2008 <DOI:10.1198/016214508000000337>), and horseshoe priors (Carvalho, et al., 2010; <DOI:10.1093/biomet/asq017>). These priors include an option for grouped regularization of different subsets of parameters (Meier et al., 2008; <DOI:10.1111/j.1467-9868.2007.00627.x>). F priors are used for the penalty parameters lambda^2 (Mulder and Pericchi, 2018 <DOI:10.1214/17-BA1092>). This correspond to half-Cauchy priors on lambda (Carvalho, Polson, Scott, 2010 <DOI:10.1093/biomet/asq017>).
msPurity R package was developed to: 1) Assess the spectral quality of fragmentation spectra by evaluating the "precursor ion purity". 2) Process fragmentation spectra. 3) Perform spectral matching. What is precursor ion purity? -What we call "Precursor ion purity" is a measure of the contribution of a selected precursor peak in an isolation window used for fragmentation. The simple calculation involves dividing the intensity of the selected precursor peak by the total intensity of the isolation window. When assessing MS/MS spectra this calculation is done before and after the MS/MS scan of interest and the purity is interpolated at the recorded time of the MS/MS acquisition. Additionally, isotopic peaks can be removed, low abundance peaks are removed that are thought to have limited contribution to the resulting MS/MS spectra and the isolation efficiency of the mass spectrometer can be used to normalise the intensities used for the calculation.
Functionality for reliability estimates. For unidimensional tests: Coefficient alpha, Guttman's lambda-2/-4/-6, the Greatest lower bound and coefficient omega_u ('unidimensional') in a Bayesian and a frequentist version. For multidimensional tests: omega_t (total) and omega_h (hierarchical). The results include confidence and credible intervals, the probability of a coefficient being larger than a cutoff, and a check for the factor models, necessary for the omega coefficients. The method for the Bayesian unidimensional estimates, except for omega_u, is sampling from the posterior inverse Wishart for the covariance matrix based measures (see Murphy', 2007, <https://groups.seas.harvard.edu/courses/cs281/papers/murphy-2007.pdf>. The Bayesian omegas (u, t, and h) are obtained by Gibbs sampling from the conditional posterior distributions of (1) the single factor model, (2) the second-order factor model, (3) the bi-factor model, (4) the correlated factor model ('Lee', 2007, <doi:10.1002/9780470024737>).
Programmatic connection to the OpenAltimetry API <https://openaltimetry.earthdatacloud.nasa.gov/data/openapi/swagger-ui/index.html/> to download and process ATL03 (Global Geolocated Photon Data), ATL06 (Land Ice Height), ATL07 (Sea Ice Height), ATL08 (Land and Vegetation Height), ATL10 (Sea Ice Freeboard'), ATL12 (Ocean Surface Height) and ATL13 (Inland Water Surface Height) ICESat-2 Altimeter Data. The user has the option to download the data by selecting a bounding box from a 1- or 5-degree grid globally utilizing a shiny application. The ICESat-2 mission collects altimetry data of the Earth's surface. The sole instrument on ICESat-2 is the Advanced Topographic Laser Altimeter System (ATLAS) instrument that measures ice sheet elevation change and sea ice thickness, while also generating an estimate of global vegetation biomass. ICESat-2 continues the important observations of ice-sheet elevation change, sea-ice freeboard', and vegetation canopy height begun by ICESat in 2003.