Implement a new stopping rule to detect anomaly in the covariance structure of high-dimensional online data. The detection procedure can be applied to Gaussian or non-Gaussian data with a large number of components. Moreover, it allows both spatial and temporal dependence in data. The dependence can be estimated by a data-driven procedure. The level of threshold in the stopping rule can be determined at a pre-selected average run length. More detail can be seen in Li, L. and Li, J. (2020) "Online Change-Point Detection in High-Dimensional Covariance Structure with Application to Dynamic Networks." <arXiv:1911.07762>
.
Implementation of the semi-parametric proportional-hazards (PH) of Sy and Taylor (2000) <doi:10.1111/j.0006-341X.2000.00227.x> extended to time-varying covariates. Estimation and variable selection are based on the methodology described in Beretta and Heuchenne (2019) <doi:10.1080/02664763.2018.1554627>; confidence intervals of the parameter estimates may be computed using a bootstrap approach. Moreover, data following the PH cure model may be simulated using a method similar to Hendry (2014) <doi:10.1002/sim.5945>, where the event-times are generated on a continuous scale from a piecewise exponential distribution conditional on time-varying covariates.
An implementation of the data processing and data analysis portion of a pipeline named the PepSAVI-MS
which is currently under development by the Hicks laboratory at the University of North Carolina. The statistical analysis package presented herein provides a collection of software tools used to facilitate the prioritization of putative bioactive peptides from a complex biological matrix. Tools are provided to deconvolute mass spectrometry features into a single representation for each peptide charge state, filter compounds to include only those possibly contributing to the observed bioactivity, and prioritize these remaining compounds for those most likely contributing to each bioactivity data set.
Estimate specification models for the state-dependent level of an optimal quantile/expectile forecast. Wald Tests and the test of overidentifying restrictions are implemented. Plotting of the estimated specification model is possible. The package contains two data sets with forecasts and realizations: the daily accumulated precipitation at London, UK from the high-resolution model of the European Centre for Medium-Range Weather Forecasts (ECMWF, <https://www.ecmwf.int/>) and GDP growth Greenbook data by the US Federal Reserve. See Schmidt, Katzfuss and Gneiting (2015) <arXiv:1506.01917>
for more details on the identification and estimation of a directive behind a point forecast.
Multivariate ordered probit model, i.e. the extension of the scalar ordered probit model where the observed variables have dimension greater than one. Estimation of the parameters is done via maximization of the pairwise likelihood, a special case of the composite likelihood obtained as product of bivariate marginal distributions. The package uses the Fortran 77 subroutine SADMVN by Alan Genz, with minor adaptations made by Adelchi Azzalini in his "mvnormt" package for evaluating the two-dimensional Gaussian integrals involved in the pairwise log-likelihood. Optimization of the latter objective function is performed via quasi-Newton box-constrained optimization algorithm, as implemented in nlminb.
This package provides functions for making run charts, Shewhart control charts and Pareto charts for continuous quality improvement. Included control charts are: I, MR, Xbar, S, T, C, U, U', P, P', and G charts. Non-random variation in the form of minor to moderate persistent shifts in data over time is identified by the Anhoej rules for unusually long runs and unusually few crossing [Anhoej, Olesen (2014) <doi:10.1371/journal.pone.0113825>]. Non-random variation in the form of larger, possibly transient, shifts is identified by Shewhart's 3-sigma rule [Mohammed, Worthington, Woodall (2008) <doi:10.1136/qshc.2004.012047>].
This package creates a wrapper for the SuiteSparse
routines that execute the Takahashi equations. These equations compute the elements of the inverse of a sparse matrix at locations where the its Cholesky factor is structurally non-zero. The resulting matrix is known as a sparse inverse subset. Some helper functions are also implemented. Support for spam matrices is currently limited and will be implemented in the future. See Rue and Martino (2007) <doi:10.1016/j.jspi.2006.07.016> and Zammit-Mangion and Rougier (2018) <doi:10.1016/j.csda.2018.02.001> for the application of these equations to statistics.
Data from multi environment agronomic trials, which are often carried out by plant breeders, can be analyzed with the tools offered by this package such as the Additive Main effects and Multiplicative Interaction model or AMMI ('Gauch 1992, ISBN:9780444892409) and the Site Regression model or SREG ('Cornelius 1996, <doi:10.1201/9780367802226>). Since these methods present a poor performance under the presence of outliers and missing values, this package includes robust versions of the AMMI model ('Rodrigues 2016, <doi:10.1093/bioinformatics/btv533>), and also imputation techniques specifically developed for this kind of data ('Arciniegas-Alarcón 2014, <doi:10.2478/bile-2014-0006>).
The HURRECON model estimates wind speed, wind direction, enhanced Fujita scale wind damage, and duration of EF0 to EF5 winds as a function of hurricane location and maximum sustained wind speed. Results may be generated for a single site or an entire region. Hurricane track and intensity data may be imported directly from the US National Hurricane Center's HURDAT2 database. For details on the original version of the model written in Borland Pascal, see: Boose, Chamberlin, and Foster (2001) <doi:10.1890/0012-9615(2001)071[0027:LARIOH]2.0.CO;2> and Boose, Serrano, and Foster (2004) <doi:10.1890/02-4057>.
Different algorithms to perform approximate joint diagonalization of a finite set of square matrices. Depending on the algorithm, orthogonal or non-orthogonal diagonalizer is found. These algorithms are particularly useful in the context of blind source separation. Original publications of the algorithms can be found in Ziehe et al. (2004), Pham and Cardoso (2001) <doi:10.1109/78.942614>, Souloumiac (2009) <doi:10.1109/TSP.2009.2016997>, Vollgraff and Obermayer <doi:10.1109/TSP.2006.877673>. An example of application in the context of Brain-Computer Interfaces EEG denoising can be found in Gouy-Pailler et al (2010) <doi:10.1109/TBME.2009.2032162>.
The primary purpose of lavaan.mi is to extend the functionality of the R package lavaan', which implements structural equation modeling (SEM). When incomplete data have been multiply imputed, the imputed data sets can be analyzed by lavaan using complete-data estimation methods, but results must be pooled across imputations (Rubin, 1987, <doi:10.1002/9780470316696>). The lavaan.mi package automates the pooling of point and standard-error estimates, as well as a variety of test statistics, using a familiar interface that allows users to fit an SEM to multiple imputations as they would to a single data set using the lavaan package.
Identification of ring borders on scanned image sections from dendrochronological samples. Processing of image reflectances to produce gray matrices and time series of smoothed gray values. Luminance data is plotted on segmented images for users to perform both: visual identification of ring borders or control of automatic detection. Routines to visually include/exclude ring borders on the R graphical devices, or automatically detect ring borders using a linear detection algorithm. This algorithm detects ring borders according to positive/negative extreme values in the smoothed time-series of gray values. Most of the in-package routines can be recursively implemented using the multiDetect()
function.
The cartogram heatmaps generated by the included methods are an alternative to choropleth maps for the United States and are based on work by the Washington Post graphics department in their report on "The states most threatened by trade" (<http://www.washingtonpost.com/wp-srv/special/business/states-most-threatened-by-trade/>). "State bins" preserve as much of the geographic placement of the states as possible but have the look and feel of a traditional heatmap. Functions are provided that allow for use of a binned, discrete scale, a continuous scale or manually specified colors depending on what is needed for the underlying data.
This package provides functions to filter GPS/Argos locations, as well as assessing the sample size for the analysis of animal distributions. The filters remove temporal and spatial duplicates, fixes located at a given height from estimated high tide line, and locations with high error as described in Shimada et al. (2012) <doi:10.3354/meps09747> and Shimada et al. (2016) <doi:10.1007/s00227-015-2771-0>. Sample size for the analysis of animal distributions can be assessed by the conventional area-based approach or the alternative probability-based approach as described in Shimada et al. (2021) <doi:10.1111/2041-210X.13506>.
Designed to simplify and streamline the process of reading and processing large volumes of data in R, this package offers a collection of functions tailored for bulk data operations. It enables users to efficiently read multiple sheets from Microsoft Excel and Google Sheets workbooks, as well as various CSV files from a directory. The data is returned as organized data frames, facilitating further analysis and manipulation. Ideal for handling extensive data sets or batch processing tasks, bulkreadr empowers users to manage data in bulk effortlessly, saving time and effort in data preparation workflows. Additionally, the package seamlessly works with labelled data from SPSS and Stata.
Provee un acceso conveniente a mas de 17 millones de registros de la base de datos del Censo 2017. Los datos fueron importados desde el DVD oficial del INE usando el Convertidor REDATAM creado por Pablo De Grande. Esta paquete esta documentado intencionalmente en castellano asciificado para que funcione sin problema en diferentes plataformas. (Provides convenient access to more than 17 million records from the Chilean Census 2017 database. The datasets were imported from the official DVD provided by the Chilean National Bureau of Statistics by using the REDATAM converter created by Pablo De Grande and in addition it includes the maps accompanying these datasets.).
Estimates RxC
(R by C) vote transfer matrices (ecological contingency tables) from aggregate data using the model described in Forcina et al. (2012), as extension of the model proposed in Brown and Payne (1986). Allows incorporation of covariates. References: Brown, P. and Payne, C. (1986). Aggregate data, ecological regression and voting transitions''. Journal of the American Statistical Association, 81, 453â 460. <DOI:10.1080/01621459.1986.10478290>. Forcina, A., Gnaldi, M. and Bracalente, B. (2012). A revised Brown and Payne model of voting behaviour applied to the 2009 elections in Italy''. Statistical Methods & Applications, 21, 109â 119. <DOI:10.1007/s10260-011-0184-x>.
This package provides functions to apply spatial fuzzy unsupervised classification, visualize and interpret results. This method is well suited when the user wants to analyze data with a fuzzy clustering algorithm and to account for the spatial dimension of the dataset. In addition, indexes for estimating the spatial consistency and classification quality are proposed. The methods were originally proposed in the field of brain imagery (seed Cai and al. 2007 <doi:10.1016/j.patcog.2006.07.011> and Zaho and al. 2013 <doi:10.1016/j.dsp.2012.09.016>) and recently applied in geography (see Gelb and Apparicio <doi:10.4000/cybergeo.36414>).
This package provides functions to estimate the intrinsic dimension of a dataset via likelihood-based approaches. Specifically, the package implements the TWO-NN and Gride estimators and the Hidalgo Bayesian mixture model. In addition, the first reference contains an extended vignette on the usage of the TWO-NN and Hidalgo models. References: Denti (2023, <doi:10.18637/jss.v106.i09>); Allegra et al. (2020, <doi:10.1038/s41598-020-72222-0>); Denti et al. (2022, <doi:10.1038/s41598-022-20991-1>); Facco et al. (2017, <doi:10.1038/s41598-017-11873-y>); Santos-Fernandez et al. (2021, <doi:10.1038/s41598-022-20991-1>).
This package provides modular, graph-based agents powered by large language models (LLMs) for intelligent task execution in R. Supports structured workflows for tasks such as forecasting, data visualization, feature engineering, data wrangling, data cleaning, SQL', code generation, weather reporting, and research-driven question answering. Each agent performs iterative reasoning: recommending steps, generating R code, executing, debugging, and explaining results. Includes built-in support for packages such as tidymodels', modeltime', plotly', ggplot2', and prophet'. Designed for analysts, developers, and teams building intelligent, reproducible AI workflows in R. Compatible with LLM providers such as OpenAI
', Anthropic', Groq', and Ollama'. Inspired by the Python package langagent'.
Use probability theory under the Bayesian framework for calculating the risk of selecting candidates in a multi-environment context. Contained are functions used to fit a Bayesian multi-environment model (based on the available presets), extract posterior values and maximum posterior values, compute the variance components, check the modelâ s convergence, and calculate the probabilities. For both across and within-environments scopes, the package computes the probability of superior performance and the pairwise probability of superior performance. Furthermore, the probability of superior stability and the pairwise probability of superior stability across environments is estimated. A joint probability of superior performance and stability is also provided.
Compositional data consisting of three-parts can be color mapped with a ternary color scale. Such a scale is provided by the tricolore packages with options for discrete and continuous colors, mean-centering and scaling. See Jonas Schöley (2021) "The centered ternary balance scheme. A technique to visualize surfaces of unbalanced three-part compositions" <doi:10.4054/DemRes.2021.44.19>
, Jonas Schöley, Frans Willekens (2017) "Visualizing compositional data on the Lexis surface" <doi:10.4054/DemRes.2017.36.21>
, and Ilya Kashnitsky, Jonas Schöley (2018) "Regional population structures at a glance" <doi:10.1016/S0140-6736(18)31194-2>.
This package implements the whitening methods (ZCA, PCA, Cholesky, ZCA-cor, and PCA-cor) discussed in Kessy, Lewin, and Strimmer (2018) "Optimal whitening and decorrelation", <doi:10.1080/00031305.2016.1277159>, as well as the whitening approach to canonical correlation analysis allowing negative canonical correlations described in Jendoubi and Strimmer (2019) "A whitening approach to probabilistic canonical correlation analysis for omics data integration", <doi:10.1186/s12859-018-2572-9>. The package also offers functions to simulate random orthogonal matrices, compute (correlation) loadings and explained variation. It also contains four example data sets (extended UCI wine data, TCGA LUSC data, nutrimouse data, extended pitprops data).
Calculates key indicators such as fertility rates (Total Fertility Rate (TFR), General Fertility Rate (GFR), and Age Specific Fertility Rate (ASFR)) using Demographic and Health Survey (DHS) women/individual data, childhood mortality probabilities and rates such as Neonatal Mortality Rate (NNMR), Post-neonatal Mortality Rate (PNNMR), Infant Mortality Rate (IMR), Child Mortality Rate (CMR), and Under-five Mortality Rate (U5MR), and adult mortality indicators such as the Age Specific Mortality Rate (ASMR), Age Adjusted Mortality Rate (AAMR), Age Specific Maternal Mortality Rate (ASMMR), Age Adjusted Maternal Mortality Rate (AAMMR), Age Specific Pregnancy Related Mortality Rate (ASPRMR), Age Adjusted Pregnancy Related Mortality Rate (AAPRMR), Maternal Mortality Ratio (MMR) and Pregnancy Related Mortality Ratio (PRMR). In addition to the indicators, the DHS.rates package estimates sampling errors indicators such as Standard Error (SE), Design Effect (DEFT), Relative Standard Error (RSE) and Confidence Interval (CI). The package is developed according to the DHS methodology of calculating the fertility indicators and the childhood mortality rates outlined in the "Guide to DHS Statistics" (Croft, Trevor N., Aileen M. J. Marshall, Courtney K. Allen, et al. 2018, <https://dhsprogram.com/Data/Guide-to-DHS-Statistics/index.cfm>) and the DHS methodology of estimating the sampling errors indicators outlined in the "DHS Sampling and Household Listing Manual" (ICF International 2012, <https://dhsprogram.com/pubs/pdf/DHSM4/DHS6_Sampling_Manual_Sept2012_DHSM4.pdf>).