Computes interdaily stability (IS), intradaily variability (IV) & the relative amplitude (RA) from actigraphy data as described in Blume et al. (2016) <doi: 10.1016/j.mex.2016.05.006> and van Someren et al. (1999) <doi: 10.3109/07420529908998724>. Additionally, it also computes L5 (i.e. the 5 hours with lowest average actigraphy amplitude) and M10 (the 10 hours with highest average amplitude) as well as the respective start times. The flex versions will also compute the L-value for a user-defined number of minutes. IS describes the strength of coupling of a rhythm to supposedly stable zeitgebers. It varies between 0 (Gaussian Noise) and 1 for perfect IS. IV describes the fragmentation of a rhythm, i.e. the frequency and extent of transitions between rest and activity. It is near 0 for a perfect sine wave, about 2 for Gaussian noise and may be even higher when a definite ultradian period of about 2 hrs is present. RA is the relative amplitude of a rhythm. Note that to obtain reliable results, actigraphy data should cover a reasonable number of days.
Several tests for high dimensional generalized linear models have been proposed recently. In this package, we implemented a new test called adaptive sum of powered score (aSPU
) for high dimensional generalized linear models, which is often more powerful than the existing methods in a wide scenarios. We also implemented permutation based version of several existing methods for research purpose. We recommend users use the aSPU
test for their real testing problem. You can learn more about the tests implemented in the package via the following papers: 1. Pan, W., Kim, J., Zhang, Y., Shen, X. and Wei, P. (2014) <DOI:10.1534/genetics.114.165035> A powerful and adaptive association test for rare variants, Genetics, 197(4). 2. Guo, B., and Chen, S. X. (2016) <DOI:10.1111/rssb.12152>. Tests for high dimensional generalized linear models. Journal of the Royal Statistical Society: Series B. 3. Goeman, J. J., Van Houwelingen, H. C., and Finos, L. (2011) <DOI:10.1093/biomet/asr016>. Testing against a high-dimensional alternative in the generalized linear model: asymptotic type I error control. Biometrika, 98(2).
This package provides a collection of analysis-ready datasets for the U.S. Geological Survey - Idaho National Laboratory (USGS-INL) groundwater and surface-water monitoring networks, administered by the USGS-INL Project Office in cooperation with the U.S. Department of Energy. The data collected from wells and surface-water stations at the Idaho National Laboratory and surrounding areas have been used to describe the effects of waste disposal on water contained in the eastern Snake River Plain aquifer, located in the southeastern part of Idaho, and the availability of water for long-term consumptive and industrial use. The package includes long-term monitoring records dating back to measurements from 1922. Geospatial data describing the areas from which samples were collected or observations were made are also included in the package. Bundling this data into a single package significantly reduces the magnitude of data processing for researchers and provides a way to distribute the data along with its documentation in a standard format. Geospatial datasets are made available in a common projection and datum, and geohydrologic data have been structured to facilitate analysis.
Assists actuaries and other insurance modellers in pricing, reserving and capital modelling for non-life insurance and reinsurance modelling. Provides functions that help model excess levels, capping and pure Incurred but not reported claims (pure IBNR). Includes capped mean, exposure curves and increased limit factor curves (ILFs) for LogNormal
, Gamma, Pareto, Sliced LogNormal-Pareto
and Sliced Gamma-Pareto distributions. Includes mean, probability density function (pdf), cumulative probability function (cdf) and inverse cumulative probability function for Sliced LogNormal-Pareto
and Sliced Gamma-Pareto distributions. Includes calculating pure IBNR exposure with LogNormal
and Gamma distribution for reporting delay. Includes three shiny tools, one to simulate insurance claims applying reinsurance structures, fit generalised linear models and fit claims frequency or severity distributions. Methods used in the package refer to Free for All by Yiannis Parizas (2023) <https://www.theactuary.com/2023/03/02/free-all>; Escaping the triangle by Yiannis Parizas (2019) <https://www.theactuary.com/features/2019/06/2019/06/05/escaping-triangle>; Take to excess by Yiannis Parizas (2019) <https://www.theactuary.com/features/2019/03/2019/03/06/taken-excess>.
Doubly robust estimation and inference of log hazard ratio under the Cox marginal structural model with informative censoring. An augmented inverse probability weighted estimator that involves 3 working models, one for conditional failure time T, one for conditional censoring time C and one for propensity score. Both models for T and C can depend on both a binary treatment A and additional baseline covariates Z, while the propensity score model only depends on Z. With the help of cross-fitting techniques, achieves the rate-doubly robust property that allows the use of most machine learning or non-parametric methods for all 3 working models, which are not permitted in classic inverse probability weighting or doubly robust estimators. When the proportional hazard assumption is violated, CoxAIPW
estimates a causal estimated that is a weighted average of the time-varying log hazard ratio. Reference: Luo, J. (2023). Statistical Robustness - Distributed Linear Regression, Informative Censoring, Causal Inference, and Non-Proportional Hazards [Unpublished doctoral dissertation]. University of California San Diego.; Luo & Xu (2022) <doi:10.48550/arXiv.2206.02296>
; Rava (2021) <https://escholarship.org/uc/item/8h1846gs>.
Routines for estimating tree fiber (tracheid) length distributions in the standing tree based on increment core samples. Two types of data can be used with the package, increment core data measured by means of an optical fiber analyzer (OFA), e.g. such as the Kajaani Fiber Lab, or measured by microscopy. Increment core data analyzed by OFAs consist of the cell lengths of both cut and uncut fibres (tracheids) and fines (such as ray parenchyma cells) without being able to identify which cells are cut or if they are fines or fibres. The microscopy measured data consist of the observed lengths of the uncut fibres in the increment core. A censored version of a mixture of the fine and fiber length distributions is proposed to fit the OFA data, under distributional assumptions (Svensson et al., 2006) <doi:10.1111/j.1467-9469.2006.00501.x>. The package offers two choices for the assumptions of the underlying density functions of the true fiber (fine) lenghts of those fibers (fines) that at least partially appear in the increment core, being the generalized gamma and the log normal densities.
Label-free bottom-up proteomics expression data is often affected by data heterogeneity and missing values. Normalization and missing value imputation are commonly used techniques to address these issues and make the dataset suitable for further downstream analysis. This package provides an optimal combination of normalization and imputation methods for the dataset. The package utilizes three normalization methods and three imputation methods.The statistical evaluation measures named pooled co-efficient of variance, pooled estimate of variance and pooled median absolute deviation are used for selecting the best combination of normalization and imputation method for the given dataset. The user can also visualize the results by using various plots available in this package. The user can also perform the differential expression analysis between two sample groups with the function included in this package. The chosen three normalization methods, three imputation methods and three evaluation measures were chosen for this study based on the research papers published by Välikangas et al. (2016) <doi:10.1093/bib/bbw095>, Jin et al. (2021) <doi:10.1038/s41598-021-81279-4> and Srivastava et al. (2023) <doi:10.2174/1574893618666230223150253>.
Fast and multi-threaded implementation of isolation forest (Liu, Ting, Zhou (2008) <doi:10.1109/ICDM.2008.17>), extended isolation forest (Hariri, Kind, Brunner (2018) <doi:10.48550/arXiv.1811.02141>
), SCiForest
(Liu, Ting, Zhou (2010) <doi:10.1007/978-3-642-15883-4_18>), fair-cut forest (Cortes (2021) <doi:10.48550/arXiv.2110.13402>
), robust random-cut forest (Guha, Mishra, Roy, Schrijvers (2016) <http://proceedings.mlr.press/v48/guha16.html>), and customizable variations of them, for isolation-based outlier detection, clustered outlier detection, distance or similarity approximation (Cortes (2019) <doi:10.48550/arXiv.1910.12362>
), isolation kernel calculation (Ting, Zhu, Zhou (2018) <doi:10.1145/3219819.3219990>), and imputation of missing values (Cortes (2019) <doi:10.48550/arXiv.1911.06646>
), based on random or guided decision tree splitting, and providing different metrics for scoring anomalies based on isolation depth or density (Cortes (2021) <doi:10.48550/arXiv.2111.11639>
). Provides simple heuristics for fitting the model to categorical columns and handling missing data, and offers options for varying between random and guided splits, and for using different splitting criteria.
This package provides a class of methods that combine dimension reduction and clustering of continuous, categorical or mixed-type data (Markos, Iodice D'Enza and van de Velden 2019; <DOI:10.18637/jss.v091.i10>). For continuous data, the package contains implementations of factorial K-means (Vichi and Kiers 2001; <DOI:10.1016/S0167-9473(00)00064-5>) and reduced K-means (De Soete and Carroll 1994; <DOI:10.1007/978-3-642-51175-2_24>); both methods that combine principal component analysis with K-means clustering. For categorical data, the package provides MCA K-means (Hwang, Dillon and Takane 2006; <DOI:10.1007/s11336-004-1173-x>), i-FCB (Iodice D'Enza and Palumbo 2013, <DOI:10.1007/s00180-012-0329-x>) and Cluster Correspondence Analysis (van de Velden, Iodice D'Enza and Palumbo 2017; <DOI:10.1007/s11336-016-9514-0>), which combine multiple correspondence analysis with K-means. For mixed-type data, it provides mixed Reduced K-means and mixed Factorial K-means (van de Velden, Iodice D'Enza and Markos 2019; <DOI:10.1002/wics.1456>), which combine PCA for mixed-type data with K-means.
Early insights in probability theory were largely influenced by questions about gambling and games of chance, as noted by Blitzstein and Hwang (2019, ISBN:978-1138369917). In modern times, playing cards continue to serve as an effective teaching tool for probability, statistics, and even R programming, as demonstrated by Grolemund (2014, ISBN:978-1449359010). The mmcards package offers a collection of utility functions designed to aid in the creation, manipulation, and utilization of playing card decks in multiple formats. These include a standard 52-card deck, as well as alternative decks such as decks defined by custom anonymous functions and custom interleaved decks. Optimized for the development of educational shiny applications, the package is particularly useful for teaching statistics and probability through card-based games. Functions include shuffle_deck()
, which creates either a shuffled standard deck or a shuffled custom alternative deck; deal_card()
, which takes a deck and returns a list object containing both the dealt card and the updated deck; and i_deck()
, which adds image paths to card objects, further enriching the package's utility in the development of interactive shiny application card games.
Implementation of uniformity tests on the circle and (hyper)sphere. The main function of the package is unif_test()
, which conveniently collects more than 35 tests for assessing uniformity on S^p-1 = x in R^p : ||x|| = 1, p >= 2. The test statistics are implemented in the unif_stat()
function, which allows computing several statistics for different samples within a single call, thus facilitating Monte Carlo experiments. Furthermore, the unif_stat_MC()
function allows parallelizing them in a simple way. The asymptotic null distributions of the statistics are available through the function unif_stat_distr()
. The core of sphunif is coded in C++ by relying on the Rcpp package. The package also provides several novel datasets and gives the replicability for the data applications/simulations in Garcà a-Portugués et al. (2021) <doi:10.1007/978-3-030-69944-4_12>, Garcà a-Portugués et al. (2023) <doi:10.3150/21-BEJ1454>, Garcà a-Portugués et al. (2024) <doi:10.48550/arXiv.2108.09874>
, and Fernández-de-Marcos and Garcà a-Portugués (2024) <doi:10.48550/arXiv.2405.13531>
.
Suns-Voc (or Isc-Voc) curves can provide the current-voltage (I-V) characteristics of the diode of photovoltaic cells without the effect of series resistance. Here, Suns-Voc curves can be constructed with outdoor time-series I-V curves [1,2,3] of full-size photovoltaic (PV) modules instead of having to be measured in the lab. Time series of four different power loss modes can be calculated based on obtained Isc-Voc curves. This material is based upon work supported by the U.S. Department of Energy's Office of Energy Efficiency and Renewable Energy (EERE) under Solar Energy Technologies Office (SETO) Agreement Number DE-EE0008172. Jennifer L. Braid is supported by the U.S. Department of Energy (DOE) Office of Energy Efficiency and Renewable Energy administered by the Oak Ridge Institute for Science and Education (ORISE) for the DOE. ORISE is managed by Oak Ridge Associated Universities (ORAU) under DOE contract number DE-SC0014664. [1] Wang, M. et al, 2018. <doi:10.1109/PVSC.2018.8547772>. [2] Walters et al, 2018 <doi:10.1109/PVSC.2018.8548187>. [3] Guo, S. et al, 2016. <doi:10.1117/12.2236939>.
Description: For the risk, progression, and response to treatment of many complex diseases, it has been increasingly recognized that gene-environment interactions play important roles beyond the main genetic and environmental effects. In practical interaction analyses, outliers in response variables and covariates are not uncommon. In addition, missingness in environmental factors is routinely encountered in epidemiological studies. The developed package consists of five robust approaches to address the outliers problems, among which two approaches can also accommodate missingness in environmental factors. Both continuous and right censored responses are considered. The proposed approaches are based on penalization and sparse boosting techniques for identifying important interactions, which are realized using efficient algorithms. Beyond the gene-environment analysis, the developed package can also be adopted to conduct analysis on interactions between other types of low-dimensional and high-dimensional data. (Mengyun Wu et al (2017), <doi:10.1080/00949655.2018.1523411>; Mengyun Wu et al (2017), <doi:10.1002/gepi.22055>; Yaqing Xu et al (2018), <doi:10.1080/00949655.2018.1523411>; Yaqing Xu et al (2019), <doi:10.1016/j.ygeno.2018.07.006>; Mengyun Wu et al (2021), <doi:10.1093/bioinformatics/btab318>).
Hospitals, hospital systems, and even trauma systems that provide care to injured patients may not be aware of robust metrics that can help gauge the efficacy of their programs in saving the lives of injured patients. traumar provides robust functions driven by the academic literature to automate the calculation of relevant metrics to individuals desiring to measure the performance of their trauma center or even a trauma system. traumar also provides some helper functions for the data analysis journey. Users can refer to the following publications for descriptions of the methods used in traumar'. TRISS methodology, including probability of survival, and the W, M, and Z Scores - Flora (1978) <doi:10.1097/00005373-197810000-00003>, Boyd et al. (1987, PMID:3106646), Llullaku et al. (2009) <doi:10.1186/1749-7922-4-2>, Singh et al. (2011) <doi:10.4103/0974-2700.86626>, Baker et al. (1974, PMID:4814394), and Champion et al. (1989) <doi:10.1097/00005373-198905000-00017>. For the Relative Mortality Metric, see Napoli et al. (2017) <doi:10.1080/24725579.2017.1325948>, Schroeder et al. (2019) <doi:10.1080/10903127.2018.1489021>, and Kassar et al. (2016) <doi:10.1177/00031348221093563>.
Is used to simulate and fit biological geometries. biogeom incorporates several novel universal parametric equations that can generate the profiles of bird eggs, flowers, linear and lanceolate leaves, seeds, starfish, and tree-rings (Gielis (2003) <doi:10.3732/ajb.90.3.333>; Shi et al. (2020) <doi:10.3390/sym12040645>), three growth-rate curves representing the ontogenetic growth trajectories of animals and plants against time, and the axially symmetrical and integral forms of all these functions (Shi et al. (2017) <doi:10.1016/j.ecolmodel.2017.01.012>; Shi et al. (2021) <doi:10.3390/sym13081524>). The optimization method proposed by Nelder and Mead (1965) <doi:10.1093/comjnl/7.4.308> was used to estimate model parameters. biogeom includes several real data sets of the boundary coordinates of natural shapes, including avian eggs, fruit, lanceolate and ovate leaves, tree rings, seeds, and sea stars,and can be potentially applied to other natural shapes. biogeom can quantify the conspecific or interspecific similarity of natural outlines, and provides information with important ecological and evolutionary implications for the growth and form of living organisms. Please see Shi et al. (2022) <doi:10.1111/nyas.14862> for details.
The card game War is simple in its rules but can be lengthy. In another domain, the nonparametric bootstrap test with pooled resampling (nbpr) methods, as outlined in Dwivedi, Mallawaarachchi, and Alvarado (2017) <doi:10.1002/sim.7263>, is optimal for comparing paired or unpaired means in non-normal data, especially for small sample size studies. However, many researchers are unfamiliar with these methods. The bootwar package bridges this gap by enabling users to grasp the concepts of nbpr via Boot War, a variation of the card game War designed for small samples. The package provides functions like score_keeper()
and play_round()
to streamline gameplay and scoring. Once a predetermined number of rounds concludes, users can employ the analyze_game()
function to derive game results. This function leverages the npboottprm package's nonparboot()
to report nbpr results and, for comparative analysis, also reports results from the stats package's t.test()
function. Additionally, bootwar features an interactive shiny web application, bootwar()
. This offers a user-centric interface to experience Boot War, enhancing understanding of nbpr methods across various distributions, sample sizes, number of bootstrap resamples, and confidence intervals.
This package provides functions to fit regression models for bounded continuous and discrete responses. In case of bounded continuous responses (e.g., proportions and rates), available models are the flexible beta (Migliorati, S., Di Brisco, A. M., Ongaro, A. (2018) <doi:10.1214/17-BA1079>), the variance-inflated beta (Di Brisco, A. M., Migliorati, S., Ongaro, A. (2020) <doi:10.1177/1471082X18821213>), the beta (Ferrari, S.L.P., Cribari-Neto, F. (2004) <doi:10.1080/0266476042000214501>), and their augmented versions to handle the presence of zero/one values (Di Brisco, A. M., Migliorati, S. (2020) <doi:10.1002/sim.8406>) are implemented. In case of bounded discrete responses (e.g., bounded counts, such as the number of successes in n trials), available models are the flexible beta-binomial (Ascari, R., Migliorati, S. (2021) <doi:10.1002/sim.9005>), the beta-binomial, and the binomial are implemented. Inference is dealt with a Bayesian approach based on the Hamiltonian Monte Carlo (HMC) algorithm (Gelman, A., Carlin, J. B., Stern, H. S., Rubin, D. B. (2014) <doi:10.1201/b16018>). Besides, functions to compute residuals, posterior predictives, goodness of fit measures, convergence diagnostics, and graphical representations are provided.
Gaussian processes ('GPs') have been widely used to model spatial data, spatio'-temporal data, and computer experiments in diverse areas of statistics including spatial statistics, spatio'-temporal statistics, uncertainty quantification, and machine learning. This package creates basic tools for fitting and prediction based on GPs with spatial data, spatio'-temporal data, and computer experiments. Key characteristics for this GP tool include: (1) the comprehensive implementation of various covariance functions including the Matérn family and the Confluent Hypergeometric family with isotropic form, tensor form, and automatic relevance determination form, where the isotropic form is widely used in spatial statistics, the tensor form is widely used in design and analysis of computer experiments and uncertainty quantification, and the automatic relevance determination form is widely used in machine learning; (2) implementations via Markov chain Monte Carlo ('MCMC') algorithms and optimization algorithms for GP models with all the implemented covariance functions. The methods for fitting and prediction are mainly implemented in a Bayesian framework; (3) model evaluation via Fisher information and predictive metrics such as predictive scores; (4) built-in functionality for simulating GPs with all the implemented covariance functions; (5) unified implementation to allow easy specification of various GPs'.
This package provides a framework for the optimization of breeding programs via optimum contribution selection and mate allocation. An easy to use set of function for computation of optimum contributions of selection candidates, and of the population genetic parameters to be optimized. These parameters can be estimated using pedigree or genotype information, and include kinships, kinships at native haplotype segments, and breed composition of crossbred individuals. They are suitable for managing genetic diversity, removing introgressed genetic material, and accelerating genetic gain. Additionally, functions are provided for computing genetic contributions from ancestors, inbreeding coefficients, the native effective size, the native genome equivalent, pedigree completeness, and for preparing and plotting pedigrees. The methods are described in:\n Wellmann, R., and Pfeiffer, I. (2009) <doi:10.1017/S0016672309000202>.\n Wellmann, R., and Bennewitz, J. (2011) <doi:10.2527/jas.2010-3709>.\n Wellmann, R., Hartwig, S., Bennewitz, J. (2012) <doi:10.1186/1297-9686-44-34>.\n de Cara, M. A. R., Villanueva, B., Toro, M. A., Fernandez, J. (2013) <doi:10.1111/mec.12560>.\n Wellmann, R., Bennewitz, J., Meuwissen, T.H.E. (2014) <doi:10.1017/S0016672314000196>.\n Wellmann, R. (2019) <doi:10.1186/s12859-018-2450-5>.
Efficient estimation of the population-level causal effects of stochastic interventions on a continuous-valued exposure. Both one-step and targeted minimum loss estimators are implemented for the counterfactual mean value of an outcome of interest under an additive modified treatment policy, a stochastic intervention that may depend on the natural value of the exposure. To accommodate settings with outcome-dependent two-phase sampling, procedures incorporating inverse probability of censoring weighting are provided to facilitate the construction of inefficient and efficient one-step and targeted minimum loss estimators. The causal parameter and its estimation were first described by DÃ az and van der Laan (2013) <doi:10.1111/j.1541-0420.2011.01685.x>, while the multiply robust estimation procedure and its application to data from two-phase sampling designs is detailed in NS Hejazi, MJ van der Laan, HE Janes, PB Gilbert, and DC Benkeser (2020) <doi:10.1111/biom.13375>. The software package implementation is described in NS Hejazi and DC Benkeser (2020) <doi:10.21105/joss.02447>. Estimation of nuisance parameters may be enhanced through the Super Learner ensemble model in sl3', available for download from GitHub
using remotes::install_github("tlverse/sl3")'.
This package provides a Scannerless GLR parser/parser generator. Note that GLR standing for "generalized LR", where L stands for "left-to-right" and R stands for "rightmost (derivation)". For more information see <https://en.wikipedia.org/wiki/GLR_parser>. This parser is based on the Tomita (1987) algorithm. (Paper can be found at <https://aclanthology.org/P84-1073.pdf>). The original dparser package documentation can be found at <https://dparser.sourceforge.net/>. This allows you to add mini-languages to R (like rxode2's ODE mini-language Wang, Hallow, and James 2015 <DOI:10.1002/psp4.12052>) or to parse other languages like NONMEM to automatically translate them to R code. To use this in your code, add a LinkingTo
dparser in your DESCRIPTION file and instead of using #include <dparse.h> use #include <dparser.h>. This also provides a R-based port of the make_dparser <https://dparser.sourceforge.net/d/make_dparser.cat> command called mkdparser()
. Additionally you can parse an arbitrary grammar within R using the dparse()
function, which works on most OSes and is mainly for grammar testing. The fastest parsing, of course, occurs at the C level, and is suggested.
Package implements the EDNE-test for equivalence according to Hoffelder et al. (2015) <DOI:10.1080/10543406.2014.920344>. "EDNE" abbreviates "Euclidean Distance between the Non-standardized Expected values". The EDNE-test for equivalence is a multivariate two-sample equivalence test. Distance measure of the test is the Euclidean distance. The test is an asymptotically valid test for the family of distributions fulfilling the assumptions of the multivariate central limit theorem (see Hoffelder et al.,2015). The function EDNE.EQ()
implements the EDNE-test for equivalence according to Hoffelder et al. (2015). The function EDNE.EQ.dissolution.profiles()
implements a variant of the EDNE-test for equivalence analyses of dissolution profiles (see Suarez-Sharp et al.,2020 <DOI:10.1208/s12248-020-00458-9>). EDNE.EQ.dissolution.profiles()
checks whether the quadratic mean of the differences of the expected values of both dissolution profile populations is statistically significantly smaller than 10 [\% of label claim]. The current regulatory standard approach for equivalence analyses of dissolution profiles is the similarity factor f2. The statistical hypotheses underlying EDNE.EQ.dissolution.profiles()
coincide with the hypotheses for f2 (see Hoffelder et al.,2015, Suarez-Sharp et al., 2020).
The ConNEcT
approach investigates the pairwise association strength of binary time series by calculating contingency measures and depicts the results in a network. The package includes features to explore and visualize the data. To calculate the pairwise concurrent or temporal sequenced relationship between the variables, the package provides seven contingency measures (proportion of agreement, classical & corrected Jaccard, Cohen's kappa, phi correlation coefficient, odds ratio, and log odds ratio), however, others can easily be implemented. The package also includes non-parametric significance tests, that can be applied to test whether the contingency value quantifying the relationship between the variables is significantly higher than chance level. Most importantly this test accounts for auto-dependence and relative frequency.See Bodner et al.(2021) <doi: 10.1111/bmsp.12222>.Finally, a network can be drawn. Variables depicted the nodes of the network, with the node size adapted to the prevalence. The association strength between the variables defines the undirected (concurrent) or directed (temporal sequenced) links between the nodes. The results of the non-parametric significance test can be included by depicting either all links or only the significant ones. Tutorial see Bodner et al.(2021) <doi:10.3758/s13428-021-01760-w>.
Summarizing data frames by calculating various statistical measures, including measures of central tendency, dispersion, skewness()
, kurtosis()
, and normality tests. The package leverages the moments package for calculating statistical moments and related measures, the dplyr package for data manipulation, and the nortest package for normality testing. DataSum
includes functions such as getmode()
for finding the mode(s) of a data vector, shapiro_normality_test()
for performing the Shapiro-Wilk test (Shapiro & Wilk 1965 <doi:10.1093/biomet/52.3-4.591>) (or the Anderson-Darling test when the data length is outside the valid range for the Shapiro-Wilk test) (Stephens 1974 <doi:10.1080/01621459.1974.10480196>), Datum()
for generating a comprehensive summary of a data vector with various statistics (including data type, sample size, mean, mode, median, variance, standard deviation, maximum, minimum, range, skewness()
, kurtosis()
, and normality test result) (Joanes & Gill 1998 <doi:10.1111/1467-9884.00122>), and DataSumm()
for applying the Datum()
function to each column of a data frame. Emphasizing the importance of normality testing, the package provides robust tools to validate whether data follows a normal distribution, a fundamental assumption in many statistical analyses and models.