This package provides a R driver for Apache Drill<https://drill.apache.org>, which could connect to the Apache Drill cluster<https://drill.apache.org/docs/installing-drill-on-the-cluster> or drillbit<https://drill.apache.org/docs/embedded-mode-prerequisites> and get result(in data frame) from the SQL query and check the current configuration status. This link <https://drill.apache.org/docs> contains more information about Apache Drill.
For cleaning and analysis of graphs, such as animal closing force measurements. forceR was initially written and optimized to deal with insect bite force measurements, but can be used for any time series. Includes a full workflow to load, plot and crop data, correct amplifier and baseline drifts, identify individual peak shapes (bites), rescale (normalize) peak curves, and find best polynomial fits to describe and analyze force curve shapes.
Currently used CI method has its limitation when the test statistics are asymmetrical (chi-square test, F-test) or the model functions are non-linear. It can be overcome by using the likelihood functions for the interval estimation. inteli package now supports interval estimation for the mean, variance, variance ratio, binomial distribution, Poisson distribution, odds ratio, risk difference, relative risk and their likelihood function plots. Testing functions are also provided.
This package provides an implementation of a kernel-embedding of probability test for elliptical distribution. This is an asymptotic test for elliptical distribution under general alternatives, and the location and shape parameters are assumed to be unknown. Some side-products are posted, including the transformation between rectangular and polar coordinates and two product-type kernel functions. See Tang and Li (2024) <doi:10.48550/arXiv.2306.10594> for details.
Fits keyword assisted topic models (keyATM) using collapsed Gibbs samplers. The keyATM combines the latent dirichlet allocation (LDA) models with a small number of keywords selected by researchers in order to improve the interpretability and topic classification of the LDA. The keyATM can also incorporate covariates and directly model time trends. The keyATM is proposed in Eshima, Imai, and Sasaki (2024) <doi:10.1111/ajps.12779>.
Easily import the MI-SUVI data sets. The user can import data sets with full metrics, percentiles, Z-scores, or rankings. Data is available at both the County and Zip Code Tabulation Area (ZCTA) levels. This package also includes a function to import shape files for easy mapping and a function to access the full technical documentation. All data is sourced from the Michigan Department of Health and Human Services.
Dealing with neutrosophic data in single valued form using score, accuracy and certainty functions to calculate ranks of Single Valued Neutrosophic Set (SVNS), also to calculate the Mann-Whitney test, and making a post-hoc test after rejecting the null hypothesis using the Neutrosophic Statistics Kruskal-Wallis test. For more information see Miari, Mahmoud; Anan, Mohamad Taher; Zeina, Mohamed Bisher(2022) <https://digitalrepository.unm.edu/nss_journal/vol51/iss1/60/>.
Plot the daily and cumulative number of downloads of your packages. It is designed to be slightly more convenient than the several similar programs. If you want to run this each morning, you do not need to keep typing in the names of your packages. Also, this combines the daily and cumulative counts in one run, you do not need to run separate programs to get both types of information.
Calculates phenological cycle and anomalies using a non-parametric approach applied to time series of vegetation indices derived from remote sensing data or field measurements. The package implements basic and high-level functions for manipulating vector data (numerical series) and raster data (satellite derived products). Processing of very large raster files is supported. For more information, please check the following paper: Chávez et al. (2023) <doi:10.3390/rs15010073>.
Robust nonparametric bootstrap and permutation tests for goodness of fit, distribution equivalence, location, correlation, and regression problems, as described in Helwig (2019a) <doi:10.1002/wics.1457> and Helwig (2019b) <doi:10.1016/j.neuroimage.2019.116030>. Univariate and multivariate tests are supported. For each problem, exact tests and Monte Carlo approximations are available. Five different nonparametric bootstrap confidence intervals are implemented. Parallel computing is implemented via the parallel package.
This package implements the copula-based estimator for univariate long-range dependent processes, introduced in Pumi et al. (2023) <doi:10.1007/s00362-023-01418-z>. Notably, this estimator is capable of handling missing data and has been shown to perform exceptionally well, even when up to 70% of data is missing (as reported in <arXiv:2303.04754>) and has been found to outperform several other commonly applied estimators.
It offers a wide variety of techniques, such as graphics, recoding, or regression models, for a comprehensive analysis of patient-reported outcomes (PRO). Especially novel is the broad range of regression models based on the beta-binomial distribution useful for analyzing binomial data with over-dispersion in cross-sectional, longitudinal, or multidimensional response studies (see Najera-Zuloaga J., Lee D.-J. and Arostegui I. (2019) <doi:10.1002/bimj.201700251>).
This package provides functions to aid the identification of probable/possible duplicates in Plant Genetic Resources (PGR) collections using passport databases comprising of information records of each constituent sample. These include methods for cleaning the data, creation of a searchable Key Word in Context (KWIC) index of keywords associated with sample records and the identification of nearly identical records with similar information by fuzzy, phonetic and semantic matching of keywords.
Utilities for single nucleotide polymorphism (SNP) based kinship analysis testing and evaluation. The skater package contains functions for importing, parsing, and analyzing pedigree data, performing relationship degree inference, benchmarking relationship degree classification, and summarizing identity by descent (IBD) segment data. Package functions and methods are described in Turner et al. (2021) "skater: An R package for SNP-based Kinship Analysis, Testing, and Evaluation" <doi:10.1101/2021.07.21.453083>.
Similarity regression, evaluating the probability of association between sets of ontological terms and binary response vector. A no-association model is compared with one in which the log odds of a true response is linked to the semantic similarity between terms and a latent characteristic ontological profile - Phenotype Similarity Regression for Identifying the Genetic Determinants of Rare Diseases', Greene et al 2016 <doi:10.1016/j.ajhg.2016.01.008>.
Stationary subspace analysis (SSA) is a blind source separation (BSS) variant where stationary components are separated from non-stationary components. Several SSA methods for multivariate time series are provided here (Flumian et al. (2021); Hara et al. (2010) <doi:10.1007/978-3-642-17537-4_52>) along with functions to simulate time series with time-varying variance and autocovariance (Patilea and Raissi(2014) <doi:10.1080/01621459.2014.884504>).
The nature of working with structured query language ('SQL') scripts efficiently often requires the creation of temporary tables and there are few clean and simple R SQL execution approaches that allow you to complete this kind of work with the R environment. This package seeks to give SQL implementations in R a little love by deploying functions that allow you to deploy complex SQL scripts within a typical R workflow.
Sample size calculation to detect dynamic treatment regime (DTR) effects based on change in clinical attachment level (CAL) outcomes from a non-surgical chronic periodontitis treatments study. The experiment is performed under a Sequential Multiple Assignment Randomized Trial (SMART) design. The clustered tooth (sub-unit) level CAL outcomes are skewed, spatially-referenced, and non-randomly missing. The implemented algorithm is available in Xu et al. (2019+) <arXiv:1902.09386>.
Computes a point pattern in R^2 or on a graph that is representative of a collection of many data patterns. The result is an approximate barycenter (also known as Fréchet mean or prototype) based on a transport-transform metric. Possible choices include Optimal SubPattern Assignment (OSPA) and Spike Time metrics. Details can be found in Müller, Schuhmacher and Mateu (2020) <doi:10.1007/s11222-020-09932-y>.
This package provides a simple approach for constructing dynamic materials modeling suggested by Prasad and Gegel (1984) <doi:10.1007/BF02664902>. It can easily generate various processing-maps based on this model as well. The calculation result in this package contains full materials constants, information about power dissipation efficiency factor, and rheological properties, can be exported completely also, through which further analysis and customized plots will be applicable as well.
Mixed type vectors are useful for combining semantically similar classes. Some examples of semantically related classes include time across different granularities (e.g. daily, monthly, annual) and probability distributions (e.g. Normal, Uniform, Poisson). These groups of vector types typically share common statistical operations which vary in results with the attributes of each vector. The vecvec data structure facilitates efficient storage and computation across multiple vectors within the same object.
Toolkit to support and perform discrete event simulations with and without resource constraints in the context of health technology assessments (HTA). The package focuses on cost-effectiveness modelling and aims to be submission-ready to relevant HTA bodies in alignment with NICE TSD 15 <https://sheffield.ac.uk/nice-dsu/tsds/patient-level-simulation>. More details an examples can be found in the package website <https://jsanchezalv.github.io/WARDEN/>.
Enables researchers to sample redistricting plans from a pre-specified target distribution using Sequential Monte Carlo and Markov Chain Monte Carlo algorithms. The package allows for the implementation of various constraints in the redistricting process such as geographic compactness and population parity requirements. Tools for analysis such as computation of various summary statistics and plotting functionality are also included. The package implements the SMC algorithm of McCartan and Imai (2023) <doi:10.1214/23-AOAS1763>, the enumeration algorithm of Fifield, Imai, Kawahara, and Kenny (2020) <doi:10.1080/2330443X.2020.1791773>, the Flip MCMC algorithm of Fifield, Higgins, Imai and Tarr (2020) <doi:10.1080/10618600.2020.1739532>, the Merge-split/Recombination algorithms of Carter et al. (2019) <doi:10.48550/arXiv.1911.01503> and DeFord et al. (2021) <doi:10.1162/99608f92.eb30390f>, and the Short-burst optimization algorithm of Cannon et al. (2020) <doi:10.48550/arXiv.2011.02288>.
The R Analytic Tool To Learn Easily (Rattle) provides a collection of utilities functions for the data scientist. This package (v5.6.0) supports the companion graphical interface with the aim to provide a simple and intuitive introduction to R for data science, allowing a user to quickly load data from a CSV file transform and explore the data, and to build and evaluate models. A key aspect of the GUI is that all R commands are logged and commented through the log tab. This can be saved as a standalone R script file and as an aid for the user to learn R or to copy-and-paste directly into R itself. If you want to use the older Rattle implementing the GUI in RGtk2 (which is no longer available from CRAN) then please install the Rattle package v5.5.1. See rattle.togaware.com for instructions on installing the modern Rattle graphical user interface.