Identify and understand clusters of points (typically representing the locations of places or events) stored in simple-features (SF) objects. This is useful for analysing, for example, hot-spots of crime events. The package emphasises producing results from point SF data in a single step using reasonable default values for all other arguments, to aid rapid data analysis by users who are starting out. Functions available include kernel density estimation (for details, see Yip (2020) <doi:10.22224/gistbok/2020.1.12>), analysis of spatial association (Getis and Ord (1992) <doi:10.1111/j.1538-4632.1992.tb00261.x>) and hot-spot classification (Chainey (2020) ISBN:158948584X).
The Wavelet Decomposition followed by Random Forest Regression (RF) models have been applied for time series forecasting. The maximum overlap discrete wavelet transform (MODWT) algorithm was chosen as it works for any length of the series. The series is first divided into training and testing sets. In each of the wavelet decomposed series, the supervised machine learning approach namely random forest was employed to train the model. This package also provides accuracy metrics in the form of Root Mean Square Error (RMSE) and Mean Absolute Prediction Error (MAPE). This package is based on the algorithm of Ding et al. (2021) <DOI: 10.1007/s11356-020-12298-3>.
Some response-adaptive randomization methods commonly found in literature are included in this package. These methods include the randomized play-the-winner rule for binary endpoint (Wei and Durham (1978) <doi:10.2307/2286290>), the doubly adaptive biased coin design with minimal variance strategy for binary endpoint (Atkinson and Biswas (2013) <doi:10.1201/b16101>, Rosenberger and Lachin (2015) <doi:10.1002/9781118742112>) and maximal power strategy targeting Neyman allocation for binary endpoint (Tymofyeyev, Rosenberger, and Hu (2007) <doi:10.1198/016214506000000906>) and RSIHR allocation with each letter representing the first character of the names of the individuals who first proposed this rule (Youngsook and Hu (2010) <doi:10.1198/sbr.2009.0056>, Bello and Sabo (2016) <doi:10.1080/00949655.2015.1114116>), A-optimal Allocation for continuous endpoint (Sverdlov and Rosenberger (2013) <doi:10.1080/15598608.2013.783726>), Aa-optimal Allocation for continuous endpoint (Sverdlov and Rosenberger (2013) <doi:10.1080/15598608.2013.783726>), generalized RSIHR allocation for continuous endpoint (Atkinson and Biswas (2013) <doi:10.1201/b16101>), Bayesian response-adaptive randomization with a control group using the Thall \& Wathen method for binary and continuous endpoints (Thall and Wathen (2007) <doi:10.1016/j.ejca.2007.01.006>) and the forward-looking Gittins index rule for binary and continuous endpoints (Villar, Wason, and Bowden (2015) <doi:10.1111/biom.12337>, Williamson and Villar (2019) <doi:10.1111/biom.13119>).
This package provides a comprehensive set of tools for descriptive statistics, graphical data exploration, outlier detection, homoscedasticity testing, and multiple comparison procedures. Includes manual implementations of Levene's test, Bartlett's test, and the Fligner-Killeen test, as well as post hoc comparison methods such as Tukey, Scheffé, Games-Howell, Brunner-Munzel, and others. This version introduces two new procedures: the Jonckheere-Terpstra trend test and the Jarque-Bera test with Glinskiy's (2024) correction. Designed for use in teaching, applied statistical analysis, and reproducible research. Additionally you can find a post hoc Test Planner, which helps you to make a decision on which procedure is most suitable.
Addressing measurement error in covariates and misclassification in binary outcome variables within causal inference, the ATE.ERROR package implements inverse probability weighted estimation methods proposed by Shu and Yi (2017, <doi:10.1177/0962280217743777>; 2019, <doi:10.1002/sim.8073>). These methods correct errors to accurately estimate average treatment effects (ATE). The package includes two main functions: ATE.ERROR.Y() for handling misclassification in the outcome variable and ATE.ERROR.XY() for correcting both outcome misclassification and covariate measurement error. It employs logistic regression for treatment assignment and uses bootstrap sampling to calculate standard errors and confidence intervals, with simulated datasets provided for practical demonstration.
This package provides computational tools to generate efficient blocked and unblocked fractional factorial designs for two-level and three-level factors using the generalized Minimum Aberration (MA) criterion and related optimization algorithms. Methodological foundations include the general theory of minimum aberration as described by Cheng and Tang (2005) <doi:10.1214/009053604000001228>, and the catalogue of three-level regular fractional factorial designs developed by Xu (2005) <doi:10.1007/s00184-005-0408-x>. The main functions dol2() and dol3() generate blocked two-level and three-level fractional factorial designs, respectively, using beam search, optimization-based ranking, confounding assessment, and structured output suitable for complete factorial situations.
Gene Symbols or Ensembl Gene IDs are converted using the Bimap interface in AnnotationDbi in convertId2() but that function is only provided as fallback mechanism for the most common use cases in data analysis. The main function in the package is convert.bm() which queries BioMart using the full capacity of the API provided through the biomaRt package. Presets and defaults are provided for convenience but all "marts", "filters" and "attributes" can be set by the user. Function convert.alias() converts Gene Symbols to Aliases and vice versa and function likely_symbol() attempts to determine the most likely current Gene Symbol.
This package performs biomedical named entity recognition, Unified Medical Language System (UMLS) concept mapping, and negation detection using the Python spaCy', scispaCy', and medspaCy packages, and transforms extracted data into a wide format for inclusion in machine learning models. The development of the scispaCy package is described by Neumann (2019) <doi:10.18653/v1/W19-5034>. The medspacy package uses ConText', an algorithm for determining the context of clinical statements described by Harkema (2009) <doi:10.1016/j.jbi.2009.05.002>. Clinspacy also supports entity embeddings from scispaCy and UMLS cui2vec concept embeddings developed by Beam (2018) <arXiv:1804.01486>.
This package provides utility functions, distributions, and fitting methods for Bayesian Spatial Capture-Recapture (SCR) and Open Population Spatial Capture-Recapture (OPSCR) modelling using the nimble package (de Valpine et al. 2017 <doi:10.1080/10618600.2016.1172487 >). Development of the package was motivated primarily by the need for flexible and efficient analysis of large-scale SCR data (Bischof et al. 2020 <doi:10.1073/pnas.2011383117 >). Computational methods and techniques implemented in nimbleSCR include those discussed in Turek et al. 2021 <doi:10.1002/ecs2.3385>; among others. For a recent application of nimbleSCR, see Milleret et al. (2021) <doi:10.1098/rsbl.2021.0128>.
Supports risk assessors in performing the entry step of the quantitative Pest Risk Assessment. It allows the estimation of the amount of a plant pest entering a risk assessment area (in terms of founder populations) through the calculation of the imported commodities that could be potential pathways of pest entry, and the development of a pathway model. Two Shiny apps based on the functionalities of the package are included, that simplify the process of assessing the risk of entry of plant pests. The approach is based on the work of the European Food Safety Authority (EFSA PLH Panel et al., 2018) <doi:10.2903/j.efsa.2018.5350>.
The goal of dynamicpv is to provide a simple way to calculate (net) present values and outputs from health economic models (especially cost-effectiveness and budget impact) in discrete time that reflect dynamic pricing and dynamic uptake. Dynamic pricing is also known as life cycle pricing; dynamic uptake is also known as multiple or stacked cohorts, or dynamic disease prevalence. Shafrin (2024) <doi:10.1515/fhep-2024-0014> provides an explanation of dynamic value elements, in the context of Generalized Cost Effectiveness Analysis, and Puls (2024) <doi:10.1016/j.jval.2024.03.006> reviews challenges of incorporating such dynamic value elements. This package aims to reduce those challenges.
The introduction of the broom package has made converting model objects into data frames as simple as a single function. While the broom package focuses on providing tidy data frames that can be used in advanced analysis, it deliberately stops short of providing functionality for reporting models in publication-ready tables. pixiedust provides this functionality with a programming interface intended to be similar to ggplot2's system of layers with fine tuned control over each cell of the table. Options for output include printing to the console and to the common markdown formats (markdown, HTML, and LaTeX). With a little pixiedust (and happy thoughts) tables can really fly.
This package implements a two-stage estimation approach for Cox regression using five-parameter M-spline functions to model the baseline hazard. It allows for flexible hazard shapes and model selection based on log-likelihood criteria as described in Teranishi et al.(2025). In addition, the package provides functions for constructing and evaluating B-spline copulas based on five M-spline or I-spline basis functions, allowing users to flexibly model and compute bivariate dependence structures. Both the copula function and its density can be evaluated. Furthermore, the package supports computation of dependence measures such as Kendall's tau and Spearman's rho, derived analytically from the copula parameters.
An R Shiny application for visual and statistical exploration and web communication of archaeological spatial data, either remains or sites. It offers interactive 3D and 2D visualisations (cross sections and maps of remains, timeline of the work made in a site) which can be exported in SVG and HTML formats. It performs simple spatial statistics (convex hull, regression surfaces, 2D kernel density estimation) and allows exporting data to other online applications for more complex methods. archeoViz can be used offline locally or deployed on a server, either with interactive input of data or with a static data set. Example is provided at <https://analytics.huma-num.fr/archeoviz/en>.
Original idea was presented in the thesis "A statistical analysis tool for agricultural research" to obtain the degree of Master on science, National Engineering University (UNI), Lima-Peru. Some experimental data for the examples come from the CIP and others research. Agricolae offers extensive functionality on experimental design especially for agricultural and plant breeding experiments, which can also be useful for other purposes. It supports planning of lattice, Alpha, Cyclic, Complete Block, Latin Square, Graeco-Latin Squares, augmented block, factorial, split and strip plot designs. There are also various analysis facilities for experimental data, e.g. treatment comparison procedures and several non-parametric tests comparison, biodiversity indexes and consensus cluster.
Fits tractable fully parametric odds-based regression models for survival data, including proportional odds (PO), accelerated failure time (AFT), accelerated odds (AO), and General Odds (GO) models in overall survival frameworks. Given at least an R function specifying the survivor, hazard rate and cumulative distribution functions, any user-defined parametric distribution can be fitted. We applied and evaluated a minimum of seventeen (17) various baseline distributions that can handle different failure rate shapes for each of the four different proposed odds-based regression models. For more information see Bennet et al., (1983) <doi:10.1002/sim.4780020223>, and Muse et al., (2022) <doi:10.1016/j.aej.2022.01.033>.
These are data sets for the hit TV show, RuPaul's Drag Race. Data right now include episode-level data, contestant-level data, and episode-contestant-level data. This is a work in progress, and a love letter of a kind to RuPaul's Drag Race and the performers that have appeared on the show. This may not be the most productive use of my time, but I have tenure and what are you going to do about it? I think there is at least some value in this package if it allows the show's fandom to learn more about the R programming language around its contents.
Conduct multi-locus genome-wide association study under the framework of multi-locus random-SNP-effect mixed linear model (mrMLM). First, each marker on the genome is scanned. Bonferroni correction is replaced by a less stringent selection criterion for significant test. Then, all the markers that are potentially associated with the trait are included in a multi-locus genetic model, their effects are estimated by empirical Bayes and all the nonzero effects were further identified by likelihood ratio test for true QTL. Wen YJ, Zhang H, Ni YL, Huang B, Zhang J, Feng JY, Wang SB, Dunwell JM, Zhang YM, Wu R (2018) <doi:10.1093/bib/bbw145>.
Package for estimating, analyzing, and forecasting multi-country macro-finance affine term structure models (ATSMs). All setups build on the single-country unspanned macroeconomic risk framework from Joslin, Priebsch, and Singleton (2014, JF) <doi:10.1111/jofi.12131>. Multicountry extensions by Jotikasthira, Le, and Lundblad (2015, JFE) <doi:10.1016/j.jfineco.2014.09.004>, Candelon and Moura (2023, EM) <doi:10.1016/j.econmod.2023.106453>, and Candelon and Moura (2024, JFEC) <doi:10.1093/jjfinec/nbae008> are also available. The package also provides tools for bias correction as in Bauer Rudebusch and Wu (2012, JBES) <doi:10.1080/07350015.2012.693855>, bootstrap analysis, and several graphical/numerical outputs.
Computes the probability density function, cumulative distribution function, quantile function, random numbers and measures of inference for the following general families of distributions (each family defined in terms of an arbitrary cdf G): Marshall Olkin G distributions, exponentiated G distributions, beta G distributions, gamma G distributions, Kumaraswamy G distributions, generalized beta G distributions, beta extended G distributions, gamma G distributions, gamma uniform G distributions, beta exponential G distributions, Weibull G distributions, log gamma G I distributions, log gamma G II distributions, exponentiated generalized G distributions, exponentiated Kumaraswamy G distributions, geometric exponential Poisson G distributions, truncated-exponential skew-symmetric G distributions, modified beta G distributions, and exponentiated exponential Poisson G distributions.
An object is called "outlier" if it remarkably deviates from the other objects in a data set. Outlier detection is the process to find outliers by using the methods that are based on distance measures, clustering and spatial methods (Ben-Gal, 2005 <ISBN 0-387-24435-2>). It is one of the intensively studied research topics for identification of novelties, frauds, anomalies, deviations or exceptions in addition to its use for outlier removing in data processing. This package provides the implementations of some novel approaches to detect the outliers based on typicality degrees that are obtained with the soft partitioning clustering algorithms such as Fuzzy C-means and its variants.
This package provides functions to estimate and interpret the alpha-NOMINATE ideal point model developed in Carroll et al. (2013, <doi:10.1111/ajps.12029>). alpha-NOMINATE extends traditional spatial voting frameworks by allowing for a mixture of Gaussian and quadratic utility functions, providing flexibility in modeling political actors preferences. The package uses Markov Chain Monte Carlo (MCMC) methods for parameter estimation, supporting robust inference about individuals ideological positions and the shape of their utility functions. It also contains functions to simulate data from the model and to calculate the probability of a vote passing given the ideal points of the legislators/voters and the estimated location of the choice alternatives.
This package contains Bayesian implementations of the Mixed-Effects Accelerated Failure Time (MEAFT) models for censored data. Those can be not only right-censored but also interval-censored, doubly-interval-censored or misclassified interval-censored. The methods implemented in the package have been published in Komárek and Lesaffre (2006, Stat. Modelling) <doi:10.1191/1471082X06st107oa>, Komárek, Lesaffre and Legrand (2007, Stat. in Medicine) <doi:10.1002/sim.3083>, Komárek and Lesaffre (2007, Stat. Sinica) <https://www3.stat.sinica.edu.tw/statistica/oldpdf/A17n27.pdf>, Komárek and Lesaffre (2008, JASA) <doi:10.1198/016214507000000563>, Garcà a-Zattera, Jara and Komárek (2016, Biometrics) <doi:10.1111/biom.12424>.
This package provides a class of Bayesian beta regression models for the analysis of continuous data with support restricted to an unknown finite support. The response variable is modeled using a four-parameter beta distribution with the mean or mode parameter depending linearly on covariates through a link function. When the response support is known to be (0,1), the above class of models reduce to traditional (0,1) supported beta regression models. Model choice is carried out via the logarithm of the pseudo marginal likelihood (LPML), the deviance information criterion (DIC), and the Watanabe-Akaike information criterion (WAIC). See Zhou and Huang (2022) <doi:10.1016/j.csda.2021.107345>.