This package provides a comprehensive suite of statistical tools for analyzing, simulating, and computing properties of the Topp-Leone Cauchy Rayleigh (TLCAR) distribution, a versatile distribution amalgamating features of the Topp-Leone, Cauchy, and Rayleigh distributions, ideal for modeling intricate, heterogeneous data across scientific domains. See Atchadé, M.N., Bogninou, M.J., and Djibril, A.M. (2023) <doi:10.1007/s44199-023-00066-4> and Atchadé, M.N., Bogninou, M.J., and Djibril, A.M. (2024) <doi:10.1007/s44199-023-00069-1> for further insights.
This tool provides functions to load, segment and classify zooplankton images. The image processing algorithms and the machine learning classifiers in this package are (will be, since these have not been added yet) direct ports of an early python implementation that can be found at <https://github.com/arickGrootveld/ZooID>. The model weights and datasets (also not added yet) that are a part of this package can also be found at Arick Grootveld, Eva R. Kozak, Carmen Franco-Gordo (2023) <doi:10.5281/zenodo.7979996>.
This package provides flexible Bayesian estimation of IMIFA and related models, for nonparametrically clustering high-dimensional data. The IMIFA model conducts Bayesian nonparametric model-based clustering with factor analytic covariance structures without recourse to model selection criteria to choose the number of clusters or cluster-specific latent factors, mostly via efficient Gibbs updates. Model-specific diagnostic tools are also provided, as well as many options for plotting results, conducting posterior inference on parameters of interest, posterior predictive checking, and quantifying uncertainty.
This is package for QTL mapping in a mixed model framework with separate detection and localization stages. The first stage detects the number of QTL on each chromosome based on the genetic variation due to grouped markers on the chromosome; the second stage uses this information to determine the most likely QTL positions. The mixed model can accommodate general fixed and random effects, including spatial effects in field trials and pedigree effects. It is applicable to backcrosses, doubled haploids, recombinant inbred lines, F2 intercrosses, and association mapping populations.
Decomposition of (income) inequality by population sub groups. For a decomposition on a single variable the mean log deviation can be used (see Mookherjee Shorrocks (1982) <DOI:10.2307/2232673>). For a decomposition on multiple variables a regression based technique can be used (see Fields (2003) <DOI:10.1016/s0147-9121(03)22001-x>). Recentered influence function regression for marginal effects of the (income or wealth) distribution (see Firpo et al. (2009) <DOI:10.3982/ECTA6822>). Some extensions to inequality functions to handle weights and/or missings.
This package implements an efficient algorithm for solving sparse-penalized support vector machines with kernel density convolution. This package is designed for high-dimensional classification tasks, supporting lasso (L1) and elastic-net penalties for sparse feature selection and providing options for tuning kernel bandwidth and penalty weights. The dcsvm is applicable to fields such as bioinformatics, image analysis, and text classification, where high-dimensional data commonly arise. Learn more about the methodology and algorithm at Wang, Zhou, Gu, and Zou (2023) <doi:10.1109/TIT.2022.3222767>.
This package provides methods for fitting various extreme value distributions with parameters of generalised additive model (GAM) form are provided. For details of distributions see Coles, S.G. (2001) <doi:10.1007/978-1-4471-3675-0>, GAMs see Wood, S.N. (2017) <doi:10.1201/9781315370279>, and the fitting approach see Wood, S.N., Pya, N. & Safken, B. (2016) <doi:10.1080/01621459.2016.1180986>. Details of how evgam works and various examples are given in Youngman, B.D. (2022) <doi:10.18637/jss.v103.i03>.
Helper functions provide an accurate imputation algorithm for reconstructing the missing segment in a multi-variate data streams. Inspired by single-shot learning, it reconstructs the missing segment by identifying the first similar segment in the stream. Nevertheless, there should be one column of data available, i.e. a constraint column. The values of columns can be characters (A, B, C, etc.). The result of the imputed dataset will be returned a .csv file. For more details see Reza Rawassizadeh (2019) <doi:10.1109/TKDE.2019.2914653>.
This package provides functions for simulating and estimating parameters of various growth models, including Logistic, Exponential, Theta-logistic, Von-Bertalanffy, and Gompertz models. The package supports both simulated and real data analysis, including parameter estimation, visualization, and calculation of global and local estimates. The methods are based on research described by Md Aktar Ul Karim and Amiya Ranjan Bhowmick (2022) in (<https://www.researchsquare.com/article/rs-2363586/v1>). An interactive web application is also available at [GPEMR Web App](<https://gpem-r.shinyapps.io/GPEM-R/>).
An S4 class and several functions which utilize internally stored datasets and gauging data enable 1d water level interpolation. The S4 class (WaterLevelDataFrame) structures the computation and visualisation of 1d water level information along the German federal waterways Elbe and Rhine. hyd1d delivers 1d water level data - extracted from the FLYS database - and validated gauging data - extracted from the hydrological database WISKI7 - package-internally. For computations near real time gauging data are queried externally from the PEGELONLINE REST API <https://pegelonline.wsv.de/webservice/dokuRestapi>.
This package performs Gaussian process regression with heteroskedastic noise following the model by Binois, M., Gramacy, R., Ludkovski, M. (2016) <doi:10.48550/arXiv.1611.05902>, with implementation details in Binois, M. & Gramacy, R. B. (2021) <doi:10.18637/jss.v098.i13>. The input dependent noise is modeled as another Gaussian process. Replicated observations are encouraged as they yield computational savings. Sequential design procedures based on the integrated mean square prediction error and lookahead heuristics are provided, and notably fast update functions when adding new observations.
This package provides a set of evolutionary algorithms to solve many-objective optimization. Hybridization between the algorithms are also facilitated. Available algorithms are: SMS-EMOA <doi:10.1016/j.ejor.2006.08.008> NSGA-III <doi:10.1109/TEVC.2013.2281535> MO-CMA-ES <doi:10.1145/1830483.1830573> The following many-objective benchmark problems are also provided: DTLZ1'-'DTLZ4 from Deb, et al. (2001) <doi:10.1007/1-84628-137-7_6> and WFG4'-'WFG9 from Huband, et al. (2005) <doi:10.1109/TEVC.2005.861417>.
Estimate diagnostic classification models (also called cognitive diagnostic models) with Stan'. Diagnostic classification models are confirmatory latent class models, as described by Rupp et al. (2010, ISBN: 978-1-60623-527-0). Automatically generate Stan code for the general loglinear cognitive diagnostic diagnostic model proposed by Henson et al. (2009) <doi:10.1007/s11336-008-9089-5> and other subtypes that introduce additional model constraints. Using the generated Stan code, estimate the model evaluate the model's performance using model fit indices, information criteria, and reliability metrics.
Generate and analyze Optimal Channel Networks (OCNs): oriented spanning trees reproducing all scaling features characteristic of real, natural river networks. As such, they can be used in a variety of numerical experiments in the fields of hydrology, ecology and epidemiology. See Carraro et al. (2020) <doi:10.1002/ece3.6479> for a presentation of the package; Rinaldo et al. (2014) <doi:10.1073/pnas.1322700111> for a theoretical overview on the OCN concept; Furrer and Sain (2010) <doi:10.18637/jss.v036.i10> for the construct used.
This package provides a simple to use summary function that can be used with pipes and displays nicely in the console. The default summary statistics may be modified by the user as can the default formatting. Support for data frames and vectors is included, and users can implement their own skim methods for specific object types as described in a vignette. Default summaries include support for inline spark graphs. Instructions for managing these on specific operating systems are given in the "Using skimr" vignette and the README.
Complex machine learning models are often hard to interpret. However, in many situations it is crucial to understand and explain why a model made a specific prediction. Shapley values is the only method for such prediction explanation framework with a solid theoretical foundation. Previously known methods for estimating the Shapley values do, however, assume feature independence. This package implements methods which accounts for any feature dependence, and thereby produces more accurate estimates of the true Shapley values. An accompanying Python wrapper ('shaprpy') is available through PyPI.
Real-time quantitative polymerase chain reaction (qPCR) data sets by Boggy et al. (2008) <doi:10.1371/journal.pone.0012355>. This package provides a dilution series for one PCR target: a random sequence that minimizes secondary structure and off-target primer binding. The data set is a six-point, ten-fold dilution series. For each concentration there are two replicates. Each amplification curve is 40 cycles long. Original raw data file: <https://journals.plos.org/plosone/article/file?type=supplementary&id=10.1371/journal.pone.0012355.s004>.
Provide functions for overlaps clustering, fuzzy clustering and interval-valued data manipulation. The package implement the following algorithms: OKM (Overlapping Kmeans) from Cleuziou, G. (2007) <doi:10.1109/icpr.2008.4761079> ; NEOKM (Non-exhaustive overlapping Kmeans) from Whang, J. J., Dhillon, I. S., and Gleich, D. F. (2015) <doi:10.1137/1.9781611974010.105> ; Fuzzy Cmeans from Bezdek, J. C. (1981) <doi:10.1007/978-1-4757-0450-1> ; Fuzzy I-Cmeans from de A.T. De Carvalho, F. (2005) <doi:10.1016/j.patrec.2006.08.014>.
This package provides a general-purpose computational engine for data analysis, drake rebuilds intermediate data objects when their dependencies change, and it skips work when the results are already up to date. Not every execution starts from scratch, there is native support for parallel and distributed computing, and completed projects have tangible evidence that they are reproducible. Extensive documentation, from beginner-friendly tutorials to practical examples and more, is available at the reference website <https://docs.ropensci.org/drake/> and the online manual <https://books.ropensci.org/drake/>.
The fftab package stores Fourier coefficients in a tibble and allows their manipulation in various ways. Functions are available for converting between complex, rectangular ('re', im'), and polar ('mod', arg') representations, as well as for extracting components as vectors or matrices. Inputs can include vectors, time series, and arrays of arbitrary dimensions, which are restored to their original form when inverting the transform. Since fftab stores Fourier frequencies as columns in the tibble, many standard operations on spectral data can be easily performed using tidy packages like dplyr'.
We implement various tests for the composite hypothesis of testing the fit to the family of inverse Gaussian distributions. Included are methods presented by Allison, J.S., Betsch, S., Ebner, B., and Visagie, I.J.H. (2022) <doi:10.48550/arXiv.1910.14119>, as well as two tests from Henze and Klar (2002) <doi:10.1023/A:1022442506681>. Additionally, the package implements a test proposed by Baringhaus and Gaigall (2015) <doi:10.1016/j.jmva.2015.05.013>. For each test a parametric bootstrap procedure is implemented.
This package provides a data generator of multivariate non-normal data in R. It combines two different methods to generate non-normal data, one with user-specified multivariate skewness and kurtosis (more details can be found in the paper: Qu, Liu, & Zhang, 2019 <doi:10.3758/s13428-019-01291-5>), and the other with the given marginal skewness and kurtosis. The latter one is the widely-used Vale and Maurelli's method. It also contains a function to calculate univariate and multivariate (Mardia's Test) skew and kurtosis.
Lightweight utilities for nucleic acid melting curve analysis are important in life sciences and diagnostics. This software can be used for the analysis and presentation of melting curve data from microbead-based assays (surface melting curve analysis) and reactions in solution (e.g., quantitative PCR (qPCR), real-time isothermal Amplification). Further information are described in detail in two publications in The R Journal [ <https://journal.r-project.org/archive/2013-2/roediger-bohm-schimke.pdf>; <https://journal.r-project.org/archive/2015-1/RJ-2015-1.pdf>].
This package performs smoothed (and non-smoothed) principal/independent components analysis of functional data. Various functional pre-whitening approaches are implemented as discussed in Vidal and Aguilera (2022) â Novel whitening approaches in functional settings", <doi:10.1002/sta4.516>. Further whitening representations of functional data can be derived in terms of a few principal components, providing an avenue to explore hidden structures in low dimensional settings: see Vidal, Rosso and Aguilera (2021) â Bi-smoothed functional independent component analysis for EEG artifact removalâ , <doi:10.3390/math9111243>.