This package provides a class of methods that combine dimension reduction and clustering of continuous, categorical or mixed-type data (Markos, Iodice D'Enza and van de Velden 2019; <DOI:10.18637/jss.v091.i10>). For continuous data, the package contains implementations of factorial K-means (Vichi and Kiers 2001; <DOI:10.1016/S0167-9473(00)00064-5>) and reduced K-means (De Soete and Carroll 1994; <DOI:10.1007/978-3-642-51175-2_24>); both methods that combine principal component analysis with K-means clustering. For categorical data, the package provides MCA K-means (Hwang, Dillon and Takane 2006; <DOI:10.1007/s11336-004-1173-x>), i-FCB (Iodice D'Enza and Palumbo 2013, <DOI:10.1007/s00180-012-0329-x>) and Cluster Correspondence Analysis (van de Velden, Iodice D'Enza and Palumbo 2017; <DOI:10.1007/s11336-016-9514-0>), which combine multiple correspondence analysis with K-means. For mixed-type data, it provides mixed Reduced K-means and mixed Factorial K-means (van de Velden, Iodice D'Enza and Markos 2019; <DOI:10.1002/wics.1456>), which combine PCA for mixed-type data with K-means.
Early insights in probability theory were largely influenced by questions about gambling and games of chance, as noted by Blitzstein and Hwang (2019, ISBN:978-1138369917). In modern times, playing cards continue to serve as an effective teaching tool for probability, statistics, and even R programming, as demonstrated by Grolemund (2014, ISBN:978-1449359010). The mmcards package offers a collection of utility functions designed to aid in the creation, manipulation, and utilization of playing card decks in multiple formats. These include a standard 52-card deck, as well as alternative decks such as decks defined by custom anonymous functions and custom interleaved decks. Optimized for the development of educational shiny applications, the package is particularly useful for teaching statistics and probability through card-based games. Functions include shuffle_deck(), which creates either a shuffled standard deck or a shuffled custom alternative deck; deal_card(), which takes a deck and returns a list object containing both the dealt card and the updated deck; and i_deck(), which adds image paths to card objects, further enriching the package's utility in the development of interactive shiny application card games.
Implementation of uniformity tests on the circle and (hyper)sphere. The main function of the package is unif_test(), which conveniently collects more than 35 tests for assessing uniformity on S^p-1 = x in R^p : ||x|| = 1, p >= 2. The test statistics are implemented in the unif_stat() function, which allows computing several statistics for different samples within a single call, thus facilitating Monte Carlo experiments. Furthermore, the unif_stat_MC() function allows parallelizing them in a simple way. The asymptotic null distributions of the statistics are available through the function unif_stat_distr(). The core of sphunif is coded in C++ by relying on the Rcpp package. The package also provides several novel datasets and gives the replicability for the data applications/simulations in Garcà a-Portugués et al. (2021) <doi:10.1007/978-3-030-69944-4_12>, Garcà a-Portugués et al. (2023) <doi:10.3150/21-BEJ1454>, Fernández-de-Marcos and Garcà a-Portugués (2024) <doi:10.1016/j.spl.2024.110218>, and Garcà a-Portugués et al. (2025) <doi:10.1080/01621459.2025.2566414>.
Suns-Voc (or Isc-Voc) curves can provide the current-voltage (I-V) characteristics of the diode of photovoltaic cells without the effect of series resistance. Here, Suns-Voc curves can be constructed with outdoor time-series I-V curves [1,2,3] of full-size photovoltaic (PV) modules instead of having to be measured in the lab. Time series of four different power loss modes can be calculated based on obtained Isc-Voc curves. This material is based upon work supported by the U.S. Department of Energy's Office of Energy Efficiency and Renewable Energy (EERE) under Solar Energy Technologies Office (SETO) Agreement Number DE-EE0008172. Jennifer L. Braid is supported by the U.S. Department of Energy (DOE) Office of Energy Efficiency and Renewable Energy administered by the Oak Ridge Institute for Science and Education (ORISE) for the DOE. ORISE is managed by Oak Ridge Associated Universities (ORAU) under DOE contract number DE-SC0014664. [1] Wang, M. et al, 2018. <doi:10.1109/PVSC.2018.8547772>. [2] Walters et al, 2018 <doi:10.1109/PVSC.2018.8548187>. [3] Guo, S. et al, 2016. <doi:10.1117/12.2236939>.
Description: For the risk, progression, and response to treatment of many complex diseases, it has been increasingly recognized that gene-environment interactions play important roles beyond the main genetic and environmental effects. In practical interaction analyses, outliers in response variables and covariates are not uncommon. In addition, missingness in environmental factors is routinely encountered in epidemiological studies. The developed package consists of five robust approaches to address the outliers problems, among which two approaches can also accommodate missingness in environmental factors. Both continuous and right censored responses are considered. The proposed approaches are based on penalization and sparse boosting techniques for identifying important interactions, which are realized using efficient algorithms. Beyond the gene-environment analysis, the developed package can also be adopted to conduct analysis on interactions between other types of low-dimensional and high-dimensional data. (Mengyun Wu et al (2017), <doi:10.1080/00949655.2018.1523411>; Mengyun Wu et al (2017), <doi:10.1002/gepi.22055>; Yaqing Xu et al (2018), <doi:10.1080/00949655.2018.1523411>; Yaqing Xu et al (2019), <doi:10.1016/j.ygeno.2018.07.006>; Mengyun Wu et al (2021), <doi:10.1093/bioinformatics/btab318>).
The card game War is simple in its rules but can be lengthy. In another domain, the nonparametric bootstrap test with pooled resampling (nbpr) methods, as outlined in Dwivedi, Mallawaarachchi, and Alvarado (2017) <doi:10.1002/sim.7263>, is optimal for comparing paired or unpaired means in non-normal data, especially for small sample size studies. However, many researchers are unfamiliar with these methods. The bootwar package bridges this gap by enabling users to grasp the concepts of nbpr via Boot War, a variation of the card game War designed for small samples. The package provides functions like score_keeper() and play_round() to streamline gameplay and scoring. Once a predetermined number of rounds concludes, users can employ the analyze_game() function to derive game results. This function leverages the npboottprm package's nonparboot() to report nbpr results and, for comparative analysis, also reports results from the stats package's t.test() function. Additionally, bootwar features an interactive shiny web application, bootwar(). This offers a user-centric interface to experience Boot War, enhancing understanding of nbpr methods across various distributions, sample sizes, number of bootstrap resamples, and confidence intervals.
Is used to simulate and fit biological geometries. biogeom incorporates several novel universal parametric equations that can generate the profiles of bird eggs, flowers, linear and lanceolate leaves, seeds, starfish, and tree-rings (Gielis (2003) <doi:10.3732/ajb.90.3.333>; Shi et al. (2020) <doi:10.3390/sym12040645>), three growth-rate curves representing the ontogenetic growth trajectories of animals and plants against time, and the axially symmetrical and integral forms of all these functions (Shi et al. (2017) <doi:10.1016/j.ecolmodel.2017.01.012>; Shi et al. (2021) <doi:10.3390/sym13081524>). The optimization method proposed by Nelder and Mead (1965) <doi:10.1093/comjnl/7.4.308> was used to estimate model parameters. biogeom includes several real data sets of the boundary coordinates of natural shapes, including avian eggs, fruit, lanceolate and ovate leaves, tree rings, seeds, and sea stars,and can be potentially applied to other natural shapes. biogeom can quantify the conspecific or interspecific similarity of natural outlines, and provides information with important ecological and evolutionary implications for the growth and form of living organisms. Please see Shi et al. (2022) <doi:10.1111/nyas.14862> for details.
This package provides functions to fit regression models for bounded continuous and discrete responses. In case of bounded continuous responses (e.g., proportions and rates), available models are the flexible beta (Migliorati, S., Di Brisco, A. M., Ongaro, A. (2018) <doi:10.1214/17-BA1079>), the variance-inflated beta (Di Brisco, A. M., Migliorati, S., Ongaro, A. (2020) <doi:10.1177/1471082X18821213>), the beta (Ferrari, S.L.P., Cribari-Neto, F. (2004) <doi:10.1080/0266476042000214501>), and their augmented versions to handle the presence of zero/one values (Di Brisco, A. M., Migliorati, S. (2020) <doi:10.1002/sim.8406>) are implemented. In case of bounded discrete responses (e.g., bounded counts, such as the number of successes in n trials), available models are the flexible beta-binomial (Ascari, R., Migliorati, S. (2021) <doi:10.1002/sim.9005>), the beta-binomial, and the binomial are implemented. Inference is dealt with a Bayesian approach based on the Hamiltonian Monte Carlo (HMC) algorithm (Gelman, A., Carlin, J. B., Stern, H. S., Rubin, D. B. (2014) <doi:10.1201/b16018>). Besides, functions to compute residuals, posterior predictives, goodness of fit measures, convergence diagnostics, and graphical representations are provided.
The main objective of ViSEAGO package is to carry out a data mining of biological functions and establish links between genes involved in the study. We developed ViSEAGO in R to facilitate functional Gene Ontology (GO) analysis of complex experimental design with multiple comparisons of interest. It allows to study large-scale datasets together and visualize GO profiles to capture biological knowledge. The acronym stands for three major concepts of the analysis: Visualization, Semantic similarity and Enrichment Analysis of Gene Ontology. It provides access to the last current GO annotations, which are retrieved from one of NCBI EntrezGene, Ensembl or Uniprot databases for several species. Using available R packages and novel developments, ViSEAGO extends classical functional GO analysis to focus on functional coherence by aggregating closely related biological themes while studying multiple datasets at once. It provides both a synthetic and detailed view using interactive functionalities respecting the GO graph structure and ensuring functional coherence supplied by semantic similarity. ViSEAGO has been successfully applied on several datasets from different species with a variety of biological questions. Results can be easily shared between bioinformaticians and biologists, enhancing reporting capabilities while maintaining reproducibility.
Implement and fit a variety of short-memory (SM) and long-memory (LM) models from a very broad family of exponential generalized autoregressive conditional heteroskedasticity (EGARCH) models, such as a MEGARCH (modified EGARCH), FIEGARCH (fractionally integrated EGARCH), FIMLog-GARCH (fractionally integrated modulus Log-GARCH), and more. The FIMLog-GARCH as part of the EGARCH family is discussed in Feng et al. (2023) <https://econpapers.repec.org/paper/pdnciepap/156.htm>. For convenience and the purpose of comparison, a variety of other popular SM and LM GARCH-type models, like an APARCH model, a fractionally integrated APARCH (FIAPARCH) model, standard GARCH and fractionally integrated GARCH (FIGARCH) models, GJR-GARCH and FIGJR-GARCH models, TGARCH and FITGARCH models, are implemented as well as dual models with simultaneous modelling of the mean, including dual long-memory models with a fractionally integrated autoregressive moving average (FARIMA) model in the mean and a long-memory model in the variance, and semiparametric volatility model extensions. Parametric models and parametric model parts are fitted through quasi-maximum-likelihood estimation. Furthermore, common forecasting and backtesting functions for value-at-risk (VaR) and expected shortfall (ES) based on the package's models are provided.
Gaussian processes ('GPs') have been widely used to model spatial data, spatio'-temporal data, and computer experiments in diverse areas of statistics including spatial statistics, spatio'-temporal statistics, uncertainty quantification, and machine learning. This package creates basic tools for fitting and prediction based on GPs with spatial data, spatio'-temporal data, and computer experiments. Key characteristics for this GP tool include: (1) the comprehensive implementation of various covariance functions including the Matérn family and the Confluent Hypergeometric family with isotropic form, tensor form, and automatic relevance determination form, where the isotropic form is widely used in spatial statistics, the tensor form is widely used in design and analysis of computer experiments and uncertainty quantification, and the automatic relevance determination form is widely used in machine learning; (2) implementations via Markov chain Monte Carlo ('MCMC') algorithms and optimization algorithms for GP models with all the implemented covariance functions. The methods for fitting and prediction are mainly implemented in a Bayesian framework; (3) model evaluation via Fisher information and predictive metrics such as predictive scores; (4) built-in functionality for simulating GPs with all the implemented covariance functions; (5) unified implementation to allow easy specification of various GPs'.
This package provides a framework for the optimization of breeding programs via optimum contribution selection and mate allocation. An easy to use set of function for computation of optimum contributions of selection candidates, and of the population genetic parameters to be optimized. These parameters can be estimated using pedigree or genotype information, and include kinships, kinships at native haplotype segments, and breed composition of crossbred individuals. They are suitable for managing genetic diversity, removing introgressed genetic material, and accelerating genetic gain. Additionally, functions are provided for computing genetic contributions from ancestors, inbreeding coefficients, the native effective size, the native genome equivalent, pedigree completeness, and for preparing and plotting pedigrees. The methods are described in:\n Wellmann, R., and Pfeiffer, I. (2009) <doi:10.1017/S0016672309000202>.\n Wellmann, R., and Bennewitz, J. (2011) <doi:10.2527/jas.2010-3709>.\n Wellmann, R., Hartwig, S., Bennewitz, J. (2012) <doi:10.1186/1297-9686-44-34>.\n de Cara, M. A. R., Villanueva, B., Toro, M. A., Fernandez, J. (2013) <doi:10.1111/mec.12560>.\n Wellmann, R., Bennewitz, J., Meuwissen, T.H.E. (2014) <doi:10.1017/S0016672314000196>.\n Wellmann, R. (2019) <doi:10.1186/s12859-018-2450-5>.
Efficient estimation of the population-level causal effects of stochastic interventions on a continuous-valued exposure. Both one-step and targeted minimum loss estimators are implemented for the counterfactual mean value of an outcome of interest under an additive modified treatment policy, a stochastic intervention that may depend on the natural value of the exposure. To accommodate settings with outcome-dependent two-phase sampling, procedures incorporating inverse probability of censoring weighting are provided to facilitate the construction of inefficient and efficient one-step and targeted minimum loss estimators. The causal parameter and its estimation were first described by DÃ az and van der Laan (2013) <doi:10.1111/j.1541-0420.2011.01685.x>, while the multiply robust estimation procedure and its application to data from two-phase sampling designs is detailed in NS Hejazi, MJ van der Laan, HE Janes, PB Gilbert, and DC Benkeser (2020) <doi:10.1111/biom.13375>. The software package implementation is described in NS Hejazi and DC Benkeser (2020) <doi:10.21105/joss.02447>. Estimation of nuisance parameters may be enhanced through the Super Learner ensemble model in sl3', available for download from GitHub using remotes::install_github("tlverse/sl3")'.
Work with and download road traffic casualty data from Great Britain. Enables access to the UK's official road safety statistics, STATS19'. Enables users to specify a download directory for the data, which can be set permanently by adding `STATS19_DOWNLOAD_DIRECTORY=/path/to/a/dir` to your `.Renviron` file, which can be opened with `usethis::edit_r_environ()`. The data is provided as a series of `.csv` files. This package downloads, reads-in and formats the data, making it suitable for analysis. See the stats19 vignette for details. Data available from 1979 to 2024. See the official data series at <https://www.data.gov.uk/dataset/cb7ae6f0-4be6-4935-9277-47e5ce24a11f/road-accidents-safety-data>. The package is described in a paper in the Journal of Open Source Software (Lovelace et al. 2019) <doi:10.21105/joss.01181>. See Gilardi et al. (2022) <doi:10.1111/rssa.12823>, Vidal-Tortosa et al. (2021) <doi:10.1016/j.jth.2021.101291>, Tait et al. (2023) <doi:10.1016/j.aap.2022.106895>, and León et al. (2025) <doi:10.18637/jss.v114.i09> for examples of how the data can be used for methodological and empirical research.
This package provides a Scannerless GLR parser/parser generator. Note that GLR standing for "generalized LR", where L stands for "left-to-right" and R stands for "rightmost (derivation)". For more information see <https://en.wikipedia.org/wiki/GLR_parser>. This parser is based on the Tomita (1987) algorithm. (Paper can be found at <https://aclanthology.org/P84-1073.pdf>). The original dparser package documentation can be found at <https://dparser.sourceforge.net/>. This allows you to add mini-languages to R (like rxode2's ODE mini-language Wang, Hallow, and James 2015 <DOI:10.1002/psp4.12052>) or to parse other languages like NONMEM to automatically translate them to R code. To use this in your code, add a LinkingTo dparser in your DESCRIPTION file and instead of using #include <dparse.h> use #include <dparser.h>. This also provides a R-based port of the make_dparser <https://dparser.sourceforge.net/d/make_dparser.cat> command called mkdparser(). Additionally you can parse an arbitrary grammar within R using the dparse() function, which works on most OSes and is mainly for grammar testing. The fastest parsing, of course, occurs at the C level, and is suggested.
Package implements the EDNE-test for equivalence according to Hoffelder et al. (2015) <DOI:10.1080/10543406.2014.920344>. "EDNE" abbreviates "Euclidean Distance between the Non-standardized Expected values". The EDNE-test for equivalence is a multivariate two-sample equivalence test. Distance measure of the test is the Euclidean distance. The test is an asymptotically valid test for the family of distributions fulfilling the assumptions of the multivariate central limit theorem (see Hoffelder et al.,2015). The function EDNE.EQ() implements the EDNE-test for equivalence according to Hoffelder et al. (2015). The function EDNE.EQ.dissolution.profiles() implements a variant of the EDNE-test for equivalence analyses of dissolution profiles (see Suarez-Sharp et al.,2020 <DOI:10.1208/s12248-020-00458-9>). EDNE.EQ.dissolution.profiles() checks whether the quadratic mean of the differences of the expected values of both dissolution profile populations is statistically significantly smaller than 10 [\% of label claim]. The current regulatory standard approach for equivalence analyses of dissolution profiles is the similarity factor f2. The statistical hypotheses underlying EDNE.EQ.dissolution.profiles() coincide with the hypotheses for f2 (see Hoffelder et al.,2015, Suarez-Sharp et al., 2020).
The ConNEcT approach investigates the pairwise association strength of binary time series by calculating contingency measures and depicts the results in a network. The package includes features to explore and visualize the data. To calculate the pairwise concurrent or temporal sequenced relationship between the variables, the package provides seven contingency measures (proportion of agreement, classical & corrected Jaccard, Cohen's kappa, phi correlation coefficient, odds ratio, and log odds ratio), however, others can easily be implemented. The package also includes non-parametric significance tests, that can be applied to test whether the contingency value quantifying the relationship between the variables is significantly higher than chance level. Most importantly this test accounts for auto-dependence and relative frequency.See Bodner et al.(2021) <doi: 10.1111/bmsp.12222>.Finally, a network can be drawn. Variables depicted the nodes of the network, with the node size adapted to the prevalence. The association strength between the variables defines the undirected (concurrent) or directed (temporal sequenced) links between the nodes. The results of the non-parametric significance test can be included by depicting either all links or only the significant ones. Tutorial see Bodner et al.(2021) <doi:10.3758/s13428-021-01760-w>.
This package provides a mainly instrumental package meant to allow other packages whose core is written in C++ to read, write and manipulate matrices in a binary format so that the memory used for them is no more than strictly needed. Its functionality is already inside parallelpam and scellpam', so if you have installed any of these, you do not need to install jmatrix'. Using just the needed memory is not always true with R matrices or vectors, since by default they are of double type. Trials like the float package have been done, but to use them you have to coerce a matrix already loaded in R memory to a float matrix, and then you can delete it. The problem comes when your computer has not memory enough to hold the matrix in the first place, so you are forced to load it by chunks. This is the problem this package tries to address (with partial success, but this is a difficult problem since R is not a strictly typed language, which is anyway quite hard to get in an interpreted language). This package allows the creation and manipulation of full, sparse and symmetric matrices of any standard data type.
Label-free bottom-up proteomics expression data is often affected by data heterogeneity and missing values. Normalization and missing value imputation are commonly used techniques to address these issues and make the dataset suitable for further downstream analysis. This package provides an optimal combination of normalization and imputation methods for the dataset. The package utilizes three normalization methods and three imputation methods.The statistical evaluation measures named pooled co-efficient of variance, pooled estimate of variance and pooled median absolute deviation are used for selecting the best combination of normalization and imputation method for the given dataset. The user can also visualize the results by using various plots available in this package. The user can also perform the differential expression analysis between two sample groups with the function included in this package. The chosen three normalization methods, three imputation methods and three evaluation measures were chosen for this study based on the research papers published by Välikangas et al. (2016) <doi:10.1093/bib/bbw095>, Jin et al. (2021) <doi:10.1038/s41598-021-81279-4> and Srivastava et al. (2023) <doi:10.2174/1574893618666230223150253>.This work has published by Sakthivel et al. (2025) <doi:10.1021/acs.jproteome.4c00552>.
The restricted optimal design method is implemented to optimally allocate a set of items that require calibration to a group of examinees. The optimization process is based on the method described in detail by Ul Hassan and Miller in their works published in (2019) <doi:10.1177/0146621618824854> and (2021) <doi:10.1016/j.csda.2021.107177>. To use the method, preliminary item characteristics must be provided as input. These characteristics can either be expert guesses or based on previous calibration with a small number of examinees. The item characteristics should be described in the form of parameters for an Item Response Theory (IRT) model. These models can include the Rasch model, the 2-parameter logistic model, the 3-parameter logistic model, or a mixture of these models. The output consists of a set of rules for each item that determine which examinees should be assigned to each item. The efficiency or gain achieved through the optimal design is quantified by comparing it to a random allocation. This comparison allows for an assessment of how much improvement or advantage is gained by using the optimal design approach. This work was supported by the Swedish Research Council (Vetenskapsrådet) Grant 2019-02706.
This package provides methods to unify the different ways of creating predictive models and their different predictive formats for classification and regression. It includes methods such as K-Nearest Neighbors Schliep, K. P. (2004) <doi:10.5282/ubm/epub.1769>, Decision Trees Leo Breiman, Jerome H. Friedman, Richard A. Olshen, Charles J. Stone (2017) <doi:10.1201/9781315139470>, ADA Boosting Esteban Alfaro, Matias Gamez, Noelia Garcà a (2013) <doi:10.18637/jss.v054.i02>, Extreme Gradient Boosting Chen & Guestrin (2016) <doi:10.1145/2939672.2939785>, Random Forest Breiman (2001) <doi:10.1023/A:1010933404324>, Neural Networks Venables, W. N., & Ripley, B. D. (2002) <ISBN:0-387-95457-0>, Support Vector Machines Bennett, K. P. & Campbell, C. (2000) <doi:10.1145/380995.380999>, Bayesian Methods Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. (1995) <doi:10.1201/9780429258411>, Linear Discriminant Analysis Venables, W. N., & Ripley, B. D. (2002) <ISBN:0-387-95457-0>, Quadratic Discriminant Analysis Venables, W. N., & Ripley, B. D. (2002) <ISBN:0-387-95457-0>, Logistic Regression Dobson, A. J., & Barnett, A. G. (2018) <doi:10.1201/9781315182780> and Penalized Logistic Regression Friedman, J. H., Hastie, T., & Tibshirani, R. (2010) <doi:10.18637/jss.v033.i01>.
Adaptive Sparse Multi-block Partial Least Square, a supervised algorithm, is an extension of the Sparse Multi-block Partial Least Square, which allows different quantiles to be used in different blocks of different partial least square components to decide the proportion of features to be retained. The best combinations of quantiles can be chosen from a set of user-defined quantiles combinations by cross-validation. By doing this, it enables us to do the feature selection for different blocks, and the selected features can then be further used to predict the outcome. For example, in biomedical applications, clinical covariates plus different types of omics data such as microbiome, metabolome, mRNA data, methylation data, copy number variation data might be predictive for patients outcome such as survival time or response to therapy. Different types of data could be put in different blocks and along with survival time to fit the model. The fitted model can then be used to predict the survival for the new samples with the corresponding clinical covariates and omics data. In addition, Adaptive Sparse Multi-block Partial Least Square Discriminant Analysis is also included, which extends Adaptive Sparse Multi-block Partial Least Square for classifying the categorical outcome.
Estimation and inference using the Generalized Maximum Entropy (GME) and Generalized Cross Entropy (GCE) framework, a flexible method for solving ill-posed inverse problems and parameter estimation under uncertainty (Golan, Judge, and Miller (1996, ISBN:978-0471145925) "Maximum Entropy Econometrics: Robust Estimation with Limited Data"). The package includes routines for generalized cross entropy estimation of linear models including the implementation of a GME-GCE two steps approach. Diagnostic tools, and options to incorporate prior information through support and prior distributions are available (Macedo, Cabral, Afreixo, Macedo and Angelelli (2025) <doi:10.1007/978-3-031-97589-9_21>). In particular, support spaces can be defined by the user or be internally computed based on the ridge trace or on the distribution of standardized regression coefficients. Different optimization methods for the objective function can be used. An adaptation of the normalized entropy aggregation (Macedo and Costa (2019) <doi:10.1007/978-3-030-26036-1_2> "Normalized entropy aggregation for inhomogeneous large-scale data") and a two-stage maximum entropy approach for time series regression (Macedo (2022) <doi:10.1080/03610918.2022.2057540>) are also available. Suitable for applications in econometrics, health, signal processing, and other fields requiring robust estimation under data constraints.
This package provides a collection of tools to handle microsatellite data of any ploidy (and samples of mixed ploidy) where allele copy number is not known in partially heterozygous genotypes. It can import and export data in ABI GeneMapper', Structure', ATetra', Tetrasat'/'Tetra', GenoDive', SPAGeDi', POPDIST', STRand', and binary presence/absence formats. It can calculate pairwise distances between individuals using a stepwise mutation model or infinite alleles model, with or without taking ploidies and allele frequencies into account. These distances can be used for the calculation of clonal diversity statistics or used for further analysis in R. Allelic diversity statistics and Polymorphic Information Content are also available. polysat can assist the user in estimating the ploidy of samples, and it can estimate allele frequencies in populations, calculate pairwise or global differentiation statistics based on those frequencies, and export allele frequencies to SPAGeDi and adegenet'. Functions are also included for assigning alleles to isoloci in cases where one pair of microsatellite primers amplifies alleles from two or more independently segregating isoloci. polysat is described by Clark and Jasieniuk (2011) <doi:10.1111/j.1755-0998.2011.02985.x> and Clark and Schreier (2017) <doi:10.1111/1755-0998.12639>.