Description: For the risk, progression, and response to treatment of many complex diseases, it has been increasingly recognized that gene-environment interactions play important roles beyond the main genetic and environmental effects. In practical interaction analyses, outliers in response variables and covariates are not uncommon. In addition, missingness in environmental factors is routinely encountered in epidemiological studies. The developed package consists of five robust approaches to address the outliers problems, among which two approaches can also accommodate missingness in environmental factors. Both continuous and right censored responses are considered. The proposed approaches are based on penalization and sparse boosting techniques for identifying important interactions, which are realized using efficient algorithms. Beyond the gene-environment analysis, the developed package can also be adopted to conduct analysis on interactions between other types of low-dimensional and high-dimensional data. (Mengyun Wu et al (2017), <doi:10.1080/00949655.2018.1523411>; Mengyun Wu et al (2017), <doi:10.1002/gepi.22055>; Yaqing Xu et al (2018), <doi:10.1080/00949655.2018.1523411>; Yaqing Xu et al (2019), <doi:10.1016/j.ygeno.2018.07.006>; Mengyun Wu et al (2021), <doi:10.1093/bioinformatics/btab318>).
The card game War is simple in its rules but can be lengthy. In another domain, the nonparametric bootstrap test with pooled resampling (nbpr) methods, as outlined in Dwivedi, Mallawaarachchi, and Alvarado (2017) <doi:10.1002/sim.7263>, is optimal for comparing paired or unpaired means in non-normal data, especially for small sample size studies. However, many researchers are unfamiliar with these methods. The bootwar package bridges this gap by enabling users to grasp the concepts of nbpr via Boot War, a variation of the card game War designed for small samples. The package provides functions like score_keeper()
and play_round()
to streamline gameplay and scoring. Once a predetermined number of rounds concludes, users can employ the analyze_game()
function to derive game results. This function leverages the npboottprm package's nonparboot()
to report nbpr results and, for comparative analysis, also reports results from the stats package's t.test()
function. Additionally, bootwar features an interactive shiny web application, bootwar()
. This offers a user-centric interface to experience Boot War, enhancing understanding of nbpr methods across various distributions, sample sizes, number of bootstrap resamples, and confidence intervals.
Is used to simulate and fit biological geometries. biogeom incorporates several novel universal parametric equations that can generate the profiles of bird eggs, flowers, linear and lanceolate leaves, seeds, starfish, and tree-rings (Gielis (2003) <doi:10.3732/ajb.90.3.333>; Shi et al. (2020) <doi:10.3390/sym12040645>), three growth-rate curves representing the ontogenetic growth trajectories of animals and plants against time, and the axially symmetrical and integral forms of all these functions (Shi et al. (2017) <doi:10.1016/j.ecolmodel.2017.01.012>; Shi et al. (2021) <doi:10.3390/sym13081524>). The optimization method proposed by Nelder and Mead (1965) <doi:10.1093/comjnl/7.4.308> was used to estimate model parameters. biogeom includes several real data sets of the boundary coordinates of natural shapes, including avian eggs, fruit, lanceolate and ovate leaves, tree rings, seeds, and sea stars,and can be potentially applied to other natural shapes. biogeom can quantify the conspecific or interspecific similarity of natural outlines, and provides information with important ecological and evolutionary implications for the growth and form of living organisms. Please see Shi et al. (2022) <doi:10.1111/nyas.14862> for details.
This package provides functions to fit regression models for bounded continuous and discrete responses. In case of bounded continuous responses (e.g., proportions and rates), available models are the flexible beta (Migliorati, S., Di Brisco, A. M., Ongaro, A. (2018) <doi:10.1214/17-BA1079>), the variance-inflated beta (Di Brisco, A. M., Migliorati, S., Ongaro, A. (2020) <doi:10.1177/1471082X18821213>), the beta (Ferrari, S.L.P., Cribari-Neto, F. (2004) <doi:10.1080/0266476042000214501>), and their augmented versions to handle the presence of zero/one values (Di Brisco, A. M., Migliorati, S. (2020) <doi:10.1002/sim.8406>) are implemented. In case of bounded discrete responses (e.g., bounded counts, such as the number of successes in n trials), available models are the flexible beta-binomial (Ascari, R., Migliorati, S. (2021) <doi:10.1002/sim.9005>), the beta-binomial, and the binomial are implemented. Inference is dealt with a Bayesian approach based on the Hamiltonian Monte Carlo (HMC) algorithm (Gelman, A., Carlin, J. B., Stern, H. S., Rubin, D. B. (2014) <doi:10.1201/b16018>). Besides, functions to compute residuals, posterior predictives, goodness of fit measures, convergence diagnostics, and graphical representations are provided.
The main objective of ViSEAGO
package is to carry out a data mining of biological functions and establish links between genes involved in the study. We developed ViSEAGO
in R to facilitate functional Gene Ontology (GO) analysis of complex experimental design with multiple comparisons of interest. It allows to study large-scale datasets together and visualize GO profiles to capture biological knowledge. The acronym stands for three major concepts of the analysis: Visualization, Semantic similarity and Enrichment Analysis of Gene Ontology. It provides access to the last current GO annotations, which are retrieved from one of NCBI EntrezGene
, Ensembl or Uniprot databases for several species. Using available R packages and novel developments, ViSEAGO
extends classical functional GO analysis to focus on functional coherence by aggregating closely related biological themes while studying multiple datasets at once. It provides both a synthetic and detailed view using interactive functionalities respecting the GO graph structure and ensuring functional coherence supplied by semantic similarity. ViSEAGO
has been successfully applied on several datasets from different species with a variety of biological questions. Results can be easily shared between bioinformaticians and biologists, enhancing reporting capabilities while maintaining reproducibility.
Gaussian processes ('GPs') have been widely used to model spatial data, spatio'-temporal data, and computer experiments in diverse areas of statistics including spatial statistics, spatio'-temporal statistics, uncertainty quantification, and machine learning. This package creates basic tools for fitting and prediction based on GPs with spatial data, spatio'-temporal data, and computer experiments. Key characteristics for this GP tool include: (1) the comprehensive implementation of various covariance functions including the Matérn family and the Confluent Hypergeometric family with isotropic form, tensor form, and automatic relevance determination form, where the isotropic form is widely used in spatial statistics, the tensor form is widely used in design and analysis of computer experiments and uncertainty quantification, and the automatic relevance determination form is widely used in machine learning; (2) implementations via Markov chain Monte Carlo ('MCMC') algorithms and optimization algorithms for GP models with all the implemented covariance functions. The methods for fitting and prediction are mainly implemented in a Bayesian framework; (3) model evaluation via Fisher information and predictive metrics such as predictive scores; (4) built-in functionality for simulating GPs with all the implemented covariance functions; (5) unified implementation to allow easy specification of various GPs'.
This package provides a framework for the optimization of breeding programs via optimum contribution selection and mate allocation. An easy to use set of function for computation of optimum contributions of selection candidates, and of the population genetic parameters to be optimized. These parameters can be estimated using pedigree or genotype information, and include kinships, kinships at native haplotype segments, and breed composition of crossbred individuals. They are suitable for managing genetic diversity, removing introgressed genetic material, and accelerating genetic gain. Additionally, functions are provided for computing genetic contributions from ancestors, inbreeding coefficients, the native effective size, the native genome equivalent, pedigree completeness, and for preparing and plotting pedigrees. The methods are described in:\n Wellmann, R., and Pfeiffer, I. (2009) <doi:10.1017/S0016672309000202>.\n Wellmann, R., and Bennewitz, J. (2011) <doi:10.2527/jas.2010-3709>.\n Wellmann, R., Hartwig, S., Bennewitz, J. (2012) <doi:10.1186/1297-9686-44-34>.\n de Cara, M. A. R., Villanueva, B., Toro, M. A., Fernandez, J. (2013) <doi:10.1111/mec.12560>.\n Wellmann, R., Bennewitz, J., Meuwissen, T.H.E. (2014) <doi:10.1017/S0016672314000196>.\n Wellmann, R. (2019) <doi:10.1186/s12859-018-2450-5>.
Efficient estimation of the population-level causal effects of stochastic interventions on a continuous-valued exposure. Both one-step and targeted minimum loss estimators are implemented for the counterfactual mean value of an outcome of interest under an additive modified treatment policy, a stochastic intervention that may depend on the natural value of the exposure. To accommodate settings with outcome-dependent two-phase sampling, procedures incorporating inverse probability of censoring weighting are provided to facilitate the construction of inefficient and efficient one-step and targeted minimum loss estimators. The causal parameter and its estimation were first described by DÃ az and van der Laan (2013) <doi:10.1111/j.1541-0420.2011.01685.x>, while the multiply robust estimation procedure and its application to data from two-phase sampling designs is detailed in NS Hejazi, MJ van der Laan, HE Janes, PB Gilbert, and DC Benkeser (2020) <doi:10.1111/biom.13375>. The software package implementation is described in NS Hejazi and DC Benkeser (2020) <doi:10.21105/joss.02447>. Estimation of nuisance parameters may be enhanced through the Super Learner ensemble model in sl3', available for download from GitHub
using remotes::install_github("tlverse/sl3")'.
This package provides a Scannerless GLR parser/parser generator. Note that GLR standing for "generalized LR", where L stands for "left-to-right" and R stands for "rightmost (derivation)". For more information see <https://en.wikipedia.org/wiki/GLR_parser>. This parser is based on the Tomita (1987) algorithm. (Paper can be found at <https://aclanthology.org/P84-1073.pdf>). The original dparser package documentation can be found at <https://dparser.sourceforge.net/>. This allows you to add mini-languages to R (like rxode2's ODE mini-language Wang, Hallow, and James 2015 <DOI:10.1002/psp4.12052>) or to parse other languages like NONMEM to automatically translate them to R code. To use this in your code, add a LinkingTo
dparser in your DESCRIPTION file and instead of using #include <dparse.h> use #include <dparser.h>. This also provides a R-based port of the make_dparser <https://dparser.sourceforge.net/d/make_dparser.cat> command called mkdparser()
. Additionally you can parse an arbitrary grammar within R using the dparse()
function, which works on most OSes and is mainly for grammar testing. The fastest parsing, of course, occurs at the C level, and is suggested.
Package implements the EDNE-test for equivalence according to Hoffelder et al. (2015) <DOI:10.1080/10543406.2014.920344>. "EDNE" abbreviates "Euclidean Distance between the Non-standardized Expected values". The EDNE-test for equivalence is a multivariate two-sample equivalence test. Distance measure of the test is the Euclidean distance. The test is an asymptotically valid test for the family of distributions fulfilling the assumptions of the multivariate central limit theorem (see Hoffelder et al.,2015). The function EDNE.EQ()
implements the EDNE-test for equivalence according to Hoffelder et al. (2015). The function EDNE.EQ.dissolution.profiles()
implements a variant of the EDNE-test for equivalence analyses of dissolution profiles (see Suarez-Sharp et al.,2020 <DOI:10.1208/s12248-020-00458-9>). EDNE.EQ.dissolution.profiles()
checks whether the quadratic mean of the differences of the expected values of both dissolution profile populations is statistically significantly smaller than 10 [\% of label claim]. The current regulatory standard approach for equivalence analyses of dissolution profiles is the similarity factor f2. The statistical hypotheses underlying EDNE.EQ.dissolution.profiles()
coincide with the hypotheses for f2 (see Hoffelder et al.,2015, Suarez-Sharp et al., 2020).
The ConNEcT
approach investigates the pairwise association strength of binary time series by calculating contingency measures and depicts the results in a network. The package includes features to explore and visualize the data. To calculate the pairwise concurrent or temporal sequenced relationship between the variables, the package provides seven contingency measures (proportion of agreement, classical & corrected Jaccard, Cohen's kappa, phi correlation coefficient, odds ratio, and log odds ratio), however, others can easily be implemented. The package also includes non-parametric significance tests, that can be applied to test whether the contingency value quantifying the relationship between the variables is significantly higher than chance level. Most importantly this test accounts for auto-dependence and relative frequency.See Bodner et al.(2021) <doi: 10.1111/bmsp.12222>.Finally, a network can be drawn. Variables depicted the nodes of the network, with the node size adapted to the prevalence. The association strength between the variables defines the undirected (concurrent) or directed (temporal sequenced) links between the nodes. The results of the non-parametric significance test can be included by depicting either all links or only the significant ones. Tutorial see Bodner et al.(2021) <doi:10.3758/s13428-021-01760-w>.
Summarizing data frames by calculating various statistical measures, including measures of central tendency, dispersion, skewness()
, kurtosis()
, and normality tests. The package leverages the moments package for calculating statistical moments and related measures, the dplyr package for data manipulation, and the nortest package for normality testing. DataSum
includes functions such as getmode()
for finding the mode(s) of a data vector, shapiro_normality_test()
for performing the Shapiro-Wilk test (Shapiro & Wilk 1965 <doi:10.1093/biomet/52.3-4.591>) (or the Anderson-Darling test when the data length is outside the valid range for the Shapiro-Wilk test) (Stephens 1974 <doi:10.1080/01621459.1974.10480196>), Datum()
for generating a comprehensive summary of a data vector with various statistics (including data type, sample size, mean, mode, median, variance, standard deviation, maximum, minimum, range, skewness()
, kurtosis()
, and normality test result) (Joanes & Gill 1998 <doi:10.1111/1467-9884.00122>), and DataSumm()
for applying the Datum()
function to each column of a data frame. Emphasizing the importance of normality testing, the package provides robust tools to validate whether data follows a normal distribution, a fundamental assumption in many statistical analyses and models.
This package provides a mainly instrumental package meant to allow other packages whose core is written in C++ to read, write and manipulate matrices in a binary format so that the memory used for them is no more than strictly needed. Its functionality is already inside parallelpam and scellpam', so if you have installed any of these, you do not need to install jmatrix'. Using just the needed memory is not always true with R matrices or vectors, since by default they are of double type. Trials like the float package have been done, but to use them you have to coerce a matrix already loaded in R memory to a float matrix, and then you can delete it. The problem comes when your computer has not memory enough to hold the matrix in the first place, so you are forced to load it by chunks. This is the problem this package tries to address (with partial success, but this is a difficult problem since R is not a strictly typed language, which is anyway quite hard to get in an interpreted language). This package allows the creation and manipulation of full, sparse and symmetric matrices of any standard data type.
The restricted optimal design method is implemented to optimally allocate a set of items that require calibration to a group of examinees. The optimization process is based on the method described in detail by Ul Hassan and Miller in their works published in (2019) <doi:10.1177/0146621618824854> and (2021) <doi:10.1016/j.csda.2021.107177>. To use the method, preliminary item characteristics must be provided as input. These characteristics can either be expert guesses or based on previous calibration with a small number of examinees. The item characteristics should be described in the form of parameters for an Item Response Theory (IRT) model. These models can include the Rasch model, the 2-parameter logistic model, the 3-parameter logistic model, or a mixture of these models. The output consists of a set of rules for each item that determine which examinees should be assigned to each item. The efficiency or gain achieved through the optimal design is quantified by comparing it to a random allocation. This comparison allows for an assessment of how much improvement or advantage is gained by using the optimal design approach. This work was supported by the Swedish Research Council (Vetenskapsrådet) Grant 2019-02706.
This package provides methods to unify the different ways of creating predictive models and their different predictive formats for classification and regression. It includes methods such as K-Nearest Neighbors Schliep, K. P. (2004) <doi:10.5282/ubm/epub.1769>, Decision Trees Leo Breiman, Jerome H. Friedman, Richard A. Olshen, Charles J. Stone (2017) <doi:10.1201/9781315139470>, ADA Boosting Esteban Alfaro, Matias Gamez, Noelia Garcà a (2013) <doi:10.18637/jss.v054.i02>, Extreme Gradient Boosting Chen & Guestrin (2016) <doi:10.1145/2939672.2939785>, Random Forest Breiman (2001) <doi:10.1023/A:1010933404324>, Neural Networks Venables, W. N., & Ripley, B. D. (2002) <ISBN:0-387-95457-0>, Support Vector Machines Bennett, K. P. & Campbell, C. (2000) <doi:10.1145/380995.380999>, Bayesian Methods Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. (1995) <doi:10.1201/9780429258411>, Linear Discriminant Analysis Venables, W. N., & Ripley, B. D. (2002) <ISBN:0-387-95457-0>, Quadratic Discriminant Analysis Venables, W. N., & Ripley, B. D. (2002) <ISBN:0-387-95457-0>, Logistic Regression Dobson, A. J., & Barnett, A. G. (2018) <doi:10.1201/9781315182780> and Penalized Logistic Regression Friedman, J. H., Hastie, T., & Tibshirani, R. (2010) <doi:10.18637/jss.v033.i01>.
Adaptive Sparse Multi-block Partial Least Square, a supervised algorithm, is an extension of the Sparse Multi-block Partial Least Square, which allows different quantiles to be used in different blocks of different partial least square components to decide the proportion of features to be retained. The best combinations of quantiles can be chosen from a set of user-defined quantiles combinations by cross-validation. By doing this, it enables us to do the feature selection for different blocks, and the selected features can then be further used to predict the outcome. For example, in biomedical applications, clinical covariates plus different types of omics data such as microbiome, metabolome, mRNA
data, methylation data, copy number variation data might be predictive for patients outcome such as survival time or response to therapy. Different types of data could be put in different blocks and along with survival time to fit the model. The fitted model can then be used to predict the survival for the new samples with the corresponding clinical covariates and omics data. In addition, Adaptive Sparse Multi-block Partial Least Square Discriminant Analysis is also included, which extends Adaptive Sparse Multi-block Partial Least Square for classifying the categorical outcome.
This package provides a collection of tools to handle microsatellite data of any ploidy (and samples of mixed ploidy) where allele copy number is not known in partially heterozygous genotypes. It can import and export data in ABI GeneMapper
', Structure', ATetra', Tetrasat'/'Tetra', GenoDive
', SPAGeDi
', POPDIST', STRand', and binary presence/absence formats. It can calculate pairwise distances between individuals using a stepwise mutation model or infinite alleles model, with or without taking ploidies and allele frequencies into account. These distances can be used for the calculation of clonal diversity statistics or used for further analysis in R. Allelic diversity statistics and Polymorphic Information Content are also available. polysat can assist the user in estimating the ploidy of samples, and it can estimate allele frequencies in populations, calculate pairwise or global differentiation statistics based on those frequencies, and export allele frequencies to SPAGeDi
and adegenet'. Functions are also included for assigning alleles to isoloci in cases where one pair of microsatellite primers amplifies alleles from two or more independently segregating isoloci. polysat is described by Clark and Jasieniuk (2011) <doi:10.1111/j.1755-0998.2011.02985.x> and Clark and Schreier (2017) <doi:10.1111/1755-0998.12639>.
This package provides tools to help download, process and analyse the UK road collision data collected using the STATS19 form. The datasets are provided as CSV files with detailed road safety information about the circumstances of car crashes and other incidents on the roads resulting in casualties in Great Britain from 1979 to present. Tables are available on colissions with the circumstances (e.g. speed limit of road), information about vehicles involved (e.g. type of vehicle), and casualties (e.g. age). The statistics relate only to events on public roads that were reported to the police, and subsequently recorded, using the STATS19 collision reporting form. See the Department for Transport website <https://www.data.gov.uk/dataset/cb7ae6f0-4be6-4935-9277-47e5ce24a11f/road-accidents-safety-data> for more information on these datasets. The package is described in a paper in the Journal of Open Source Software (Lovelace et al. 2019) <doi:10.21105/joss.01181>. See Gilardi et al. (2022) <doi:10.1111/rssa.12823>, Vidal-Tortosa et al. (2021) <doi:10.1016/j.jth.2021.101291>, and Tait et al. (2023) <doi:10.1016/j.aap.2022.106895> for examples of how the data can be used for methodological and empirical road safety research.
Several functions are provided for dose-response (or concentration-response) characterization from omics data. DRomics is especially dedicated to omics data obtained using a typical dose-response design, favoring a great number of tested doses (or concentrations) rather than a great number of replicates (no need of replicates). DRomics provides functions 1) to check, normalize and or transform data, 2) to select monotonic or biphasic significantly responding items (e.g. probes, metabolites), 3) to choose the best-fit model among a predefined family of monotonic and biphasic models to describe each selected item, 4) to derive a benchmark dose or concentration and a typology of response from each fitted curve. In the available version data are supposed to be single-channel microarray data in log2, RNAseq data in raw counts, or already pretreated continuous omics data (such as metabolomic data) in log scale. In order to link responses across biological levels based on a common method, DRomics also handles apical data as long as they are continuous and follow a normal distribution for each dose or concentration, with a common standard error. For further details see Delignette-Muller et al (2023) <DOI:10.24072/pcjournal.325> and Larras et al (2018) <DOI:10.1021/acs.est.8b04752>.
It is used to construct run sequences with minimum changes for half replicate of two level factorial run order. Experimenter can save time and resources by minimizing the number of changes in levels of individual factor and therefore the total number of changes. It consists of the function minimal_hrtlf()
. This technique can be employed to any half replicate of two level factorial run order where the number of factors are greater than two. In Design of Experiments (DOE) theory, two level of a factor can be represented as integers e.g. - 1 for low and 1 for high. User is expected to enter total number of factors to be considered in the experiment. minimal_hrtlf()
provides the required run sequences for the input number of factors. The output also gives the number of changes of each factor along with total number of changes in the run sequence. Due to restricted randomization the minimally changed run sequences of half replicate of two level factorial run order will be affected by trend effect. The output also provides the Trend Factor value of the run order. Trend factor value will lies between 0 to 1. Higher the values, lesser the influence of trend effects on the run order.
This package provides a versatile package that provides implementation of various methods of Functional Data Analysis (FDA) and Empirical Dynamics. The core of this package is Functional Principal Component Analysis (FPCA), a key technique for functional data analysis, for sparsely or densely sampled random trajectories and time courses, via the Principal Analysis by Conditional Estimation (PACE) algorithm. This core algorithm yields covariance and mean functions, eigenfunctions and principal component (scores), for both functional data and derivatives, for both dense (functional) and sparse (longitudinal) sampling designs. For sparse designs, it provides fitted continuous trajectories with confidence bands, even for subjects with very few longitudinal observations. PACE is a viable and flexible alternative to random effects modeling of longitudinal data. There is also a Matlab version (PACE) that contains some methods not available on fdapace and vice versa. Updates to fdapace were supported by grants from NIH Echo and NSF DMS-1712864 and DMS-2014626. Please cite our package if you use it (You may run the command citation("fdapace") to get the citation format and bibtex entry). References: Wang, J.L., Chiou, J., Müller, H.G. (2016) <doi:10.1146/annurev-statistics-041715-033624>; Chen, K., Zhang, X., Petersen, A., Müller, H.G. (2017) <doi:10.1007/s12561-015-9137-5>.
Download geospatial data available from several federated data sources (mainly sources maintained by the US Federal government). Currently, the package enables extraction from nine datasets: The National Elevation Dataset digital elevation models (<https://www.usgs.gov/3d-elevation-program> 1 and 1/3 arc-second; USGS); The National Hydrography Dataset (<https://www.usgs.gov/national-hydrography/national-hydrography-dataset>; USGS); The Soil Survey Geographic (SSURGO) database from the National Cooperative Soil Survey (<https://websoilsurvey.sc.egov.usda.gov/>; NCSS), which is led by the Natural Resources Conservation Service (NRCS) under the USDA; the Global Historical Climatology Network (<https://www.ncei.noaa.gov/products/land-based-station/global-historical-climatology-network-daily>; GHCN), coordinated by National Climatic Data Center at NOAA; the Daymet gridded estimates of daily weather parameters for North America, version 4, available from the Oak Ridge National Laboratory's Distributed Active Archive Center (<https://daymet.ornl.gov/>; DAAC); the International Tree Ring Data Bank; the National Land Cover Database (<https://www.mrlc.gov/>; NLCD); the Cropland Data Layer from the National Agricultural Statistics Service (<https://www.nass.usda.gov/Research_and_Science/Cropland/SARS1a.php>; NASS); and the PAD-US dataset of protected area boundaries (<https://www.usgs.gov/programs/gap-analysis-project/science/pad-us-data-overview>; USGS).
According to a phenomenon known as "the wisdom of the crowds," combining point estimates from multiple judges often provides a more accurate aggregate estimate than using a point estimate from a single judge. However, if the judges use shared information in their estimates, the simple average will over-emphasize this common component at the expense of the judgesâ private information. Asa Palley & Ville Satopää (2021) "Boosting the Wisdom of Crowds Within a Single Judgment Problem: Selective Averaging Based on Peer Predictions" <https://papers.ssrn.com/sol3/Papers.cfm?abstract_id=3504286> proposes a procedure for calculating a weighted average of the judgesâ individual estimates such that resulting aggregate estimate appropriately combines the judges collective information within a single estimation problem. The authors use both simulation and data from six experimental studies to illustrate that the weighting procedure outperforms existing averaging-like methods, such as the equally weighted average, trimmed average, and median. This aggregate estimate -- know as "the knowledge-weighted estimate" -- inputs a) judges estimates of a continuous outcome (E) and b) predictions of others average estimate of this outcome (P). In this R-package, the function knowledge_weighted_estimate(E,P) implements the knowledge-weighted estimate. Its use is illustrated with a simple stylized example and on real-world experimental data.
Phylogenetic comparative methods represent models of continuous trait data associated with the tips of a phylogenetic tree. Examples of such models are Gaussian continuous time branching stochastic processes such as Brownian motion (BM) and Ornstein-Uhlenbeck (OU) processes, which regard the data at the tips of the tree as an observed (final) state of a Markov process starting from an initial state at the root and evolving along the branches of the tree. The PCMBase R package provides a general framework for manipulating such models. This framework consists of an application programming interface for specifying data and model parameters, and efficient algorithms for simulating trait evolution under a model and calculating the likelihood of model parameters for an assumed model and trait data. The package implements a growing collection of models, which currently includes BM, OU, BM/OU with jumps, two-speed OU as well as mixed Gaussian models, in which different types of the above models can be associated with different branches of the tree. The PCMBase package is limited to trait-simulation and likelihood calculation of (mixed) Gaussian phylogenetic models. The PCMFit package provides functionality for inference of these models to tree and trait data. The package web-site <https://venelin.github.io/PCMBase/> provides access to the documentation and other resources.