This package provides a mainly instrumental package meant to allow other packages whose core is written in C++ to read, write and manipulate matrices in a binary format so that the memory used for them is no more than strictly needed. Its functionality is already inside parallelpam and scellpam', so if you have installed any of these, you do not need to install jmatrix'. Using just the needed memory is not always true with R matrices or vectors, since by default they are of double type. Trials like the float package have been done, but to use them you have to coerce a matrix already loaded in R memory to a float matrix, and then you can delete it. The problem comes when your computer has not memory enough to hold the matrix in the first place, so you are forced to load it by chunks. This is the problem this package tries to address (with partial success, but this is a difficult problem since R is not a strictly typed language, which is anyway quite hard to get in an interpreted language). This package allows the creation and manipulation of full, sparse and symmetric matrices of any standard data type.
The restricted optimal design method is implemented to optimally allocate a set of items that require calibration to a group of examinees. The optimization process is based on the method described in detail by Ul Hassan and Miller in their works published in (2019) <doi:10.1177/0146621618824854> and (2021) <doi:10.1016/j.csda.2021.107177>. To use the method, preliminary item characteristics must be provided as input. These characteristics can either be expert guesses or based on previous calibration with a small number of examinees. The item characteristics should be described in the form of parameters for an Item Response Theory (IRT) model. These models can include the Rasch model, the 2-parameter logistic model, the 3-parameter logistic model, or a mixture of these models. The output consists of a set of rules for each item that determine which examinees should be assigned to each item. The efficiency or gain achieved through the optimal design is quantified by comparing it to a random allocation. This comparison allows for an assessment of how much improvement or advantage is gained by using the optimal design approach. This work was supported by the Swedish Research Council (Vetenskapsrådet) Grant 2019-02706.
This package provides methods to unify the different ways of creating predictive models and their different predictive formats for classification and regression. It includes methods such as K-Nearest Neighbors Schliep, K. P. (2004) <doi:10.5282/ubm/epub.1769>, Decision Trees Leo Breiman, Jerome H. Friedman, Richard A. Olshen, Charles J. Stone (2017) <doi:10.1201/9781315139470>, ADA Boosting Esteban Alfaro, Matias Gamez, Noelia Garcà a (2013) <doi:10.18637/jss.v054.i02>, Extreme Gradient Boosting Chen & Guestrin (2016) <doi:10.1145/2939672.2939785>, Random Forest Breiman (2001) <doi:10.1023/A:1010933404324>, Neural Networks Venables, W. N., & Ripley, B. D. (2002) <ISBN:0-387-95457-0>, Support Vector Machines Bennett, K. P. & Campbell, C. (2000) <doi:10.1145/380995.380999>, Bayesian Methods Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. (1995) <doi:10.1201/9780429258411>, Linear Discriminant Analysis Venables, W. N., & Ripley, B. D. (2002) <ISBN:0-387-95457-0>, Quadratic Discriminant Analysis Venables, W. N., & Ripley, B. D. (2002) <ISBN:0-387-95457-0>, Logistic Regression Dobson, A. J., & Barnett, A. G. (2018) <doi:10.1201/9781315182780> and Penalized Logistic Regression Friedman, J. H., Hastie, T., & Tibshirani, R. (2010) <doi:10.18637/jss.v033.i01>.
Adaptive Sparse Multi-block Partial Least Square, a supervised algorithm, is an extension of the Sparse Multi-block Partial Least Square, which allows different quantiles to be used in different blocks of different partial least square components to decide the proportion of features to be retained. The best combinations of quantiles can be chosen from a set of user-defined quantiles combinations by cross-validation. By doing this, it enables us to do the feature selection for different blocks, and the selected features can then be further used to predict the outcome. For example, in biomedical applications, clinical covariates plus different types of omics data such as microbiome, metabolome, mRNA
data, methylation data, copy number variation data might be predictive for patients outcome such as survival time or response to therapy. Different types of data could be put in different blocks and along with survival time to fit the model. The fitted model can then be used to predict the survival for the new samples with the corresponding clinical covariates and omics data. In addition, Adaptive Sparse Multi-block Partial Least Square Discriminant Analysis is also included, which extends Adaptive Sparse Multi-block Partial Least Square for classifying the categorical outcome.
This package provides a collection of tools to handle microsatellite data of any ploidy (and samples of mixed ploidy) where allele copy number is not known in partially heterozygous genotypes. It can import and export data in ABI GeneMapper
', Structure', ATetra', Tetrasat'/'Tetra', GenoDive
', SPAGeDi
', POPDIST', STRand', and binary presence/absence formats. It can calculate pairwise distances between individuals using a stepwise mutation model or infinite alleles model, with or without taking ploidies and allele frequencies into account. These distances can be used for the calculation of clonal diversity statistics or used for further analysis in R. Allelic diversity statistics and Polymorphic Information Content are also available. polysat can assist the user in estimating the ploidy of samples, and it can estimate allele frequencies in populations, calculate pairwise or global differentiation statistics based on those frequencies, and export allele frequencies to SPAGeDi
and adegenet'. Functions are also included for assigning alleles to isoloci in cases where one pair of microsatellite primers amplifies alleles from two or more independently segregating isoloci. polysat is described by Clark and Jasieniuk (2011) <doi:10.1111/j.1755-0998.2011.02985.x> and Clark and Schreier (2017) <doi:10.1111/1755-0998.12639>.
This package provides tools to help download, process and analyse the UK road collision data collected using the STATS19 form. The datasets are provided as CSV files with detailed road safety information about the circumstances of car crashes and other incidents on the roads resulting in casualties in Great Britain from 1979 to present. Tables are available on colissions with the circumstances (e.g. speed limit of road), information about vehicles involved (e.g. type of vehicle), and casualties (e.g. age). The statistics relate only to events on public roads that were reported to the police, and subsequently recorded, using the STATS19 collision reporting form. See the Department for Transport website <https://www.data.gov.uk/dataset/cb7ae6f0-4be6-4935-9277-47e5ce24a11f/road-accidents-safety-data> for more information on these datasets. The package is described in a paper in the Journal of Open Source Software (Lovelace et al. 2019) <doi:10.21105/joss.01181>. See Gilardi et al. (2022) <doi:10.1111/rssa.12823>, Vidal-Tortosa et al. (2021) <doi:10.1016/j.jth.2021.101291>, and Tait et al. (2023) <doi:10.1016/j.aap.2022.106895> for examples of how the data can be used for methodological and empirical road safety research.
Several functions are provided for dose-response (or concentration-response) characterization from omics data. DRomics is especially dedicated to omics data obtained using a typical dose-response design, favoring a great number of tested doses (or concentrations) rather than a great number of replicates (no need of replicates). DRomics provides functions 1) to check, normalize and or transform data, 2) to select monotonic or biphasic significantly responding items (e.g. probes, metabolites), 3) to choose the best-fit model among a predefined family of monotonic and biphasic models to describe each selected item, 4) to derive a benchmark dose or concentration and a typology of response from each fitted curve. In the available version data are supposed to be single-channel microarray data in log2, RNAseq data in raw counts, or already pretreated continuous omics data (such as metabolomic data) in log scale. In order to link responses across biological levels based on a common method, DRomics also handles apical data as long as they are continuous and follow a normal distribution for each dose or concentration, with a common standard error. For further details see Delignette-Muller et al (2023) <DOI:10.24072/pcjournal.325> and Larras et al (2018) <DOI:10.1021/acs.est.8b04752>.
It is used to construct run sequences with minimum changes for half replicate of two level factorial run order. Experimenter can save time and resources by minimizing the number of changes in levels of individual factor and therefore the total number of changes. It consists of the function minimal_hrtlf()
. This technique can be employed to any half replicate of two level factorial run order where the number of factors are greater than two. In Design of Experiments (DOE) theory, two level of a factor can be represented as integers e.g. - 1 for low and 1 for high. User is expected to enter total number of factors to be considered in the experiment. minimal_hrtlf()
provides the required run sequences for the input number of factors. The output also gives the number of changes of each factor along with total number of changes in the run sequence. Due to restricted randomization the minimally changed run sequences of half replicate of two level factorial run order will be affected by trend effect. The output also provides the Trend Factor value of the run order. Trend factor value will lies between 0 to 1. Higher the values, lesser the influence of trend effects on the run order.
Exploits dynamical seasonal forecasts in order to provide information relevant to stakeholders at the seasonal timescale. The package contains process-based methods for forecast calibration, bias correction, statistical and stochastic downscaling, optimal forecast combination and multivariate verification, as well as basic and advanced tools to obtain tailored products. This package was developed in the context of the ERA4CS project MEDSCOPE and the H2020 S2S4E project and includes contributions from ArticXchange
project founded by EU-PolarNet
2'. Pérez-Zanón et al. (2022) <doi:10.5194/gmd-15-6115-2022>'. Doblas-Reyes et al. (2005) <doi:10.1111/j.1600-0870.2005.00104.x>'. Mishra et al. (2018) <doi:10.1007/s00382-018-4404-z>'. Sanchez-Garcia et al. (2019) <doi:10.5194/asr-16-165-2019>'. Straus et al. (2007) <doi:10.1175/JCLI4070.1>'. Terzago et al. (2018) <doi:10.5194/nhess-18-2825-2018>'. Torralba et al. (2017) <doi:10.1175/JAMC-D-16-0204.1>'. D'Onofrio et al. (2014) <doi:10.1175/JHM-D-13-096.1>'. Verfaillie et al. (2017) <doi:10.5194/gmd-10-4257-2017>'. Van Schaeybroeck et al. (2019) <doi:10.1016/B978-0-12-812372-0.00010-8>'. Yiou et al. (2013) <doi:10.1007/s00382-012-1626-3>'.
This package provides a versatile package that provides implementation of various methods of Functional Data Analysis (FDA) and Empirical Dynamics. The core of this package is Functional Principal Component Analysis (FPCA), a key technique for functional data analysis, for sparsely or densely sampled random trajectories and time courses, via the Principal Analysis by Conditional Estimation (PACE) algorithm. This core algorithm yields covariance and mean functions, eigenfunctions and principal component (scores), for both functional data and derivatives, for both dense (functional) and sparse (longitudinal) sampling designs. For sparse designs, it provides fitted continuous trajectories with confidence bands, even for subjects with very few longitudinal observations. PACE is a viable and flexible alternative to random effects modeling of longitudinal data. There is also a Matlab version (PACE) that contains some methods not available on fdapace and vice versa. Updates to fdapace were supported by grants from NIH Echo and NSF DMS-1712864 and DMS-2014626. Please cite our package if you use it (You may run the command citation("fdapace") to get the citation format and bibtex entry). References: Wang, J.L., Chiou, J., Müller, H.G. (2016) <doi:10.1146/annurev-statistics-041715-033624>; Chen, K., Zhang, X., Petersen, A., Müller, H.G. (2017) <doi:10.1007/s12561-015-9137-5>.
Download geospatial data available from several federated data sources (mainly sources maintained by the US Federal government). Currently, the package enables extraction from nine datasets: The National Elevation Dataset digital elevation models (<https://www.usgs.gov/3d-elevation-program> 1 and 1/3 arc-second; USGS); The National Hydrography Dataset (<https://www.usgs.gov/national-hydrography/national-hydrography-dataset>; USGS); The Soil Survey Geographic (SSURGO) database from the National Cooperative Soil Survey (<https://websoilsurvey.sc.egov.usda.gov/>; NCSS), which is led by the Natural Resources Conservation Service (NRCS) under the USDA; the Global Historical Climatology Network (<https://www.ncei.noaa.gov/products/land-based-station/global-historical-climatology-network-daily>; GHCN), coordinated by National Climatic Data Center at NOAA; the Daymet gridded estimates of daily weather parameters for North America, version 4, available from the Oak Ridge National Laboratory's Distributed Active Archive Center (<https://daymet.ornl.gov/>; DAAC); the International Tree Ring Data Bank; the National Land Cover Database (<https://www.mrlc.gov/>; NLCD); the Cropland Data Layer from the National Agricultural Statistics Service (<https://www.nass.usda.gov/Research_and_Science/Cropland/SARS1a.php>; NASS); and the PAD-US dataset of protected area boundaries (<https://www.usgs.gov/programs/gap-analysis-project/science/pad-us-data-overview>; USGS).
According to a phenomenon known as "the wisdom of the crowds," combining point estimates from multiple judges often provides a more accurate aggregate estimate than using a point estimate from a single judge. However, if the judges use shared information in their estimates, the simple average will over-emphasize this common component at the expense of the judgesâ private information. Asa Palley & Ville Satopää (2021) "Boosting the Wisdom of Crowds Within a Single Judgment Problem: Selective Averaging Based on Peer Predictions" <https://papers.ssrn.com/sol3/Papers.cfm?abstract_id=3504286> proposes a procedure for calculating a weighted average of the judgesâ individual estimates such that resulting aggregate estimate appropriately combines the judges collective information within a single estimation problem. The authors use both simulation and data from six experimental studies to illustrate that the weighting procedure outperforms existing averaging-like methods, such as the equally weighted average, trimmed average, and median. This aggregate estimate -- know as "the knowledge-weighted estimate" -- inputs a) judges estimates of a continuous outcome (E) and b) predictions of others average estimate of this outcome (P). In this R-package, the function knowledge_weighted_estimate(E,P) implements the knowledge-weighted estimate. Its use is illustrated with a simple stylized example and on real-world experimental data.
Phylogenetic comparative methods represent models of continuous trait data associated with the tips of a phylogenetic tree. Examples of such models are Gaussian continuous time branching stochastic processes such as Brownian motion (BM) and Ornstein-Uhlenbeck (OU) processes, which regard the data at the tips of the tree as an observed (final) state of a Markov process starting from an initial state at the root and evolving along the branches of the tree. The PCMBase R package provides a general framework for manipulating such models. This framework consists of an application programming interface for specifying data and model parameters, and efficient algorithms for simulating trait evolution under a model and calculating the likelihood of model parameters for an assumed model and trait data. The package implements a growing collection of models, which currently includes BM, OU, BM/OU with jumps, two-speed OU as well as mixed Gaussian models, in which different types of the above models can be associated with different branches of the tree. The PCMBase package is limited to trait-simulation and likelihood calculation of (mixed) Gaussian phylogenetic models. The PCMFit package provides functionality for inference of these models to tree and trait data. The package web-site <https://venelin.github.io/PCMBase/> provides access to the documentation and other resources.
Generalized meta-analysis is a technique for estimating parameters associated with a multiple regression model through meta-analysis of studies which may have information only on partial sets of the regressors. It estimates the effects of each variable while fully adjusting for all other variables that are measured in at least one of the studies. Using algebraic relationships between regression parameters in different dimensions, a set of moment equations is specified for estimating the parameters of a maximal model through information available on sets of parameter estimates from a series of reduced models available from the different studies. The specification of the equations requires a reference dataset to estimate the joint distribution of the covariates. These equations are solved using the generalized method of moments approach, with the optimal weighting of the equations taking into account uncertainty associated with estimates of the parameters of the reduced models. The proposed framework is implemented using iterated reweighted least squares algorithm for fitting generalized linear regression models. For more details about the method, please see pre-print version of the manuscript on generalized meta-analysis by Prosenjit Kundu, Runlong Tang and Nilanjan Chatterjee (2018) <doi:10.1093/biomet/asz030>.The current version (0.2.0) is updated to address some of the stability issues in the previous version (0.1).
Estimation of the most-left informative set of gross returns (i.e., the informative set). The procedure to compute the informative set adjusts the method proposed by Mariani et al. (2022a) <doi:10.1007/s11205-020-02440-6> and Mariani et al. (2022b) <doi:10.1007/s10287-022-00422-2> to gross returns of financial assets. This is accomplished through an adaptive algorithm that identifies sub-groups of gross returns in each iteration by approximating their distribution with a sequence of two-component log-normal mixtures. These sub-groups emerge when a significant change in the distribution occurs below the median of the financial returns, with their boundary termed as the â change point" of the mixture. The process concludes when no further change points are detected. The outcome encompasses parameters of the leftmost mixture distributions and change points of the analyzed financial time series. The functionalities of the INFOSET package include: (i) modelling asset distribution detecting the parameters which describe left tail behaviour (infoset function), (ii) clustering, (iii) labeling of the financial series for predictive and classification purposes through a Left Risk measure based on the first change point (LR_cp function) (iv) portfolio construction (ptf_construction function). The package also provide a specific function to construct rolling windows of different length size and overlapping time.
Extracts features from biological sequences. It contains most features which are presented in related work and also includes features which have never been introduced before. It extracts numerous features from nucleotide and peptide sequences. Each feature converts the input sequences to discrete numbers in order to use them as predictors in machine learning models. There are many features and information which are hidden inside a sequence. Utilizing the package, users can convert biological sequences to discrete models based on chosen properties. References: iLearn
Z. Chen et al. (2019) <DOI:10.1093/bib/bbz041>. iFeature
Z. Chen et al. (2018) <DOI:10.1093/bioinformatics/bty140>. <https://CRAN.R-project.org/package=rDNAse>
. PseKRAAC
Y. Zuo et al. PseKRAAC
: a flexible web server for generating pseudo K-tuple reduced amino acids composition (2017) <DOI:10.1093/bioinformatics/btw564>. iDNA6mA-PseKNC
P. Feng et al. iDNA6mA-PseKNC
: Identifying DNA N6-methyladenosine sites by incorporating nucleotide physicochemical properties into PseKNC
(2019) <DOI:10.1016/j.ygeno.2018.01.005>. I. Dubchak et al. Prediction of protein folding class using global description of amino acid sequence (1995) <DOI:10.1073/pnas.92.19.8700>. W. Chen et al. Identification and analysis of the N6-methyladenosine in the Saccharomyces cerevisiae transcriptome (2015) <DOI:10.1038/srep13859>.
This package provides functions to access and download data from the Open Case Studies <https://www.opencasestudies.org/> repositories on GitHub
<https://github.com/opencasestudies>. Different functions enable users to grab the data they need at different sections in the case study, as well as download the whole case study repository. All the user needs to do is input the name of the case study being worked on. The package relies on the httr::GET()
function to access files through the GitHub
API. The functions usethis::use_zip()
and usethis::create_from_github()
are used to clone and/or download the case study repositories. To cite an individual case study, please see the respective README file at <https://github.com/opencasestudies/>. <https://github.com/opencasestudies/ocs-bp-rural-and-urban-obesity> <https://github.com/opencasestudies/ocs-bp-air-pollution> <https://github.com/opencasestudies/ocs-bp-vaping-case-study> <https://github.com/opencasestudies/ocs-bp-opioid-rural-urban> <https://github.com/opencasestudies/ocs-bp-RTC-wrangling> <https://github.com/opencasestudies/ocs-bp-RTC-analysis> <https://github.com/opencasestudies/ocs-bp-youth-disconnection> <https://github.com/opencasestudies/ocs-bp-youth-mental-health> <https://github.com/opencasestudies/ocs-bp-school-shootings-dashboard> <https://github.com/opencasestudies/ocs-bp-co2-emissions> <https://github.com/opencasestudies/ocs-bp-diet>.
Identification, model fitting and estimation for time series with periodic structure. Additionally, procedures for simulation of periodic processes and real data sets are included. Hurd, H. L., Miamee, A. G. (2007) <doi:10.1002/9780470182833> Box, G. E. P., Jenkins, G. M., Reinsel, G. (1994) <doi:10.1111/jtsa.12194> Brockwell, P. J., Davis, R. A. (1991, ISBN:978-1-4419-0319-8) Bretz, F., Hothorn, T., Westfall, P. (2010, ISBN: 9780429139543) Westfall, P. H., Young, S. S. (1993, ISBN:978-0-471-55761-6) Bloomfield, P., Hurd, H. L.,Lund, R. (1994) <doi:10.1111/j.1467-9892.1994.tb00181.x> Dehay, D., Hurd, H. L. (1994, ISBN:0-7803-1023-3) Vecchia, A. (1985) <doi:10.1080/00401706.1985.10488076> Vecchia, A. (1985) <doi:10.1111/j.1752-1688.1985.tb00167.x> Jones, R., Brelsford, W. (1967) <doi:10.1093/biomet/54.3-4.403> Makagon, A. (1999) <https://www.math.uni.wroc.pl/~pms/files/19.2/Article/19.2.5.pdf> Sakai, H. (1989) <doi:10.1111/j.1467-9892.1991.tb00069.x> Gladyshev, E. G. (1961) <https://www.mathnet.ru/php/archive.phtml?wshow=paper&jrnid=dan&paperid=24851> Ansley (1979) <doi:10.1093/biomet/66.1.59> Hurd, H. L., Gerr, N. L. (1991) <doi:10.1111/j.1467-9892.1991.tb00088.x>.
This package provides a set of functions to implement decision-making systems based on the W.A.S.P.A.S. method (Weighted Aggregated Sum Product Assessment), Chakraborty and Zavadskas (2012) <doi:10.5755/j01.eee.122.6.1810>. So this package offers functions that analyze and validate the raw data, which must be entered in a determined format; extract specific vectors and matrices from this raw database; normalize the input data; calculate rankings by intermediate methods; apply the lambda parameter for the main method; and a function that does everything at once. The package has an example database called choppers, with which the user can see how the input data should be organized so that everything works as recommended by the decision methods based on multiple criteria that this package solves. Basically, the data are composed of a set of alternatives, which will be ranked, a set of choice criteria, a matrix of values for each Alternative-Criterion relationship, a vector of weights associated with the criteria, since certain criteria are considered more important than others, as well as a vector that defines each criterion as cost or benefit, this determines the calculation formula, as there are those criteria that we want the highest possible value (e.g. durability) and others that we want the lowest possible value (e.g. price).
Analysis of Q methodology, used to identify distinct perspectives existing within a group. This methodology is used across social, health and environmental sciences to understand diversity of attitudes, discourses, or decision-making styles (for more information, see <https://qmethod.org/>). A single function runs the full analysis. Each step can be run separately using the corresponding functions: for automatic flagging of Q-sorts (manual flagging is optional), for statement scores, for distinguishing and consensus statements, and for general characteristics of the factors. The package allows to choose either principal components or centroid factor extraction, manual or automatic flagging, a number of mathematical methods for rotation (or none), and a number of correlation coefficients for the initial correlation matrix, among many other options. Additional functions are available to import and export data (from raw *.CSV, HTMLQ and FlashQ
*.CSV, PQMethod *.DAT and easy-htmlq *.JSON files), to print and plot, to import raw data from individual *.CSV files, and to make printable cards. The package also offers functions to print Q cards and to generate Q distributions for study administration. See further details in the package documentation, and in the web pages below, which include a cookbook, guidelines for more advanced analysis (how to perform manual flagging or change the sign of factors), data management, and a graphical user interface (GUI) for online and offline use.
It is assumed that psychological distances between the categories are equal for the measurement instruments consisted of polytomously scored items. According to Muraki, this assumption must be tested. In the examination process of this assumption, the fit indexes are obtained and evaluated. This package provides that this assumption is removed. By with this package, the converted scale values of all items in a measurement instrument can be calculated by estimating a category parameter set for each item. Thus, the calculations can be made without any need to usage of the common category parameter set. Through this package, the psychological distances of the items are scaled. The scaling of a category parameter set for each item cause differentiation of score of the categories will be got from items. Also, the total measurement instrument score of an individual can be calculated according to the scaling of item score categories by with this package.This package provides that the place of individuals related to the structure to be measured with a measurement instrument consisted of polytomously scored items can be reveal more accurately. In this way, it is thought that the results obtained about individuals can be made more sensitive, and the differences between individuals can be revealed more accurately. On the other hand, it can be argued that more accurate evidences can be obtained regarding the psychometric properties of the measurement instruments.
Allow to compute and visualise convective parameters commonly used in the operational prediction of severe convective storms. Core algorithm is based on a highly optimized C++ code linked into R via Rcpp'. Highly efficient engine allows to derive thermodynamic and kinematic parameters from large numerical datasets such as reanalyses or operational Numerical Weather Prediction models in a reasonable amount of time. Package has been developed since 2017 by research meteorologists specializing in severe thunderstorms. The most relevant methods used in the package based on the following publications Stipanuk (1973) <https://apps.dtic.mil/sti/pdfs/AD0769739.pdf>, McCann
et al. (1994) <doi:10.1175/1520-0434(1994)009%3C0532:WNIFFM%3E2.0.CO;2>, Bunkers et al. (2000) <doi:10.1175/1520-0434(2000)015%3C0061:PSMUAN%3E2.0.CO;2>, Corfidi et al. (2003) <doi:10.1175/1520-0434(2003)018%3C0997:CPAMPF%3E2.0.CO;2>, Showalter (1953) <doi:10.1175/1520-0477-34.6.250>, Coffer et al. (2019) <doi:10.1175/WAF-D-19-0115.1>, Gropp and Davenport (2019) <doi:10.1175/WAF-D-17-0150.1>, Czernecki et al. (2019) <doi:10.1016/j.atmosres.2019.05.010>, Taszarek et al. (2020) <doi:10.1175/JCLI-D-20-0346.1>, Sherburn and Parker (2014) <doi:10.1175/WAF-D-13-00041.1>, Romanic et al. (2022) <doi:10.1016/j.wace.2022.100474>.
This package provides a user-friendly tool to fit Bayesian regression models. It can fit 3 types of Bayesian models using individual-level, summary-level, and individual plus pedigree-level (single-step) data for both Genomic prediction/selection (GS) and Genome-Wide Association Study (GWAS), it was designed to estimate joint effects and genetic parameters for a complex trait, including: (1) fixed effects and coefficients of covariates, (2) environmental random effects, and its corresponding variance, (3) genetic variance, (4) residual variance, (5) heritability, (6) genomic estimated breeding values (GEBV) for both genotyped and non-genotyped individuals, (7) SNP effect size, (8) phenotype/genetic variance explained (PVE) for single or multiple SNPs, (9) posterior probability of association of the genomic window (WPPA), (10) posterior inclusive probability (PIP). The functions are not limited, we will keep on going in enriching it with more features. References: Meuwissen et al. (2001) <doi:10.1093/genetics/157.4.1819>; Gustavo et al. (2013) <doi:10.1534/genetics.112.143313>; Habier et al. (2011) <doi:10.1186/1471-2105-12-186>; Yi et al. (2008) <doi:10.1534/genetics.107.085589>; Zhou et al. (2013) <doi:10.1371/journal.pgen.1003264>; Moser et al. (2015) <doi:10.1371/journal.pgen.1004969>; Lloyd-Jones et al. (2019) <doi:10.1038/s41467-019-12653-0>; Henderson (1976) <doi:10.2307/2529339>; Fernando et al. (2014) <doi:10.1186/1297-9686-46-50>.
Framework to facilitate patient subtyping with similarity network fusion and meta clustering. The similarity network fusion (SNF) algorithm was introduced by Wang et al. (2014) in <doi:10.1038/nmeth.2810>. SNF is a data integration approach that can transform high-dimensional and diverse data types into a single similarity network suitable for clustering with minimal loss of information from each initial data source. The meta clustering approach was introduced by Caruana et al. (2006) in <doi:10.1109/ICDM.2006.103>. Meta clustering involves generating a wide range of cluster solutions by adjusting clustering hyperparameters, then clustering the solutions themselves into a manageable number of qualitatively similar solutions, and finally characterizing representative solutions to find ones that are best for the user's specific context. This package provides a framework to easily transform multi-modal data into a wide range of similarity network fusion-derived cluster solutions as well as to visualize, characterize, and validate those solutions. Core package functionality includes easy customization of distance metrics, clustering algorithms, and SNF hyperparameters to generate diverse clustering solutions; calculation and plotting of associations between features, between patients, and between cluster solutions; and standard cluster validation approaches including resampled measures of cluster stability, standard metrics of cluster quality, and label propagation to evaluate generalizability in unseen data. Associated vignettes guide the user through using the package to identify patient subtypes while adhering to best practices for unsupervised learning.