This package provides a flexible statistical framework for network-valued data analysis. It leverages the complexity of the space of distributions on graphs by using the permutation framework for inference as implemented in the flipr package. Currently, only the two-sample testing problem is covered and generalization to k samples and regression will be added in the future as well. It is a 4-step procedure where the user chooses a suitable representation of the networks, a suitable metric to embed the representation into a metric space, one or more test statistics to target specific aspects of the distributions to be compared and a formula to compute the permutation p-value. Two types of inference are provided: a global test answering whether there is a difference between the distributions that generated the two samples and a local test for localizing differences on the network structure. The latter is assumed to be shared by all networks of both samples. References: Lovato, I., Pini, A., Stamm, A., Vantini, S. (2020) "Model-free two-sample test for network-valued data" <doi:10.1016/j.csda.2019.106896>; Lovato, I., Pini, A., Stamm, A., Taquet, M., Vantini, S. (2021) "Multiscale null hypothesis testing for network-valued data: Analysis of brain networks of patients with autism" <doi:10.1111/rssc.12463>.
This package provides a pipeline for identifying differentially methylated CpG sites using Hidden Markov Model in bisulfite sequencing data. DNA methylation studies have enabled researchers to understand methylation patterns and their regulatory roles in biological processes and disease. However, only a limited number of statistical approaches have been developed to provide formal quantitative analysis. Specifically, a few available methods do identify differentially methylated CpG (DMC) sites or regions (DMR), but they suffer from limitations that arise mostly due to challenges inherent in bisulfite sequencing data. These challenges include: (1) that read-depths vary considerably among genomic positions and are often low; (2) both methylation and autocorrelation patterns change as regions change; and (3) CpG sites are distributed unevenly. Furthermore, there are several methodological limitations: almost none of these tools is capable of comparing multiple groups and/or working with missing values, and only a few allow continuous or multiple covariates. The last of these is of great interest among researchers, as the goal is often to find which regions of the genome are associated with several exposures and traits. To tackle these issues, we have developed an efficient DMC identification method based on Hidden Markov Models (HMMs) called “DMCHMM” which is a three-step approach (model selection, prediction, testing) aiming to address the aforementioned drawbacks.
This package is an implementation the Quantitative Set Analysis for Gene Expression (QuSAGE) method described in (Yaari G. et al, Nucl Acids Res, 2013). This is a novel Gene Set Enrichment-type test, which is designed to provide a faster, more accurate, and easier to understand test for gene expression studies. qusage accounts for inter-gene correlations using the Variance Inflation Factor technique proposed by Wu et al. (Nucleic Acids Res, 2012). In addition, rather than simply evaluating the deviation from a null hypothesis with a single number (a P value), qusage quantifies gene set activity with a complete probability density function (PDF). From this PDF, P values and confidence intervals can be easily extracted. Preserving the PDF also allows for post-hoc analysis (e.g., pair-wise comparisons of gene set activity) while maintaining statistical traceability. Finally, while qusage is compatible with individual gene statistics from existing methods (e.g., LIMMA), a Welch-based method is implemented that is shown to improve specificity. The QuSAGE package also includes a mixed effects model implementation, as described in (Turner JA et al, BMC Bioinformatics, 2015), and a meta-analysis framework as described in (Meng H, et al. PLoS Comput Biol. 2019). For questions, contact Chris Bolen (cbolen1@gmail.com) or Steven Kleinstein (steven.kleinstein@yale.edu).
Implementation of multisource exchangeability models for Bayesian analyses of prespecified subgroups arising in the context of basket trial design and monitoring. The R basket package facilitates implementation of the binary, symmetric multi-source exchangeability model (MEM) with posterior inference arising from both exact computation and Markov chain Monte Carlo sampling. Analysis output includes full posterior samples as well as posterior probabilities, highest posterior density (HPD) interval boundaries, effective sample sizes (ESS), mean and median estimations, posterior exchangeability probability matrices, and maximum a posteriori MEMs. In addition to providing "basketwise" analyses, the package includes similar calculations for "clusterwise" analyses for which subgroups are combined into meta-baskets, or clusters, using graphical clustering algorithms that treat the posterior exchangeability probabilities as edge weights. In addition plotting tools are provided to visualize basket and cluster densities as well as their exchangeability. References include Hyman, D.M., Puzanov, I., Subbiah, V., Faris, J.E., Chau, I., Blay, J.Y., Wolf, J., Raje, N.S., Diamond, E.L., Hollebecque, A. and Gervais, R (2015) <doi:10.1056/NEJMoa1502309>; Hobbs, B.P. and Landin, R. (2018) <doi:10.1002/sim.7893>; Hobbs, B.P., Kane, M.J., Hong, D.S. and Landin, R. (2018) <doi:10.1093/annonc/mdy457>; and Kaizer, A.M., Koopmeiners, J.S. and Hobbs, B.P. (2017) <doi:10.1093/biostatistics/kxx031>.
This package performs power and sample size calculation for non-proportional hazards model using the Fleming-Harrington family of weighted log-rank tests. The sequentially calculated log-rank test score statistics are assumed to have independent increments as characterized in Anastasios A. Tsiatis (1982) <doi:10.1080/01621459.1982.10477898>. The mean and variance of log-rank test score statistics are calculated based on Kaifeng Lu (2021) <doi:10.1002/pst.2069>. The boundary crossing probabilities are calculated using the recursive integration algorithm described in Christopher Jennison and Bruce W. Turnbull (2000, ISBN:0849303168). The package can also be used for continuous, binary, and count data. For continuous data, it can handle missing data through mixed-model for repeated measures (MMRM). In crossover designs, it can estimate direct treatment effects while accounting for carryover effects. For binary data, it can design Simon's 2-stage, modified toxicity probability-2 (mTPI-2), and Bayesian optimal interval (BOIN) trials. For count data, it can design group sequential trials for negative binomial endpoints with censoring. Additionally, it facilitates group sequential equivalence trials for all supported data types. Moreover, it can design adaptive group sequential trials for changes in sample size, error spending function, number and spacing or future looks. Finally, it offers various options for adjusted p-values, including graphical and gatekeeping procedures.
This package provides a minimal library specifically designed to make the estimation of Machine Learning (ML) techniques as easy and accessible as possible, particularly within the framework of the Knowledge Discovery in Databases (KDD) process in data mining. The package provides essential tools to structure and execute each stage of a predictive or classification modeling workflow, aligning closely with the fundamental steps of the KDD methodology, from data selection and preparation, through model building and tuning, to the interpretation and evaluation of results using Sensitivity Analysis. The MLwrap workflow is organized into four core steps; preprocessing(), build_model(), fine_tuning(), and sensitivity_analysis(). It also includes global and pairwise interaction analysis based on Friedmanâ s H-statistic to support a more detailed interpretation of complex feature relationships.These steps correspond, respectively, to data preparation and transformation, model construction, hyperparameter optimization, and sensitivity analysis. The user can access comprehensive model evaluation results including fit assessment metrics, plots, predictions, and performance diagnostics for ML models implemented through Neural Networks', Random Forest', XGBoost (Extreme Gradient Boosting), and Support Vector Machines (SVM) algorithms. By streamlining these phases, MLwrap aims to simplify the implementation of ML techniques, allowing analysts and data scientists to focus on extracting actionable insights and meaningful patterns from large datasets, in line with the objectives of the KDD process.
Performance of functional kriging, cokriging, optimal sampling and simulation for spatial prediction of functional data. The framework of spatial prediction, optimal sampling and simulation are extended from scalar to functional data. SpatFD is based on the Karhunen-Loève expansion that allows to represent the observed functions in terms of its empirical functional principal components. Based on this approach, the functional auto-covariances and cross-covariances required for spatial functional predictions and optimal sampling, are completely determined by the sum of the spatial auto-covariances and cross-covariances of the respective score components. The package provides new classes of data and functions for modeling spatial dependence structure among curves. The spatial prediction of curves at unsampled locations can be carried out using two types of predictors, and both of them report, the respective variances of the prediction error. In addition, there is a function for the determination of spatial locations sampling configuration that ensures minimum variance of spatial functional prediction. There are also two functions for plotting predicted curves at each location and mapping the surface at each time point, respectively. References Bohorquez, M., Giraldo, R., and Mateu, J. (2016) <doi:10.1007/s10260-015-0340-9>, Bohorquez, M., Giraldo, R., and Mateu, J. (2016) <doi:10.1007/s00477-016-1266-y>, Bohorquez M., Giraldo R. and Mateu J. (2021) <doi:10.1002/9781119387916>.
This package provides a suite of mixed-integer linear programming (MILP) model builders and solversâ including Gurobi', HiGHS', Symphony', GNU Linear Programming Kit (GLPK)', and lpSolve'â for automated test assembly (ATA) in multistage testing (MST). Offers filtering of decision variables through itemâ module eligibility and the application of explicit bounds to simplify the MILP model and accelerate the optimization process. Supports bottom up, top down, and hybrid assembly strategies; enemy-item and enemy-stimulus exclusions; stimulus all in/all out or partial selection; anchor item/stimulus specification; and item exposure control. Accommodates both single-objective and multi-objective optimization ('weighted sum', maximin', capped maximin', minimax', and goal programming'). Enables simultaneous assembly of multiple panels with item and stimulus content balancing and exposure control. Provides analytical evaluation of assembled MST performance within seconds. Includes tools for diagnosing infeasible optimization models by systematically identifying sources of infeasibility and reformulating models with slack variables to restore feasibility.Methods implemented in this package build on established work in optimal test assembly (van der Linden, 2005 <doi:10.1007/0-387-29054-0>), item-set constrained test assembly (van der Linden, 2000 <doi:10.1177/01466210022031697>), hybrid assembly (Xiong, 2018 <doi:10.1177/0146621618762739>), recursion-based analytic methods (Lim et al., 2021 <doi:10.1111/jedm.12276>), and classification evaluation (Rudner, 2000 <doi:10.7275/an9m-2035>; Rudner, 2005 <doi:10.7275/56a5-6b14>).
This package provides Bayesian estimation and forecasting of dynamic panel data using Bayesian Panel Vector Autoregressions with hierarchical prior distributions. The models include country-specific VARs that share a global prior distribution that extend the model by JarociŠski (2010) <doi:10.1002/jae.1082>. Under this prior expected value, each country's system follows a global VAR with country-invariant parameters. Further flexibility is provided by the hierarchical prior structure that retains the Minnesota prior interpretation for the global VAR and features estimated prior covariance matrices, shrinkage, and persistence levels. Bayesian forecasting is developed for models including exogenous variables, allowing conditional forecasts given the future trajectories of some variables and restricted forecasts assuring that rates are forecasted to stay positive and less than 100. The package implements the model specification, estimation, and forecasting routines, facilitating coherent workflows and reproducibility. It also includes automated pseudo-out-of-sample forecasting and computation of forecasting performance measures. Beautiful plots, informative summary functions, and extensive documentation complement all this. An extraordinary computational speed is achieved thanks to employing frontier econometric and numerical techniques and algorithms written in C++'. The bpvars package is aligned regarding objects, workflows, and code structure with the R packages bsvars by Woźniak (2024) <doi:10.32614/CRAN.package.bsvars> and bsvarSIGNs by Wang & Woźniak (2025) <doi:10.32614/CRAN.package.bsvarSIGNs>, and they constitute an integrated toolset. Copyright: 2025 International Labour Organization.
Encode/Decode base64', with support for JSON format, using two functions: j_encode() and j_decode(). Base64 is a group of similar binary-to-text encoding schemes that represent binary data in an ASCII string format by translating it into a radix-64 representation, used when there is a need to encode binary data that needs to be stored and transferred over media that are designed to deal with textual data, ensuring that the data will remain intact and without modification during transport. <https://developer.mozilla.org/en-US/docs/Web/API/WindowBase64/Base64_encoding_and_decoding> On the other side, JSON (JavaScript Object Notation) is a lightweight data-interchange format. Easy to read, write, parse and generate. It is based on a subset of the JavaScript Programming Language. JSON is a text format that is completely language independent but uses conventions that are familiar to programmers of the C-family of languages, including C, C++, C#, Java, JavaScript, Perl, Python, and many others. JSON structure is built around name:value pairs and ordered list of values. <https://www.json.org> The first function, j_encode(), let you transform a data.frame or list to a base64 encoded JSON (or JSON string). The j_decode() function takes a base64 string (could be an encoded JSON) and transform it to a data.frame (or list, depending of the JSON structure).
Miscellaneous printing of numeric or statistical results in R Markdown or Quarto documents according to guidelines of the "Publication Manual" of the American Psychological Association (2020, ISBN: 978-1-4338-3215-4). These guidelines are usually referred to as APA style (<https://apastyle.apa.org/>) and include specific rules on the formatting of numbers and statistical test results. APA style has to be implemented when submitting scientific reports in a wide range of research fields, especially in the social sciences. The default output of numbers in the R console or R Markdown and Quarto documents does not meet the APA style requirements, and reformatting results manually can be cumbersome and error-prone. This package covers the automatic conversion of R objects to textual representations that meet the APA style requirements, which can be included in R Markdown or Quarto documents. It covers some basic statistical tests (t-test, ANOVA, correlation, chi-squared test, Wilcoxon test) as well as some basic number printing manipulations (formatting p-values, removing leading zeros for numbers that cannot be greater than one, and others). Other packages exist for formatting numbers and tests according to the APA style guidelines, such as papaja (<https://cran.r-project.org/package=papaja>) and apa (<https://cran.r-project.org/package=apa>), but they do not offer all convenience functionality included in prmisc'. The vignette has an overview of most of the functions included in the package.
Computes a confidence interval for a specified linear combination of the regression parameters in a linear regression model with iid normal errors with known variance when there is uncertain prior information that a distinct specified linear combination of the regression parameters takes a given value. This confidence interval, found by numerical nonlinear constrained optimization, has the required minimum coverage and utilizes this uncertain prior information through desirable expected length properties. This confidence interval has the following three practical applications. Firstly, if the error variance has been accurately estimated from previous data then it may be treated as being effectively known. Secondly, for sufficiently large (dimension of the response vector) minus (dimension of regression parameter vector), greater than or equal to 30 (say), if we replace the assumed known value of the error variance by its usual estimator in the formula for the confidence interval then the resulting interval has, to a very good approximation, the same coverage probability and expected length properties as when the error variance is known. Thirdly, some more complicated models can be approximated by the linear regression model with error variance known when certain unknown parameters are replaced by estimates. This confidence interval is described in Mainzer, R. and Kabaila, P. (2019) <doi:10.32614/RJ-2019-026>, and is a member of the family of confidence intervals proposed by Kabaila, P. and Giri, K. (2009) <doi:10.1016/j.jspi.2009.03.018>.
This database contains necessary data relevant to medical costs on obesity throughout the United States. This database, in form of an R package, could output necessary data frames relevant to obesity costs, where the clients could easily manipulate the output using difference parameters, e.g. relative risks for each illnesses. This package contributes to parts of our published journal named "Modeling the Economic Cost of Obesity Risk and Its Relation to the Health Insurance Premium in the United States: A State Level Analysis". Please use the following citation for the journal: Woods Thomas, Tatjana Miljkovic (2022) "Modeling the Economic Cost of Obesity Risk and Its Relation to the Health Insurance Premium in the United States: A State Level Analysis" <doi:10.3390/risks10100197>. The database is composed of the following main tables: 1. Relative_Risks: (constant) Relative risks for a given disease group with a risk factor of obesity; 2. Disease_Cost: (obesity_cost_disease) Supplementary output with all variables related to individual disease groups in a given state and year; 3. Full_Cost: (obesity_cost_full) Complete output with all variables used to make cost calculations, as well as cost calculations in a given state and year; 4. National_Summary: (obesity_cost_national_summary) National summary cost calculations in a given year. Three functions are included to assist users in calling and adjusting the mentioned tables and they are data_load(), data_produce(), and rel_risk_fun().
This package provides functions to generate and analyze spatially-explicit individual-based multistate movements in rivers, heterogeneous and homogeneous spaces. This is done by incorporating landscape bias on local behaviour, based on resistance rasters. Although originally conceived and designed to simulate trajectories of species constrained to linear habitats/dendritic ecological networks (e.g. river networks), the simulation algorithm is built to be highly flexible and can be applied to any (aquatic, semi-aquatic or terrestrial) organism, independently on the landscape in which it moves. Thus, the user will be able to use the package to simulate movements either in homogeneous landscapes, heterogeneous landscapes (e.g. semi-aquatic animal moving mainly along rivers but also using the matrix), or even in highly contrasted landscapes (e.g. fish in a river network). The algorithm and its input parameters are the same for all cases, so that results are comparable. Simulated trajectories can then be used as mechanistic null models (Potts & Lewis 2014, <DOI:10.1098/rspb.2014.0231>) to test a variety of Movement Ecology hypotheses (Nathan et al. 2008, <DOI:10.1073/pnas.0800375105>), including landscape effects (e.g. resources, infrastructures) on animal movement and species site fidelity, or for predictive purposes (e.g. road mortality risk, dispersal/connectivity). The package should be relevant to explore a broad spectrum of ecological phenomena, such as those at the interface of animal behaviour, management, landscape and movement ecology, disease and invasive species spread, and population dynamics.
Collective matrix factorization (a.k.a. multi-view or multi-way factorization, Singh, Gordon, (2008) <doi:10.1145/1401890.1401969>) tries to approximate a (potentially very sparse or having many missing values) matrix X as the product of two low-dimensional matrices, optionally aided with secondary information matrices about rows and/or columns of X', which are also factorized using the same latent components. The intended usage is for recommender systems, dimensionality reduction, and missing value imputation. Implements extensions of the original model (Cortes, (2018) <arXiv:1809.00366>) and can produce different factorizations such as the weighted implicit-feedback model (Hu, Koren, Volinsky, (2008) <doi:10.1109/ICDM.2008.22>), the weighted-lambda-regularization model, (Zhou, Wilkinson, Schreiber, Pan, (2008) <doi:10.1007/978-3-540-68880-8_32>), or the enhanced model with implicit features (Rendle, Zhang, Koren, (2019) <arXiv:1905.01395>), with or without side information. Can use gradient-based procedures or alternating-least squares procedures (Koren, Bell, Volinsky, (2009) <doi:10.1109/MC.2009.263>), with either a Cholesky solver, a faster conjugate gradient solver (Takacs, Pilaszy, Tikk, (2011) <doi:10.1145/2043932.2043987>), or a non-negative coordinate descent solver (Franc, Hlavac, Navara, (2005) <doi:10.1007/11556121_50>), providing efficient methods for sparse and dense data, and mixtures thereof. Supports L1 and L2 regularization in the main models, offers alternative most-popular and content-based models, and implements functionality for cold-start recommendations and imputation of 2D data.
Linear dynamic panel data modeling based on linear and nonlinear moment conditions as proposed by Holtz-Eakin, Newey, and Rosen (1988) <doi:10.2307/1913103>, Ahn and Schmidt (1995) <doi:10.1016/0304-4076(94)01641-C>, and Arellano and Bover (1995) <doi:10.1016/0304-4076(94)01642-D>. Estimation of the model parameters relies on the Generalized Method of Moments (GMM) and instrumental variables (IV) estimation, numerical optimization (when nonlinear moment conditions are employed) and the computation of closed form solutions (when estimation is based on linear moment conditions). One-step, two-step and iterated estimation is available. For inference and specification testing, Windmeijer (2005) <doi:10.1016/j.jeconom.2004.02.005> and doubly corrected standard errors (Hwang, Kang, Lee, 2021 <doi:10.1016/j.jeconom.2020.09.010>) are available. Additionally, serial correlation tests, tests for overidentification, and Wald tests are provided. Functions for visualizing panel data structures and modeling results obtained from GMM estimation are also available. The plot methods include functions to plot unbalanced panel structure, coefficient ranges and coefficient paths across GMM iterations (the latter is implemented according to the plot shown in Hansen and Lee, 2021 <doi:10.3982/ECTA16274>). For a more detailed description of the GMM-based functionality, please see Fritsch, Pua, Schnurbus (2021) <doi:10.32614/RJ-2021-035>. For more details on the IV-based estimation routines, see Fritsch, Pua, and Schnurbus (WP, 2024) and Han and Phillips (2010) <doi:10.1017/S026646660909063X>.
The efficient treatment and convenient analysis of experimental high-throughput (omics) data gets facilitated through this collection of diverse functions. Several functions address advanced object-conversions, like manipulating lists of lists or lists of arrays, reorganizing lists to arrays or into separate vectors, merging of multiple entries, etc. Another set of functions provides speed-optimized calculation of standard deviation (sd), coefficient of variance (CV) or standard error of the mean (SEM) for data in matrixes or means per line with respect to additional grouping (eg n groups of replicates). A group of functions facilitate dealing with non-redundant information, by indexing unique, adding counters to redundant or eliminating lines with respect redundancy in a given reference-column, etc. Help is provided to identify very closely matching numeric values to generate (partial) distance matrixes for very big data in a memory efficient manner or to reduce the complexity of large data-sets by combining very close values. Other functions help aligning a matrix or data.frame to a reference using partial matching or to mine an experimental setup to extract patterns of replicate samples. Many times large experimental datasets need some additional filtering, adequate functions are provided. Convenient data normalization is supported in various different modes, parameter estimation via permutations or boot-strap as well as flexible testing of multiple pair-wise combinations using the framework of limma is provided, too. Batch reading (or writing) of sets of files and combining data to arrays is supported, too.
Tetra-allele cross often referred as four-way cross or double cross or four-line cross are those type of mating designs in which every cross is obtained by mating amongst four inbred lines. A tetra-allele cross can be obtained by crossing the resultant of two unrelated diallel crosses. A common triallel cross involving four inbred lines A, B, C and D can be symbolically represented as (A X B) X (C X D) or (A, B, C, D) or (A B C D) etc. Tetra-allele cross can be broadly categorized as Complete Tetra-allele Cross (CTaC) and Partial Tetra-allele Crosses (PTaC). Rawlings and Cockerham (1962)<doi:10.2307/2527461> firstly introduced and gave the method of analysis for tetra-allele cross hybrids using the analysis method of single cross hybrids under the assumption of no linkage. The set of all possible four-way mating between several genotypes (individuals, clones, homozygous lines, etc.) leads to a CTaC. If there are N number of inbred lines involved in a CTaC, the the total number of crosses, T = N*(N-1)*(N-2)*(N-3)/8. When more number of lines are to be considered, the total number of crosses in CTaC also increases. Thus, it is almost impossible for the investigator to carry out the experimentation with limited available resource material. This situation lies in taking a fraction of CTaC with certain underlying properties, known as PTaC.
The actfts package provides tools for performing autocorrelation analysis of time series data. It includes functions to compute and visualize the autocorrelation function (ACF) and the partial autocorrelation function (PACF). Additionally, it performs the Dickey-Fuller, KPSS, and Phillips-Perron unit root tests to assess the stationarity of time series. Theoretical foundations are based on Box and Cox (1964) <doi:10.1111/j.2517-6161.1964.tb00553.x>, Box and Jenkins (1976) <isbn:978-0-8162-1234-2>, and Box and Pierce (1970) <doi:10.1080/01621459.1970.10481180>. Statistical methods are also drawn from Kolmogorov (1933) <doi:10.1007/BF00993594>, Kwiatkowski et al. (1992) <doi:10.1016/0304-4076(92)90104-Y>, and Ljung and Box (1978) <doi:10.1093/biomet/65.2.297>. The package integrates functions from forecast (Hyndman & Khandakar, 2008) <https://CRAN.R-project.org/package=forecast>, tseries (Trapletti & Hornik, 2020) <https://CRAN.R-project.org/package=tseries>, xts (Ryan & Ulrich, 2020) <https://CRAN.R-project.org/package=xts>, and stats (R Core Team, 2023) <https://stat.ethz.ch/R-manual/R-devel/library/stats/html/00Index.html>. Additionally, it provides visualization tools via plotly (Sievert, 2020) <https://CRAN.R-project.org/package=plotly> and reactable (Glaz, 2023) <https://CRAN.R-project.org/package=reactable>. The package also incorporates macroeconomic datasets from the U.S. Bureau of Economic Analysis: Disposable Personal Income (DPI) <https://fred.stlouisfed.org/series/DPI>, Gross Domestic Product (GDP) <https://fred.stlouisfed.org/series/GDP>, and Personal Consumption Expenditures (PCEC) <https://fred.stlouisfed.org/series/PCEC>.
Knowledge graphs enable to efficiently visualize and gain insights into large-scale data analysis results, as p-values from multiple studies or embedding data matrices. The usual workflow is a user providing a data frame of association studies results and specifying target nodes, e.g. phenotypes, to visualize. The knowledge graph then shows all the features which are significantly associated with the phenotype, with the edges being proportional to the association scores. As the user adds several target nodes and grouping information about the nodes such as biological pathways, the construction of such graphs soon becomes complex. The kgraph package aims to enable users to easily build such knowledge graphs, and provides two main features: first, to enable building a knowledge graph based on a data frame of concepts relationships, be it p-values or cosine similarities; second, to enable determining an appropriate cut-off on cosine similarities from a complete embedding matrix, to enable the building of a knowledge graph directly from an embedding matrix. The kgraph package provides several display, layout and cut-off options, and has already proven useful to researchers to enable them to visualize large sets of p-value associations with various phenotypes, and to quickly be able to visualize embedding results. Two example datasets are provided to demonstrate these behaviors, and several live shiny applications are hosted by the CELEHS laboratory and Parse Health, as the KESER Mental Health application <https://keser-mental-health.parse-health.org/> based on Hong C. (2021) <doi:10.1038/s41746-021-00519-z>.
Implementation of default Bayes factors for testing statistical hypotheses under various statistical models. The package is intended for applied quantitative researchers in the social and behavioral sciences, medical research, and related fields. The Bayes factor tests can be executed for statistical models such as univariate and multivariate normal linear models, correlation analysis, generalized linear models, special cases of linear mixed models, survival models, relational event models. Parameters that can be tested are location parameters (e.g., group means, regression coefficients), variances (e.g., group variances), and measures of association (e.g,. polychoric/polyserial/biserial/tetrachoric/product moments correlations), among others. The statistical underpinnings are described in O'Hagan (1995) <DOI:10.1111/j.2517-6161.1995.tb02017.x>, De Santis and Spezzaferri (2001) <DOI:10.1016/S0378-3758(00)00240-8>, Mulder and Xin (2022) <DOI:10.1080/00273171.2021.1904809>, Mulder and Gelissen (2019) <DOI:10.1080/02664763.2021.1992360>, Mulder (2016) <DOI:10.1016/j.jmp.2014.09.004>, Mulder and Fox (2019) <DOI:10.1214/18-BA1115>, Mulder and Fox (2013) <DOI:10.1007/s11222-011-9295-3>, Boeing-Messing, van Assen, Hofman, Hoijtink, and Mulder (2017) <DOI:10.1037/met0000116>, Hoijtink, Mulder, van Lissa, and Gu (2018) <DOI:10.1037/met0000201>, Gu, Mulder, and Hoijtink (2018) <DOI:10.1111/bmsp.12110>, Hoijtink, Gu, and Mulder (2018) <DOI:10.1111/bmsp.12145>, and Hoijtink, Gu, Mulder, and Rosseel (2018) <DOI:10.1037/met0000187>. When using the packages, please refer to the package Mulder et al. (2021) <DOI:10.18637/jss.v100.i18> and the relevant methodological papers.
Cooperative game theory models decision-making situations in which a group of agents, called players, may achieve certain benefits by cooperating to reach an optimal outcome. It has great potential in different fields, since it offers a scenario to analyze and solve problems in which cooperation is essential to achieve a common goal. The TUGLab (Transferable Utility Games Laboratory) R package contains a set of scripts that could serve as a helpful complement to the books and other materials used in courses on cooperative game theory, and also as a practical tool for researchers working in this field. The TUGLab project was born in 2006 trying to highlight the geometrical aspects of the theory of cooperative games for 3 and 4 players. TUGlabWeb is an online platform on which the basic functions of TUGLab are implemented, and it is being used all over the world as a resource in degree, master's and doctoral programs. This package is an extension of the first versions and enables users to work with games in general (computational restrictions aside). The user can check properties of games, compute well-known games and calculate several set-valued and single-valued solutions such as the core, the Shapley value, the nucleolus or the core-center. The package also illustrates how the Shapley value flexibly adapts to various cooperative game settings, including weighted players and coalitions, a priori unions, and restricted communication structures. In keeping with the original philosophy of the first versions, special emphasis is placed on the graphical representation of the solution concepts for 3 and 4 players.
Gene lists derived from the results of genomic analyses are rich in biological information. For instance, differentially expressed genes (DEGs) from a microarray or RNA-Seq analysis are related functionally in terms of their response to a treatment or condition. Gene lists can vary in size, up to several thousand genes, depending on the robustness of the perturbations or how widely different the conditions are biologically. Having a way to associate biological relatedness between hundreds and thousands of genes systematically is impractical by manually curating the annotation and function of each gene. Over-representation analysis (ORA) of genes was developed to identify biological themes. Given a Gene Ontology (GO) and an annotation of genes that indicate the categories each one fits into, significance of the over-representation of the genes within the ontological categories is determined by a Fisher's exact test or modeling according to a hypergeometric distribution. Comparing a small number of enriched biological categories for a few samples is manageable using Venn diagrams or other means for assessing overlaps. However, with hundreds of enriched categories and many samples, the comparisons are laborious. Furthermore, if there are enriched categories that are shared between samples, trying to represent a common theme across them is highly subjective. goSTAG uses GO subtrees to tag and annotate genes within a set. goSTAG visualizes the similarities between the over-representation of DEGs by clustering the p-values from the enrichment statistical tests and labels clusters with the GO term that has the most paths to the root within the subtree generated from all the GO terms in the cluster.
The Super Imposition by Translation and Rotation (SITAR) model is a shape-invariant nonlinear mixed effect model that fits a natural cubic spline mean curve to the growth data and aligns individual-specific growth curves to the underlying mean curve via a set of random effects (see Cole, 2010 <doi:10.1093/ije/dyq115> for details). The non-Bayesian version of the SITAR model can be fit by using the already available R package sitar'. Unlike the sitar package which allows modelling of a single outcome only, the bsitar package offers great flexibility in fitting models of varying complexities, including joint modelling of multiple outcomes such as height and weight (multivariate model). Additionally, the bsitar package allows for the simultaneous analysis of an outcome separately for subgroups defined by a factor variable such as gender. This is achieved by fitting separate models for each subgroup (for example males and females for gender variable). An advantage of this approach is that posterior draws for each subgroup are part of a single model object, making it possible to compare coefficients across subgroups and test hypotheses. Since the bsitar package is a front-end to the R package brms', it offers excellent support for post-processing of posterior draws via various functions that are directly available from the brms package. In addition, the bsitar package includes various customized functions that allow for the visualization of distance (increase in size with age) and velocity (change in growth rate as a function of age), as well as the estimation of growth spurt parameters such as age at peak growth velocity and peak growth velocity.