This package provides estimation methods for markets in equilibrium and disequilibrium. Supports the estimation of an equilibrium and four disequilibrium models with both correlated and independent shocks. Also provides post-estimation analysis tools, such as aggregation, marginal effect, and shortage calculations. See Karapanagiotis (2024) <doi:10.18637/jss.v108.i02> for an overview of the functionality and examples. The estimation methods are based on full information maximum likelihood techniques given in Maddala and Nelson (1974) <doi:10.2307/1914215>. They are implemented using the analytic derivative expressions calculated in Karapanagiotis (2020) <doi:10.2139/ssrn.3525622>. Standard errors can be estimated by adjusting for heteroscedasticity or clustering. The equilibrium estimation constitutes a case of a system of linear, simultaneous equations. Instead, the disequilibrium models replace the market-clearing condition with a non-linear, short-side rule and allow for different specifications of price dynamics.
The goal of the SwimmeR package is to provide means of acquiring, and then analyzing, data from swimming (and diving) competitions. To that end SwimmeR allows results to be read in from .html sources, like Hy-Tek real time results pages, .pdf files, ISL results, Omega results, and (on a development basis) .hy3 files. Once read in, SwimmeR can convert swimming times (performances) between the computationally useful format of seconds reported to the 100ths place (e.g. 95.37), and the conventional reporting format (1:35.37) used in the swimming community. SwimmeR can also score meets in a variety of formats with user defined point values, convert times between courses ('LCM', SCM', SCY') and draw single elimination brackets, as well as providing a suite of tools for working cleaning swimming data. This is a developmental package, not yet mature.
This package implements a suite of methods to preprocess data from PTR-TOF-MS instruments (HDF5 format) and generates the sample by features table of peak intensities in addition to the sample and feature metadata (as a singl<e ExpressionSet object for subsequent statistical analysis). This package also permit usefull tools for cohorts management as analyzing data progressively, visualization tools and quality control. The steps include calibration, expiration detection, peak detection and quantification, feature alignment, missing value imputation and feature annotation. Applications to exhaled air and cell culture in headspace are described in the vignettes and examples. This package was used for data analysis of Gassin Delyle study on adults undergoing invasive mechanical ventilation in the intensive care unit due to severe COVID-19 or non-COVID-19 acute respiratory distress syndrome (ARDS), and permit to identfy four potentiel biomarquers of the infection.
Computation of various confidence intervals (Altman et al. (2000), ISBN:978-0-727-91375-3; Hedderich and Sachs (2018), ISBN:978-3-662-56657-2) including bootstrapped versions (Davison and Hinkley (1997), ISBN:978-0-511-80284-3) as well as Hsu (Hedderich and Sachs (2018), ISBN:978-3-662-56657-2), permutation (Janssen (1997), <doi:10.1016/S0167-7152(97)00043-6>), bootstrap (Davison and Hinkley (1997), ISBN:978-0-511-80284-3), intersection-union (Sozu et al. (2015), ISBN:978-3-319-22005-5) and multiple imputation (Barnard and Rubin (1999), <doi:10.1093/biomet/86.4.948>) t-test; furthermore, computation of intersection-union z-test as well as multiple imputation Wilcoxon tests. Graphical visualization by volcano and Bland-Altman plots (Bland and Altman (1986), <doi:10.1016/S0140-6736(86)90837-8>; Shieh (2018), <doi:10.1186/s12874-018-0505-y>).
Sample size estimation for bio-equivalence trials is supported through a simulation-based approach that extends the Two One-Sided Tests (TOST) procedure. The methodology provides flexibility in hypothesis testing, accommodates multiple treatment comparisons, and accounts for correlated endpoints. Users can model complex trial scenarios, including parallel and crossover designs, intra-subject variability, and different equivalence margins. Monte Carlo simulations enable accurate estimation of power and type I error rates, ensuring well-calibrated study designs. The statistical framework builds on established methods for equivalence testing and multiple hypothesis testing in bio-equivalence studies, as described in Schuirmann (1987) <doi:10.1007/BF01068419>, Mielke et al. (2018) <doi:10.1080/19466315.2017.1371071>, Shieh (2022) <doi:10.1371/journal.pone.0269128>, and Sozu et al. (2015) <doi:10.1007/978-3-319-22005-5>. Comprehensive documentation and vignettes guide users through implementation and interpretation of results.
This package provides a set of functions allowing to implement the SpiceFP approach which is iterative. It involves transformation of functional predictors into several candidate explanatory matrices (based on contingency tables), to which relative edge matrices with contiguity constraints are associated. Generalized Fused Lasso regression are performed in order to identify the best candidate matrix, the best class intervals and related coefficients at each iteration. The approach is stopped when the maximal number of iterations is reached or when retained coefficients are zeros. Supplementary functions allow to get coefficients of any candidate matrix or mean of coefficients of many candidates. The methods in this package are describing in Girault Gnanguenon Guesse, Patrice Loisel, Bénedicte Fontez, Thierry Simonneau, Nadine Hilgert (2021) "An exploratory penalized regression to identify combined effects of functional variables -Application to agri-environmental issues" <https://hal.archives-ouvertes.fr/hal-03298977>.
The "ussher" data set is drawn from original chronological textual historic events. Commonly known as James Ussher's Annals of the World, the source text was originally written in Latin in 1650, and published in English translation in 1658.The data are classified by index, year, epoch (or one of the 7 ancient "Ages of the World"), Biblical source book if referenced (rarely), as well as alternate dating mechanisms, such as "Anno Mundi" (age of the world) or "Julian Period" (dates based upon the Julian calendar). Additional file "usshfull" includes variables that may be of further interest to historians, such as Southern Kingdom and Northern Kingdom discrepant dates, and the original amalgamated dating mechanic used by Ussher in the original text. The raw data can also be called using "usshraw", as described in: Ussher, J. (1658) <https://archive.org/stream/AnnalsOfTheWorld/Annals_djvu.txt>.
Designed to streamline data analysis and statistical testing, reducing the length of R scripts while generating well-formatted outputs in pdf', Microsoft Word', and Microsoft Excel formats. In essence, the package contains functions which are sophisticated wrappers around existing R functions that are called by using f_ (user f_riendly) prefix followed by the normal function name. This first version of the rfriend package focuses primarily on data exploration, including tools for creating summary tables, f_summary(), performing data transformations, f_boxcox() in part based on MASS/boxcox and rcompanion', and f_bestNormalize() which wraps and extends functionality from the bestNormalize package. Furthermore, rfriend can automatically (or on request) generate visualizations such as boxplots, f_boxplot(), QQ-plots, f_qqnorm(), histograms f_hist(), and density plots. Additionally, the package includes four statistical test functions: f_aov(), f_kruskal_test(), f_glm(), f_chisq_test for sequential testing and visualisation of the stats functions: aov(), kruskal.test(), glm() and chisq.test. These functions support testing multiple response variables and predictors, while also handling assumption checks, data transformations, and post hoc tests. Post hoc results are automatically summarized in a table using the compact letter display (cld) format for easy interpretation. The package also provides a function to do model comparison, f_model_comparison(), and several utility functions to simplify common R tasks. For example, f_clear() clears the workspace and restarts R with a single command; f_setwd() sets the working directory to match the directory of the current script; f_theme() quickly changes RStudio themes; and f_factors() converts multiple columns of a data frame to factors, and much more. If you encounter any issues or have feature requests, please feel free to contact me via email.
Data Analysis using Bootstrap-Coupled ESTimation. Estimation statistics is a simple framework that avoids the pitfalls of significance testing. It uses familiar statistical concepts: means, mean differences, and error bars. More importantly, it focuses on the effect size of one's experiment/intervention, as opposed to a false dichotomy engendered by P values. An estimation plot has two key features: 1. It presents all datapoints as a swarmplot, which orders each point to display the underlying distribution. 2. It presents the effect size as a bootstrap 95% confidence interval on a separate but aligned axes. Estimation plots are introduced in Ho et al., Nature Methods 2019, 1548-7105. <doi:10.1038/s41592-019-0470-3>. The free-to-view PDF is located at <https://www.nature.com/articles/s41592-019-0470-3.epdf?author_access_token=Euy6APITxsYA3huBKOFBvNRgN0jAjWel9jnR3ZoTv0Pr6zJiJ3AA5aH4989gOJS_dajtNr1Wt17D0fh-t4GFcvqwMYN03qb8C33na_UrCUcGrt-Z0J9aPL6TPSbOxIC-pbHWKUDo2XsUOr3hQmlRew%3D%3D>.
Estimation and comparison of the performances of diagnostic tests in multi-reader multi-case studies where true case statuses (or ground truths) are known and one or more readers provide test ratings for multiple cases. Reader performance metrics are provided for area under and expected utility of ROC curves, likelihood ratio of positive or negative tests, and sensitivity and specificity. ROC curves can be estimated empirically or with binormal or binormal likelihood-ratio models. Statistical comparisons of diagnostic tests are based on the ANOVA model of Obuchowski-Rockette and the unified framework of Hillis (2005) <doi:10.1002/sim.2024>. The ANOVA can be conducted with data from a full factorial, nested, or partially paired study design; with random or fixed readers or cases; and covariances estimated with the DeLong method, jackknifing, or an unbiased method. Smith and Hillis (2020) <doi:10.1117/12.2549075>.
This package provides a set of functions useful in the analysis of 3D genomic interactions. It includes the import of standard HiC data formats into R and HiC normalisation procedures. The main objective of this package is to improve the visualization and quantification of the analysis of HiC contacts through aggregation. The package allows to import 1D genomics data, such as peaks from ATACSeq, ChIPSeq, to create potential couples between features of interest under user-defined parameters such as distance between pairs of features of interest. It allows then the extraction of contact values from the HiC data for these couples and to perform Aggregated Peak Analysis (APA) for visualization, but also to compare normalized contact values between conditions. Overall the package allows to integrate 1D genomics data with 3D genomics data, providing an easy access to HiC contact values.
Build and use B-splines for interpolation and regression. In case of regression, equality constraints as well as monotonicity and/or positivity of B-spline weights can be imposed. Moreover, knot positions can be on regular grid or be part of optimized parameters too (in addition to the spline weights). For this end, bspline is able to calculate Jacobian of basis vectors as function of knot positions. User is provided with functions calculating spline values at arbitrary points. These functions can be differentiated and integrated to obtain B-splines calculating derivatives/integrals at any point. B-splines of this package can simultaneously operate on a series of curves sharing the same set of knots. bspline is written with concern about computing performance that's why the basis and Jacobian calculation is implemented in C++. The rest is implemented in R but without notable impact on computing speed.
This package provides functions for implementing the novel algorithm CASCORE, which is designed to detect latent community structure in graphs with node covariates. This algorithm can handle models such as the covariate-assisted degree corrected stochastic block model (CADCSBM). CASCORE specifically addresses the disagreement between the community structure inferred from the adjacency information and the community structure inferred from the covariate information. For more detailed information, please refer to the reference paper: Yaofang Hu and Wanjie Wang (2022) <arXiv:2306.15616>. In addition to CASCORE, this package includes several classical community detection algorithms that are compared to CASCORE in our paper. These algorithms are: Spectral Clustering On Ratios-of Eigenvectors (SCORE), normalized PCA, ordinary PCA, network-based clustering, covariates-based clustering and covariate-assisted spectral clustering (CASC). By providing these additional algorithms, the package enables users to compare their performance with CASCORE in community detection tasks.
This data contains a large variety of information on players and their current attributes on Fantasy Premier League <https://fantasy.premierleague.com/>. In particular, it contains a `next_gw_points` (next gameweek points) value for each player given their attributes in the current week. Rows represent player-gameweeks, i.e. for each player there is a row for each gameweek. This makes the data suitable for modelling a player's next gameweek points, given attributes such as form, total points, and cost at the current gameweek. This data can therefore be used to create Fantasy Premier League bots that may use a machine learning algorithm and a linear programming solver (for example) to return the best possible transfers and team to pick for each gameweek, thereby fully automating the decision making process in Fantasy Premier League. This function simply supplies the required data for such a task.
This package provides constrained joint maximum likelihood estimation algorithms for item factor analysis (IFA) based on multidimensional item response theory models. So far, we provide functions for exploratory and confirmatory IFA based on the multidimensional two parameter logistic (M2PL) model for binary response data. Comparing with traditional estimation methods for IFA, the methods implemented in this package scale better to data with large numbers of respondents, items, and latent factors. The computation is facilitated by multiprocessing OpenMP API. For more information, please refer to: 1. Chen, Y., Li, X., & Zhang, S. (2018). Joint Maximum Likelihood Estimation for High-Dimensional Exploratory Item Factor Analysis. Psychometrika, 1-23. <doi:10.1007/s11336-018-9646-5>; 2. Chen, Y., Li, X., & Zhang, S. (2019). Structured Latent Factor Analysis for Large-scale Data: Identifiability, Estimability, and Their Implications. Journal of the American Statistical Association, <doi: 10.1080/01621459.2019.1635485>.
This package provides a multi-core R package that contains a set of tools based on copula graphical models for accomplishing the three interrelated goals in genetics and genomics in an unified way: (1) linkage map construction, (2) constructing linkage disequilibrium networks, and (3) exploring high-dimensional genotype-phenotype network and genotype- phenotype-environment interactions networks. The netgwas package can deal with biparental inbreeding and outbreeding species with any ploidy level, namely diploid (2 sets of chromosomes), triploid (3 sets of chromosomes), tetraploid (4 sets of chromosomes) and so on. We target on high-dimensional data where number of variables p is considerably larger than number of sample sizes (p >> n). The computations is memory-optimized using the sparse matrix output. The netgwas implements the methodological developments in Behrouzi and Wit (2017) <doi:10.1111/rssc.12287> and Behrouzi and Wit (2017) <doi:10.1093/bioinformatics/bty777>.
The GENESIS package provides methodology for estimating, inferring, and accounting for population and pedigree structure in genetic analyses. The current implementation provides functions to perform PC-AiR (Conomos et al., 2015, Gen Epi) and PC-Relate (Conomos et al., 2016, AJHG). PC-AiR performs a Principal Components Analysis on genome-wide SNP data for the detection of population structure in a sample that may contain known or cryptic relatedness. Unlike standard PCA, PC-AiR accounts for relatedness in the sample to provide accurate ancestry inference that is not confounded by family structure. PC-Relate uses ancestry representative principal components to adjust for population structure/ancestry and accurately estimate measures of recent genetic relatedness such as kinship coefficients, IBD sharing probabilities, and inbreeding coefficients. Additionally, functions are provided to perform efficient variance component estimation and mixed model association testing for both quantitative and binary phenotypes.
Learn vector representations of sentences, paragraphs or documents by using the Paragraph Vector algorithms, namely the distributed bag of words (PV-DBOW) and the distributed memory (PV-DM) model. Top2vec finds clusters in text documents by combining techniques to embed documents and words and density-based clustering. It does this by embedding documents in the semantic space as defined by the doc2vec algorithm. Next it maps these document embeddings to a lower-dimensional space using the Uniform Manifold Approximation and Projection (UMAP) clustering algorithm and finds dense areas in that space using a Hierarchical Density-Based Clustering technique (HDBSCAN). These dense areas are the topic clusters which can be represented by the corresponding topic vector which is an aggregate of the document embeddings of the documents which are part of that topic cluster. In the same semantic space similar words can be found which are representative of the topic.
The network analysis plays an important role in numerous application domains including biomedicine. Estimation of the number of communities is a fundamental and critical issue in network analysis. Most existing studies assume that the number of communities is known a priori, or lack of rigorous theoretical guarantee on the estimation consistency. This method proposes a regularized network embedding model to simultaneously estimate the community structure and the number of communities in a unified formulation. The proposed model equips network embedding with a novel composite regularization term, which pushes the embedding vector towards its center and collapses similar community centers with each other. A rigorous theoretical analysis is conducted, establishing asymptotic consistency in terms of community detection and estimation of the number of communities. Reference: Ren, M., Zhang S. and Wang J. (2022). "Consistent Estimation of the Number of Communities via Regularized Network Embedding". Biometrics, <doi:10.1111/biom.13815>.
Combines taxonomic classifications of high-throughput 16S rRNA gene sequences with reference proteomes of archaeal and bacterial taxa to generate amino acid compositions of community reference proteomes. Calculates chemical metrics including carbon oxidation state ('Zc'), stoichiometric oxidation and hydration state ('nO2 and nH2O'), H/C, N/C, O/C, and S/C ratios, grand average of hydropathicity ('GRAVY'), isoelectric point ('pI'), protein length, and average molecular weight of amino acid residues. Uses precomputed reference proteomes for archaea and bacteria derived from the Genome Taxonomy Database ('GTDB'). Also includes reference proteomes derived from the NCBI Reference Sequence ('RefSeq') database and manual mapping from the RDP Classifier training set to RefSeq taxonomy as described by Dick and Tan (2023) <doi:10.1007/s00248-022-01988-9>. Processes taxonomic classifications in RDP Classifier format or OTU tables in phyloseq-class objects from the Bioconductor package phyloseq'.
This package provides methods for simultaneous clustering and dimensionality reduction such as: Double k-means, Reduced k-means, Factorial k-means, Clustering with Disjoint PCA but also methods for exclusively dimensionality reduction: Disjoint PCA, Disjoint FA. The statistical methods implemented refer to the following articles: de Soete G., Carroll J. (1994) "K-means clustering in a low-dimensional Euclidean space" <doi:10.1007/978-3-642-51175-2_24> ; Vichi M. (2001) "Double k-means Clustering for Simultaneous Classification of Objects and Variables" <doi:10.1007/978-3-642-59471-7_6> ; Vichi M., Kiers H.A.L. (2001) "Factorial k-means analysis for two-way data" <doi:10.1016/S0167-9473(00)00064-5> ; Vichi M., Saporta G. (2009) "Clustering and disjoint principal component analysis" <doi:10.1016/j.csda.2008.05.028> ; Vichi M. (2017) "Disjoint factor analysis with cross-loadings" <doi:10.1007/s11634-016-0263-9>.
Runs ecological niche models over all combinations of user-defined settings (i.e., tuning), performs cross validation to evaluate models, and returns data tables to aid in selection of optimal model settings that balance goodness-of-fit and model complexity. Also has functions to partition data spatially (or not) for cross validation, to plot multiple visualizations of results, to run null models to estimate significance and effect sizes of performance metrics, and to calculate range overlap between model predictions, among others. The package was originally built for Maxent models (Phillips et al. 2006, Phillips et al. 2017), but the current version allows possible extensions for any modeling algorithm. The extensive vignette, which guides users through most package functionality but unfortunately has a file size too big for CRAN, can be found here on the package's Github Pages website: <https://jamiemkass.github.io/ENMeval/articles/ENMeval-2.0-vignette.html>.
Easily explore data by plotting graphs with a few lines of code. Use these ggplot() wrappers to quickly draw graphs of scatter/dots with box-whiskers, violins or SD error bars, data distributions, before-after graphs, factorial ANOVA and more. Customise graphs in many ways, for example, by choosing from colour blind-friendly palettes (12 discreet, 3 continuous and 2 divergent palettes). Use the simple code for ANOVA as ordinary (lm()) or mixed-effects linear models (lmer()), including randomised-block or repeated-measures designs, and fit non-linear outcomes as a generalised additive model (gam) using mgcv(). Obtain estimated marginal means and perform post-hoc comparisons on fitted models (via emmeans()). Also includes small datasets for practising code and teaching basics before users move on to more complex designs. See vignettes for details on usage <https://grafify.shenoylab.com/>. Citation: <doi:10.5281/zenodo.5136508>.
This package provides functions to compute p-values based on permutation tests. Regression, ANOVA and ANCOVA, omnibus F-tests, marginal unilateral and bilateral t-tests are available. Several methods to handle nuisance variables are implemented (Kherad-Pajouh, S., & Renaud, O. (2010) <doi:10.1016/j.csda.2010.02.015> ; Kherad-Pajouh, S., & Renaud, O. (2014) <doi:10.1007/s00362-014-0617-3> ; Winkler, A. M., Ridgway, G. R., Webster, M. A., Smith, S. M., & Nichols, T. E. (2014) <doi:10.1016/j.neuroimage.2014.01.060>). An extension for the comparison of signals issued from experimental conditions (e.g. EEG/ERP signals) is provided. Several corrections for multiple testing are possible, including the cluster-mass statistic (Maris, E., & Oostenveld, R. (2007) <doi:10.1016/j.jneumeth.2007.03.024>) and the threshold-free cluster enhancement (Smith, S. M., & Nichols, T. E. (2009) <doi:10.1016/j.neuroimage.2008.03.061>).