Probabilistic record linkage without direct identifiers using only diagnosis codes. Method is detailed in: Hejblum, Weber, Liao, Palmer, Churchill, Szolovits, Murphy, Kohane & Cai (2019) <doi: 10.1038/sdata.2018.298> ; Zhang, Hejblum, Weber, Palmer, Churchill, Szolovits, Murphy, Liao, Kohane & Cai (2021) <doi: 10.1093/jamia/ocab187>.
Computes the degrees of freedom of the lasso, elastic net, generalized elastic net and adaptive lasso based on the generalized path seeking algorithm. The optimal model can be selected by model selection criteria including Mallows Cp, bias-corrected AIC (AICc), generalized cross validation (GCV) and BIC.
Generates mid upper arm circumference (MUAC) and body mass index (BMI) for age z-scores and percentiles based on LMS method for children and adolescents up to 19 years that can be used to assess nutritional and health status and define risk of adverse health events.
Utilities for unambiguous, neat and legible representation of data (date, time stamp, numbers, percentages and strings) for presentation of analysis , aiming for elegance and consistency. The purpose of this package is to format data, that is better for presentation and any automation jobs that reports numbers.
This package provides functions for nominal data mining based on bipartite graphs, which build a pipeline for analysis and missing values imputation. Methods are mainly from the paper: Jafari, Mohieddin, et al. (2021) <doi:10.1101/2021.03.18.436040>, some new ones are also included.
Partial Least Squares Path Modeling (PLS-PM), Tenenhaus, Esposito Vinzi, Chatelin, Lauro (2005) <doi:10.1016/j.csda.2004.03.005>, analysis for both metric and non-metric data, as well as REBUS analysis, Esposito Vinzi, Trinchera, Squillacciotti, and Tenenhaus (2008) <doi:10.1002/asmb.728>.
The Prais-Winsten estimator (Prais & Winsten, 1954) takes into account AR(1) serial correlation of the errors in a linear regression model. The procedure recursively estimates the coefficients and the error autocorrelation of the specified model until sufficient convergence of the AR(1) coefficient is attained.
This package implements the pcgen algorithm, which is a modified version of the standard pc-algorithm, with specific conditional independence tests and modified orientation rules. pcgen extends the approach of Valente et al. (2010) <doi:10.1534/genetics.109.112979> with reconstruction of direct genetic effects.
This package contains functions to identify tree-ring borders based on X-ray micro-density profiles and a Graphical User Interface (GUI) to visualize density profiles and correct tree-ring borders. Campelo F, Mayer K, Grabner M. (2019) <doi:10.1016/j.dendro.2018.11.002>.
Rogue ("wildcard") taxa are leaves with uncertain phylogenetic position. Their position may vary from tree to tree under inference methods that yield a tree set (e.g. bootstrapping, Bayesian tree searches, maximum parsimony). The presence of rogue taxa in a tree set can potentially remove all information from a consensus tree. The information content of a consensus tree - a function of its resolution and branch support values - can often be increased by removing rogue taxa. Rogue provides an explicitly information-theoretic approach to rogue detection (Smith 2022) <doi:10.1093/sysbio/syab099>, and an interface to RogueNaRok (Aberer et al. 2013) <doi:10.1093/sysbio/sys078>.
This package provides an intuitive and user-friendly interface for working with emojis in R'. It allows users to search, insert, and manage emojis by keyword, category, or through an interactive shiny'-based drop-down. The package enables integration of emojis into R scripts, R Markdown', Quarto', shiny apps, and ggplot2 plots. Also includes built-in mappings for commit messages, useful for version control. It builds on established emoji libraries and Unicode standards, adding expressiveness and visual cues to documentation, user interfaces, and reports. For more details see Emojipedia (2024) <https://emojipedia.org> and GitHub Emoji Cheat Sheet <https://github.com/ikatyang/emoji-cheat-sheet/tree/master>.
This package provides an interface to simulate metabolic reconstruction from the BiGG database and other metabolic reconstruction databases. The package facilitates flux balance analysis (FBA) and the sampling of feasible flux distributions. Metabolic networks and estimated fluxes can be visualized with hypergraphs.
SeqGL is a group lasso based algorithm to extract transcription factor sequence signals from ChIP, DNase and ATAC-seq profiles. This package presents a method which uses group lasso to discriminate between bound and non bound genomic regions to accurately identify transcription factors bound at the specific regions.
Pando leverages multi-modal single-cell measurements to infer gene regulatory networks using a flexible linear model-based framework. By modeling the relationship between TF-binding site pairs with the expression of target genes, Pando simultaneously infers gene modules and sets of regulatory regions for each transcription factor.
This Package utilizes a Semi-parametric Differential Abundance/expression analysis (SDA) method for metabolomics and proteomics data from mass spectrometry as well as single-cell RNA sequencing data. SDA is able to robustly handle non-normally distributed data and provides a clear quantification of the effect size.
Fetches monthly financial tables and banking sector data published on the official website of the Banking Regulation and Supervision Agency of Turkey and also enables you to save it as an Excel file. It is a R implementation of the Python package <https://pypi.org/project/bddkdata/>.
Makes difficult operations easy. Includes these types of functions: shorthand, type conversion, data wrangling, and work flow. Also includes some helpful data objects: NA strings, U.S. state list, color blind charting colors. Built and shared by Oliver Wyman Actuarial Consulting. Accepting proposed contributions through GitHub.
This package provides algorithms to fit linear regression models under several popular penalization techniques and functional linear regression models based on Majorizing-Minimizing (MM) and Alternating Direction Method of Multipliers (ADMM) techniques. See Boyd et al (2010) <doi:10.1561/2200000016> for complete introduction to the method.
Read and process a large delimited file block by block. A block consists of all the contiguous rows that have the same value in the first field. The result can be returned as a list or a data.table, or even directly printed to an output file.
It contains a function designed to the joint segmentation in the mean of several correlated series. The method is described in the paper X. Collilieux, E. Lebarbier and S. Robin. A factor model approach for the joint segmentation with between-series correlation (2015) <arXiv:1505.05660>.
An implementation of the initial guided analytics for parameter testing and controlband extraction framework. Functions are available for continuous and categorical target variables as well as for generating standardized reports of the conducted analysis. See <https://github.com/stefan-stein/igate> for more information on the technology.
Back-end connections to LattE (<https://www.math.ucdavis.edu/~latte/>) for counting lattice points and integration inside convex polytopes and 4ti2 (<http://www.4ti2.de/>) for algebraic, geometric, and combinatorial problems on linear spaces and front-end tools facilitating their use in the R ecosystem.
Use a glmmkin class object (GMMAT package) from the null model to perform generalized linear mixed model-based single-variant and variant set main effect tests, gene-environment interaction tests, and joint tests for association, as proposed in Wang et al. (2020) <DOI:10.1002/gepi.22351>.
Penalized orthogonal-components regression (POCRE) is a supervised dimension reduction method for high-dimensional data. It sequentially constructs orthogonal components (with selected features) which are maximally correlated to the response residuals. POCRE can also construct common components for multiple responses and thus build up latent-variable models.