This R package provides power calculations via internal simulation methods. The package also provides a frontend to the now abandoned PBAT program (developed by Christoph Lange), and reads in the corresponding output and displays results and figures when appropriate. The license of this R package itself is GPL. However, to have the program interact with the PBAT program for some functionality of the R package, users must additionally obtain the PBAT program from Christoph Lange, and accept his license. Both the data analysis and power calculations have command line and graphical interfaces using tcltk.
Fits penalized linear mixed models that correct for unobserved confounding factors. plmmr infers and corrects for the presence of unobserved confounding effects such as population stratification and environmental heterogeneity. It then fits a linear model via penalized maximum likelihood. Originally designed for the multivariate analysis of single nucleotide polymorphisms (SNPs) measured in a genome-wide association study (GWAS), plmmr eliminates the need for subpopulation-specific analyses and post-analysis p-value adjustments. Functions for the appropriate processing of PLINK files are also supplied. For examples, see the package homepage. <https://pbreheny.github.io/plmmr/>.
This package provides functions for causal structure learning and causal inference using graphical models. The main algorithms for causal structure learning are PC (for observational data without hidden variables), FCI and RFCI (for observational data with hidden variables), and GIES (for a mix of data from observational studies (i.e. observational data) and data from experiments involving interventions (i.e. interventional data) without hidden variables). For causal inference the IDA algorithm, the Generalized Backdoor Criterion (GBC), the Generalized Adjustment Criterion (GAC) and some related functions are implemented. Functions for incorporating background knowledge are provided.
The SALSO algorithm is an efficient randomized greedy search method to find a point estimate for a random partition based on a loss function and posterior Monte Carlo samples. The algorithm is implemented for many loss functions, including the Binder loss and a generalization of the variation of information loss, both of which allow for unequal weights on the two types of clustering mistakes. Efficient implementations are also provided for Monte Carlo estimation of the posterior expected loss of a given clustering estimate. See Dahl, Johnson, Müller (2022) <doi:10.1080/10618600.2022.2069779>.
This package provides a collection of functions for Kronecker structured covariance estimation and testing under the array normal model. For estimation, maximum likelihood and Bayesian equivariant estimation procedures are implemented. For testing, a likelihood ratio testing procedure is available. This package also contains additional functions for manipulating and decomposing tensor data sets. This work was partially supported by NSF grant DMS-1505136. Details of the methods are described in Gerard and Hoff (2015) <doi:10.1016/j.jmva.2015.01.020> and Gerard and Hoff (2016) <doi:10.1016/j.laa.2016.04.033>.
This package provides a tool that allows users to estimate tree height in the long-term forest experiments in Sweden. It utilizes the multilevel nonlinear mixed-effect height models developed for the forest experiments and consists of four functions for the main species, other conifer species, and other broadleaves. Each function within the system returns a data frame that includes the input data and the estimated heights for any missing values. Ogana et al. (2023) <doi:10.1016/j.foreco.2023.120843>\n Arias-Rodil et al. (2015) <doi:10.1371/JOURNAL.PONE.0143521>.
Cross-Species Investigation and Analysis (CoSIA) is a package that provides researchers with an alternative methodology for comparing across species and tissues using normal wild-type RNA-Seq Gene Expression data from Bgee. Using RNA-Seq Gene Expression data, CoSIA provides multiple visualization tools to explore the transcriptome diversity and variation across genes, tissues, and species. CoSIA uses the Coefficient of Variation and Shannon Entropy and Specificity to calculate transcriptome diversity and variation. CoSIA also provides additional conversion tools and utilities to provide a streamlined methodology for cross-species comparison.
mbQTL is a statistical R package for simultaneous 16srRNA,16srDNA (microbial) and variant, SNP, SNV (host) relationship, correlation, regression studies. We apply linear, logistic and correlation based statistics to identify the relationships of taxa, genus, species and variant, SNP, SNV in the infected host. We produce various statistical significance measures such as P values, FDR, BC and probability estimation to show significance of these relationships. Further we provide various visualization function for ease and clarification of the results of these analysis. The package is compatible with dataframe, MRexperiment and text formats.
The Biomarker Optimal Segmentation System R package, bossR', is designed for precision medicine, helping to identify individual traits using biomarkers. It focuses on determining the most effective cutoff value for a continuous biomarker, which is crucial for categorizing patients into two groups with distinctly different clinical outcomes. The package simultaneously finds the optimal cutoff from given candidate values and tests its significance. Simulation studies demonstrate that bossR offers statistical power and false positive control non-inferior to the permutation approach (considered the gold standard in this field), while being hundreds of times faster.
Inference functionalities for distributed-lag linear structural equation models (DLSEMs). DLSEMs are Markovian structural causal models where each factor of the joint probability distribution is a distributed-lag linear regression with constrained lag shapes (Magrini, 2018 <doi:10.2478/bile-2018-0012>; Magrini et al., 2019 <doi:10.1007/s11135-019-00855-z>). DLSEMs account for temporal delays in the dependence relationships among the variables through a single parameter per covariate, thus allowing to perform dynamic causal inference in a feasible fashion. Endpoint-constrained quadratic, quadratic decreasing, linearly decreasing and gamma lag shapes are available.
Simultaneously detect the number and locations of change points in piecewise linear models under stationary Gaussian noise allowing autocorrelated random noise. The core idea is to transform the problem of detecting change points into the detection of local extrema (local maxima and local minima)through kernel smoothing and differentiation of the data sequence, see Cheng et al. (2020) <doi:10.1214/20-EJS1751>. A low-computational and fast algorithm call dSTEM is introduced to detect change points based on the STEM algorithm in D. Cheng and A. Schwartzman (2017) <doi:10.1214/16-AOS1458>.
With the functions in this package you can check the validity of the following financial instrument identifiers: FIGI (Financial Instrument Global Identifier <https://www.openfigi.com/about/figi>), CUSIP (Committee on Uniform Security Identification Procedures <https://www.cusip.com/identifiers.html#/CUSIP>), ISIN (International Securities Identification Number <https://www.cusip.com/identifiers.html#/ISIN>), SEDOL (Stock Exchange Daily Official List <https://www2.lseg.com/SEDOL-masterfile-service-tech-guide-v8.6>). You can also calculate the FIGI checksum of 11-character strings, which can be useful if you want to create your own FIGI identifiers.
Fit penalized multivariable linear mixed models with a single random effect to control for population structure in genetic association studies. The goal is to simultaneously fit many genetic variants at the same time, in order to select markers that are independently associated with the response. Can also handle prior annotation information, for example, rare variants, in the form of variable weights. For more information, see the website below and the accompanying paper: Bhatnagar et al., "Simultaneous SNP selection and adjustment for population structure in high dimensional prediction models", 2020, <DOI:10.1371/journal.pgen.1008766>.
This is the central location for data and tools for the development, maintenance, analysis, and deployment of the International Soil Radiocarbon Database (ISRaD). ISRaD was developed as a collaboration between the U.S. Geological Survey Powell Center and the Max Planck Institute for Biogeochemistry. This R package provides tools for accessing and manipulating ISRaD data, compiling local data using the ISRaD data structure, and simple query and reporting functions for ISRaD. For more detailed information visit the ISRaD website at: <https://soilradiocarbon.org/>.
Reproduces the harmonized DB of the ESTAT survey of the same name. The survey data is served as separate spreadsheets with noticeable differences in the collected attributes. The tool here presented carries out a series of instructions that harmonize the attributes in terms of name, meaning, and occurrence, while also introducing a series of new variables, instrumental to adding value to the product. Outputs include one harmonized table with all the years, and three separate geometries, corresponding to the theoretical point, the gps location where the measurement was made and the 250m east-facing transect.
To assist biological researchers in assembling taxonomically and marker focused molecular sequence data sets. MACER accepts a list of genera as a user input and uses NCBI-GenBank and BOLD as resources to download and assemble molecular sequence datasets. These datasets are then assembled by marker, aligned, trimmed, and cleaned. The use of this package allows the publication of specific parameters to ensure reproducibility. The MACER package has four core functions and an example run through using all of these functions can be found in the associated repository <https://github.com/rgyoung6/MACER_example>.
An implementation of the nodiv algorithm, see Borregaard, M.K., Rahbek, C., Fjeldsaa, J., Parra, J.L., Whittaker, R.J. & Graham, C.H. 2014. Node-based analysis of species distributions. Methods in Ecology and Evolution 5(11): 1225-1235. <DOI:10.1111/2041-210X.12283>. Package for phylogenetic analysis of species distributions. The main function goes through each node in the phylogeny, compares the distributions of the two descendant nodes, and compares the result to a null model. This highlights nodes where major distributional divergence have occurred. The distributional divergence for these nodes is mapped.
This package provides an implementation of a rare variant association test that utilizes protein tertiary structure to increase signal and to identify likely causal variants. Performs structure-guided collapsing, which leads to local tests that borrow information from neighboring variants on a protein and that provide association information on a variant-specific level. For details of the implemented method see West, R. M., Lu, W., Rotroff, D. M., Kuenemann, M., Chang, S-M., Wagner M. J., Buse, J. B., Motsinger-Reif, A., Fourches, D., and Tzeng, J-Y. (2019) <doi:10.1371/journal.pcbi.1006722>.
An algorithm for nonlinear global optimization based on the variable neighbourhood trust region search (VNTRS) algorithm proposed by Bierlaire et al. (2009) "A Heuristic for Nonlinear Global Optimization" <doi:10.1287/ijoc.1090.0343>. The algorithm combines variable neighbourhood exploration with a trust-region framework to efficiently search the solution space. It can terminate a local search early if the iterates are converging toward a previously visited local optimum or if further improvement within the current region is unlikely. In addition to global optimization, the algorithm can also be applied to identify multiple local optima.
The sqldf function is typically passed a single argument which is an SQL select statement where the table names are ordinary R data frame names. sqldf transparently sets up a database, imports the data frames into that database, performs the SQL statement and returns the result using a heuristic to determine which class to assign to each column of the returned data frame. The sqldf or read.csv.sql functions can also be used to read filtered files into R even if the original files are larger than R itself can handle.
Combining a generalized linear model with an additional tree part on the same scale. A four-step procedure is proposed to fit the model and test the joint effect of the selected tree part while adjusting on confounding factors. We also proposed an ensemble procedure based on the bagging to improve prediction accuracy and computed several scores of importance for variable selection. See Cyprien Mbogning et al.'(2014)<doi:10.1186/2043-9113-4-6> and Cyprien Mbogning et al.'(2015)<doi:10.1159/000380850> for an overview of all the methods implemented in this package.
This package provides a utility to quickly obtain clean and tidy men's basketball play by play data. Provides functions to access live play by play and box score data from ESPN<https://www.espn.com> with shot locations when available. It is also a full NBA Stats API<https://www.nba.com/stats/> wrapper. It is also a scraping and aggregating interface for Ken Pomeroy's men's college basketball statistics website<https://kenpom.com>. It provides users with an active subscription the capability to scrape the website tables and analyze the data for themselves.
Combines Latent Dirichlet Allocation (LDA) and Bayesian multinomial time series methods in a two-stage analysis to quantify dynamics in high-dimensional temporal data. LDA decomposes multivariate data into lower-dimension latent groupings, whose relative proportions are modeled using generalized Bayesian time series models that include abrupt changepoints and smooth dynamics. The methods are described in Blei et al. (2003) <doi:10.1162/jmlr.2003.3.4-5.993>, Western and Kleykamp (2004) <doi:10.1093/pan/mph023>, Venables and Ripley (2002, ISBN-13:978-0387954578), and Christensen et al. (2018) <doi:10.1002/ecy.2373>.
Create and manipulate numeric list ('nlist') objects. An nlist is an S3 list of uniquely named numeric objects. An numeric object is an integer or double vector, matrix or array. An nlists object is a S3 class list of nlist objects with the same names, dimensionalities and typeofs. Numeric list objects are of interest because they are the raw data inputs for analytic engines such as JAGS', STAN and TMB'. Numeric lists objects, which are useful for storing multiple realizations of of simulated data sets, can be converted to coda::mcmc and coda::mcmc.list objects.