Models categorical time series through a Markov Chain when a) covariates are predictors for transitioning into the next state/symbol and b) when the dependence in the past states has variable length. The probability of transitioning to the next state in the Markov Chain is defined by a multinomial regression whose parameters depend on the past states of the chain and, moreover, the number of states in the past needed to predict the next state also depends on the observed states themselves. See Zambom, Kim, and Garcia (2022) <doi:10.1111/jtsa.12615>.
Cross-Species Investigation and Analysis (CoSIA) is a package that provides researchers with an alternative methodology for comparing across species and tissues using normal wild-type RNA-Seq Gene Expression data from Bgee. Using RNA-Seq Gene Expression data, CoSIA provides multiple visualization tools to explore the transcriptome diversity and variation across genes, tissues, and species. CoSIA uses the Coefficient of Variation and Shannon Entropy and Specificity to calculate transcriptome diversity and variation. CoSIA also provides additional conversion tools and utilities to provide a streamlined methodology for cross-species comparison.
mbQTL is a statistical R package for simultaneous 16srRNA,16srDNA (microbial) and variant, SNP, SNV (host) relationship, correlation, regression studies. We apply linear, logistic and correlation based statistics to identify the relationships of taxa, genus, species and variant, SNP, SNV in the infected host. We produce various statistical significance measures such as P values, FDR, BC and probability estimation to show significance of these relationships. Further we provide various visualization function for ease and clarification of the results of these analysis. The package is compatible with dataframe, MRexperiment and text formats.
Facilitate the analysis of data related to aquatic ecology, specifically the establishment of carbon budget. Currently, the package allows the below analysis. (i) the calculation of greenhouse gas flux based on data obtained from trace gas analyzer using the method described in Lin et al. (2024). (ii) the calculation of Dissolved Oxygen (DO) metabolism based on data obtained from dissolved oxygen data logger using the method described in Staehr et al. (2010). Yong et al. (2024) <doi:10.5194/bg-21-5247-2024>. Staehr et al. (2010) <doi:10.4319/lom.2010.8.0628>.
This package provides a highly scientific and utterly addictive bird point count simulator to test statistical assumptions, aid survey design, and have fun while doing it (Solymos 2024 <doi:10.1007/s42977-023-00183-2>). The simulations follow time-removal and distance sampling models based on Matsuoka et al. (2012) <doi:10.1525/auk.2012.11190>, Solymos et al. (2013) <doi:10.1111/2041-210X.12106>, and Solymos et al. (2018) <doi:10.1650/CONDOR-18-32.1>, and sound attenuation experiments by Yip et al. (2017) <doi:10.1650/CONDOR-16-93.1>.
This package provides a small set of functions for managing R environments, with defaults designed to encourage usage patterns that scale well to larger code bases. It provides: import_from(), a flexible way to assign bindings that defaults to the current environment; include(), a vectorized alternative to base::source() that also default to the current environment; and attach_eval() and attach_source(), a way to evaluate expressions in attached environments. Together, these (and other) functions pair to provide a robust alternative to base::library() and base::source().
This package provides tools to download datasets of German elections covering local, state, federal, mayoral, European Parliament, and county (Kreistag) elections, with federal county-level coverage from 1953 and other families extending through 2025. The package supplies turnout, vote shares, and derived indicators at the municipal and county level, including geographically harmonized datasets that account for changes in municipal boundaries over time and incorporate mail-in voting districts. Bundled data includes county-level INKAR covariates (1995-2022) and municipality-level Zensus 2022 indicators. Data is sourced from <https://github.com/awiedem/german_election_data>.
This package provides a convenient R interface to the Genotype-Tissue Expression (GTEx) Portal API. The GTEx project is a comprehensive public resource for studying tissue-specific gene expression and regulation in human tissues. Through systematic analysis of RNA sequencing data from 54 non-diseased tissue sites across nearly 1000 individuals, GTEx provides crucial insights into the relationship between genetic variation and gene expression. This data is accessible through the GTEx Portal API enabling programmatic access to human gene expression data. For more information on the API, see <https://gtexportal.org/api/v2/redoc>.
Tree height is an important dendrometric variable and forms the basis of vertical structure of a forest stand. This package will help to fit and validate various non-linear height diameter models for assessing the underlying relationship that exists between tree height and diameter at breast height in case of conifer trees. This package has been implemented on Naslund, Curtis, Michailoff, Meyer, Power, Michaelis-Menten and Wykoff non linear models using algorithm of Huang et al. (1992) <doi:10.1139/x92-172> and Zeide et al. (1993) <doi:10.1093/forestscience/39.3.594>.
All the data and functions used to produce the book. We do not expect most people to use the package for any other reason than to get simple access to the JAGS model files, the data, and perhaps run some of the simple examples. The authors of the book are David Lucy (now sadly deceased) and James Curran. It is anticipated that a manuscript will be provided to Taylor and Francis around February 2020, with bibliographic details to follow at that point. Until such time, further information can be obtained by emailing James Curran.
Allows the simultaneous analysis of responses and response times in an Item Response Theory (IRT) modelling framework. Supports variable person speed functions (intercept, trend, quadratic), and covariates for item and person (random) parameters. Data missing-by-design can be specified. Parameter estimation is done with a MCMC algorithm. LNIRT replaces the package CIRT, which was written by Rinke Klein Entink. For reference, see the paper by Fox, Klein Entink and Van der Linden (2007), "Modeling of Responses and Response Times with the Package cirt", Journal of Statistical Software, <doi:10.18637/jss.v020.i07>.
Sentiment analysis is a popular technique in text mining that attempts to determine the emotional state of some text. We provide a new implementation of a common method for computing sentiment, whereby words are scored as positive or negative according to a dictionary lookup. Then the sum of those scores is returned for the document. We use the Hu and Liu sentiment dictionary ('Hu and Liu', 2004) <doi:10.1145/1014052.1014073> for determining sentiment. The scoring function is vectorized by document, and scores for multiple documents are computed in parallel via OpenMP'.
This R package provides power calculations via internal simulation methods. The package also provides a frontend to the now abandoned PBAT program (developed by Christoph Lange), and reads in the corresponding output and displays results and figures when appropriate. The license of this R package itself is GPL. However, to have the program interact with the PBAT program for some functionality of the R package, users must additionally obtain the PBAT program from Christoph Lange, and accept his license. Both the data analysis and power calculations have command line and graphical interfaces using tcltk.
Fits penalized linear mixed models that correct for unobserved confounding factors. plmmr infers and corrects for the presence of unobserved confounding effects such as population stratification and environmental heterogeneity. It then fits a linear model via penalized maximum likelihood. Originally designed for the multivariate analysis of single nucleotide polymorphisms (SNPs) measured in a genome-wide association study (GWAS), plmmr eliminates the need for subpopulation-specific analyses and post-analysis p-value adjustments. Functions for the appropriate processing of PLINK files are also supplied. For examples, see the package homepage. <https://pbreheny.github.io/plmmr/>.
This package provides functions for causal structure learning and causal inference using graphical models. The main algorithms for causal structure learning are PC (for observational data without hidden variables), FCI and RFCI (for observational data with hidden variables), and GIES (for a mix of data from observational studies (i.e. observational data) and data from experiments involving interventions (i.e. interventional data) without hidden variables). For causal inference the IDA algorithm, the Generalized Backdoor Criterion (GBC), the Generalized Adjustment Criterion (GAC) and some related functions are implemented. Functions for incorporating background knowledge are provided.
The SALSO algorithm is an efficient randomized greedy search method to find a point estimate for a random partition based on a loss function and posterior Monte Carlo samples. The algorithm is implemented for many loss functions, including the Binder loss and a generalization of the variation of information loss, both of which allow for unequal weights on the two types of clustering mistakes. Efficient implementations are also provided for Monte Carlo estimation of the posterior expected loss of a given clustering estimate. See Dahl, Johnson, Müller (2022) <doi:10.1080/10618600.2022.2069779>.
This package provides a tool that allows users to estimate tree height in the long-term forest experiments in Sweden. It utilizes the multilevel nonlinear mixed-effect height models developed for the forest experiments and consists of four functions for the main species, other conifer species, and other broadleaves. Each function within the system returns a data frame that includes the input data and the estimated heights for any missing values. Ogana et al. (2023) <doi:10.1016/j.foreco.2023.120843>\n Arias-Rodil et al. (2015) <doi:10.1371/JOURNAL.PONE.0143521>.
This package provides a collection of functions for Kronecker structured covariance estimation and testing under the array normal model. For estimation, maximum likelihood and Bayesian equivariant estimation procedures are implemented. For testing, a likelihood ratio testing procedure is available. This package also contains additional functions for manipulating and decomposing tensor data sets. This work was partially supported by NSF grant DMS-1505136. Details of the methods are described in Gerard and Hoff (2015) <doi:10.1016/j.jmva.2015.01.020> and Gerard and Hoff (2016) <doi:10.1016/j.laa.2016.04.033>.
The sqldf function is typically passed a single argument which is an SQL select statement where the table names are ordinary R data frame names. sqldf transparently sets up a database, imports the data frames into that database, performs the SQL statement and returns the result using a heuristic to determine which class to assign to each column of the returned data frame. The sqldf or read.csv.sql functions can also be used to read filtered files into R even if the original files are larger than R itself can handle.
The Biomarker Optimal Segmentation System R package, bossR', is designed for precision medicine, helping to identify individual traits using biomarkers. It focuses on determining the most effective cutoff value for a continuous biomarker, which is crucial for categorizing patients into two groups with distinctly different clinical outcomes. The package simultaneously finds the optimal cutoff from given candidate values and tests its significance. Simulation studies demonstrate that bossR offers statistical power and false positive control non-inferior to the permutation approach (considered the gold standard in this field), while being hundreds of times faster.
Simultaneously detect the number and locations of change points in piecewise linear models under stationary Gaussian noise allowing autocorrelated random noise. The core idea is to transform the problem of detecting change points into the detection of local extrema (local maxima and local minima)through kernel smoothing and differentiation of the data sequence, see Cheng et al. (2020) <doi:10.1214/20-EJS1751>. A low-computational and fast algorithm call dSTEM is introduced to detect change points based on the STEM algorithm in D. Cheng and A. Schwartzman (2017) <doi:10.1214/16-AOS1458>.
Inference functionalities for distributed-lag linear structural equation models (DLSEMs). DLSEMs are Markovian structural causal models where each factor of the joint probability distribution is a distributed-lag linear regression with constrained lag shapes (Magrini, 2018 <doi:10.2478/bile-2018-0012>; Magrini et al., 2019 <doi:10.1007/s11135-019-00855-z>). DLSEMs account for temporal delays in the dependence relationships among the variables through a single parameter per covariate, thus allowing to perform dynamic causal inference in a feasible fashion. Endpoint-constrained quadratic, quadratic decreasing, linearly decreasing and gamma lag shapes are available.
With the functions in this package you can check the validity of the following financial instrument identifiers: FIGI (Financial Instrument Global Identifier <https://www.openfigi.com/about/figi>), CUSIP (Committee on Uniform Security Identification Procedures <https://www.cusip.com/identifiers.html#/CUSIP>), ISIN (International Securities Identification Number <https://www.cusip.com/identifiers.html#/ISIN>), SEDOL (Stock Exchange Daily Official List <https://www2.lseg.com/SEDOL-masterfile-service-tech-guide-v8.6>). You can also calculate the FIGI checksum of 11-character strings, which can be useful if you want to create your own FIGI identifiers.
Fit penalized multivariable linear mixed models with a single random effect to control for population structure in genetic association studies. The goal is to simultaneously fit many genetic variants at the same time, in order to select markers that are independently associated with the response. Can also handle prior annotation information, for example, rare variants, in the form of variable weights. For more information, see the website below and the accompanying paper: Bhatnagar et al., "Simultaneous SNP selection and adjustment for population structure in high dimensional prediction models", 2020, <DOI:10.1371/journal.pgen.1008766>.