Fits or generalized linear models either a regression with Autoregressive moving-average (ARMA) errors for time series data. The package makes it easy to incorporate constraints into the model's coefficients. The model is specified by an objective function (Gaussian, Binomial or Poisson) or an ARMA order (p,q), a vector of bound constraints for the coefficients (i.e beta1 > 0) and the possibility to incorporate restrictions among coefficients (i.e beta1 > beta2). The references of this packages are the same as stats package for glm()
and arima()
functions. See Brockwell, P. J. and Davis, R. A. (1996, ISBN-10: 9783319298528). For the different optimizers implemented, it is recommended to consult the documentation of the corresponding packages.
An interface to DifferentialEquations.jl
<https://diffeq.sciml.ai/dev/> from the R programming language. It has unique high performance methods for solving ordinary differential equations (ODE), stochastic differential equations (SDE), delay differential equations (DDE), differential-algebraic equations (DAE), and more. Much of the functionality, including features like adaptive time stepping in SDEs, are unique and allow for multiple orders of magnitude speedup over more common methods. Supports GPUs, with support for CUDA (NVIDIA), AMD GPUs, Intel oneAPI
GPUs, and Apple's Metal (M-series chip GPUs). diffeqr attaches an R interface onto the package, allowing seamless use of this tooling by R users. For more information, see Rackauckas and Nie (2017) <doi:10.5334/jors.151>.
Converts TXT and XML data curated by the United States Patent and Trademark Office (USPTO). Allows conversion of bulk data after downloading directly from the USPTO bulk data website, eliminating need for users to wrangle multiple data formats to get large patent databases in tidy, rectangular format. Data details can be found on the USPTO website <https://bulkdata.uspto.gov/>. Currently, all 3 formats: 1. TXT data (1976-2001); 2. XML format 1 data (2002-2004); and 3. XML format 2 data (2005-current) can be converted to rectangular, CSV format. Relevant literature that uses data from USPTO includes Wada (2020) <doi:10.1007/s11192-020-03674-4> and Plaza & Albert (2008) <doi:10.1007/s11192-007-1763-3>.
Implementation of the remote effects spatial process (RESP) model for teleconnection. The RESP model is a geostatistical model that allows a spatially-referenced variable (like average precipitation) to be influenced by covariates defined on a remote domain (like sea surface temperatures). The RESP model is introduced in Hewitt et al. (2018) <doi:10.1002/env.2523>. Sample code for working with the RESP model is available at <https://jmhewitt.github.io/research/resp_example>. This material is based upon work supported by the National Science Foundation under grant number AGS 1419558. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
Intuitive framework for identifying spatially variable genes (SVGs) via edgeR
, a popular method for performing differential expression analyses. Based on pre-annotated spatial clusters as summarized spatial information, DESpace models gene expression using a negative binomial (NB), via edgeR
, with spatial clusters as covariates. SVGs are then identified by testing the significance of spatial clusters. The method is flexible and robust, and is faster than the most SV methods. Furthermore, to the best of our knowledge, it is the only SV approach that allows: - performing a SV test on each individual spatial cluster, hence identifying the key regions of the tissue affected by spatial variability; - jointly fitting multiple samples, targeting genes with consistent spatial patterns across replicates.
This package provides an exact Goodness-of-Fit test for multinomial data with fixed probabilities. It can be used to determine whether a set of counts fits a given expected ratio. To see whether a set of observed counts fits an expectation, one can examine all possible outcomes with xmulti()
or a random sample of them with xmonte()
and find the probability of an observation deviating from the expectation by at least as much as the observed. As a measure of deviation from the expected, one can use the log-likelihood ratio, the multinomial probability, or the classic chi-square statistic. A histogram of the test statistic can also be plotted and compared with the asymptotic curve.
For spatial data analysis; provides exploratory spatial analysis tools, spatial regression, spatial econometric, and disease mapping models, model diagnostics, and special methods for inference with small area survey data (e.g., the America Community Survey (ACS)) and censored population health monitoring data. Models are pre-specified using the Stan programming language, a platform for Bayesian inference using Markov chain Monte Carlo (MCMC). References: Carpenter et al. (2017) <doi:10.18637/jss.v076.i01>; Donegan (2021) <doi:10.31219/osf.io/3ey65>; Donegan (2022) <doi:10.21105/joss.04716>; Donegan, Chun and Hughes (2020) <doi:10.1016/j.spasta.2020.100450>; Donegan, Chun and Griffith (2021) <doi:10.3390/ijerph18136856>; Morris et al. (2019) <doi:10.1016/j.sste.2019.100301>.
This package provides a collection of functions for estimating spatial and spatio-temporal regression models. Moran eigenvectors are used as spatial basis functions to efficiently approximate spatially dependent Gaussian processes (i.e., random effects eigenvector spatial filtering; see Murakami and Griffith 2015 <doi: 10.1007/s10109-015-0213-7>). The implemented models include linear regression with residual spatial dependence, spatially/spatio-temporally varying coefficient models (Murakami et al., 2017, 2024; <doi:10.1016/j.spasta.2016.12.001>,<doi:10.48550/arXiv.2410.07229>
), spatially filtered unconditional quantile regression (Murakami and Seya, 2019 <doi:10.1002/env.2556>), Gaussian and non-Gaussian spatial mixed models through compositionally-warping (Murakami et al. 2021, <doi:10.1016/j.spasta.2021.100520>).
SEM Trees and SEM Forests -- an extension of model-based decision trees and forests to Structural Equation Models (SEM). SEM trees hierarchically split empirical data into homogeneous groups each sharing similar data patterns with respect to a SEM by recursively selecting optimal predictors of these differences. SEM forests are an extension of SEM trees. They are ensembles of SEM trees each built on a random sample of the original data. By aggregating over a forest, we obtain measures of variable importance that are more robust than measures from single trees. A description of the method was published by Brandmaier, von Oertzen, McArdle
, & Lindenberger (2013) <doi:10.1037/a0030001> and Arnold, Voelkle, & Brandmaier (2020) <doi:10.3389/fpsyg.2020.564403>.
This package provides raw files recorded on different Liquid Chromatography Mass Spectrometry (LC-MS) instruments. All included MS instruments are manufactured by Thermo Fisher Scientific and belong to the Orbitrap Tribrid or Q Exactive Orbitrap family of instruments. Despite their common origin and shared hardware components, e.g., Orbitrap mass analyser, the above instruments tend to write data in different "dialects" in a shared binary file format (.raw). The intention behind tartare is to provide complex but slim real-world files that can be used to make code robust with respect to this diversity. In other words, it is intended for enhanced unit testing. The package is considered to be used with the rawrr package and the Spectra MsBackends
.
This package provides visualizations for SHAP (SHapley Additive exPlanations) such as waterfall plots, force plots, various types of importance plots, dependence plots, and interaction plots. These plots act on a shapviz
object created from a matrix of SHAP values and a corresponding feature dataset. Wrappers for the R packages xgboost
, lightgbm
, fastshap
, shapr
, h2o
, treeshap
, DALEX
, and kernelshap
are added for convenience. By separating visualization and computation, it is possible to display factor variables in graphs, even if the SHAP values are calculated by a model that requires numerical features. The plots are inspired by those provided by the shap
package in Python, but there is no dependency on it.
This package provides functions for computing and visualizing generalized canonical discriminant analyses and canonical correlation analysis for a multivariate linear model. Traditional canonical discriminant analysis is restricted to a one-way MANOVA design and is equivalent to canonical correlation analysis between a set of quantitative response variables and a set of dummy variables coded from the factor variable. The candisc package generalizes this to higher-way MANOVA designs for all factors in a multivariate linear model, computing canonical scores and vectors for each term. The graphic functions provide low-rank (1D, 2D, 3D) visualizations of terms in an mlm via the plot.candisc and heplot.candisc methods. Related plots are now provided for canonical correlation analysis when all predictors are quantitative.
This package provides functions for building cognitive maps based on qualitative data. Inputs are textual sources (articles, transcription of qualitative interviews of agents,...). These sources have been coded using relations and are linked to (i) a table describing the variables (or concepts) used for the coding and (ii) a table describing the sources (typology of agents, ...). Main outputs are Individual Cognitive Maps (ICM), Social Cognitive Maps (all sources or group of sources) and a list of quotes linked to relations. This package is linked to the work done during the PhD
of Frederic M. Vanwindekens (CRA-W / UCL) hold the 13 of May 2014 at University of Louvain in collaboration with the Walloon Agricultural Research Centre (project MIMOSA, MOERMAN fund).
"Evolutionary Virtual Education" - evolved - provides multiple tools to help educators (especially at the graduate level or in advanced undergraduate level courses) apply inquiry-based learning in general evolution classes. In particular, the tools provided include functions that simulate evolutionary processes (e.g., genetic drift, natural selection within a single locus) or concepts (e.g. Hardy-Weinberg equilibrium, phylogenetic distribution of traits). More than only simulating, the package also provides tools for students to analyze (e.g., measuring, testing, visualizing) datasets with characteristics that are common to many fields related to evolutionary biology. Importantly, the package is heavily oriented towards providing tools for inquiry-based learning - where students follow scientific practices to actively construct knowledge. For additional details, see package's vignettes.
Deconvolution of thermal decay curves allows you to quantify proportions of biomass components in plant litter. Thermal decay curves derived from thermogravimetric analysis (TGA) are imported, modified, and then modelled in a three- or four- part mixture model using the Fraser-Suzuki function. The output is estimates for weights of pseudo-components corresponding to hemicellulose, cellulose, and lignin. For more information see: Müller-Hagedorn, M. and Bockhorn, H. (2007) <doi:10.1016/j.jaap.2006.12.008>, à rfão, J. J. M. and Figueiredo, J. L. (2001) <doi:10.1016/S0040-6031(01)00634-7>, and Yang, H. and Yan, R. and Chen, H. and Zheng, C. and Lee, D. H. and Liang, D. T. (2006) <doi:10.1021/ef0580117>.
This package provides a framework of tools to summarise, visualise, and explore longitudinal data. It builds upon the tidy time series data frames used in the tsibble package, and is designed to integrate within the tidyverse', and tidyverts (for time series) ecosystems. The methods implemented include calculating features for understanding longitudinal data, including calculating summary statistics such as quantiles, medians, and numeric ranges, sampling individual series, identifying individual series representative of a group, and extending the facet system in ggplot2 to facilitate exploration of samples of data. These methods are fully described in the paper "brolgar: An R package to Browse Over Longitudinal Data Graphically and Analytically in R", Nicholas Tierney, Dianne Cook, Tania Prvan (2020) <doi:10.32614/RJ-2022-023>.
The main purpose of this package is to make it easy for userR's
to interact with jMetrik
an open source application for psychometric analysis. For example it allows useR's
to write data frames to file in a format that can be used by jMetrik
'. It also allows useR's
to read *.jmetrik files (e.g. output from an analysis) for follow-up analysis in R. The *.jmetrik format is a flat file that includes a multiline header and the data as comma separated values. The header includes metadata about the file and one row per variable with the following information in each row: variable name, data type, item scoring, special data codes, and variable label.
In the generalized Roy model, the marginal treatment effect (MTE) can be used as a building block for constructing conventional causal parameters such as the average treatment effect (ATE) and the average treatment effect on the treated (ATT). Given a treatment selection equation and an outcome equation, the function mte()
estimates the MTE via the semiparametric local instrumental variables method or the normal selection model. The function mte_at()
evaluates MTE at different values of the latent resistance u with a given X = x, and the function mte_tilde_at()
evaluates MTE projected onto the estimated propensity score. The function ace()
estimates population-level average causal effects such as ATE, ATT, or the marginal policy relevant treatment effect.
Identification of the most appropriate pharmacotherapy for each patient based on genomic alterations is a major challenge in personalized oncology. PANACEA is a collection of personalized anti-cancer drug prioritization approaches utilizing network methods. The methods utilize personalized "driverness" scores from driveR
to rank drugs, mapping these onto a protein-protein interaction network. The "distance-based" method scores each drug based on these scores and distances between drugs and genes to rank given drugs. The "RWR" method propagates these scores via a random-walk with restart framework to rank the drugs. The methods are described in detail in Ulgen E, Ozisik O, Sezerman OU. 2023. PANACEA: network-based methods for pharmacotherapy prioritization in personalized oncology. Bioinformatics <doi:10.1093/bioinformatics/btad022>.
Stepwise regression is a statistical technique used for model selection. This package streamlines stepwise regression analysis by supporting multiple regression types, incorporating popular selection strategies, and offering essential metrics. It enables users to apply multiple selection strategies and metrics in a single function call, visualize variable selection processes, and export results in various formats. However, StepReg
should not be used for statistical inference unless the variable selection process is explicitly accounted for, as it can compromise the validity of the results. This limitation does not apply when StepReg
is used for prediction purposes. We validated StepReg's
accuracy using public datasets within the SAS software environment. Additionally, StepReg
features an interactive Shiny application to enhance usability and accessibility.
Gumbel distribution functions (De Haan L. (2007) <doi:10.1007/0-387-34471-3>) implemented with the techniques of automatic differentiation (Griewank A. (2008) <isbn:978-0-89871-659-7>). With this tool, a user should be able to quickly model extreme events for which the Gumbel distribution is the domain of attraction. The package makes available the density function, the distribution function the quantile function and a random generating function. In addition, it supports gradient functions. The package combines Adept (C++ templated automatic differentiation) (Hogan R. (2017) <doi:10.5281/zenodo.1004730>) and Eigen (templated matrix-vector library) for fast computations of both objective functions and exact gradients. It relies on RcppEigen
for easy access to Eigen and bindings to R.
In gene-expression microarray studies, for example, one generally obtains a list of dozens or hundreds of genes that differ in expression between samples and then asks What does all of this mean biologically? Alternatively, gene lists can be derived conceptually in addition to experimentally. For instance, one might want to analyze a group of genes known as housekeeping genes. The work of the Gene Ontology (GO) Consortium <geneontology.org> provides a way to address that question. GO organizes genes into hierarchical categories based on biological process, molecular function and subcellular localization. The role of GoMiner
is to automate the mapping between a list of genes and GO, and to provide a statistical summary of the results as well as a visualization.
Techniques from a particular branch of spatial statistics,termed geographically-weighted (GW) models. GW models suit situations when data are not described well by some global model, but where there are spatial regions where a suitably localised calibration provides a better description. GWmodel includes functions to calibrate: GW summary statistics (Brunsdon et al., 2002)<doi: 10.1016/s0198-9715(01)00009-6>, GW principal components analysis (Harris et al., 2011)<doi: 10.1080/13658816.2011.554838>, GW discriminant analysis (Brunsdon et al., 2007)<doi: 10.1111/j.1538-4632.2007.00709.x> and various forms of GW regression (Brunsdon et al., 1996)<doi: 10.1111/j.1538-4632.1996.tb00936.x>; some of which are provided in basic and robust (outlier resistant) forms.
This package provides a set of functions designed to quickly generate results of a multiple choice test. Generates detailed global results, lists for anonymous feedback and personalised result feedback (in LaTeX
and/or PDF format), as well as item statistics like Cronbach's alpha or disciminatory power. klausuR
also includes a plugin for the R GUI and IDE RKWard, providing graphical dialogs for its basic features. The respective R package rkward cannot be installed directly from a repository, as it is a part of RKWard. To make full use of this feature, please install RKWard from <https://rkward.kde.org> (plugins are detected automatically). Due to some restrictions on CRAN, the full package sources are only available from the project homepage.