"Evolutionary Virtual Education" - evolved - provides multiple tools to help educators (especially at the graduate level or in advanced undergraduate level courses) apply inquiry-based learning in general evolution classes. In particular, the tools provided include functions that simulate evolutionary processes (e.g., genetic drift, natural selection within a single locus) or concepts (e.g. Hardy-Weinberg equilibrium, phylogenetic distribution of traits). More than only simulating, the package also provides tools for students to analyze (e.g., measuring, testing, visualizing) datasets with characteristics that are common to many fields related to evolutionary biology. Importantly, the package is heavily oriented towards providing tools for inquiry-based learning - where students follow scientific practices to actively construct knowledge. For additional details, see package's vignettes.
Deconvolution of thermal decay curves allows you to quantify proportions of biomass components in plant litter. Thermal decay curves derived from thermogravimetric analysis (TGA) are imported, modified, and then modelled in a three- or four- part mixture model using the Fraser-Suzuki function. The output is estimates for weights of pseudo-components corresponding to hemicellulose, cellulose, and lignin. For more information see: Müller-Hagedorn, M. and Bockhorn, H. (2007) <doi:10.1016/j.jaap.2006.12.008>, à rfão, J. J. M. and Figueiredo, J. L. (2001) <doi:10.1016/S0040-6031(01)00634-7>, and Yang, H. and Yan, R. and Chen, H. and Zheng, C. and Lee, D. H. and Liang, D. T. (2006) <doi:10.1021/ef0580117>.
This package provides a framework of tools to summarise, visualise, and explore longitudinal data. It builds upon the tidy time series data frames used in the tsibble package, and is designed to integrate within the tidyverse', and tidyverts (for time series) ecosystems. The methods implemented include calculating features for understanding longitudinal data, including calculating summary statistics such as quantiles, medians, and numeric ranges, sampling individual series, identifying individual series representative of a group, and extending the facet system in ggplot2 to facilitate exploration of samples of data. These methods are fully described in the paper "brolgar: An R package to Browse Over Longitudinal Data Graphically and Analytically in R", Nicholas Tierney, Dianne Cook, Tania Prvan (2020) <doi:10.32614/RJ-2022-023>.
The main purpose of this package is to make it easy for userR's
to interact with jMetrik
an open source application for psychometric analysis. For example it allows useR's
to write data frames to file in a format that can be used by jMetrik
'. It also allows useR's
to read *.jmetrik files (e.g. output from an analysis) for follow-up analysis in R. The *.jmetrik format is a flat file that includes a multiline header and the data as comma separated values. The header includes metadata about the file and one row per variable with the following information in each row: variable name, data type, item scoring, special data codes, and variable label.
In the generalized Roy model, the marginal treatment effect (MTE) can be used as a building block for constructing conventional causal parameters such as the average treatment effect (ATE) and the average treatment effect on the treated (ATT). Given a treatment selection equation and an outcome equation, the function mte()
estimates the MTE via the semiparametric local instrumental variables method or the normal selection model. The function mte_at()
evaluates MTE at different values of the latent resistance u with a given X = x, and the function mte_tilde_at()
evaluates MTE projected onto the estimated propensity score. The function ace()
estimates population-level average causal effects such as ATE, ATT, or the marginal policy relevant treatment effect.
Identification of the most appropriate pharmacotherapy for each patient based on genomic alterations is a major challenge in personalized oncology. PANACEA is a collection of personalized anti-cancer drug prioritization approaches utilizing network methods. The methods utilize personalized "driverness" scores from driveR
to rank drugs, mapping these onto a protein-protein interaction network. The "distance-based" method scores each drug based on these scores and distances between drugs and genes to rank given drugs. The "RWR" method propagates these scores via a random-walk with restart framework to rank the drugs. The methods are described in detail in Ulgen E, Ozisik O, Sezerman OU. 2023. PANACEA: network-based methods for pharmacotherapy prioritization in personalized oncology. Bioinformatics <doi:10.1093/bioinformatics/btad022>.
Stepwise regression is a statistical technique used for model selection. This package streamlines stepwise regression analysis by supporting multiple regression types, incorporating popular selection strategies, and offering essential metrics. It enables users to apply multiple selection strategies and metrics in a single function call, visualize variable selection processes, and export results in various formats. However, StepReg
should not be used for statistical inference unless the variable selection process is explicitly accounted for, as it can compromise the validity of the results. This limitation does not apply when StepReg
is used for prediction purposes. We validated StepReg's
accuracy using public datasets within the SAS software environment. Additionally, StepReg
features an interactive Shiny application to enhance usability and accessibility.
While data from randomized experiments remain the gold standard for causal inference, estimation of causal estimands from observational data is possible through various confounding adjustment methods. However, the challenge of unmeasured confounding remains a concern in causal inference, where failure to account for unmeasured confounders can lead to biased estimates of causal estimands. Sensitivity analysis within the framework of causal inference can help adjust for possible unmeasured confounding. In `causens`, three main methods are implemented: adjustment via sensitivity functions (Brumback, Hernán, Haneuse, and Robins (2004) <doi:10.1002/sim.1657> and Li, Shen, Wu, and Li (2011) <doi:10.1093/aje/kwr096>), Bayesian parametric modelling and Monte Carlo approaches (McCandless
, Lawrence C and Gustafson, Paul (2017) <doi:10.1002/sim.7298>).
Gumbel distribution functions (De Haan L. (2007) <doi:10.1007/0-387-34471-3>) implemented with the techniques of automatic differentiation (Griewank A. (2008) <isbn:978-0-89871-659-7>). With this tool, a user should be able to quickly model extreme events for which the Gumbel distribution is the domain of attraction. The package makes available the density function, the distribution function the quantile function and a random generating function. In addition, it supports gradient functions. The package combines Adept (C++ templated automatic differentiation) (Hogan R. (2017) <doi:10.5281/zenodo.1004730>) and Eigen (templated matrix-vector library) for fast computations of both objective functions and exact gradients. It relies on RcppEigen
for easy access to Eigen and bindings to R.
In gene-expression microarray studies, for example, one generally obtains a list of dozens or hundreds of genes that differ in expression between samples and then asks What does all of this mean biologically? Alternatively, gene lists can be derived conceptually in addition to experimentally. For instance, one might want to analyze a group of genes known as housekeeping genes. The work of the Gene Ontology (GO) Consortium <geneontology.org> provides a way to address that question. GO organizes genes into hierarchical categories based on biological process, molecular function and subcellular localization. The role of GoMiner
is to automate the mapping between a list of genes and GO, and to provide a statistical summary of the results as well as a visualization.
Techniques from a particular branch of spatial statistics,termed geographically-weighted (GW) models. GW models suit situations when data are not described well by some global model, but where there are spatial regions where a suitably localised calibration provides a better description. GWmodel includes functions to calibrate: GW summary statistics (Brunsdon et al., 2002)<doi: 10.1016/s0198-9715(01)00009-6>, GW principal components analysis (Harris et al., 2011)<doi: 10.1080/13658816.2011.554838>, GW discriminant analysis (Brunsdon et al., 2007)<doi: 10.1111/j.1538-4632.2007.00709.x> and various forms of GW regression (Brunsdon et al., 1996)<doi: 10.1111/j.1538-4632.1996.tb00936.x>; some of which are provided in basic and robust (outlier resistant) forms.
This package provides a set of functions designed to quickly generate results of a multiple choice test. Generates detailed global results, lists for anonymous feedback and personalised result feedback (in LaTeX
and/or PDF format), as well as item statistics like Cronbach's alpha or disciminatory power. klausuR
also includes a plugin for the R GUI and IDE RKWard, providing graphical dialogs for its basic features. The respective R package rkward cannot be installed directly from a repository, as it is a part of RKWard. To make full use of this feature, please install RKWard from <https://rkward.kde.org> (plugins are detected automatically). Due to some restrictions on CRAN, the full package sources are only available from the project homepage.
Real-time quantitative polymerase chain reaction (qPCR
) data sets by Lievens et al. (2012) <doi:10.1093/nar/gkr775>. Provides one single tabular tidy data set in long format, encompassing three dilution series, targeted against the soybean Lectin endogene. Each dilution series was assayed in one of the following PCR-efficiency-modifying conditions: no PCR inhibition, inhibition by isopropanol and inhibition by tannic acid. The inhibitors were co-diluted along with the dilution series. The co-dilution series consists of a five-point, five-fold serial dilution. For each concentration there are 18 replicates. Each amplification curve is 60 cycles long. Original raw data file is available at the Supplementary Data section at Nucleic Acids Research Online <doi:10.1093/nar/gkr775>.
An implementation of ranked sparsity methods, including penalized regression methods such as the sparsity-ranked lasso, its non-convex alternatives, and elastic net, as well as the sparsity-ranked Bayesian Information Criterion. As described in Peterson and Cavanaugh (2022) <doi:10.1007/s10182-021-00431-7>, ranked sparsity is a philosophy with methods primarily useful for variable selection in the presence of prior informational asymmetry, which occurs in the context of trying to perform variable selection in the presence of interactions and/or polynomials. Ultimately, this package attempts to facilitate dealing with cumbersome interactions and polynomials while not avoiding them entirely. Typically, models selected under ranked sparsity principles will also be more transparent, having fewer falsely selected interactions and polynomials than other methods.
Cell differentiation processes are achieved through a continuum of hierarchical intermediate cell-states that might be captured by single-cell RNA seq. Existing computational approaches for the assessment of cell-state hierarchies from single-cell data might be formalized under a general workflow composed of i) a metric to assess cell-to-cell similarities (combined or not with a dimensionality reduction step), and ii) a graph-building algorithm (optionally making use of a cells-clustering step). Sincell R package implements a methodological toolbox allowing flexible workflows under such framework. Furthermore, Sincell contributes new algorithms to provide cell-state hierarchies with statistical support while accounting for stochastic factors in single-cell RNA seq. Graphical representations and functional association tests are provided to interpret hierarchies.
This package provides a collection of datasets of human-computer interaction (HCI) experiments. Each dataset is from an HCI paper, with all fields described and the original publication linked. All paper authors of included data have consented to the inclusion of their data in this package. The datasets include data from a range of HCI studies, such as pointing tasks, user experience ratings, and steering tasks. Dataset sources: Bergström et al. (2022) <doi:10.1145/3490493>; Dalsgaard et al. (2021) <doi:10.1145/3489849.3489853>; Larsen et al. (2019) <doi:10.1145/3338286.3340115>; Lilija et al. (2019) <doi:10.1145/3290605.3300676>; Pohl and Murray-Smith (2013) <doi:10.1145/2470654.2481307>; Pohl and Mottelson (2022) <doi:10.3389/frvir.2022.719506>.
Facilitates spatial and general latent Gaussian modeling using integrated nested Laplace approximation via the INLA package (<https://www.r-inla.org>). Additionally, extends the GAM-like model class to more general nonlinear predictor expressions, and implements a log Gaussian Cox process likelihood for modeling univariate and spatial point processes based on ecological survey data. Model components are specified with general inputs and mapping methods to the latent variables, and the predictors are specified via general R expressions, with separate expressions for each observation likelihood model in multi-likelihood models. A prediction method based on fast Monte Carlo sampling allows posterior prediction of general expressions of the latent variables. Ecology-focused introduction in Bachl, Lindgren, Borchers, and Illian (2019) <doi:10.1111/2041-210X.13168>.
This package provides a Non-Metric Space Library ('NMSLIB <https://github.com/nmslib/nmslib>) wrapper, which according to the authors "is an efficient cross-platform similarity search library and a toolkit for evaluation of similarity search methods. The goal of the NMSLIB <https://github.com/nmslib/nmslib> Library is to create an effective and comprehensive toolkit for searching in generic non-metric spaces. Being comprehensive is important, because no single method is likely to be sufficient in all cases. Also note that exact solutions are hardly efficient in high dimensions and/or non-metric spaces. Hence, the main focus is on approximate methods". The wrapper also includes Approximate Kernel k-Nearest-Neighbor functions based on the NMSLIB <https://github.com/nmslib/nmslib> Python Library.
Generates artificial point patterns marked by their spatial and temporal signatures. The resulting point cloud may exhibit inherent interactions between both signatures. The simulation integrates microsimulation (Holm, E., (2017)<doi:10.1002/9781118786352.wbieg0320>) and agent-based models (Bonabeau, E., (2002)<doi:10.1073/pnas.082080899>), beginning with the configuration of movement characteristics for the specified agents (referred to as walkers') and their interactions within the simulation environment. These interactions (Quaglietta, L. and Porto, M., (2019)<doi:10.1186/s40462-019-0154-8>) result in specific spatiotemporal patterns that can be visualized, analyzed, and used for various analytical purposes. Given the growing scarcity of detailed spatiotemporal data across many domains, this package provides an alternative data source for applications in social and life sciences.
This package provides a comprehensive analysis tool for metabolomics data. It consists a variety of functional modules, including several new modules: a pre-processing module for normalization and imputation, an exploratory data analysis module for dimension reduction and source of variation analysis, a classification module with the new deep-learning method and other machine-learning methods, a prognosis module with cox-PH and neural-network based Cox-nnet methods, and pathway analysis module to visualize the pathway and interpret metabolite-pathway relationships. References: H. Paul Benton <http://www.metabolomics-forum.com/index.php?topic=281.0> Jeff Xia <https://github.com/cangfengzhe/Metabo/blob/master/MetaboAnalyst/website/name_match.R>
Travers Ching, Xun Zhu, Lana X. Garmire (2018) <doi:10.1371/journal.pcbi.1006076>.
This package provides a general regression neural network (GRNN) is a variant of a Radial Basis Function Network characterized by a fast single-pass learning. tsfgrnn allows you to forecast time series using a GRNN model Francisco Martinez et al. (2019) <doi:10.1007/978-3-030-20521-8_17> and Francisco Martinez et al. (2022) <doi:10.1016/j.neucom.2021.12.028>. When the forecasting horizon is higher than 1, two multi-step ahead forecasting strategies can be used. The model built is autoregressive, that is, it is only based on the observations of the time series. You can consult and plot how the prediction was done. It is also possible to assess the forecasting accuracy of the model using rolling origin evaluation.
This package implements a low dimensional visualization of a set of cytometry samples, in order to visually assess the distances between them. This, in turn, can greatly help the user to identify quality issues like batch effects or outlier samples, and/or check the presence of potential sample clusters that might align with the exeprimental design. The CytoMDS
algorithm combines, on the one hand, the concept of Earth Mover's Distance (EMD), a.k.a. Wasserstein metric and, on the other hand, the Multi Dimensional Scaling (MDS) algorithm for the low dimensional projection. Also, the package provides some diagnostic tools for both checking the quality of the MDS projection, as well as tools to help with the interpretation of the axes of the projection.
It makes the creation of networks from sequences of RNA, with this is done the abstraction of characteristics of these networks with a methodology of threshold for the purpose of making a classification between the classes of the sequences. There are four data present in the BASiNET
package, "sequences", "sequences2", "sequences-predict" and "sequences2-predict" with 11, 10, 11 and 11 sequences respectively. These sequences were taken from the data set used in the article (LI, Aimin; ZHANG, Junying; ZHOU, Zhongyin, 2014) <doi:10.1186/1471-2105-15-311>, these sequences are used to run examples. The BASiNET
was published on Nucleic Acids Research, (ITO, Eric; KATAHIRA, Isaque; VICENTE, Fábio; PEREIRA, Felipe; LOPES, Fabrà cio, 2018) <doi:10.1093/nar/gky462>.
Streamlines Quarto workflows by providing tools for consistent project setup and documentation. Enables portability through reusable metadata, automated project structure creation, and standardized templates. Features include enhanced project initialization, pre-formatted Quarto documents, comprehensive data protection settings, custom styling, and structured documentation generation. Designed to improve efficiency and collaboration in R data science projects by reducing repetitive setup tasks while maintaining consistent formatting across multiple documents. There are many valuable resources providing in-depth explanations of customizing Quarto templates and theme styling by the Posit team: <https://quarto.org/docs/output-formats/html-themes.html#customizing-themes> & <https://quarto.org/docs/output-formats/html-themes-more.html>, and at the Bootstrap community's GitHub
at <https://github.com/twbs/bootstrap/blob/main/scss/_variables.scss>.