This package provides a collection of R functions that are widely used by the Petersen Lab. Included are functions for various purposes, including evaluating the accuracy of judgments and predictions, performing scoring of assessments, generating correlation matrices, conversion of data between various types, data management, psychometric evaluation, extensions related to latent variable modeling, various plotting capabilities, and other miscellaneous useful functions. By making the package available, we hope to make our methods reproducible and replicable by others and to help others perform their data processing and analysis methods more easily and efficiently. The codebase is provided in Petersen (2025) <doi:10.5281/zenodo.7602890> and on CRAN': <doi: 10.32614/CRAN.package.petersenlab>. The package is described in "Principles of Psychological Assessment: With Applied Examples in R" (Petersen, 2024, 2025a) <doi:10.1201/9781003357421>, <doi:10.25820/work.007199>, <doi:10.5281/zenodo.6466589> and in "Fantasy Football Analytics: Statistics, Prediction, and Empiricism Using R" (Petersen, 2025b).
This is an easy-to-use package for downloading, organizing, and integrative analyzing RNA expression data in GDC with an emphasis on deciphering the lncRNA-mRNA related ceRNA regulatory network in cancer. Three databases of lncRNA-miRNA interactions including spongeScan, starBase, and miRcode, as well as three databases of mRNA-miRNA interactions including miRTarBase, starBase, and miRcode are incorporated into the package for ceRNAs network construction. limma, edgeR, and DESeq2 can be used to identify differentially expressed genes/miRNAs. Functional enrichment analyses including GO, KEGG, and DO can be performed based on the clusterProfiler and DO packages. Both univariate CoxPH and KM survival analyses of multiple genes can be implemented in the package. Besides some routine visualization functions such as volcano plot, bar plot, and KM plot, a few simply shiny apps are developed to facilitate visualization of results on a local webpage.
This package performs the permutation test using difference in the restricted mean survival time (RMST) between groups as a summary measure of the survival time distribution. When the sample size is less than 50 per group, it has been shown that there is non-negligible inflation of the type I error rate in the commonly used asymptotic test for the RMST comparison. Generally, permutation tests can be useful in such a situation. However, when we apply the permutation test for the RMST comparison, particularly in small sample situations, there are some cases where the survival function in either group cannot be defined due to censoring in the permutation process. Horiguchi and Uno (2020) <doi:10.1002/sim.8565> have examined six workable solutions to handle this numerical issue. It performs permutation tests with implementation of the six methods outlined in the paper when the numerical issue arises during the permutation process. The result of the asymptotic test is also provided for a reference.
Access Datastream content through <https://product.datastream.com/dswsclient/Docs/Default.aspx>., our historical financial database with over 35 million individual instruments or indicators across all major asset classes, including over 19 million active economic indicators. It features 120 years of data, across 175 countries â the information you need to interpret market trends, economic cycles, and the impact of world events. Data spans bond indices, bonds, commodities, convertibles, credit default swaps, derivatives, economics, energy, equities, equity indices, ESG, estimates, exchange rates, fixed income, funds, fundamentals, interest rates, and investment trusts. Unique content includes I/B/E/S Estimates, Worldscope Fundamentals, point-in-time data, and Reuters Polls. Alongside the content, sit a set of powerful analytical tools for exploring relationships between different asset types, with a library of customizable analytical functions. In-house timeseries can also be uploaded using the package to comingle with Datastream maintained datasets, use with these analytical tools and displayed in Datastreamâ s flexible charting facilities in Microsoft Office.
This package provides a data clustering package based on admixture ratios (Q matrix) of population structure. The framework is based on iterative Pruning procedure that performs data clustering by splitting a given population into subclusters until meeting the condition of stopping criteria the same as ipPCA, iNJclust, and IPCAPS frameworks. The package also provides a function to retrieve phylogeny tree that construct a neighbor-joining tree based on a similar matrix between clusters. By given multiple Q matrices with varying a number of ancestors (K), the framework define a similar value between clusters i,j as a minimum number K* that makes majority of members of two clusters are in the different clusters. This K* reflexes a minimum number of ancestors we need to splitting cluster i,j into different clusters if we assign K* clusters based on maximum admixture ratio of individuals. The publication of this package is at Chainarong Amornbunchornvej, Pongsakorn Wangkumhang, and Sissades Tongsima (2020) <doi:10.1101/2020.03.21.001206>.
Fits single-species, multi-species, and integrated non-spatial and spatial occupancy models using Markov Chain Monte Carlo (MCMC). Models are fit using Polya-Gamma data augmentation detailed in Polson, Scott, and Windle (2013) <doi:10.1080/01621459.2013.829001>. Spatial models are fit using either Gaussian processes or Nearest Neighbor Gaussian Processes (NNGP) for large spatial datasets. Details on NNGP models are given in Datta, Banerjee, Finley, and Gelfand (2016) <doi:10.1080/01621459.2015.1044091> and Finley, Datta, and Banerjee (2022) <doi:10.18637/jss.v103.i05>. Provides functionality for data integration of multiple single-species occupancy data sets using a joint likelihood framework. Details on data integration are given in Miller, Pacifici, Sanderlin, and Reich (2019) <doi:10.1111/2041-210X.13110>. Details on single-species and multi-species models are found in MacKenzie, Nichols, Lachman, Droege, Royle, and Langtimm (2002) <doi:10.1890/0012-9658(2002)083[2248:ESORWD]2.0.CO;2> and Dorazio and Royle <doi:10.1198/016214505000000015>, respectively.
An implementation for the multi-task Gaussian processes with common mean framework. Two main algorithms, called Magma and MagmaClust', are available to perform predictions for supervised learning problems, in particular for time series or any functional/continuous data applications. The corresponding articles has been respectively proposed by Arthur Leroy, Pierre Latouche, Benjamin Guedj and Servane Gey (2022) <doi:10.1007/s10994-022-06172-1>, and Arthur Leroy, Pierre Latouche, Benjamin Guedj and Servane Gey (2023) <https://jmlr.org/papers/v24/20-1321.html>. Theses approaches leverage the learning of cluster-specific mean processes, which are common across similar tasks, to provide enhanced prediction performances (even far from data) at a linear computational cost (in the number of tasks). MagmaClust is a generalisation of Magma where the tasks are simultaneously clustered into groups, each being associated to a specific mean process. User-oriented functions in the package are decomposed into training, prediction and plotting functions. Some basic features (classic kernels, training, prediction) of standard Gaussian processes are also implemented.
This package performs one-way tests in independent groups designs including homoscedastic and heteroscedastic tests. These are one-way analysis of variance (ANOVA), Welch's heteroscedastic F test, Welch's heteroscedastic F test with trimmed means and Winsorized variances, Brown-Forsythe test, Alexander-Govern test, James second order test, Kruskal-Wallis test, Scott-Smith test, Box F test, Johansen F test, Generalized tests equivalent to Parametric Bootstrap and Fiducial tests, Alvandi's F test, Alvandi's generalized p-value, approximate F test, B square test, Cochran test, Weerahandi's generalized F test, modified Brown-Forsythe test, adjusted Welch's heteroscedastic F test, Welch-Aspin test, Permutation F test. The package performs pairwise comparisons and graphical approaches. Also, the package includes Student's t test, Welch's t test and Mann-Whitney U test for two samples. Moreover, it assesses variance homogeneity and normality of data in each group via tests and plots (Dag et al., 2018, <https://journal.r-project.org/archive/2018/RJ-2018-022/RJ-2018-022.pdf>).
The purpose of this library is to compute the optimal charging cost function for a electric vehicle (EV). It is well known that the charging function of a EV is a concave function that can be approximated by a piece-wise linear function, so bigger the state of charge, slower the charging process is. Moreover, the other important function is the one that gives the electricity price. This function is usually step-wise, since depending on the time of the day, the price of the electricity is different. Then, the problem of charging an EV to a certain state of charge is not trivial. This library implements an algorithm to compute the optimal charging cost function, that is, it plots for a given state of charge r (between 0 and 1) the minimum cost we need to pay in order to charge the EV to that state of charge r. The details of the algorithm are described in González-Rodrà guez et at (2023) <https://inria.hal.science/hal-04362876v1>.
This package provides a quantitative and automated tool to extract (palaeo)biological information (i.e., measurements, velocities, similarity metrics, etc.) from the analysis of tetrapod trackways. Methods implemented in the package draw from several sources, including Alexander (1976) <doi:10.1038/261129a0>, Batschelet (1981, ISBN:9780120810505), Benhamou (2004) <doi:10.1016/j.jtbi.2004.03.016>, Bovet and Benhamou (1988) <doi:10.1016/S0022-5193(88)80038-9>, Cheung et al. (2007) <doi:10.1007/s00422-007-0158-0>, Cheung et al. (2008) <doi:10.1007/s00422-008-0251-z>, Cleasby et al. (2019) <doi:10.1007/s00265-019-2761-1>, Farlow et al. (1981) <doi:10.1038/294747a0>, Ostrom (1972) <doi:10.1016/0031-0182(72)90049-1>, Rohlf (2008) <https://sbmorphometrics.org/>, Rohlf (2009) <https://sbmorphometrics.org/>, Ruiz and Torices (2013) <doi:10.1080/10420940.2012.759115>, Scrucca et al. (2016) <doi:10.32614/RJ-2016-021>, Thulborn and Wade (1984) <https://www.museum.qld.gov.au/collections-and-research/memoirs/nature-21/mqm-n21-2-11-thulborn-wade>.
This package provides a framework for dynamically combining forecasting models for time series forecasting predictive tasks. It leverages machine learning models from other packages to automatically combine expert advice using metalearning and other state-of-the-art forecasting combination approaches. The predictive methods receive a data matrix as input, representing an embedded time series, and return a predictive ensemble model. The ensemble use generic functions predict() and forecast() to forecast future values of the time series. Moreover, an ensemble can be updated using methods, such as update_weights() or update_base_models()'. A complete description of the methods can be found in: Cerqueira, V., Torgo, L., Pinto, F., and Soares, C. "Arbitrated Ensemble for Time Series Forecasting." to appear at: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer International Publishing, 2017; and Cerqueira, V., Torgo, L., and Soares, C.: "Arbitrated Ensemble for Solar Radiation Forecasting." International Work-Conference on Artificial Neural Networks. Springer, 2017 <doi:10.1007/978-3-319-59153-7_62>.
This package extends sparse matrix and vector classes from the Matrix package by providing:
Methods and operators that work natively on CSR formats (compressed sparse row, a.k.a.
RsparseMatrix) such as slicing/sub-setting, assignment,rbind(), mathematical operators for CSR and COO such as addition orsqrt(), and methods such asdiag();Multi-threaded matrix multiplication and cross-product for many
<sparse, dense>types, including thefloat32type fromfloat;Coercion methods between pairs of classes which are not present in
Matrix, such as fromdgCMatrixtongRMatrix, as well as convenience conversion functions;Utility functions for sparse matrices such as sorting the indices or removing zero-valued entries;
Fast transposes that work by outputting in the opposite storage format;
Faster replacements for many
Matrixmethods for all sparse types, such as slicing and elementwise multiplication.Convenience functions for sparse objects, such as
mapSparseor a shortershowmethod.
Statistical hypothesis testing of pattern heterogeneity via differences in underlying distributions across multiple contingency tables. Five tests are included: the comparative chi-squared test (Song et al. 2014) <doi:10.1093/nar/gku086> (Zhang et al. 2015) <doi:10.1093/nar/gkv358>, the Sharma-Song test (Sharma et al. 2021) <doi:10.1093/bioinformatics/btab240>, the heterogeneity test, the marginal-change test (Sharma et al. 2020) <doi:10.1145/3388440.3412485>, and the strength test (Sharma et al. 2020) <doi:10.1145/3388440.3412485>. Under the null hypothesis that row and column variables are statistically independent and joint distributions are equal, their test statistics all follow an asymptotically chi-squared distribution. A comprehensive type analysis categorizes the relation among the contingency tables into type null, 0, 1, and 2 (Sharma et al. 2020) <doi:10.1145/3388440.3412485>. They can identify heterogeneous patterns that differ in either the first order (marginal) or the second order (differential departure from independence). Second-order differences reveal more fundamental changes than first-order differences across heterogeneous patterns.
Perform non-bipartite matching and matched randomization. A "bipartite" matching utilizes two separate groups, e.g. smokers being matched to nonsmokers or cases being matched to controls. A "non-bipartite" matching creates mates from one big group, e.g. 100 hospitals being randomized for a two-arm cluster randomized trial or 5000 children who have been exposed to various levels of secondhand smoke and are being paired to form a greater exposure vs. lesser exposure comparison. At the core of a non-bipartite matching is a N x N distance matrix for N potential mates. The distance between two units expresses a measure of similarity or quality as mates (the lower the better). The gendistance() and distancematrix() functions assist in creating this. The nonbimatch() function creates the matching that minimizes the total sum of distances between mates; hence, it is referred to as an "optimal" matching. The assign.grp() function aids in performing a matched randomization. Note bipartite matching can be performed using the prevent option in gendistance()'.
geneXtendeR optimizes the functional annotation of ChIP-seq peaks by exploring relative differences in annotating ChIP-seq peak sets to variable-length gene bodies. In contrast to prior techniques, geneXtendeR considers peak annotations beyond just the closest gene, allowing users to see peak summary statistics for the first-closest gene, second-closest gene, ..., n-closest gene whilst ranking the output according to biologically relevant events and iteratively comparing the fidelity of peak-to-gene overlap across a user-defined range of upstream and downstream extensions on the original boundaries of each gene's coordinates. Since different ChIP-seq peak callers produce different differentially enriched peaks with a large variance in peak length distribution and total peak count, annotating peak lists with their nearest genes can often be a noisy process. As such, the goal of geneXtendeR is to robustly link differentially enriched peaks with their respective genes, thereby aiding experimental follow-up and validation in designing primers for a set of prospective gene candidates during qPCR.
Researchers often use the bootstrap to understand a sample drawn from a population with unknown distribution. The exact bootstrap method is a practical tool for exploring the distribution of small sample size data. For a sample of size n, the exact bootstrap method generates the entire space of n to the power of n resamples and calculates all realizations of the selected statistic. The exactamente package includes functions for implementing two bootstrap methods, the exact bootstrap and the regular bootstrap. The exact_bootstrap() function applies the exact bootstrap method following methodologies outlined in Kisielinska (2013) <doi:10.1007/s00180-012-0350-0>. The regular_bootstrap() function offers a more traditional bootstrap approach, where users can determine the number of resamples. The e_vs_r() function allows users to directly compare results from these bootstrap methods. To augment user experience, exactamente includes the function exactamente_app() which launches an interactive shiny web application. This application facilitates exploration and comparison of the bootstrap methods, providing options for modifying various parameters and visualizing results.
We included functions to assess the performance of risk models. The package contains functions for the various measures that are used in empirical studies, including univariate and multivariate odds ratios (OR) of the predictors, the c-statistic (or area under the receiver operating characteristic (ROC) curve (AUC)), Hosmer-Lemeshow goodness of fit test, reclassification table, net reclassification improvement (NRI) and integrated discrimination improvement (IDI). Also included are functions to create plots, such as risk distributions, ROC curves, calibration plot, discrimination box plot and predictiveness curves. In addition to functions to assess the performance of risk models, the package includes functions to obtain weighted and unweighted risk scores as well as predicted risks using logistic regression analysis. These logistic regression functions are specifically written for models that include genetic variables, but they can also be applied to models that are based on non-genetic risk factors only. Finally, the package includes function to construct a simulated dataset with genotypes, genetic risks, and disease status for a hypothetical population, which is used for the evaluation of genetic risk models.
This package provides functions which facilitate harmonization of data from multiple different datasets. Data harmonization involves taking data sources with differing values, creating coding instructions to create a harmonized set of values, then making those data modifications. psHarmonize will assist with data modification once the harmonization instructions are written. Coding instructions are written by the user to create a "harmonization sheet". This sheet catalogs variable names, domains (e.g. clinical, behavioral, outcomes), provides R code instructions for mapping or conversion of data, specifies the variable name in the harmonized data set, and tracks notes. The package will then harmonize the source datasets according to the harmonization sheet to create a harmonized dataset. Once harmonization is finished, the package also has functions that will create descriptive statistics using RMarkdown'. Data Harmonization guidelines have been described by Fortier I, Raina P, Van den Heuvel ER, et al. (2017) <doi:10.1093/ije/dyw075>. Additional details of our R package have been described by Stephen JJ, Carolan P, Krefman AE, et al. (2024) <doi:10.1016/j.patter.2024.101003>.
DNA methylation is an epigenetic modification involved in genomic stability, gene regulation, development and disease. DNA methylation occurs mainly through the addition of a methyl group to cytosines, for example to cytosines in a CpG dinucleotide context (CpG stands for a cytosine followed by a guanine). Tissue-specific methylation patterns lead to genomic regions with different characteristic methylation levels. E.g. in vertebrates CpG islands (regions with high CpG content) that are associated to promoter regions of expressed genes tend to be unmethylated. MethEvolSIM is a model-based simulation software for the generation and modification of cytosine methylation patterns along a given tree, which can be a genealogy of cells within an organism, a coalescent tree of DNA sequences sampled from a population, or a species tree. The simulations are based on an extension of the model of Grosser & Metzler (2020) <doi:10.1186/s12859-020-3438-5> and allows for changes of the methylation states at single cytosine positions as well as simultaneous changes of methylation frequencies in genomic structures like CpG islands.
This package provides methods for analyzing and using quartets displayed on a collection of gene trees, primarily to make inferences about the species tree or network under the multi-species coalescent model. These include quartet hypothesis tests for the model, as developed by Mitchell et al. (2019) <doi:10.1214/19-EJS1576>, simplex plots of quartet concordance factors as presented by Allman et al. (2020) <doi:10.1101/2020.02.13.948083>, species tree inference methods based on quartet distances of Rhodes (2019) <doi:10.1109/TCBB.2019.2917204> and Yourdkhani and Rhodes (2019) <doi:10.1007/s11538-020-00773-4>, the NANUQ algorithm for inference of level-1 species networks of Allman et al. (2019) <doi:10.1186/s13015-019-0159-2>, the TINNIK algorithm for inference of the tree of blobs of an arbitrary network of Allman et al.(2022) <doi:10.1007/s00285-022-01838-9>, and NANUQ+ routines for resolving multifurcations in the tree of blobs to cycles as in Allman et al.(2024) (forthcoming). Software announcement by Rhodes et al. (2020) <doi:10.1093/bioinformatics/btaa868>.
Assists in the plotting and functional smoothing of traits measured over time and the extraction of features from these traits, implementing the SET (Smoothing and Extraction of Traits) method described in Brien et al. (2020) Plant Methods, 16. Smoothing of growth trends for individual plants using natural cubic smoothing splines or P-splines is available for removing transient effects and segmented smoothing is available to deal with discontinuities in growth trends. There are graphical tools for assessing the adequacy of trait smoothing, both when using this and other packages, such as those that fit nonlinear growth models. A range of per-unit (plant, pot, plot) growth traits or features can be extracted from the data, including single time points, interval growth rates and other growth statistics, such as maximum growth or days to maximum growth. The package also has tools adapted to inputting data from high-throughput phenotyping facilities, such from a Lemna-Tec Scananalyzer 3D (see <https://www.youtube.com/watch?v=MRAF_mAEa7E/> for more information). The package growthPheno can also be installed from <http://chris.brien.name/rpackages/>.
The differences in the RNA types being sequenced have an impact on the resulting sequencing profiles. mRNA-seq data is enriched with reads derived from exons, while GRO-, nucRNA- and chrRNA-seq demonstrate a substantial broader coverage of both exonic and intronic regions. The presence of intronic reads in GRO-seq type of data makes it possible to use it to computationally identify and quantify all de novo continuous regions of transcription distributed across the genome. This type of data, however, is more challenging to interpret and less common practice compared to mRNA-seq. One of the challenges for primary transcript detection concerns the simultaneous transcription of closely spaced genes, which needs to be properly divided into individually transcribed units. The R package transcriptR combines RNA-seq data with ChIP-seq data of histone modifications that mark active Transcription Start Sites (TSSs), such as, H3K4me3 or H3K9/14Ac to overcome this challenge. The advantage of this approach over the use of, for example, gene annotations is that this approach is data driven and therefore able to deal also with novel and case specific events.
Enhancing cross-language compatibility within the RStudio environment and supporting seamless language understanding, the deepRstudio package leverages the power of the DeepL API (see <https://www.deepl.com/docs-api>) to enable seamless, fast, accurate, and affordable translation of code comments, documents, and text. This package offers the ability to translate selected text into English (EN), as well as from English into various languages, namely Japanese (JA), Chinese (ZH), Spanish (ES), French (FR), Russian (RU), Portuguese (PT), and Indonesian (ID). With much of the text being written in English, the emphasis is on compatibility from English. It is also designed for developers working on multilingual projects and data analysts collaborating with international teams, simplifying the translation process and making code more accessible and comprehensible to people with diverse language backgrounds. This package uses the rstudioapi package and DeepL API, and is simply implemented, executed from addins or via shortcuts on RStudio'. With just a few steps, content can be translated between supported languages, promoting better collaboration and expanding the global reach of work. The functionality of this package works only on RStudio using rstudioapi'.
Conduct numerous exploratory analyses in an instant with a point-and-click interface. With one simple command, this tool launches a Shiny App on the local machine. Drag and drop variables in a data set to categorize them as possible independent, dependent, moderating, or mediating variables. Then run dozens (or hundreds) of analyses instantly to uncover any statistically significant relationships among variables. Any relationship thus uncovered should be tested in follow-up studies. This tool is designed only to facilitate exploratory analyses and should NEVER be used for p-hacking. Many of the functions used in this package are previous versions of functions in the R Packages kim and ezr'. Selected References: Chang et al. (2021) <https://CRAN.R-project.org/package=shiny>. Dowle et al. (2021) <https://CRAN.R-project.org/package=data.table>. Kim (2023) <https://jinkim.science/docs/kim.pdf>. Kim (2021) <doi:10.5281/zenodo.4619237>. Kim (2020) <https://CRAN.R-project.org/package=ezr>. Simmons et al. (2011) <doi:10.1177/0956797611417632> Tingley et al. (2019) <https://CRAN.R-project.org/package=mediation>. Wickham et al. (2020) <https://CRAN.R-project.org/package=ggplot2>.