Assay for Transpose-Accessible Chromatin using sequencing (ATAC-seq) is a technique to assess genome-wide chromatin accessibility by probing open chromatin with hyperactive mutant Tn5 Transposase that inserts sequencing adapters into open regions of the genome. ATACseqTFEA
is an improvement of the current computational method that detects differential activity of transcription factors (TFs). ATACseqTFEA
not only uses the difference of open region information, but also (or emphasizes) the difference of TFs footprints (cutting sites or insertion sites). ATACseqTFEA
provides an easy, rigorous way to broadly assess TF activity changes between two conditions.
This package provides a set of utilities for matching products in different classification codes used in international trade research. It supports concordance between the Harmonized System (HS0, HS1, HS2, HS3, HS4, HS5, HS combined), the Standard International Trade Classification (SITC1, SITC2, SITC3, SITC4), the North American Industry Classification System (NAICS combined), as well as the Broad Economic Categories (BEC), the International Standard of Industrial Classification (ISIC), and the Standard Industrial Classification (SIC). It also provides code nomenclature/descriptions look-up, Rauch classification look-up (via concordance to SITC2), and trade elasticity look-up (via concordance to HS0 or SITC3 codes).
In meta regression sometimes the studies have multiple effects that are correlated. For this reason cluster robust standard errors must be computed. However, since the clusters are unbalanced the wild bootstrap is suggested. See Oczkowski E. and Doucouliagos H. (2015). "Wine prices and quality ratings: a meta-regression analysis". American Journal of Agricultural Economics, 97(1): 103--121. <doi:10.1093/ajae/aau057> and Cameron A. C., Gelbach J. B. and Miller D. L. (2008). "Bootstrap-based improvements for inference with clustered errors". The Review of Economics and Statistics, 90(3): 414--427. <doi:10.1162/rest.90.3.414>.
This package provides functions for examining measurement invariance via equivalence testing are included in this package. The traditionally used RMSEA (Root Mean Square Error of Approximation) cutoff values are adjusted based on simulation results. In addition, a projection-based method is implemented to test the equality of latent factor means across groups without assuming the equality of intercepts. For more information, see Yuan, K. H., & Chan, W. (2016) <doi:10.1037/met0000080>, Deng, L., & Yuan, K. H. (2016) <doi:10.1007/s11336-015-9491-8>, and Jiang, G., Mai, Y., & Yuan, K. H. (2017) <doi:10.3389/fpsyg.2017.01823>.
Testing of soil for the contents of organic carbon, and available macro- and micro-nutrients is a crucial part of soil fertility assessment. This package computes some routinely tested soil properties viz. organic carbon (C), total nitrogen (N), available N, mineral N, available phosphorus (P), available potassium (K), available iron (Fe), available zinc (Zn), available manganese (Mn), available copper (Cu), and available nickel (Ni) in soil based on laboratory analysis data obtained by most commonly followed protocols. Besides, it can also draw standard curves based on absorption/emission vs. concentration data, and give out unknown concentrations from absorption/emission readings.
Forecasting competitions are of increasing importance as a mean to learn best practices and gain knowledge. Data leakage is one of the most common issues that can often be found in competitions. Data leaks can happen when the training data contains information about the test data. For example: randomly chosen blocks of time series are concatenated to form a new time series, scale-shifts, repeating patterns in time series, white noise is added in the original time series to form a new time series, etc. tsdataleaks package can be used to detect data leakages in a collection of time series.
This package provides a package for detecting differential methylation. It exploits a Bayesian hidden Markov model that incorporates location dependence among genomic loci, unlike most existing methods that assume independence among observations. Bayesian priors are applied to permit information sharing across an entire chromosome for improved power of detection. The direct output of our software package is the best sequence of methylation states, eliminating the use of a subjective, and most of the time an arbitrary, threshold of p-value for determining significance. At last, our methodology does not require replication in either or both of the two comparison groups.
This package provides R users with direct access to genomic and clinical data from the cBioPortal
web resource via user-friendly functions that wrap cBioPortal's
existing API endpoints <https://www.cbioportal.org/api/swagger-ui/index.html>. Users can browse and query genomic data on mutations, copy number alterations and fusions, as well as data on tumor mutational burden ('TMB'), microsatellite instability status ('MSI'), FACETS and select clinical data points (depending on the study). See <https://www.cbioportal.org/> and Gao et al., (2013) <doi:10.1126/scisignal.2004088> for more information on the cBioPortal
web resource.
Implementation of selected Tidyverse functions within DataSHIELD
', an open-source federated analysis solution in R. Currently, DataSHIELD
contains very limited tools for data manipulation, so the aim of this package is to improve the researcher experience by implementing essential functions for data manipulation, including subsetting, filtering, grouping, and renaming variables. This is the serverside package which should be installed on the server holding the data, and is used in conjuncture with the clientside package dsTidyverseClient
which is installed in the local R environment of the analyst. For more information, see <https://www.tidyverse.org/> and <https://datashield.org/>.
This package implements analytical methods for multidimensional plant traits, including Competitors-Stress tolerators-Ruderals strategy analysis using leaf traits, Leaf-Height-Seed strategy analysis, Niche Periodicity Table analysis, and Trait Network analysis. Provides functions for data analysis, visualization, and network metrics calculation. Methods are based on Grime (1974) <doi:10.1038/250026a0>, Pierce et al. (2017) <doi:10.1111/1365-2435.12882>, Westoby (1998) <doi:10.1023/A:1004327224729>, Yang et al. (2022) <doi:10.1016/j.foreco.2022.120540>, Winemiller et al. (2015) <doi:10.1111/ele.12462>, He et al. (2020) <doi:10.1016/j.tree.2020.06.003>.
Two method new of multigroup and simulation of data. The first technique called multigroup PCA (mgPCA
) this multivariate exploration approach that has the idea of considering the structure of groups and / or different types of variables. On the other hand, the second multivariate technique called Multigroup Dimensionality Reduction (MDR) it is another multivariate exploration method that is based on projections. In addition, a method called Single Dimension Exploration (SDE) was incorporated for to analyze the exploration of the data. It could help us in a better way to observe the behavior of the multigroup data with certain variables of interest.
This package provides functions and graphics for projecting daily incidence based on past incidence, and estimates of the serial interval and reproduction number. Projections are based on a branching process using a Poisson-distributed number of new cases per day, similar to the model used for estimating R in EpiEstim
or in earlyR
', and described by Nouvellet et al. (2017) <doi:10.1016/j.epidem.2017.02.012>. The package provides the S3 class projections which extends matrix', with accessors and additional helpers for handling, subsetting, merging, or adding these objects, as well as dedicated printing and plotting methods.
Estimation of the required sample size to validate a risk model for binary outcomes, based on the sample size equations proposed by Pavlou et al. (2021) <doi:10.1177/09622802211007522>. For precision-based sample size calculations, the user is required to enter the anticipated values of the C-statistic and outcome prevalence, which can be obtained from a previous study. The user also needs to specify the required precision (standard error) for the C-statistic, the calibration slope and the calibration in the large. The calculations are valid under the assumption of marginal normality for the distribution of the linear predictor.
"The Soil Texture Wizard" is a set of R functions designed to produce texture triangles (also called texture plots, texture diagrams, texture ternary plots), classify and transform soil textures data. These functions virtually allows to plot any soil texture triangle (classification) into any triangle geometry (isosceles, right-angled triangles, etc.). This set of function is expected to be useful to people using soil textures data from different soil texture classification or different particle size systems. Many (> 15) texture triangles from all around the world are predefined in the package. A simple text based graphical user interface is provided: soiltexture_gui()
.
This package performs a multiscale analysis of a nonparametric regression or nonparametric regressions with time series errors. In case of one regression, with the help of this package it is possible to detect the regions where the trend function is increasing or decreasing. In case of multiple regressions, the test identifies regions where the trend functions are different from each other. See Khismatullina and Vogt (2020) <doi:10.1111/rssb.12347>, Khismatullina and Vogt (2022) <doi:10.48550/arXiv.2209.10841>
and Khismatullina and Vogt (2023) <doi:10.1016/j.jeconom.2021.04.010> for more details on theory and applications.
Facilitate the analysis of inter-limb and intra-limb coordination in human movement. It provides functions for calculating the phase angle between two segments, enabling researchers and practitioners to quantify the coordination patterns within and between limbs during various motor tasks. Needham, R., Naemi, R., & Chockalingam, N. (2014) <doi:10.1016/j.jbiomech.2013.12.032>. Needham, R., Naemi, R., & Chockalingam, N. (2015) <doi:10.1016/j.jbiomech.2015.07.023>. Tepavac, D., & Field-Fote, E. C. (2001) <doi:10.1123/jab.17.3.259>. Park, J.H., Lee, H., Cho, Js. et al. (2021) <doi:10.1038/s41598-020-80237-w>.
This package provides a tool developed with the Golem framework which provides an easier way to check cells differences between two data frames. The user provides two data frames for comparison, selects IDs variables identifying each row of input data, then clicks a button to perform the comparison. Several R package functions are used to describe the data and perform the comparison in the server of the application. The main ones are comparedf()
from arsenal and skim()
from skimr'. For more details see the description of comparedf()
from the arsenal package and that of skim()
from the skimr package.
This package provides tools for statistical analysis using partitioning-based least squares regression as described in Cattaneo, Farrell and Feng (2020a, <doi:10.48550/arXiv.1804.04916>
) and Cattaneo, Farrell and Feng (2020b, <doi:10.48550/arXiv.1906.00202>
): lsprobust()
for nonparametric point estimation of regression functions and their derivatives and for robust bias-corrected (pointwise and uniform) inference; lspkselect()
for data-driven selection of the IMSE-optimal number of knots; lsprobust.plot()
for regression plots with robust confidence intervals and confidence bands; lsplincom()
for estimation and inference for linear combinations of regression functions from different groups.
Computes A-, MV-, D- and E-optimal or near-optimal row-column designs for two-colour cDNA
microarray experiments using the linear fixed effects and mixed effects models where the interest is in a comparison of all pairwise treatment contrasts. The algorithms used in this package are based on the array exchange and treatment exchange algorithms adopted from Debusho, Gemechu and Haines (2018) <doi:10.1080/03610918.2018.1429617> algorithms after adjusting for the row-column designs setup. The package also provides an optional method of using the graphical user interface (GUI) R package tcltk to ensure that it is user friendly.
Create an interactive pizza chart visualizing a specific player's statistics across various attributes in a sports dataset. The chart is constructed based on input parameters: data', a dataframe containing player data for any sports; player_stats_col', a vector specifying the names of the columns from the dataframe that will be used to create slices in the pizza chart, with statistics ranging between 0 and 100; name_col', specifying the name of the column in the dataframe that contains the player names; and player_name', representing the specific player whose statistics will be visualized in the chart, serving as the chart title.
The MSstatsLOBD
package allows calculation and visualization of limit of blac (LOB) and limit of detection (LOD). We define the LOB as the highest apparent concentration of a peptide expected when replicates of a blank sample containing no peptides are measured. The LOD is defined as the measured concentration value for which the probability of falsely claiming the absence of a peptide in the sample is 0.05, given a probability 0.05 of falsely claiming its presence. These functionalities were previously a part of the MSstats package. The methodology is described in Galitzine (2018) <doi:10.1074/mcp.RA117.000322>.
Fit and simulate bivariate correlated frailty models with proportional hazard structure. Frailty distributions, such as gamma and lognormal models are supported for semiparametric procedures. Frailty variances of the two subjects can be varied or equal. Details on the models are available in book of Wienke (2011,ISBN:978-1-4200-7388-1). Bivariate gamma fit is obtained using the approach given in Iachine (1995) with modifications. Lognormal fit is based on the approach by Ripatti and Palmgren (2000) <doi:10.1111/j.0006-341X.2000.01016.x>. Frailty distributions, such as gamma, inverse gaussian and power variance frailty models are supported for parametric approach.
This package provides functions representing some useful empirical and data-driven models of heat loss, corrosion diagnostics, reliability and predictive maintenance of pipeline systems. The package is an option for technical engineering departments of heat generating and heat transfer companies that use or plan to use regulatory calculations in their activities. Methods are described in Timashev et al. (2016) <doi:10.1007/978-3-319-25307-7>, A.C.Reddy (2017) <doi:10.1016/j.matpr.2017.07.081>, Minenergo (2008) <https://docs.cntd.ru/document/902148459>, Minenergo (2005) <https://docs.cntd.ru/document/1200035568>, Xing LU. (2014) <doi:10.1080/23744731.2016.1258371>.
Samples large data such that spectral clustering is possible while preserving density information in edge weights. More specifically, given a matrix of coordinates as input, SamSPECTRAL
first builds the communities to sample the data points. Then, it builds a graph and after weighting the edges by conductance computation, the graph is passed to a classic spectral clustering algorithm to find the spectral clusters. The last stage of SamSPECTRAL
is to combine the spectral clusters. The resulting "connected components" estimate biological cell populations in the data. See the vignette for more details on how to use this package, some illustrations, and simple examples.