This package performs complex string operations compactly and efficiently. It supports string interpolation jointly with over 50 string operations. It also enhances regular string functions (like grep() and co).
Extract glyph information from font data, and translate the outline curves to flattened paths or tessellated polygons. The converted data is returned as a data.frame in easy-to-plot format.
A streamgraph is a type of stacked area chart. It represents the evolution of a numeric variable for several groups. Areas are usually displayed around a central axis, and edges are rounded to give a flowing shape. This package provides an htmlwidget for building streamgraph visualizations.
This package provides a collection of functions to search and download street view imagery ('Mapilary <https://www.mapillary.com/developer/api-documentation>) and to extract, quantify, and visualize visual features. Moreover, there are functions provided to generate Qualtrics survey in TXT format using the collection of street views for various research purposes.
Offers a comprehensive approach for analysing stratified 2x2 contingency tables. It facilitates the calculation of odds ratios, 95% confidence intervals, and conducts chi-squared, Cochran-Mantel-Haenszel, Mantel-Haenszel, and Breslow-Day-Tarone tests. The package is particularly useful in fields like epidemiology and social sciences where stratified analysis is essential. The package also provides interpretative insights into the results, aiding in the understanding of statistical outcomes.
This package provides a pilot matching design to automatically stratify and match large datasets. The manual_stratify() function allows users to manually stratify a dataset based on categorical variables of interest, while the auto_stratify() function does automatically by allocating a held-aside (pilot) data set, fitting a prognostic score (see Hansen (2008) <doi:10.1093/biomet/asn004>) on the pilot set, and stratifying the data set based on prognostic score quantiles. The strata_match() function then does optimal matching of the data set in parallel within strata.
This package provides tools for testing, monitoring and dating structural changes in (linear) regression models. It features tests/methods from the generalized fluctuation test framework as well as from the F test (Chow test) framework. This includes methods to fit, plot and test fluctuation processes (e.g., CUSUM, MOSUM, recursive/moving estimates) and F statistics, respectively. It is possible to monitor incoming data online using fluctuation processes. Finally, the breakpoints in regression models with structural changes can be estimated together with confidence intervals. Emphasis is always given to methods for visualizing the data.
This package provides an efficient method to recover the missing block of an approximately low-rank matrix. Current literature on matrix completion focuses primarily on independent sampling models under which the individual observed entries are sampled independently. Motivated by applications in genomic data integration, we propose a new framework of structured matrix completion (SMC) to treat structured missingness by design [Cai T, Cai TT, Zhang A (2016) <doi:10.1080/01621459.2015.1021005>]. Specifically, our proposed method aims at efficient matrix recovery when a subset of the rows and columns of an approximately low-rank matrix are observed. The main function in our package, smc.FUN(), is for recovery of the missing block A22 of an approximately low-rank matrix A given the other blocks A11, A12, A21.
This package provides drop-in replacements for functions from the stringr package, with the same user interface. These functions have no external dependencies and can be copied directly into your package code using the staticimports package.
This package provides functions for stratified sampling and assigning custom labels to data, ensuring randomness within groups. The package supports various sampling methods such as stratified, cluster, and systematic sampling. It allows users to apply transformations and customize the sampling process. This package can be useful for statistical analysis and data preparation tasks.
This package aims to quantify and remove putative double strand DNA from a strand-specific RNA sample. There are also options and methods to plot the positive/negative proportions of all sliding windows, which allow users to have an idea of how much the sample was contaminated and the appropriate threshold to be used for filtering.
An open source platform for validation and process control. Tools to analyze data from internal validation of forensic short tandem repeat (STR) kits are provided. The tools are developed to provide the necessary data to conform with guidelines for internal validation issued by the European Network of Forensic Science Institutes (ENFSI) DNA Working Group, and the Scientific Working Group on DNA Analysis Methods (SWGDAM). A front-end graphical user interface is provided. More information about each function can be found in the respective help documentation.
Random Forest-like tree ensemble that works with groups of predictor variables. When building a tree, a number of variables is taken randomly from each group separately, thus ensuring that it considers variables from each group for the splits. Useful when rows contain information about different things (e.g. user information and product information) and it's not sensible to make a prediction with information from only one group of variables, or when there are far more variables from one group than the other and it's desired to have groups appear evenly on trees. Trees are grown using the C5.0 algorithm rather than the usual CART algorithm. Supports parallelization (multithreaded), missing values in predictors, and categorical variables (without doing One-Hot encoding in the processing). Can also be used to create a regular (non-stratified) Random Forest-like model, but made up of C5.0 trees and with some additional control options. As it's built with C5.0 trees, it works only for classification (not for regression).
Regression-based ranking of pathogen strains with respect to their contributions to natural epidemics, using demographic and genetic data sampled in the curse of the epidemics. This package also includes the GMCPIC test.
Characterize daily stream discharge and water quality data and subsample water quality data. Provide dates, discharge, and water quality measurements and streamsampler can find gaps, get summary statistics, and subsample according to common stream sampling protocols. Stream sampling protocols are described in Lee et al. (2016) <doi:10.1016/j.jhydrol.2016.08.059> and Lee et al. (2019) <doi:10.3133/sir20195084>.
The Structstrings package implements the widely used dot bracket annotation for storing base pairing information in structured RNA. Structstrings uses the infrastructure provided by the Biostrings package and derives the DotBracketString and related classes from the BString class. From these, base pair tables can be produced for in depth analysis. In addition, the loop indices of the base pairs can be retrieved as well. For better efficiency, information conversion is implemented in C, inspired to a large extend by the ViennaRNA package.
Implementation of analytical models for estimating streamflow depletion due to groundwater pumping, and other related tools. Functions are broadly split into two groups: (1) analytical streamflow depletion models, which estimate streamflow depletion for a single stream reach resulting from groundwater pumping; and (2) depletion apportionment equations, which distribute estimated streamflow depletion among multiple stream reaches within a stream network. See Zipper et al. (2018) <doi:10.1029/2018WR022707> for more information on depletion apportionment equations and Zipper et al. (2019) <doi:10.1029/2018WR024403> for more information on analytical depletion functions, which combine analytical models and depletion apportionment equations.
An extensive set of data (pre-)processing and analysis methods and tools for metabolomics and other omics, with a strong emphasis on statistics and machine learning. This toolbox allows the user to build extensive and standardised workflows for data analysis. The methods and tools have been implemented using class-based templates provided by the struct (Statistics in R Using Class-based Templates) package. The toolbox includes pre-processing methods (e.g. signal drift and batch correction, normalisation, missing value imputation and scaling), univariate (e.g. ttest, various forms of ANOVA, Kruskal–Wallis test and more) and multivariate statistical methods (e.g. PCA and PLS, including cross-validation and permutation testing) as well as machine learning methods (e.g. Support Vector Machines). The STATistics Ontology (STATO) has been integrated and implemented to provide standardised definitions for the different methods, inputs and outputs.
Includes bases for litholog generation: graphical functions based on R base graphics, interval management functions and svg importation functions among others. Also include stereographic projection functions, and other functions made to deal with large datasets while keeping options to get into the details of the data. When using for publication please cite Sebastien Wouters, Anne-Christine Da Silva, Frederic Boulvain and Xavier Devleeschouwer, 2021. The R Journal 13:2, 153-178. The palaeomagnetism functions are based on: Tauxe, L., 2010. Essentials of Paleomagnetism. University of California Press. <https://earthref.org/MagIC/books/Tauxe/Essentials/>; Allmendinger, R. W., Cardozo, N. C., and Fisher, D., 2013, Structural Geology Algorithms: Vectors & Tensors: Cambridge, England, Cambridge University Press, 289 pp.; Cardozo, N., and Allmendinger, R. W., 2013, Spherical projections with OSXStereonet: Computers & Geosciences, v. 51, no. 0, p. 193 - 205, <doi: 10.1016/j.cageo.2012.07.021>.
Reliability of (normal) stress-strength models and for building two-sided or one-sided confidence intervals according to different approximate procedures.
Pass named and unnamed character vectors into specified positions in strings. This represents an attempt to replicate some of python's string formatting.
This package provides tools for using the StreamCat and LakeCat API and interacting with the StreamCat and LakeCat database. Convenience functions in the package wrap the API for StreamCat on <https://api.epa.gov/StreamCat/streams/metrics>.
The stress addition approach is an alternative to the traditional concentration addition or effect addition models. It allows the modelling of tri-phasic concentration-response relationships either as single toxicant experiments, in combination with an environmental stressor or as mixtures of two toxicants. See Liess et al. (2019) <doi:10.1038/s41598-019-51645-4> and Liess et al. (2020) <doi:10.1186/s12302-020-00394-7>.
Univariate stratification of survey populations with a generalization of the Lavallee-Hidiroglou method of stratum construction. The generalized method takes into account a discrepancy between the stratification variable and the survey variable. The determination of the optimal boundaries also incorporate, if desired, an anticipated non-response, a take-all stratum for large units, a take-none stratum for small units, and a certainty stratum to ensure that some specific units are in the sample. The well known cumulative root frequency rule of Dalenius and Hodges and the geometric rule of Gunning and Horgan are also implemented.