This package provides a web-based shiny interface for the StepReg package enables stepwise regression analysis across linear, generalized linear (including logistic, Poisson, Gamma, and negative binomial), and Cox models. It supports forward, backward, bidirectional, and best-subset selection under a range of criteria. The package also supports stepwise regression to multivariate settings, allowing multiple dependent variables to be modeled simultaneously. Users can explore and combine multiple selection strategies and criteria to optimize model selection. For enhanced robustness, the package offers optional randomized forward selection to reduce overfitting, and a data-splitting workflow for more reliable post-selection inference. Additional features include logging and visualization of the selection process, as well as the ability to export results in common formats.
The StockDistFit package provides functions for fitting probability distributions to stock price data. The package uses maximum likelihood estimation to find the best-fitting distribution for a given stock. It also offers a function to fit several distributions to one or more assets and compare the distribution with the Akaike Information Criterion (AIC) and then pick the best distribution. References are as follows: Siew et al. (2008) <https://www.jstage.jst.go.jp/article/jappstat/37/1/37_1_1/_pdf/-char/ja> and Benth et al. (2008) <https://books.google.co.ke/books?hl=en&lr=&id=MHNpDQAAQBAJ&oi=fnd&pg=PR7&dq=Stochastic+modeling+of+commodity+prices+using+the+Variance+Gamma+(VG)+model.+&ots=YNIL2QmEYg&sig=XZtGU0lp4oqXHVyPZ-O8x5i7N3w&redir_esc=y#v=onepage&q&f=false>.
Random Forest-like tree ensemble that works with groups of predictor variables. When building a tree, a number of variables is taken randomly from each group separately, thus ensuring that it considers variables from each group for the splits. Useful when rows contain information about different things (e.g. user information and product information) and it's not sensible to make a prediction with information from only one group of variables, or when there are far more variables from one group than the other and it's desired to have groups appear evenly on trees. Trees are grown using the C5.0 algorithm rather than the usual CART algorithm. Supports parallelization (multithreaded), missing values in predictors, and categorical variables (without doing One-Hot encoding in the processing). Can also be used to create a regular (non-stratified) Random Forest-like model, but made up of C5.0 trees and with some additional control options. As it's built with C5.0 trees, it works only for classification (not for regression).
Settings and functions to extend the knitr Stata engine.
Graphical and computational methods that can be used to assess the stability of results from supervised statistical learning.
Regression-based ranking of pathogen strains with respect to their contributions to natural epidemics, using demographic and genetic data sampled in the curse of the epidemics. This package also includes the GMCPIC test.
Collection of spatial transcriptomics datasets stored in SpatialExperiment Bioconductor format, for use in examples, demonstrations, and tutorials. The datasets are from several different platforms and have been sourced from various publicly available sources. Several datasets include images and/or reference annotation labels.
S4 class wrappers for the ODBC and Pool DBI connection, also provides some utilities to paste small datasets to clipboard, rename columns. It is used by the package stacomiR for connections to the database. Development versions of stacomiR are available in R-forge.
This package provides functions to compute standardized differences for numeric, binary, and categorical variables on Apache Spark DataFrames using sparklyr'. The implementation mirrors the methods used in the stddiff package but operates on distributed data. See Zhicheng Du, Yuantao Hao (2022) <doi:10.32614/CRAN.package.stddiff> for reference.
Characterize daily stream discharge and water quality data and subsample water quality data. Provide dates, discharge, and water quality measurements and streamsampler can find gaps, get summary statistics, and subsample according to common stream sampling protocols. Stream sampling protocols are described in Lee et al. (2016) <doi:10.1016/j.jhydrol.2016.08.059> and Lee et al. (2019) <doi:10.3133/sir20195084>.
This package provides a collection of statistical and geometrical tools including the aligned rank transform (ART; Higgins et al. 1990 <doi:10.4148/2475-7772.1443>; Peterson 2002 <doi:10.22237/jmasm/1020255240>; Wobbrock et al. 2011 <doi:10.1145/1978942.1978963>), 2-D histograms and histograms with overlapping bins, a function for making all possible formulae within a set of constraints, amongst others.
The goal of statcodelists is to promote the reuse and exchange of statistical information and related metadata with making the internationally standardized SDMX code lists available for the R user. SDMX has been published as an ISO International Standard (ISO 17369). The metadata definitions, including the codelists are updated regularly according to the standard. The authoritative version of the code lists made available in this package is <https://sdmx.org/?page_id=3215/>.
The Structstrings package implements the widely used dot bracket annotation for storing base pairing information in structured RNA. Structstrings uses the infrastructure provided by the Biostrings package and derives the DotBracketString and related classes from the BString class. From these, base pair tables can be produced for in depth analysis. In addition, the loop indices of the base pairs can be retrieved as well. For better efficiency, information conversion is implemented in C, inspired to a large extend by the ViennaRNA package.
Implementation of analytical models for estimating streamflow depletion due to groundwater pumping, and other related tools. Functions are broadly split into two groups: (1) analytical streamflow depletion models, which estimate streamflow depletion for a single stream reach resulting from groundwater pumping; and (2) depletion apportionment equations, which distribute estimated streamflow depletion among multiple stream reaches within a stream network. See Zipper et al. (2018) <doi:10.1029/2018WR022707> for more information on depletion apportionment equations and Zipper et al. (2019) <doi:10.1029/2018WR024403> for more information on analytical depletion functions, which combine analytical models and depletion apportionment equations.
An extensive set of data (pre-)processing and analysis methods and tools for metabolomics and other omics, with a strong emphasis on statistics and machine learning. This toolbox allows the user to build extensive and standardised workflows for data analysis. The methods and tools have been implemented using class-based templates provided by the struct (Statistics in R Using Class-based Templates) package. The toolbox includes pre-processing methods (e.g. signal drift and batch correction, normalisation, missing value imputation and scaling), univariate (e.g. ttest, various forms of ANOVA, Kruskal–Wallis test and more) and multivariate statistical methods (e.g. PCA and PLS, including cross-validation and permutation testing) as well as machine learning methods (e.g. Support Vector Machines). The STATistics Ontology (STATO) has been integrated and implemented to provide standardised definitions for the different methods, inputs and outputs.
Includes bases for litholog generation: graphical functions based on R base graphics, interval management functions and svg importation functions among others. Also include stereographic projection functions, and other functions made to deal with large datasets while keeping options to get into the details of the data. When using for publication please cite Sebastien Wouters, Anne-Christine Da Silva, Frederic Boulvain and Xavier Devleeschouwer, 2021. The R Journal 13:2, 153-178. The palaeomagnetism functions are based on: Tauxe, L., 2010. Essentials of Paleomagnetism. University of California Press. <https://earthref.org/MagIC/books/Tauxe/Essentials/>; Allmendinger, R. W., Cardozo, N. C., and Fisher, D., 2013, Structural Geology Algorithms: Vectors & Tensors: Cambridge, England, Cambridge University Press, 289 pp.; Cardozo, N., and Allmendinger, R. W., 2013, Spherical projections with OSXStereonet: Computers & Geosciences, v. 51, no. 0, p. 193 - 205, <doi: 10.1016/j.cageo.2012.07.021>.
This package provides non-statistical utilities used by the software developed by the Statnet Project.
Reliability of (normal) stress-strength models and for building two-sided or one-sided confidence intervals according to different approximate procedures.
Get programmatic access to data from the Czech public budgeting and accounting database, Státnà pokladna <https://monitor.statnipokladna.gov.cz/>.
Pass named and unnamed character vectors into specified positions in strings. This represents an attempt to replicate some of python's string formatting.
This package provides functions for retrieving general and specific data from the Norwegian Parliament, through the Norwegian Parliament API at <https://data.stortinget.no>.
This package provides R bindings for the Stencila Schema <https://schema.stenci.la>. This package is primarily aimed at R developers wanting to programmatically generate, or modify, executable documents.
These are my collection of R Markdown templates, mostly for compilation to PDF. These are useful for all things academic and professional, if you are using R Markdown for things like your CV or your articles and manuscripts.
Univariate and multivariate normal data simulation. They also supply a brief summary of the analysis for each experiment/design: - Independent samples. - One-way and two-way Anova. - Paired samples (T-Test & Regression). - Repeated measures (Anova & Multiple Regression). - Clinical Assay.