Provide sets of functions and methods to learn and practice data science using idea of algorithmic trading. Main goal is to process information within "Decision Support System" to come up with analysis or predictions. There are several utilities such as dynamic and adaptive risk management using reinforcement learning and even functions to generate predictions of price changes using pattern recognition deep regression learning. Summary of Methods used: Awesome H2O tutorials: <https://github.com/h2oai/awesome-h2o>, Market Type research of Van Tharp Institute: <https://vantharp.com/>, Reinforcement Learning R package: <https://CRAN.R-project.org/package=ReinforcementLearning>
.
This package provides tools and demonstrates methods for working with individual undergraduate student-level records (registrar's data) in R'. Tools include filters for program codes, data sufficiency, and timely completion. Methods include gathering blocs of records, computing quantitative metrics such as graduation rate, and creating charts to visualize comparisons. midfieldr interacts with practice data provided in midfielddata', an R data package available at <https://midfieldr.github.io/midfielddata/>. midfieldr also interacts with the full MIDFIELD database for users who have access. This work is supported by the US National Science Foundation through grant numbers 1545667 and 2142087.
It includes test for multivariate normality, test for uniformity on the d-dimensional Sphere, non-parametric two- and k-sample tests, random generation of points from the Poisson kernel-based density and clustering algorithm for spherical data. For more information see Saraceno G., Markatou M., Mukhopadhyay R. and Golzy M. (2024) <doi:10.48550/arXiv.2402.02290>
Markatou, M. and Saraceno, G. (2024) <doi:10.48550/arXiv.2407.16374>
, Ding, Y., Markatou, M. and Saraceno, G. (2023) <doi:10.5705/ss.202022.0347>, and Golzy, M. and Markatou, M. (2020) <doi:10.1080/10618600.2020.1740713>.
This package provides a high-level plotting system, compatible with `ggplot2` objects, maps from `sf`, `terra`, `raster`, `sp`. It is built primarily on the grid package. The objective of the package is to provide a plotting system that is built for speed and modularity. This is useful for quick visualizations when testing code and for plotting multiple figures to the same device from independent sources that may be independent of one another (i.e., different function or modules the create the visualizations). The suggested package fastshp can be installed from the repository (<https://PredictiveEcology.r-universe.dev>
).
This package provides functions and data sets inspired by data sharpening - data perturbation to achieve improved performance in nonparametric estimation, as described in Choi, E., Hall, P. and Rousson, V. (2000). Capabilities for enhanced local linear regression function and derivative estimation are included, as well as an asymptotically correct iterated data sharpening estimator for any degree of local polynomial regression estimation. A cross-validation-based bandwidth selector is included which, in concert with the iterated sharpener, will often provide superior performance, according to a median integrated squared error criterion. Sample data sets are provided to illustrate function usage.
This package provides methods for generating modelled parametric Tropical Cyclone (TC) spatial hazard fields and time series output at point locations from TC tracks. R's compatibility to simply use fast cpp code via the Rcpp package and the wide range spatial analysis tools via the terra package makes it an attractive open source environment to study TCs'. This package estimates TC vortex wind and pressure fields using parametric equations originally coded up in python by TCRM <https://github.com/GeoscienceAustralia/tcrm>
and then coded up in Cuda cpp by TCwindgen <https://github.com/CyprienBosserelle/TCwindgen>
.
Algebraic procedures for analyses of multiple social networks are delivered with this package. multiplex
makes possible, among other things, to create and manipulate multiplex, multimode, and multilevel network data with different formats. Effective ways are available to treat multiple networks with routines that combine algebraic systems like the partially ordered semigroup with decomposition procedures or semiring structures with the relational bundles occurring in different types of multivariate networks. multiplex provides also an algebraic approach for affiliation networks through Galois derivations between families of the pairs of subsets in the two domains of the network with visualization options.
This package provides a common task faced by researchers is the creation of APA style (i.e., American Psychological Association style) tables from statistical output. In R a large number of function calls are often needed to obtain all of the desired information for a single APA style table. As well, the process of manually creating APA style tables in a word processor is prone to transcription errors. This package creates Word files (.doc files) containing APA style tables for several types of analyses. Using this package minimizes transcription errors and reduces the number commands needed by the user.
Fast C++ implementation of Dynamic Time Warping for time series dissimilarity analysis, with applications in environmental monitoring and sensor data analysis, climate science, signal processing and pattern recognition, and financial data analysis. Built upon the ideas presented in Benito and Birks (2020) <doi:10.1111/ecog.04895>, provides tools for analyzing time series of varying lengths and structures, including irregular multivariate time series. Key features include individual variable contribution analysis, restricted permutation tests for statistical significance, and imputation of missing data via GAMs. Additionally, the package provides an ample set of tools to prepare and manage time series data.
Dataset and functions to explore quality of literary novels. The package is a part of the Riddle of Literary Quality project, and it contains the data of a reader survey about fiction in Dutch, a description of the novels the readers rated, and the results of stylistic measurements of the novels. The package also contains functions to combine, analyze, and visualize these data. For more details, see: Eder M, van Zundert J, Lensink S, van Dalen-Oskam K (2022). Replicating The Riddle of Literary Quality: The litRiddle
package for R. In _Digital Humanities 2022: Conference Abstracts_, 636-637.
Convenience functions and datasets to be used with Practical Multilevel Modeling using R. The package includes functions for calculating group means, group mean centered variables, and displaying some basic missing data information. A function for computing robust standard errors for linear mixed models based on Liang and Zeger (1986) <doi:10.1093/biomet/73.1.13> and Bell and McCaffrey
(2002) <https://www150.statcan.gc.ca/n1/en/pub/12-001-x/2002002/article/9058-eng.pdf?st=NxMjN1YZ>
is included as well as a function for checking for level-one homoskedasticity (Raudenbush & Bryk, 2002, ISBN:076191904X).
An implementation of data analysis tools for samples of symmetric or Hermitian positive definite matrices, such as collections of covariance matrices or spectral density matrices. The tools in this package can be used to perform: (i) intrinsic wavelet transforms for curves (1D) or surfaces (2D) of Hermitian positive definite matrices with applications to dimension reduction, denoising and clustering in the space of Hermitian positive definite matrices; and (ii) exploratory data analysis and inference for samples of positive definite matrices by means of intrinsic data depth functions and rank-based hypothesis tests in the space of Hermitian positive definite matrices.
This package provides functionalities for performing stability analysis of genotype by environment interaction (GEI) to identify superior and stable genotypes across diverse environments. It implements Eberhart and Russellâ s ANOVA method (1966)(<doi:10.2135/cropsci1966.0011183X000600010011x>), Finlay and Wilkinsonâ s Joint Linear Regression method (1963) (<doi:10.1071/AR9630742>), Wrickeâ s Ecovalence (1962, 1964), Shuklaâ s stability variance parameter (1972) (<doi:10.1038/hdy.1972.87>), Kangâ s simultaneous selection for high yield and stability (1991) (<doi:10.2134/agronj1991.00021962008300010037x>), Additive Main Effects and Multiplicative Interaction (AMMI) method and Genotype plus Genotypes by Environment (GGE) Interaction methods.
The Subsemble algorithm is a general subset ensemble prediction method, which can be used for small, moderate, or large datasets. Subsemble partitions the full dataset into subsets of observations, fits a specified underlying algorithm on each subset, and uses a unique form of k-fold cross-validation to output a prediction function that combines the subset-specific fits. An oracle result provides a theoretical performance guarantee for Subsemble. The paper, "Subsemble: An ensemble method for combining subset-specific algorithm fits" is authored by Stephanie Sapp, Mark J. van der Laan & John Canny (2014) <doi:10.1080/02664763.2013.864263>.
The package ptairData
contains two raw datasets from Proton-Transfer-Reaction Time-of-Flight mass spectrometer acquisitions (PTR-TOF-MS), in the HDF5 format. One from the exhaled air of two volunteer healthy individuals with three replicates, and one from the cell culture headspace from two mycobacteria species and one control (culture medium only) with two replicates. Those datasets are used in the examples and in the vignette of the ptairMS
package (PTR-TOF-MS data pre-processing). There are also used to gererate the ptrSet
in the ptairMS
data : exhaledPtrset
and mycobacteriaSet
.
This package provides density functions for the joint distribution of choice, response time and confidence for discrete confidence judgments as well as functions for parameter fitting, prediction and simulation for various dynamical models of decision confidence. All models are explained in detail by Hellmann et al. (2023; Preprint available at <https://osf.io/9jfqr/>, published version: <doi:10.1037/rev0000411>). Implemented models are the dynaViTE
model, dynWEV
model, the 2DSD model (Pleskac & Busemeyer, 2010, <doi:10.1037/a0019737>), and various race models. C++ code for dynWEV
and 2DSD is based on the rtdists package by Henrik Singmann.
Different functions includes constructing composite indicators, imputing missing data, and evaluating imputation techniques. Additionally, different tools for data normalization. Detailed methodologies of Indicator package are: OECD/European Union/EC-JRC (2008), "Handbook on Constructing Composite Indicators: Methodology and User Guide", OECD Publishing, Paris, <DOI:10.1787/533411815016>, Matteo Mazziotta & Adriano Pareto, (2018) "Measuring Well-Being Over Time: The Adjusted Mazziottaâ Pareto Index Versus Other Non-compensatory Indices" <DOI:10.1007/s11205-017-1577-5> and De Muro P., Mazziotta M., Pareto A. (2011), "Composite Indices of Development and Poverty: An Application to MDGs" <DOI:10.1007/s11205-010-9727-z>.
Kernel-based methods are powerful methods for integrating heterogeneous types of data. mixKernel
aims at providing methods to combine kernel for unsupervised exploratory analysis. Different solutions are provided to compute a meta-kernel, in a consensus way or in a way that best preserves the original topology of the data. mixKernel
also integrates kernel PCA to visualize similarities between samples in a non linear space and from the multiple source point of view <doi:10.1093/bioinformatics/btx682>. A method to select (as well as funtions to display) important variables is also provided <doi:10.1093/nargab/lqac014>.
This package provides tools to assist in safely applying user generated objective and derivative function to optimization programs. These are primarily function minimization methods with at most bounds and masks on the parameters. Provides a way to check the basic computation of objective functions that the user provides, along with proposed gradient and Hessian functions, as well as to wrap such functions to avoid failures when inadmissible parameters are provided. Check bounds and masks. Check scaling or optimality conditions. Perform an axial search to seek lower points on the objective function surface. Includes forward, central and backward gradient approximation codes.
This package provides Azure Active Directory (AAD) authentication functionality for R users of Microsoft's Azure cloud <https://azure.microsoft.com/>. Use this package to obtain OAuth 2.0 tokens for services including Azure Resource Manager, Azure Storage and others. It supports both AAD v1.0 and v2.0, as well as multiple authentication methods, including device code and resource owner grant. Tokens are cached in a user-specific directory obtained using the rappdirs package. The interface is based on the OAuth framework in the httr package, but customised and streamlined for Azure. Part of the AzureR
family of packages.
This package provides several functions to identify and analyse miRNA
sponge, including popular methods for identifying miRNA
sponge interactions, two types of global ceRNA
regulation prediction methods and four types of context-specific prediction methods( Li Y et al.(2017) <doi:10.1093/bib/bbx137>), which are based on miRNA-messenger
RNA regulation alone, or by integrating heterogeneous data, respectively. In addition, For predictive ceRNA
relationship pairs, this package provides several downstream analysis algorithms, including regulatory network analysis and functional annotation analysis, as well as survival prognosis analysis based on expression of ceRNA
ternary pair.
Easy installation, loading and management, of high-performance packages for statistical computing and data manipulation in R. The core fastverse consists of 4 packages: data.table', collapse', kit and magrittr', that jointly only depend on Rcpp'. The fastverse can be freely and permanently extended with additional packages, both globally or for individual projects. Separate package verses can also be created. Fast packages for many common tasks such as time series, dates and times, strings, spatial data, statistics, data serialization, larger-than-memory processing, and compilation of R code are listed in the README file: <https://github.com/fastverse/fastverse#suggested-extensions>.
This package provides functions and datasets to support Valliant, Dever, and Kreuter (2018), <doi:10.1007/978-3-319-93632-1>, "Practical Tools for Designing and Weighting Survey Samples". Contains functions for sample size calculation for survey samples using stratified or clustered one-, two-, and three-stage sample designs, and single-stage audit sample designs. Functions are included that will group geographic units accounting for distances apart and measures of size. Other functions compute variance components for multistage designs, sample sizes in two-phase designs, and a stopping rule for ending data collection. A number of example data sets are included.
Objects to manipulate sequential and seasonal time series. Sequential time series based on time instants and time duration are handled. Both can be regularly or unevenly spaced (overlapping duration are allowed). Only POSIX* format are used for dates and times. The following classes are provided : POSIXcti', POSIXctp', TimeIntervalDataFrame
', TimeInstantDataFrame
', SubtimeDataFrame
; methods to switch from a class to another and to modify the time support of series (hourly time series to daily time series for instance) are also defined. Tools provided can be used for instance to handle environmental monitoring data (not always produced on a regular time base).