This package provides functions for Estimating a (c)DCC-GARCH Model in large dimensions based on a publication by Engle et,al (2017) <doi:10.1080/07350015.2017.1345683> and Nakagawa et,al (2018) <doi:10.3390/ijfs6020052>. This estimation method is consist of composite likelihood method by Pakel et al. (2014) <http://paneldataconference2015.ceu.hu/Program/Cavit-Pakel.pdf> and (Non-)linear shrinkage estimation of covariance matrices by Ledoit and Wolf (2004,2015,2016). (<doi:10.1016/S0047-259X(03)00096-4>, <doi:10.1214/12-AOS989>, <doi:10.1016/j.jmva.2015.04.006>).
Algebraic procedures for analyses of multiple social networks are delivered with this package. multiplex makes possible, among other things, to create and manipulate multiplex, multimode, and multilevel network data with different formats. Effective ways are available to treat multiple networks with routines that combine algebraic systems like the partially ordered semigroup with decomposition procedures or semiring structures with the relational bundles occurring in different types of multivariate networks. multiplex provides also an algebraic approach for affiliation networks through Galois derivations between families of the pairs of subsets in the two domains of the network with visualization options.
This package provides an automatic aggregation tool to manage point data privacy, intended to be helpful for the production of official spatial data and for researchers. The package pursues the data accuracy at the smallest possible areas preventing individual information disclosure. The methodology, based on hierarchical geographic data structures performs aggregation and local suppression of point data to ensure privacy as described in Lagonigro, R., Oller, R., Martori J.C. (2017) <doi:10.2436/20.8080.02.55>. The data structures are created following the guidelines for grid datasets from the European Forum for Geography and Statistics.
Estimate the Å estákâ Berggren kinetic model (degradation model) from experimental data. A closed-form (analytic) solution to the degradation model is implemented as a non-linear fit, allowing for the extrapolation of the degradation of a drug product - both in time and temperature. Parametric bootstrap, with kinetic parameters drawn from the multivariate t-distribution, and analytical formulae (the delta method) are available options to calculate the confidence and prediction intervals. The results (modelling, extrapolations and statistical intervals) can be visualised with multiple plots. The examples illustrate the accelerated stability modelling in drugs and vaccines development.
This package provides a novel interpretable machine learning-based framework to automate the development of a clinical scoring model for predefined outcomes. Our novel framework consists of six modules: variable ranking with machine learning, variable transformation, score derivation, model selection, domain knowledge-based score fine-tuning, and performance evaluation.The details are described in our research paper<doi:10.2196/21798>. Users or clinicians could seamlessly generate parsimonious sparse-score risk models (i.e., risk scores), which can be easily implemented and validated in clinical practice. We hope to see its application in various medical case studies.
Noise filter based on determining the proportion of neighboring points. A false point will be rejected if it has only few neighbors, but accepted if the proportion of neighbors in a rectangular frame is high. The size of the rectangular frame as well as the cut-off value, i.e. of a minimum proportion of neighbor-points, may be supplied or can be calculated automatically. Originally designed for the cleaning of heart rates, but suitable for filtering any slowly-changing physiological variable.For more information see Signer (2010)<doi:10.1111/j.2041-210X.2009.00010.x>.
This package provides a family of novel beta mixture models (BMMs) has been developed by Majumdar et al. (2022) <doi:10.48550/arXiv.2211.01938> to appositely model the beta-valued cytosine-guanine dinucleotide (CpG) sites, to objectively identify methylation state thresholds and to identify the differentially methylated CpG (DMC) sites using a model-based clustering approach. The family of beta mixture models employs different parameter constraints applicable to different study settings. The EM algorithm is used for parameter estimation, with a novel approximation during the M-step providing tractability and ensuring computational feasibility.
Large-scale phenotypic data processing is essential in research. Researchers need to eliminate outliers from the data in order to obtain true and reliable results. Best linear unbiased prediction (BLUP) is a standard method for estimating random effects of a mixed model. This method can be used to process phenotypic data under different conditions and is widely used in animal and plant breeding. The Phenotype can remove outliers from phenotypic data and performs the best linear unbiased prediction (BLUP), help researchers quickly complete phenotypic data analysis. H.P.Piepho. (2008) <doi:10.1007/s10681-007-9449-8>.
This package provides a fast and flexible framework for agglomerative partitioning. partition uses an approach called Direct-Measure-Reduce to create new variables that maintain the user-specified minimum level of information. Each reduced variable is also interpretable: the original variables map to one and only one variable in the reduced data set. partition is flexible, as well: how variables are selected to reduce, how information loss is measured, and the way data is reduced can all be customized. partition is based on the Partition framework discussed in Millstein et al. (2020) <doi:10.1093/bioinformatics/btz661>.
Exporting shiny applications with shinylive allows you to run them entirely in a web browser, without the need for a separate R server. The traditional way of deploying shiny applications involves in a separate server and client: the server runs R and shiny', and clients connect via the web browser. When an application is deployed with shinylive', R and shiny run in the web browser (via webR'): the browser is effectively both the client and server for the application. This allows for your shiny application exported by shinylive to be hosted by a static web server.
Doubly robust estimation for the mean of an arbitrarily transformed survival time under covariate-induced dependent left truncation and noninformative right censoring. The functions truncAIPW(), truncAIPW_cen1(), and truncAIPW_cen2() compute the doubly robust estimators under the scenario without censoring and the two censoring scenarios, respectively. The package also contains three simulated data sets simu', simu_c1', and simu_c2', which are used to illustrate the usage of the functions in this package. Reference: Wang, Y., Ying, A., Xu, R. (2022) "Doubly robust estimation under covariate-induced dependent left truncation" <arXiv:2208.06836>.
An implementation of methods related to sparse clustering and variable importance in clustering. The package currently allows to perform sparse k-means clustering with a group penalty, so that it automatically selects groups of numerical features. It also allows to perform sparse clustering and variable selection on mixed data (categorical and numerical features), by preprocessing each categorical feature as a group of numerical features. Several methods for visualizing and exploring the results are also provided. M. Chavent, J. Lacaille, A. Mourer and M. Olteanu (2020)<https://www.esann.org/sites/default/files/proceedings/2020/ES2020-103.pdf>.
Provide sets of functions and methods to learn and practice data science using idea of algorithmic trading. Main goal is to process information within "Decision Support System" to come up with analysis or predictions. There are several utilities such as dynamic and adaptive risk management using reinforcement learning and even functions to generate predictions of price changes using pattern recognition deep regression learning. Summary of Methods used: Awesome H2O tutorials: <https://github.com/h2oai/awesome-h2o>, Market Type research of Van Tharp Institute: <https://vantharp.com/>, Reinforcement Learning R package: <https://CRAN.R-project.org/package=ReinforcementLearning>.
This package provides tools and demonstrates methods for working with individual undergraduate student-level records (registrar's data) in R'. Tools include filters for program codes, data sufficiency, and timely completion. Methods include gathering blocs of records, computing quantitative metrics such as graduation rate, and creating charts to visualize comparisons. midfieldr interacts with practice data provided in midfielddata', an R data package available at <https://midfieldr.github.io/midfielddata/>. midfieldr also interacts with the full MIDFIELD database for users who have access. This work is supported by the US National Science Foundation through grant numbers 1545667 and 2142087.
It includes test for multivariate normality, test for uniformity on the d-dimensional Sphere, non-parametric two- and k-sample tests, random generation of points from the Poisson kernel-based density and clustering algorithm for spherical data. For more information see Saraceno G., Markatou M., Mukhopadhyay R. and Golzy M. (2024) <doi:10.48550/arXiv.2402.02290> Markatou, M. and Saraceno, G. (2024) <doi:10.48550/arXiv.2407.16374>, Ding, Y., Markatou, M. and Saraceno, G. (2023) <doi:10.5705/ss.202022.0347>, and Golzy, M. and Markatou, M. (2020) <doi:10.1080/10618600.2020.1740713>.
Detrending multivariate time-series to approximate stationarity when dealing with intensive longitudinal data, prior to Vector Autoregressive (VAR) or multilevel-VAR estimation. Classical VAR assumes weak stationarity (constant first two moments), and deterministic trends inflate spurious autocorrelation, biasing Granger-causality and impulse-response analyses. All functions operate on raw panel data and write detrended columns back to the data set, but differ in the level at which the trend is estimated. See, for instance, Wang & Maxwell (2015) <doi:10.1037/met0000030>; Burger et al. (2022) <doi:10.4324/9781003111238-13>; Epskamp et al. (2018) <doi:10.1177/2167702617744325>.
This package provides functions and data sets inspired by data sharpening - data perturbation to achieve improved performance in nonparametric estimation, as described in Choi, E., Hall, P. and Rousson, V. (2000). Capabilities for enhanced local linear regression function and derivative estimation are included, as well as an asymptotically correct iterated data sharpening estimator for any degree of local polynomial regression estimation. A cross-validation-based bandwidth selector is included which, in concert with the iterated sharpener, will often provide superior performance, according to a median integrated squared error criterion. Sample data sets are provided to illustrate function usage.
This package provides methods for generating modelled parametric Tropical Cyclone (TC) spatial hazard fields and time series output at point locations from TC tracks. R's compatibility to simply use fast cpp code via the Rcpp package and the wide range spatial analysis tools via the terra package makes it an attractive open source environment to study TCs'. This package estimates TC vortex wind and pressure fields using parametric equations originally coded up in python by TCRM <https://github.com/GeoscienceAustralia/tcrm> and then coded up in Cuda cpp by TCwindgen <https://github.com/CyprienBosserelle/TCwindgen>.
The package ptairData contains two raw datasets from Proton-Transfer-Reaction Time-of-Flight mass spectrometer acquisitions (PTR-TOF-MS), in the HDF5 format. One from the exhaled air of two volunteer healthy individuals with three replicates, and one from the cell culture headspace from two mycobacteria species and one control (culture medium only) with two replicates. Those datasets are used in the examples and in the vignette of the ptairMS package (PTR-TOF-MS data pre-processing). There are also used to gererate the ptrSet in the ptairMS data : exhaledPtrset and mycobacteriaSet.
This package provides a common task faced by researchers is the creation of APA style (i.e., American Psychological Association style) tables from statistical output. In R a large number of function calls are often needed to obtain all of the desired information for a single APA style table. As well, the process of manually creating APA style tables in a word processor is prone to transcription errors. This package creates Word files (.doc files) containing APA style tables for several types of analyses. Using this package minimizes transcription errors and reduces the number commands needed by the user.
High performance principal component analysis routines that operate directly on bigmemory::big.matrix objects. The package avoids materialising large matrices in memory by streaming data through BLAS and LAPACK kernels and provides helpers to derive scores, loadings, correlations, and contribution diagnostics, including utilities that stream results into bigmemory'-backed matrices for file-based workflows. Additional interfaces expose scalable singular value decomposition, robust PCA, and robust SVD algorithms so that users can explore large matrices while tempering the influence of outliers. Scalable principal component analysis is also implemented, Elgamal, Yabandeh, Aboulnaga, Mustafa, and Hefeeda (2015) <doi:10.1145/2723372.2751520>.
This package provides tools for converting and imputing date values to the ISO 8601 standard format and for reconciling differences between two versions of a data set. The package automatically detects date patterns within data frame columns and converts them to consistent ISO-formatted dates, with optional imputation of missing day or month components based on user-defined rules. It also includes functionality to identify inserted, deleted, and updated records, as well as column- and value-level changes, when comparing old and new versions of a data frame. Only one date format may be applied within a single column.
Fast C++ implementation of Dynamic Time Warping for time series dissimilarity analysis, with applications in environmental monitoring and sensor data analysis, climate science, signal processing and pattern recognition, and financial data analysis. Built upon the ideas presented in Benito and Birks (2020) <doi:10.1111/ecog.04895>, provides tools for analyzing time series of varying lengths and structures, including irregular multivariate time series. Key features include individual variable contribution analysis, restricted permutation tests for statistical significance, and imputation of missing data via GAMs. Additionally, the package provides an ample set of tools to prepare and manage time series data.
Dataset and functions to explore quality of literary novels. The package is a part of the Riddle of Literary Quality project, and it contains the data of a reader survey about fiction in Dutch, a description of the novels the readers rated, and the results of stylistic measurements of the novels. The package also contains functions to combine, analyze, and visualize these data. For more details, see: Eder M, van Zundert J, Lensink S, van Dalen-Oskam K (2022). Replicating The Riddle of Literary Quality: The litRiddle package for R. In _Digital Humanities 2022: Conference Abstracts_, 636-637.