This package provides a universal, user friendly, single-cell and bulk RNA sequencing visualization toolkit that allows highly customizable creation of color blindness friendly, publication-quality figures. dittoSeq accepts both SingleCellExperiment (SCE) and Seurat objects, as well as the import and usage, via conversion to an SCE, of SummarizedExperiment or DGEList bulk data. Visualizations include dimensionality reduction plots, heatmaps, scatterplots, percent composition or expression across groups, and more. Customizations range from size and title adjustments to automatic generation of annotations for heatmaps, overlay of trajectory analysis onto any dimensionality reduciton plot, hidden data overlay upon cursor hovering via ggplotly conversion, and many more. All with simple, discrete inputs. Color blindness friendliness is powered by legend adjustments (enlarged keys), and by allowing the use of shapes or letter-overlay in addition to the carefully selected codedittoColors().
Age-Period-Cohort (APC) analyses are used to differentiate relevant drivers for long-term developments. The APCtools package offers visualization techniques and general routines to simplify the workflow of an APC analysis. Sophisticated functions are available both for descriptive and regression model-based analyses. For the former, we use density (or ridgeline) matrices and (hexagonally binned) heatmaps as innovative visualization techniques building on the concept of Lexis diagrams. Model-based analyses build on the separation of the temporal dimensions based on generalized additive models, where a tensor product interaction surface (usually between age and period) is utilized to represent the third dimension (usually cohort) on its diagonal. Such tensor product surfaces can also be estimated while accounting for further covariates in the regression model. See Weigert et al. (2021) <doi:10.1177/1354816620987198> for methodological details.
This package provides a collection of datasets and simplified functions for an introductory (geo)statistics module at University College London. Provides functionality for compositional, directional and spatial data, including ternary diagrams, Wulff and Schmidt stereonets, and ordinary kriging interpolation. Implements logistic and (additive and centred) logratio transformations. Computes vector averages and concentration parameters for the von-Mises distribution. Includes a collection of natural and synthetic fractals, and a simulator for deterministic chaos using a magnetic pendulum example. The main purpose of these functions is pedagogical. Researchers can find more complete alternatives for these tools in other packages such as compositions', robCompositions', sp', gstat and RFOC'. All the functions are written in plain R, with no compiled code and a minimal number of dependencies. Theoretical background and worked examples are available at <https://tinyurl.com/UCLgeostats/>.
Biodiversity is a multifaceted concept covering different levels of organization from genes to ecosystems. iNEXT.3D extends iNEXT to include three dimensions (3D) of biodiversity, i.e., taxonomic diversity (TD), phylogenetic diversity (PD) and functional diversity (FD). This package provides functions to compute standardized 3D diversity estimates with a common sample size or sample coverage. A unified framework based on Hill numbers and their generalizations (Hill-Chao numbers) are used to quantify 3D. All 3D estimates are in the same units of species/lineage equivalents and can be meaningfully compared. The package features size- and coverage-based rarefaction and extrapolation sampling curves to facilitate rigorous comparison of 3D diversity across individual assemblages. Asymptotic 3D diversity estimates are also provided. See Chao et al. (2021) <doi:10.1111/2041-210X.13682> for more details.
Given two unbiased samples of patient level data on cost and effectiveness for a pair of treatments, make head-to-head treatment comparisons by (i) generating the bivariate bootstrap resampling distribution of ICE uncertainty for a specified value of the shadow price of health, lambda, (ii) form the wedge-shaped ICE confidence region with specified confidence fraction within [0.50, 0.99] that is equivariant with respect to changes in lambda, (iii) color the bootstrap outcomes within the above confidence wedge with economic preferences from an ICE map with specified values of lambda, beta and gamma parameters, (iv) display VAGR and ALICE acceptability curves, and (v) illustrate variation in ICE preferences by displaying potentially non-linear indifference(iso-preference) curves from an ICE map with specified values of lambda, beta and either gamma or eta parameters.
Testing and documenting code that communicates with remote servers can be painful. Dealing with authentication, server state, and other complications can make testing seem too costly to bother with. But it doesn't need to be that hard. This package enables one to test all of the logic on the R sides of the API in your package without requiring access to the remote service. Importantly, it provides three contexts that mock the network connection in different ways, as well as testing functions to assert that HTTP requests were---or were not---made. It also allows one to safely record real API responses to use as test fixtures. The ability to save responses and load them offline also enables one to write vignettes and other dynamic documents that can be distributed without access to a live server.
The peak fitting of spectral data is performed by using the frame work of EM algorithm. We adapted the EM algorithm for the peak fitting of spectral data set by considering the weight of the intensity corresponding to the measurement energy steps (Matsumura, T., Nagamura, N., Akaho, S., Nagata, K., & Ando, Y. (2019, 2021 and 2023) <doi:10.1080/14686996.2019.1620123>, <doi:10.1080/27660400.2021.1899449> <doi:10.1080/27660400.2022.2159753>. The package efficiently estimates the parameters of Gaussian mixture model during iterative calculation between E-step and M-step, and the parameters are converged to a local optimal solution. This package can support the investigation of peak shift with two advantages: (1) a large amount of data can be processed at high speed; and (2) stable and automatic calculation can be easily performed.
Energy-Vorticity theory (EVT) is the fundamental theory to describe processes in the atmosphere by combining conserved quantities from hydrodynamics and thermodynamics. The package meteoEVT provides functions to calculate many energetic and vortical quantities, like potential vorticity, Bernoulli function and dynamic state index (DSI) [e.g. Weber and Nevir, 2008, <doi:10.1111/j.1600-0870.2007.00272.x>], for given gridded data, like ERA5 reanalyses. These quantities can be studied directly or can be used for many applications in meteorology, e.g., the objective identification of atmospheric fronts. For this purpose, separate function are provided that allow the detection of fronts based on the thermic front parameter [Hewson, 1998, <doi:10.1017/S1350482798000553>], the F diagnostic [Parfitt et al., 2017, <doi:10.1002/2017GL073662>] and the DSI [Mack et al., 2022, <arXiv:2208.11438>].
This Haskell package is intended for those who are tired of keeping long lists of dependencies to the same essential libraries in each package as well as the endless imports of the same APIs all over again.
It also supports the modern tendencies in the language.
To solve those problems this package does the following:
Reexport the original APIs under the
Rebasenamespace.Export all the possible non-conflicting symbols from the
Rebase.Preludemodule.Give priority to the modern practices in the conflicting cases.
The policy behind the package is only to reexport the non-ambiguous and non-controversial APIs, which the community has obviously settled on. The package is intended to rapidly evolve with the contribution from the community, with the missing features being added with pull-requests.
This package provides a recently proposed Bayesian BIN model disentangles the underlying processes that enable forecasters and forecasting methods to improve, decomposing forecasting accuracy into three components: bias, partial information, and noise. By describing the differences between two groups of forecasters, the model allows the user to carry out useful inference, such as calculating the posterior probabilities of the treatment reducing bias, diminishing noise, or increasing information. It also provides insight into how much tamping down bias and noise in judgment or enhancing the efficient extraction of valid information from the environment improves forecasting accuracy. This package provides easy access to the BIN model. For further information refer to the paper Ville A. Satopää, Marat Salikhov, Philip E. Tetlock, and Barbara Mellers (2021) "Bias, Information, Noise: The BIN Model of Forecasting" <doi:10.1287/mnsc.2020.3882>.
Calculates conditional exact tests (Fisher's exact test, Blaker's exact test, or exact McNemar's test) and unconditional exact tests (including score-based tests on differences in proportions, ratios of proportions, and odds ratios, and Boshcloo's test) with appropriate matching confidence intervals, and provides power and sample size calculations. Gives melded confidence intervals for the binomial case (Fay, et al, 2015, <DOI:10.1111/biom.12231>). Gives boundary-optimized rejection region test (Gabriel, et al, 2018, <DOI:10.1002/sim.7579>), an unconditional exact test for the situation where the controls are all expected to fail. Gives confidence intervals compatible with exact McNemar's or sign tests (Fay and Lumbard, 2021, <DOI:10.1002/sim.8829>). For review of these kinds of exact tests see Fay and Hunsberger (2021, <DOI:10.1214/21-SS131>).
Regression methods to quantify the relation between two measurement methods are provided by this package. In particular it addresses regression problems with errors in both variables and without repeated measurements. It implements the Clinical Laboratory Standard International (CLSI) recommendations (see J. A. Budd et al. (2018, <https://clsi.org/standards/products/method-evaluation/documents/ep09/>) for analytical method comparison and bias estimation using patient samples. Furthermore, algorithms for Theil-Sen and equivariant Passing-Bablok estimators are implemented, see F. Dufey (2020, <doi:10.1515/ijb-2019-0157>) and J. Raymaekers and F. Dufey (2022, <arXiv:2202:08060>). Further the robust M-Deming and MM-Deming (experimental) are available, see G. Pioda (2021, <arXiv:2105:04628>). A comprehensive overview over the implemented methods and references can be found in the manual pages mcrPioda-package and mcreg'.
The OLStrajr package provides comprehensive functions for ordinary least squares (OLS) trajectory analysis and case-by-case OLS regression as outlined in Carrig, Wirth, and Curran (2004) <doi:10.1207/S15328007SEM1101_9> and Rogosa and Saner (1995) <doi:10.3102/10769986020002149>. It encompasses two primary functions, OLStraj() and cbc_lm(). The OLStraj() function simplifies the estimation of individual growth curves over time via OLS regression, with options for visualizing both group-level and individual-level growth trajectories and support for linear and quadratic models. The cbc_lm() function facilitates case-by-case OLS estimates and provides unbiased mean population intercept and slope estimators by averaging OLS intercepts and slopes across cases. It further offers standard error calculations across bootstrap replicates and computation of 95% confidence intervals based on empirical distributions from the resampling processes.
This is a C/C++ based package for advanced data transformation and statistical computing in R that is extremely fast, class-agnostic, robust and programmer friendly. Core functionality includes a rich set of S3 generic grouped and weighted statistical functions for vectors, matrices and data frames, which provide efficient low-level vectorizations, OpenMP multithreading, and skip missing values by default. These are integrated with fast grouping and ordering algorithms (also callable from C), and efficient data manipulation functions. The package also provides a flexible and rigorous approach to time series and panel data in R. It further includes fast functions for common statistical procedures, detailed (grouped, weighted) summary statistics, powerful tools to work with nested data, fast data object conversions, functions for memory efficient R programming, and helpers to effectively deal with variable labels, attributes, and missing data.
We use the Alternating Direction Method of Multipliers (ADMM) for parameter estimation in high-dimensional, single-modality mediation models. To improve the sensitivity and specificity of estimated mediation effects, we offer the sure independence screening (SIS) function for dimension reduction. The available penalty options include Lasso, Elastic Net, Pathway Lasso, and Network-constrained Penalty. The methods employed in the package are based on Boyd, S., Parikh, N., Chu, E., Peleato, B., & Eckstein, J. (2011). <doi:10.1561/2200000016>, Fan, J., & Lv, J. (2008) <doi:10.1111/j.1467-9868.2008.00674.x>, Li, C., & Li, H. (2008) <doi:10.1093/bioinformatics/btn081>, Tibshirani, R. (1996) <doi:10.1111/j.2517-6161.1996.tb02080.x>, Zhao, Y., & Luo, X. (2022) <doi:10.4310/21-sii673>, and Zou, H., & Hastie, T. (2005) <doi:10.1111/j.1467-9868.2005.00503.x>.
The core of the package is cvr2.ipflasso(), an extension of glmnet to be used when the (large) set of available predictors is partitioned into several modalities which potentially differ with respect to their information content in terms of prediction. For example, in biomedical applications patient outcome such as survival time or response to therapy may have to be predicted based on, say, mRNA data, miRNA data, methylation data, CNV data, clinical data, etc. The clinical predictors are on average often much more important for outcome prediction than the mRNA data. The ipflasso method takes this problem into account by using different penalty parameters for predictors from different modalities. The ratio between the different penalty parameters can be chosen from a set of optional candidates by cross-validation or alternatively generated from the input data.
This package provides a simple package facilitating ML based analysis for physics education research (PER) purposes. The implemented machine learning technique is random forest optimized by item response theory (IRT) for feature selection and genetic algorithm (GA) for hyperparameter tuning. The data analyzed here has been made available in the CRAN repository through the spheredata package. The SPHERE stands for Students Performance in Physics Education Research (PER). The students are the eleventh graders learning physics at the high school curriculum. We follow the stream of multidimensional students assessment as probed by some research based assessments in PER. The goal is to predict the students performance at the end of the learning process. Three learning domains are measured including conceptual understanding, scientific ability, and scientific attitude. Furthermore, demographic backgrounds and potential variables predicting students performance on physics are also demonstrated.
Set of sequence analysis tools for manipulating, describing and rendering categorical sequences, and more generally mining sequence data in the field of social sciences. Although this sequence analysis package is primarily intended for state or event sequences that describe time use or life courses such as family formation histories or professional careers, its features also apply to many other kinds of categorical sequence data. It accepts many different sequence representations as input and provides tools for converting sequences from one format to another. It offers several functions for describing and rendering sequences, for computing distances between sequences with different metrics (among which optimal matching), original dissimilarity-based analysis tools, and functions for extracting the most frequent event subsequences and identifying the most discriminating ones among them. A user's guide can be found on the TraMineR web page.
Fit flexible and fully parametric hazard regression models to survival data with single event type or multiple competing causes via logistic and multinomial regression. Our formulation allows for arbitrary functional forms of time and its interactions with other predictors for time-dependent hazards and hazard ratios. From the fitted hazard model, we provide functions to readily calculate and plot cumulative incidence and survival curves for a given covariate profile. This approach accommodates any log-linear hazard function of prognostic time, treatment, and covariates, and readily allows for non-proportionality. We also provide a plot method for visualizing incidence density via population time plots. Based on the case-base sampling approach of Hanley and Miettinen (2009) <DOI:10.2202/1557-4679.1125>, Saarela and Arjas (2015) <DOI:10.1111/sjos.12125>, and Saarela (2015) <DOI:10.1007/s10985-015-9352-x>.
This package provides a standardized framework to support the selection and evaluation of parametric survival models for time-to-event data. Includes tools for visualizing survival data, checking proportional hazards assumptions (Grambsch and Therneau, 1994, <doi:10.1093/biomet/81.3.515>), comparing parametric (Ishak and colleagues, 2013, <doi:10.1007/s40273-013-0064-3>), spline (Royston and Parmar, 2002, <doi:10.1002/sim.1203>) and cure models, examining hazard functions, and evaluating model extrapolation. Methods are consistent with recommendations in the NICE Decision Support Unit Technical Support Documents (14 and 21 <https://sheffield.ac.uk/nice-dsu/tsds/survival-analysis>). Results are structured to facilitate integration into decision-analytic models, and reports can be generated with rmarkdown'. The package builds on existing tools including flexsurv (Jackson, 2016, <doi:10.18637/jss.v070.i08>)) and flexsurvcure for estimating cure models.
Numerous time series admit autoregressive moving average (ARMA) representations, in which the errors are uncorrelated but not necessarily independent. These models are called weak ARMA by opposition to the standard ARMA models, also called strong ARMA models, in which the error terms are supposed to be independent and identically distributed (iid). This package allows the study of nonlinear time series models through weak ARMA representations. It determines identification, estimation and validation for ARMA models and for AR and MA models in particular. Functions can also be used in the strong case. This package also works on white noises by omitting arguments p', q', ar and ma'. See Francq, C. and Zakoïan, J. (1998) <doi:10.1016/S0378-3758(97)00139-0> and Boubacar Maïnassara, Y. and Saussereau, B. (2018) <doi:10.1080/01621459.2017.1380030> for more details.
This package provides tools for detecting cellwise outliers and robust methods to analyze data which may contain them. Contains the implementation of the algorithms described in Rousseeuw and Van den Bossche (2018) <doi:10.1080/00401706.2017.1340909> (open access) Hubert et al. (2019) <doi:10.1080/00401706.2018.1562989> (open access), Raymaekers and Rousseeuw (2021) <doi:10.1080/00401706.2019.1677270> (open access), Raymaekers and Rousseeuw (2021) <doi:10.1007/s10994-021-05960-5> (open access), Raymaekers and Rousseeuw (2021) <doi:10.52933/jdssv.v1i3.18> (open access), Raymaekers and Rousseeuw (2022) <doi:10.1080/01621459.2023.2267777> (open access) Rousseeuw (2022) <doi:10.1016/j.ecosta.2023.01.007> (open access). Examples can be found in the vignettes: "DDC_examples", "MacroPCA_examples", "wrap_examples", "transfo_examples", "DI_examples", "cellMCD_examples" , "Correspondence_analysis_examples", and "cellwise_weights_examples".
Enhancing T cell receptor (TCR) sequence analysis, ClusTCR2', based on ClusTCR python program, leverages Hamming distance to compare the complement-determining region three (CDR3) sequences for sequence similarity, variable gene (V gene) and length. The second step employs the Markov Cluster Algorithm to identify clusters within an undirected graph, providing a summary of amino acid motifs and matrix for generating network plots. Tailored for single-cell RNA-seq data with integrated TCR-seq information, ClusTCR2 is integrated into the Single Cell TCR and Expression Grouped Ontologies (STEGO) R application or STEGO.R'. See the two publications for more details. Sebastiaan Valkiers, Max Van Houcke, Kris Laukens, Pieter Meysman (2021) <doi:10.1093/bioinformatics/btab446>, Kerry A. Mullan, My Ha, Sebastiaan Valkiers, Nicky de Vrij, Benson Ogunjimi, Kris Laukens, Pieter Meysman (2023) <doi:10.1101/2023.09.27.559702>.
This package provides a two-stage procedure for the denoising and clustering of stack of noisy images acquired over time. Clustering only assumes that the data contain an unknown but small number of dynamic features. The method first denoises the signals using local spatial and full temporal information. The clustering step uses the previous output to aggregate voxels based on the knowledge of their spatial neighborhood. Both steps use a single keytool based on the statistical comparison of the difference of two signals with the null signal. No assumption is therefore required on the shape of the signals. The data are assumed to be normally distributed (or at least follow a symmetric distribution) with a known constant variance. Working pixelwise, the method can be time-consuming depending on the size of the data-array but harnesses the power of multicore cpus.