Fast, flexible and user-friendly tools for distribution comparison through direct density ratio estimation. The estimated density ratio can be used for covariate shift adjustment, outlier-detection, change-point detection, classification and evaluation of synthetic data quality. The package implements multiple non-parametric estimation techniques (unconstrained least-squares importance fitting, ulsif()
, Kullback-Leibler importance estimation procedure, kliep()
, spectral density ratio estimation, spectral()
, kernel mean matching, kmm()
, and least-squares hetero-distributional subspace search, lhss()
). with automatic tuning of hyperparameters. Helper functions are available for two-sample testing and visualizing the density ratios. For an overview on density ratio estimation, see Sugiyama et al. (2012) <doi:10.1017/CBO9781139035613> for a general overview, and the help files for references on the specific estimation techniques.
This package provides R with the Glottolog database <https://glottolog.org/> and some more abilities for purposes of linguistic mapping. The Glottolog database contains the catalogue of languages of the world. This package helps researchers to make a linguistic maps, using philosophy of the Cross-Linguistic Linked Data project <https://clld.org/>, which allows for while at the same time facilitating uniform access to the data across publications. A tutorial for this package is available on GitHub
pages <https://docs.ropensci.org/lingtypology/> and package vignette. Maps created by this package can be used both for the investigation and linguistic teaching. In addition, package provides an ability to download data from typological databases such as WALS, AUTOTYP and some others and to create your own database website.
We propose a consistent monitoring procedure to detect a structural change from a cointegrating relationship to a spurious relationship. The procedure is based on residuals from modified least squares estimation, using either Fully Modified, Dynamic or Integrated Modified OLS. It is inspired by Chu et al. (1996) <DOI:10.2307/2171955> in that it is based on parameter estimation on a pre-break "calibration" period only, rather than being based on sequential estimation over the full sample. See the discussion paper <DOI:10.2139/ssrn.2624657> for further information. This package provides the monitoring procedures for both the cointegration and the stationarity case (while the latter is just a special case of the former one) as well as printing and plotting methods for a clear presentation of the results.
An implementation of ggplot2'-methods to present the composition of Solvency II Solvency Capital Requirement (SCR) as a series of concentric circle-parts. Solvency II (Solvency 2) is European insurance legislation, coming in force by the delegated acts of October 10, 2014. <https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=OJ%3AL%3A2015%3A012%3ATOC>. Additional files, defining the structure of the Standard Formula (SF) method of the SCR-calculation are provided. The structure files can be adopted for localization or for insurance companies who use Internal Models (IM). Options are available for combining smaller components, horizontal and vertical scaling, rotation, and plotting only some circle-parts. With outlines and connectors several SCR-compositions can be compared, for example in ORSA-scenarios (Own Risk and Solvency Assessment).
This package provides a simple informative powerful test (mvnTest()
) for multivariate normality proposed by Zhou and Shao (2014) <doi:10.1080/02664763.2013.839637>, which combines kurtosis with Shapiro-Wilk test that is easy for biomedical researchers to understand and easy to implement in all dimensions. This package also contains some other multivariate normality tests including Fattorini's FA test (faTest()
), Mardia's skewness and kurtosis test (mardia()
), Henze-Zirkler's test (mhz()
), Bowman and Shenton's test (msk()
), Roystonâ s H test (msw()
), and Villasenor-Alva and Gonzalez-Estrada's test (msw()
). Empirical power calculation functions for these tests are also provided. In addition, this package includes some functions to generate several types of multivariate distributions mentioned in Zhou and Shao (2014).
The StockDistFit
package provides functions for fitting probability distributions to stock price data. The package uses maximum likelihood estimation to find the best-fitting distribution for a given stock. It also offers a function to fit several distributions to one or more assets and compare the distribution with the Akaike Information Criterion (AIC) and then pick the best distribution. References are as follows: Siew et al. (2008) <https://www.jstage.jst.go.jp/article/jappstat/37/1/37_1_1/_pdf/-char/ja> and Benth et al. (2008) <https://books.google.co.ke/books?hl=en&lr=&id=MHNpDQAAQBAJ&oi=fnd&pg=PR7&dq=Stochastic+modeling+of+commodity+prices+using+the+Variance+Gamma+(VG)+model.+&ots=YNIL2QmEYg&sig=XZtGU0lp4oqXHVyPZ-O8x5i7N3w&redir_esc=y#v=onepage&q&f=false>
.
Estimation and inference methods for large-scale mean and quantile regression models via stochastic (sub-)gradient descent (S-subGD
) algorithms. The inference procedure handles cross-sectional data sequentially: (i) updating the parameter estimate with each incoming "new observation", (ii) aggregating it as a Polyak-Ruppert average, and (iii) computing an asymptotically pivotal statistic for inference through random scaling. The methodology used in the SGDinference package is described in detail in the following papers: (i) Lee, S., Liao, Y., Seo, M.H. and Shin, Y. (2022) <doi:10.1609/aaai.v36i7.20701> "Fast and robust online inference with stochastic gradient descent via random scaling". (ii) Lee, S., Liao, Y., Seo, M.H. and Shin, Y. (2023) <arXiv:2209.14502>
"Fast Inference for Quantile Regression with Tens of Millions of Observations".
An implementation of functions to generate and plot postestimation quantities after estimating Bayesian regression models using Markov chain Monte Carlo (MCMC). Functionality includes the estimation of the Precision-Recall curves (see Beger, 2016 <doi:10.2139/ssrn.2765419>), the implementation of the observed values method of calculating predicted probabilities by Hanmer and Kalkan (2013) <doi:10.1111/j.1540-5907.2012.00602.x>, the implementation of the average value method of calculating predicted probabilities (see King, Tomz, and Wittenberg, 2000 <doi:10.2307/2669316>), and the generation and plotting of first differences to summarize typical effects across covariates (see Long 1997, ISBN:9780803973749; King, Tomz, and Wittenberg, 2000 <doi:10.2307/2669316>). This package can be used with MCMC output generated by any Bayesian estimation tool including JAGS', BUGS', MCMCpack', and Stan'.
Test for monotonicity in financial variables sorted by portfolios. It is conventional practice in empirical research to form portfolios of assets ranked by a certain sort variable. A t-test is then used to consider the mean return spread between the portfolios with the highest and lowest values of the sort variable. Yet comparing only the average returns on the top and bottom portfolios does not provide a sufficient way to test for a monotonic relation between expected returns and the sort variable. This package provides nonparametric tests for the full set of monotonic patterns by Patton, A. and Timmermann, A. (2010) <doi:10.1016/j.jfineco.2010.06.006> and compares the proposed results with extant alternatives such as t-tests, Bonferroni bounds, and multivariate inequality tests through empirical applications and simulations.
seqArchRplus
facilitates downstream analyses of promoter sequence architectures/clusters identified by seqArchR
(or any other tool/method). With additional available information such as the TPM values and interquantile widths (IQWs) of the CAGE tag clusters, seqArchRplus
can order the input promoter clusters by their shape (IQWs), and write the cluster information as browser/IGV track files. Provided visualizations are of two kind: per sample/stage and per cluster visualizations. Those of the first kind include: plot panels for each sample showing per cluster shape, TPM and other score distributions, sequence logos, and peak annotations. The second include per cluster chromosome-wise and strand distributions, motif occurrence heatmaps and GO term enrichments. Additionally, seqArchRplus
can also generate HTML reports for easy viewing and comparison of promoter architectures between samples/stages.
Implementation of the Future API <doi:10.32614/RJ-2021-048> on top of the mirai package <doi:10.5281/zenodo.7912722>. By using this package, you get to take advantage of the benefits of mirai plus everything else that future and the Futureverse adds on top of it. It allows you to process futures, as defined by the future package, in parallel out of the box, on your local machine or across remote machines. Contrary to back-ends relying on the parallel package (e.g. multisession') and socket connections, mirai_cluster and mirai_multisession', provided here, can run more than 125 parallel R processes. As a reminder, regardless which future backend is used by the user, the code does not have to change, it gives identical results, and behaves exactly the same.
This package provides a network-based systems biology tool for flexible identification of phenotype-specific subpathways in the cancer gene expression data with multiple categories (such as multiple subtype or developmental stages of cancer). Subtype Set Enrichment Analysis (SubSEA
) and Dynamic Changed Subpathway Analysis (DCSA) are developed to flexible identify subtype specific and dynamic changed subpathways respectively. The operation modes include extraction of subpathways from biological pathways, inference of subpathway activities in the context of gene expression data, identification of subtype specific subpathways with SubSEA
, identification of dynamic changed subpathways associated with the cancer developmental stage with DCSA, and visualization of the activities of resulting subpathways by using box plots and heat maps. Its capabilities render the tool could find the specific abnormal subpathways in the cancer dataset with multi-phenotype samples.
This package provides a set of functions to estimate rank and factor loadings of time series tensor factor models. A tensor is a multidimensional array. To analyze high-dimensional tensor time series, factor model is a major dimension reduction tool. TensorPreAve
provides functions to estimate the rank of core tensors and factor loading spaces of tensor time series. More specifically, a pre-averaging method that accumulates information from tensor fibres is used to estimate the factor loading spaces. The estimated directions corresponding to the strongest factors are then used for projecting the data for a potentially improved re-estimation of the factor loading spaces themselves. A new rank estimation method is also implemented to utilizes correlation information from the projected data. See Chen and Lam (2023) <arXiv:2208.04012>
for more details.
This package provides cross-validation tools for adsorption isotherm models, supporting both linear and non-linear forms. Current methods cover commonly used isotherms including the Freundlich, Langmuir, and Temkin models. This package implements K-fold and leave-one-out cross-validation (LOOCV) with optional clustering-based fold assignment to preserve underlying data structures during validation. Model predictive performance is assessed using mean squared error (MSE), with optional graphical visualization of fold-wise MSEs to support intuitive evaluation of model accuracy. This package is intended to facilitate rigorous model validation in adsorption studies and aid researchers in selecting robust isotherm models. For more details, see Montgomery et al. (2012) <isbn: 978-0-470-54281-1>, Lumumba et al. (2024) <doi:10.11648/j.ajtas.20241305.13>, and Yates et al. (2022) <doi:10.1002/ecm.1557>.
Inspects provenance collected by the rdt or rdtLite
packages, or other tools providing compatible PROV JSON output created by the execution of a script, and find differences between two provenance collections. Factors under examination included the hardware and software used to execute the script, versions of attached libraries, use of global variables, modified inputs and outputs, and changes in main and sourced scripts. Based on detected changes, provExplainR
can be used to study how these factors affect the behavior of the script and generate a promising diagnosis of the causes of different script results. More information about rdtLite
and associated tools is available at <https://github.com/End-to-end-provenance/> and Barbara Lerner, Emery Boose, and Luis Perez (2018), Using Introspection to Collect Provenance in R, Informatics, <doi:10.3390/informatics5010012>.
This package provides a comprehensive bundle of utilities for the estimation of probability of informed trading models: original PIN in Easley and O'Hara (1992) and Easley et al. (1996); Multilayer PIN (MPIN) in Ersan (2016); Adjusted PIN (AdjPIN
) in Duarte and Young (2009); and volume-synchronized PIN (VPIN) in Easley et al. (2011, 2012). Implementations of various estimation methods suggested in the literature are included. Additional compelling features comprise posterior probabilities, an implementation of an expectation-maximization (EM) algorithm, and PIN decomposition into layers, and into bad/good components. Versatile data simulation tools, and trade classification algorithms are among the supplementary utilities. The package provides fast, compact, and precise utilities to tackle the sophisticated, error-prone, and time-consuming estimation procedure of informed trading, and this solely using the raw trade-level data.
It provides a comprehensive toolkit for calculating a suite of common vegetation indices (VIs) derived from remote sensing imagery. VIs are essential tools used to quantify vegetation characteristics, such as biomass, leaf area index (LAI) and photosynthetic activity, which are essential parameters in various ecological, agricultural, and environmental studies. Applications of this package include biomass estimation, crop monitoring, forest management, land use and land cover change analysis and climate change studies. For method details see, Deb,D.,Deb,S.,Chakraborty,D.,Singh,J.P.,Singh,A.K.,Dutta,P.and Choudhury,A.(2020)<doi:10.1080/10106049.2020.1756461>. Utilizing this R package, users can effectively extract and analyze critical information from remote sensing imagery, enhancing their comprehension of vegetation dynamics and their importance in global ecosystems. The package includes the function vegetation_indices()
.
Metadynamics is a state of the art biomolecular simulation technique. Plumed Tribello, G.A. et al. (2014) <doi:10.1016/j.cpc.2013.09.018> program makes it possible to perform metadynamics using various simulation codes. The results of metadynamics done in Plumed can be analyzed by metadynminer'. The package metadynminer reads 1D and 2D metadynamics hills files from Plumed package. It uses a fast algorithm by Hosek, P. and Spiwok, V. (2016) <doi:10.1016/j.cpc.2015.08.037> to calculate a free energy surface from hills. Minima can be located and plotted on the free energy surface. Transition states can be analyzed by Nudged Elastic Band method by Henkelman, G. and Jonsson, H. (2000) <doi:10.1063/1.1323224>. Free energy surfaces, minima and transition paths can be plotted to produce publication quality images.
Expands iNEXT
to include the estimation of sample completeness and evenness. The package provides simple functions to perform the following four-step biodiversity analysis: STEP 1: Assessment of sample completeness profiles. STEP 2a: Analysis of size-based rarefaction and extrapolation sampling curves to determine whether the asymptotic diversity can be accurately estimated. STEP 2b: Comparison of the observed and the estimated asymptotic diversity profiles. STEP 3: Analysis of non-asymptotic coverage-based rarefaction and extrapolation sampling curves. STEP 4: Assessment of evenness profiles. The analyses in STEPs 2a, 2b and STEP 3 are mainly based on the previous iNEXT
package. Refer to the iNEXT
package for details. This package is mainly focusing on the computation for STEPs 1 and 4. See Chao et al. (2020) <doi:10.1111/1440-1703.12102> for statistical background.
This package performs a series of offline and/or online change-point detection algorithms for 1) univariate mean: <doi:10.1214/20-EJS1710>, <arXiv:2006.03283>
; 2) univariate polynomials: <doi:10.1214/21-EJS1963>; 3) univariate and multivariate nonparametric settings: <doi:10.1214/21-EJS1809>, <doi:10.1109/TIT.2021.3130330>; 4) high-dimensional covariances: <doi:10.3150/20-BEJ1249>; 5) high-dimensional networks with and without missing values: <doi:10.1214/20-AOS1953>, <arXiv:2101.05477>
, <arXiv:2110.06450>
; 6) high-dimensional linear regression models: <arXiv:2010.10410>
, <arXiv:2207.12453>
; 7) high-dimensional vector autoregressive models: <arXiv:1909.06359>
; 8) high-dimensional self exciting point processes: <arXiv:2006.03572>
; 9) dependent dynamic nonparametric random dot product graphs: <arXiv:1911.07494>
; 10) univariate mean against adversarial attacks: <arXiv:2105.10417>
.
Univariate feature selection and compound covariate methods under the Cox model with high-dimensional features (e.g., gene expressions). Available are survival data for non-small-cell lung cancer patients with gene expressions (Chen et al 2007 New Engl J Med) <DOI:10.1056/NEJMoa060096>, statistical methods in Emura et al (2012 PLoS
ONE) <DOI:10.1371/journal.pone.0047627>, Emura & Chen (2016 Stat Methods Med Res) <DOI:10.1177/0962280214533378>, and Emura et al (2019)<DOI:10.1016/j.cmpb.2018.10.020>. Algorithms for generating correlated gene expressions are also available. Estimation of survival functions via copula-graphic (CG) estimators is also implemented, which is useful for sensitivity analyses under dependent censoring (Yeh et al 2023 Biomedicines) <DOI:10.3390/biomedicines11030797> and factorial survival analyses (Emura et al 2024 Stat Methods Med Res) <DOI:10.1177/09622802231215805>.
If results from a meta-GWAS are used for validation in one of the cohorts that was included in the meta-analysis, this will yield biased (i.e. too optimistic) results. The validation cohort needs to be independent from the meta-Genome-Wide-Association-Study (meta-GWAS) results. MetaSubtract
will subtract the results of the respective cohort from the meta-GWAS results analytically without having to redo the meta-GWAS analysis using the leave-one-out methodology. It can handle different meta-analyses methods and takes into account if single or double genomic control correction was applied to the original meta-analysis. It can also handle different meta-analysis methods. It can be used for whole GWAS, but also for a limited set of genetic markers. See for application: Nolte I.M. et al. (2017); <doi: 10.1038/ejhg.2017.50>.
Plan optimal sample size allocation and go/no-go decision rules for phase II/III drug development programs with time-to-event, binary or normally distributed endpoints when assuming fixed treatment effects or a prior distribution for the treatment effect, using methods from Kirchner et al. (2016) <doi:10.1002/sim.6624> and Preussler (2020). Optimal is in the sense of maximal expected utility, where the utility is a function taking into account the expected cost and benefit of the program. It is possible to extend to more complex settings with bias correction (Preussler S et al. (2020) <doi:10.1186/s12874-020-01093-w>), multiple phase III trials (Preussler et al. (2019) <doi:10.1002/bimj.201700241>), multi-arm trials (Preussler et al. (2019) <doi:10.1080/19466315.2019.1702092>), and multiple endpoints (Kieser et al. (2018) <doi:10.1002/pst.1861>).
This package implements novel nonparametric approaches to address biases and confounding when comparing treatments or exposures in observational studies of outcomes. While designed and appropriate for use in studies involving medicine and the life sciences, the package can be used in other situations involving outcomes with multiple confounders. The package implements a family of methods for non-parametric bias correction when comparing treatments in observational studies, including survival analysis settings, where competing risks and/or censoring may be present. The approach extends to bias-corrected personalized predictions of treatment outcome differences, and analysis of heterogeneity of treatment effect-sizes across patient subgroups. For further details, please see: Lauve NR, Nelson SJ, Young SS, Obenchain RL, Lambert CG. LocalControl
: An R Package for Comparative Safety and Effectiveness Research. Journal of Statistical Software. 2020. p. 1â 32. Available from <doi:10.18637/jss.v096.i04>.