The detection of troubling approximate collinearity in a multiple linear regression model is a classical problem in Econometrics. The objective of this package is to detect it using the variance inflation factor redefined and the scatterplot between the variance inflation factor and the coefficient of variation. For more details see Salmerón R., Garcà a C.B. and Garcà a J. (2018) <doi:10.1080/00949655.2018.1463376>, Salmerón, R., Rodrà guez, A. and Garcà a C. (2020) <doi:10.1007/s00180-019-00922-x>, Salmerón, R., Garcà a, C.B, Rodrà guez, A. and Garcà a, C. (2022) <doi:10.32614/RJ-2023-010>, Salmerón, R., Garcà a, C.B. and Garcà a, J. (2024) <doi:10.1007/s10614-024-10575-8> and Salmerón, R., Garcà a, C.B, Garcà a J. (2023, working paper) <doi:10.48550/arXiv.2005.02245>
.
This package provides a system for describing and manipulating the many models that are generated in causal inference and data analysis projects, as based on the causal theory and criteria of Austin Bradford Hill (1965) <doi:10.1177/003591576505800503>. This system includes the addition of formal attributes that modify base `R` objects, including terms and formulas, with a focus on variable roles in the "do-calculus" of modeling, as described in Pearl (2010) <doi:10.2202/1557-4679.1203>. For example, the definition of exposure, outcome, and interaction are implicit in the roles variables take in a formula. These premises allow for a more fluent modeling approach focusing on variable relationships, and assessing effect modification, as described by VanderWeele
and Robins (2007) <doi:10.1097/EDE.0b013e318127181b>. The essential goal is to help contextualize formulas and models in causality-oriented workflows.
This package implements v2 of the B.L.S. API for requests of survey information and time series data through 3-tiered API that allows users to interact with the raw API directly, create queries through a functional interface, and re-shape the data structures returned to fit common uses. The API definition is located at: <https://www.bls.gov/developers/api_signature_v2.htm>.
Estimation of hierarchical Bayesian vector autoregressive models following Kuschnig & Vashold (2021) <doi:10.18637/jss.v100.i14>. Implements hierarchical prior selection for conjugate priors in the fashion of Giannone, Lenza & Primiceri (2015) <doi:10.1162/REST_a_00483>. Functions to compute and identify impulse responses, calculate forecasts, forecast error variance decompositions and scenarios are available. Several methods to print, plot and summarise results facilitate analysis.
An R interface for the remote file hosting service Box (<https://www.box.com/>). In addition to uploading and downloading files, this package includes functions which mirror base R operations for local files, (e.g. box_load()
, box_save()
, box_read()
, box_setwd()
, etc.), as well as git style functions for entire directories (e.g. box_fetch()
, box_push()
).
This package provides a comprehensive framework for time series omics analysis, integrating changepoint detection, smooth and shape-constrained trends, and uncertainty quantification. It supports gene- and transcript-level inferences, p-value aggregation for improved power, and both case-only and case-control designs. It includes an interactive shiny interface. The methods are described in Yates et al. (2024) <doi:10.1101/2024.12.22.630003>.
Conditioned Latin hypercube sampling, as published by Minasny and McBratney
(2006) <DOI:10.1016/j.cageo.2005.12.009>. This method proposes to stratify sampling in presence of ancillary data. An extension of this method, which propose to associate a cost to each individual and take it into account during the optimisation process, is also proposed (Roudier et al., 2012, <DOI:10.1201/b12728>).
Draws systematic samples from a population that follows linear trend. The function returns a matrix comprising of the required samples as its column vectors. The samples produced are highly efficient and the inter sampling variance is minimum. The scheme will be useful in various field like Bioinformatics where the samples are expensive and must be precise in reflecting the population by possessing least sampling variance.
This package provides an implementation of a mixture of hidden Markov models (HMMs) for discrete sequence data in the Discrete Bayesian HMM Clustering (DBHC) algorithm. The DBHC algorithm is an HMM Clustering algorithm that finds a mixture of discrete-output HMMs while using heuristics based on Bayesian Information Criterion (BIC) to search for the optimal number of HMM states and the optimal number of clusters.
This package provides functions of five estimation method for ED50 (50 percent effective dose) are provided, and they are respectively Dixon-Mood method (1948) <doi:10.2307/2280071>, Choi's original turning point method (1990) <doi:10.2307/2531453> and it's modified version given by us, as well as logistic regression and isotonic regression. Besides, the package also supports comparison between two estimation results.
This package provides a simple wrapper around the ical.js library executing Javascript code via V8 (the Javascript engine driving the Chrome browser and Node.js and accessible via the V8 R package). This package enables users to parse iCalendar
files ('.ics', .ifb', .iCal
', .iFBf
') into lists and data.frames to ultimately do statistics on events, meetings, schedules, birthdays, and the like.
This package contains a set of functions to create data libraries, generate data dictionaries, and simulate a data step. The libname()
function will load a directory of data into a library in one line of code. The dictionary()
function will generate data dictionaries for individual data frames or an entire library. And the datestep()
function will perform row-by-row data processing.
This package performs Bayesian linear regression and forecasting in astronomy. The method accounts for heteroscedastic errors in both the independent and the dependent variables, intrinsic scatters (in both variables) and scatter correlation, time evolution of slopes, normalization, scatters, Malmquist and Eddington bias, upper limits and break of linearity. The posterior distribution of the regression parameters is sampled with a Gibbs method exploiting the JAGS library.
Due to lack of proper inference procedure and software, the ordinary linear regression model is seldom used in practice for the analysis of right censored data. This paper presents an S-Plus/R program that implements a recently developed inference procedure (Jin, Lin and Ying, 2006) <doi:10.1093/biomet/93.1.147> for the accelerated failure time model based on the least-squares principle.
Classification method obtained through linear programming. It is advantageous with respect to the classical developments when the distribution of the variables involved is unknown or when the number of variables is much greater than the number of individuals. Mathematical details behind the method are published in Nueda, et al. (2022) "LPDA: A new classification method based on linear programming". <doi:10.1371/journal.pone.0270403>.
This package provides new functions info()
, warn()
and error()
, similar to message()
, warning()
and stop()
respectively. However, the new functions can have a level associated with them, so that when executed the global level option determines whether they are shown or not. This allows debug modes, outputting more information. The can also output all messages to a log file.
The oblique decision tree (ODT) uses linear combinations of predictors as partitioning variables in a decision tree. Oblique Decision Random Forest (ODRF) is an ensemble of multiple ODTs generated by feature bagging. Both can be used for classification and regression as supplements to the classical CART of Breiman (1984) <DOI:10.1201/9781315139470> and Random Forest of Breiman (2001) <DOI:10.1023/A:1010933404324> respectively.
Ordered homogeneity pursuit lasso (OHPL) algorithm for group variable selection proposed in Lin et al. (2017) <DOI:10.1016/j.chemolab.2017.07.004>. The OHPL method exploits the homogeneity structure in high-dimensional data and enjoys the grouping effect to select groups of important variables automatically. This feature makes it particularly useful for high-dimensional datasets with strongly correlated variables, such as spectroscopic data.
This package provides a system for fast, accurate, and flexible whole genome bisulfite sequencing (WGBS) data analysis of two-condition comparisons. Principal Component BiSulfite
, PCBS', assigns methylated loci eigenvector values from the treatment-delineating principal component in lieu of running millions of pairwise statistical tests, which dramatically increases analysis flexibility and reduces computational requirements. Methods: <https://katlande.github.io/PCBS/articles/Differential_Methylation.html>.
We present a penalized log-density estimation method using Legendre polynomials with lasso penalty to adjust estimate's smoothness. Re-expressing the logarithm of the density estimator via a linear combination of Legendre polynomials, we can estimate parameters by maximizing the penalized log-likelihood function. Besides, we proposed an implementation strategy that builds on the coordinate decent algorithm, together with the Bayesian information criterion (BIC).
This package implements recently developed projection pursuit algorithms for finding optimal linear cluster separators. The clustering algorithms use optimal hyperplane separators based on minimum density, Pavlidis et. al (2016) <http://jmlr.org/papers/volume17/15-307/15-307.pdf>; minimum normalised cut, Hofmeyr (2017) <doi:10.1109/TPAMI.2016.2609929>; and maximum variance ratio clusterability, Hofmeyr and Pavlidis (2015) <doi:10.1109/SSCI.2015.116>.
Spatial Stochastic Frontier Analysis (SSFA) is an original method for controlling the spatial heterogeneity in Stochastic Frontier Analysis (SFA) models, for cross-sectional data, by splitting the inefficiency term into three terms: the first one related to spatial peculiarities of the territory in which each single unit operates, the second one related to the specific production features and the third one representing the error term.
We provide a flexible Zero-inflated Poisson-Gamma Model (ZIPG) by connecting both the mean abundance and the variability to different covariates, and build valid statistical inference procedures for both parameter estimation and hypothesis testing. These functions can be used to analyze microbiome count data with zero-inflation and overdispersion. The model is discussed in Jiang et al (2023) <doi:10.1080/01621459.2022.2151447>.
This package provides tooling to group dates by a variety of periods including: yearly, monthly, by second, by week of the month, and more. The groups are defined in such a way that they also represent the distance between dates in terms of the period. This extracts valuable information that can be used in further calculations that rely on a specific temporal spacing between observations.