This package provides a collection of methods for both the rank-based estimates and least-square estimates to the Accelerated Failure Time (AFT) model. For rank-based estimation, it provides approaches that include the computationally efficient Gehan's weight and the general's weight such as the logrank weight. Details of the rank-based estimation can be found in Chiou et al. (2014) <doi:10.1007/s11222-013-9388-2> and Chiou et al. (2015) <doi:10.1002/sim.6415>. For the least-square estimation, the estimating equation is solved with generalized estimating equations (GEE). Moreover, in multivariate cases, the dependence working correlation structure can be specified in GEE's setting. Details on the least-squares estimation can be found in Chiou et al. (2014) <doi:10.1007/s10985-014-9292-x>.
This package performs causal mediation analysis for count and zero-inflated count data without or with a post-treatment confounder; calculates power to detect prespecified causal mediation effects, direct effects, and total effects; performs sensitivity analysis when there is a treatment- induced mediator-outcome confounder as described by Cheng, J., Cheng, N.F., Guo, Z., Gregorich, S., Ismail, A.I., Gansky, S.A. (2018) <doi:10.1177/0962280216686131>. Implements Instrumental Variable (IV) method to estimate the controlled (natural) direct and mediation effects, and compute the bootstrap Confidence Intervals as described by Guo, Z., Small, D.S., Gansky, S.A., Cheng, J. (2018) <doi:10.1111/rssc.12233>. This software was made possible by Grant R03DE028410 from the National Institute of Dental and Craniofacial Research, a component of the National Institutes of Health.
This is a cross-platform linear model to SQL compiler. It generates SQL from linear and generalized linear models. Its interface consists of a single function, modelc()
, which takes the output of lm()
or glm()
functions (or any object which has the same signature) and outputs a SQL character vector representing the predictions on the scale of the response variable as described in Dunn & Smith (2018) <doi:10.1007/978-1-4419-0118-7> and originating in Nelder & Wedderburn (1972) <doi:10.2307/2344614>. The resultant SQL can be included in a SELECT statement and returns output similar to that of the glm.predict()
or lm.predict()
predictions, assuming numeric types are represented in the database using sufficient precision. Currently log and identity link functions are supported.
Stop signal task data of go and stop trials is generated per participant. The simulation process is based on the generally non-independent horse race model and fixed stop signal delay or tracking method. Each of go and stop process is assumed having exponentially modified Gaussian(ExG
) or Shifted Wald (SW) distributions. The output data can be converted to BEESTS software input data enabling researchers to test and evaluate various brain stopping processes manifested by ExG
or SW distributional parameters of interest. Methods are described in: Soltanifar M (2020) <https://hdl.handle.net/1807/101208>, Matzke D, Love J, Wiecki TV, Brown SD, Logan GD and Wagenmakers E-J (2013) <doi:10.3389/fpsyg.2013.00918>, Logan GD, Van Zandt T, Verbruggen F, Wagenmakers EJ. (2014) <doi:10.1037/a0035230>.
Enables the user to calculate Value at Risk (VaR
) and Expected Shortfall (ES) by means of various parametric and semiparametric GARCH-type models. For the latter the estimation of the nonparametric scale function is carried out by means of a data-driven smoothing approach. Model quality, in terms of forecasting VaR
and ES, can be assessed by means of various backtesting methods such as the traffic light test for VaR
and a newly developed traffic light test for ES. The approaches implemented in this package are described in e.g. Feng Y., Beran J., Letmathe S. and Ghosh S. (2020) <https://ideas.repec.org/p/pdn/ciepap/137.html> as well as Letmathe S., Feng Y. and Uhde A. (2021) <https://ideas.repec.org/p/pdn/ciepap/141.html>.
Reconstructing gene regulatory networks and transcription factor activity is crucial to understand biological processes and holds potential for developing personalized treatment. Yet, it is still an open problem as state-of-art algorithm are often not able to handle large amounts of data. Furthermore, many of the present methods predict numerous false positives and are unable to integrate other sources of information such as previously known interactions. Here we introduce KBoost, an algorithm that uses kernel PCA regression, boosting and Bayesian model averaging for fast and accurate reconstruction of gene regulatory networks. KBoost can also use a prior network built on previously known transcription factor targets. We have benchmarked KBoost using three different datasets against other high performing algorithms. The results show that our method compares favourably to other methods across datasets.
An implementation of the ALFAM2 dynamic emission model for ammonia volatilization from field-applied animal slurry (manure with dry matter below about 15%). The model can be used to predict cumulative emission and emission rate of ammonia following field application of slurry. Predictions may be useful for emission inventory calculations, fertilizer management, assessment of mitigation strategies, or research aimed at understanding ammonia emission. Default parameter sets include effects of application method, slurry composition, and weather. The model structure is based on a simplified representation of the physical-chemical slurry-soil-atmosphere system. See Hafner et al. (2018) <doi:10.1016/j.atmosenv.2018.11.034> for information on the model and Hafner et al. (2019) <doi:10.1016/j.agrformet.2017.11.027> for more on the measurement data used for parameter development.
This package provides a novel PRS model is introduced to enhance the prediction accuracy by utilising GxE
effects. This package performs Genome Wide Association Studies (GWAS) and Genome Wide Environment Interaction Studies (GWEIS) using a discovery dataset. The package has the ability to obtain polygenic risk scores (PRSs) for a target sample. Finally it predicts the risk values of each individual in the target sample. Users have the choice of using existing models (Li et al., 2015) <doi:10.1093/annonc/mdu565>, (Pandis et al., 2013) <doi:10.1093/ejo/cjt054>, (Peyrot et al., 2018) <doi:10.1016/j.biopsych.2017.09.009> and (Song et al., 2022) <doi:10.1038/s41467-022-32407-9>, as well as newly proposed models for genomic risk prediction (refer to the URL for more details).
This resource provides tools to create, compare, and post-process spatial isotope assignment models of animal origin. It generates probability-of-origin maps for individuals based on user-provided tissue and environment isotope values (e.g., as generated by IsoMAP
, Bowen et al. [2013] <doi:10.1111/2041-210X.12147>) using the framework established in Bowen et al. (2010) <doi:10.1146/annurev-earth-040809-152429>). The package isocat can then quantitatively compare and cluster these maps to group individuals by similar origin. It also includes techniques for applying four approaches (cumulative sum, odds ratio, quantile only, and quantile simulation) with which users can summarize geographic origins and probable distance traveled by individuals. Campbell et al. [2020] establishes several of the functions included in this package <doi:10.1515/ami-2020-0004>.
Runs a Shiny web application that merges raw qPCR
fluorescence data with related metadata to visualize species presence/absence detection patterns and assess data quality. The application calculates threshold values from raw fluorescence data using a method based on the second derivative method, Luu-The et al (2005) <doi:10.2144/05382RR05>, and utilizes the âchipPCRâ
package by Rödiger, Burdukiewicz, & Schierack (2015) <doi:10.1093/bioinformatics/btv205> to calculate Cq values. The application has the ability to connect to a custom developed MySQL
database to populate the applications interface. The application allows users to interact with visualizations such as a dynamic map, amplification curves and standard curves, that allow for zooming and/or filtering. It also enables the generation of customized exportable reports based on filtered mapping data.
Intended to create standard human-in-the-loop validity tests for typical automated content analysis such as topic modeling and dictionary-based methods. This package offers a standard workflow with functions to prepare, administer and evaluate a human-in-the-loop validity test. This package provides functions for validating topic models using word intrusion, topic intrusion (Chang et al. 2009, <https://papers.nips.cc/paper/3700-reading-tea-leaves-how-humans-interpret-topic-models>) and word set intrusion (Ying et al. 2021) <doi:10.1017/pan.2021.33> tests. This package also provides functions for generating gold-standard data which are useful for validating dictionary-based methods. The default settings of all generated tests match those suggested in Chang et al. (2009) and Song et al. (2020) <doi:10.1080/10584609.2020.1723752>.
Carries out model-based clustering, classification and discriminant analysis using five different models. The models are all based on the generalized hyperbolic distribution. The first model MGHD (Browne and McNicholas
(2015) <doi:10.1002/cjs.11246>) is the classical mixture of generalized hyperbolic distributions. The MGHFA (Tortora et al. (2016) <doi:10.1007/s11634-015-0204-z>) is the mixture of generalized hyperbolic factor analyzers for high dimensional data sets. The MSGHD is the mixture of multiple scaled generalized hyperbolic distributions, the cMSGHD
is a MSGHD with convex contour plots and the MCGHD', mixture of coalesced generalized hyperbolic distributions is a new more flexible model (Tortora et al. (2019)<doi:10.1007/s00357-019-09319-3>. The paper related to the software can be found at <doi:10.18637/jss.v098.i03>.
Estimate the correlation between two irregular time series that are not necessarily sampled on identical time points. This program is also applicable to the situation of two evenly spaced time series that are not on the same time grid. BINCOR is based on a novel estimation approach proposed by Mudelsee (2010, 2014) to estimate the correlation between two climate time series with different timescales. The idea is that autocorrelation (AR1 process) allows to correlate values obtained on different time points. BINCOR contains four functions: bin_cor()
(the main function to build the binned time series), plot_ts()
(to plot and compare the irregular and binned time series, cor_ts()
(to estimate the correlation between the binned time series) and ccf_ts()
(to estimate the cross-correlation between the binned time series).
An implementation of common statistical analysis and models with differential privacy (Dwork et al., 2006a) <doi:10.1007/11681878_14> guarantees. The package contains, for example, functions providing differentially private computations of mean, variance, median, histograms, and contingency tables. It also implements some statistical models and machine learning algorithms such as linear regression (Kifer et al., 2012) <https://proceedings.mlr.press/v23/kifer12.html> and SVM (Chaudhuri et al., 2011) <https://jmlr.org/papers/v12/chaudhuri11a.html>. In addition, it implements some popular randomization mechanisms, including the Laplace mechanism (Dwork et al., 2006a) <doi:10.1007/11681878_14>, Gaussian mechanism (Dwork et al., 2006b) <doi:10.1007/11761679_29>, analytic Gaussian mechanism (Balle & Wang, 2018) <https://proceedings.mlr.press/v80/balle18a.html>, and exponential mechanism (McSherry
& Talwar, 2007) <doi:10.1109/FOCS.2007.66>.
Post-construction fatality monitoring studies at wind facilities are based on data from searches for bird and bat carcasses in plots beneath turbines. Bird and bat carcasses can fall outside of the search plot. Bird and bat carcasses from wind turbines often fall outside of the searched area. To compensate, area correction (AC) estimations are calculated to estimate the percentage of fatalities that fall within the searched area versus those that fall outside of it. This package provides two likelihood based methods and one physics based method (Hull and Muir (2010) <doi:10.1080/14486563.2010.9725253>, Huso and Dalthorp (2014) <doi:10.1002/jwmg.663>) to estimate the carcass fall distribution. There are also functions for calculating the proportion of area searched within one unit annuli, log logistic distribution functions, and truncated distribution functions.
BUSseq R package fits an interpretable Bayesian hierarchical model---the Batch Effects Correction with Unknown Subtypes for scRNA
seq Data (BUSseq)---to correct batch effects in the presence of unknown cell types. BUSseq is able to simultaneously correct batch effects, clusters cell types, and takes care of the count data nature, the overdispersion, the dropout events, and the cell-specific sequencing depth of scRNA-seq
data. After correcting the batch effects with BUSseq, the corrected value can be used for downstream analysis as if all cells were sequenced in a single batch. BUSseq can integrate read count matrices obtained from different scRNA-seq
platforms and allow cell types to be measured in some but not all of the batches as long as the experimental design fulfills the conditions listed in our manuscript.
This package contains a function that imports data from a CSV file, or uses manually entered data from the format (x, y, weight) and plots the appropriate ACC vs LOI graph and LMA graph. The main function is plotLMA
(source file, header) that takes a data set and plots the appropriate LMA and ACC graphs. If no source file (a string) was passed, a manual data entry window is opened. The header parameter indicates by TRUE/FALSE (false by default) if the source CSV file has a header row or not. The dataset should contain only one independent variable (x) and one dependent variable (y) and can contain a weight for each observation.
This package covers many important models used in marketing and micro-econometrics applications, including Bayes Regression (univariate or multivariate dep var), Bayes Seemingly Unrelated Regression (SUR), Binary and Ordinal Probit, Multinomial Logit (MNL) and Multinomial Probit (MNP), Multivariate Probit, Negative Binomial (Poisson) Regression, Multivariate Mixtures of Normals (including clustering), Dirichlet Process Prior Density Estimation with normal base, Hierarchical Linear Models with normal prior and covariates, Hierarchical Linear Models with a mixture of normals prior and covariates, Hierarchical Multinomial Logits with a mixture of normals prior and covariates, Hierarchical Multinomial Logits with a Dirichlet Process prior and covariates, Hierarchical Negative Binomial Regression Models, Bayesian analysis of choice-based conjoint data, Bayesian treatment of linear instrumental variables models, Analysis of Multivariate Ordinal survey data with scale usage heterogeneity, and Bayesian Analysis of Aggregate Random Coefficient Logit Models.
This package provides a set of functions to access the ARDECO (Annual Regional Database of the European Commission) data directly from the official ARDECO public repository through the exploitation of the ARDECO APIs. The APIs are completely transparent to the user and the provided functions provide a direct access to the ARDECO data. The ARDECO database is a collection of variables related to demography, employment, labour market, domestic product, capital formation. Each variable can be exposed in one or more units of measure as well as refers to total values plus additional dimensions like economic sectors, gender, age classes. Data can be also aggregated at country level according to the tercet classes as defined by EUROSTAT. The description of the ARDECO database can be found at the following URL <https://urban.jrc.ec.europa.eu/ardeco>.
MDS is a statistic tool for reduction of dimensionality, using as input a distance matrix of dimensions n à n. When n is large, classical algorithms suffer from computational problems and MDS configuration can not be obtained. With this package, we address these problems by means of six algorithms, being two of them original proposals: - Landmark MDS proposed by De Silva V. and JB. Tenenbaum (2004). - Interpolation MDS proposed by Delicado P. and C. Pachón-Garcà a (2021) <arXiv:2007.11919>
(original proposal). - Reduced MDS proposed by Paradis E (2018). - Pivot MDS proposed by Brandes U. and C. Pich (2007) - Divide-and-conquer MDS proposed by Delicado P. and C. Pachón-Garcà a (2021) <arXiv:2007.11919>
(original proposal). - Fast MDS, proposed by Yang, T., J. Liu, L. McMillan
and W. Wang (2006).
Genome-wide association studies (GWAS) have been widely used for identifying common variants associated with complex diseases. Due to the small effect sizes of common variants, the power to detect individual risk variants is generally low. Complementary to SNP-level analysis, a variety of gene-based association tests have been proposed. However, the power of existing gene-based tests is often dependent on the underlying genetic models, and it is not known a priori which test is optimal. Here we proposed COMBined Association Test (COMBAT) to incorporate strengths from multiple existing gene-based tests, including VEGAS, GATES and simpleM
. Compared to individual tests, COMBAT shows higher overall performance and robustness across a wide range of genetic models. The algorithm behind this method is described in Wang et al (2017) <doi:10.1534/genetics.117.300257>.
Chaos theory has been hailed as a revolution of thoughts and attracting ever increasing attention of many scientists from diverse disciplines. Chaotic systems are nonlinear deterministic dynamic systems which can behave like an erratic and apparently random motion. A relevant field inside chaos theory and nonlinear time series analysis is the detection of a chaotic behaviour from empirical time series data. One of the main features of chaos is the well known initial value sensitivity property. Methods and techniques related to test the hypothesis of chaos try to quantify the initial value sensitive property estimating the Lyapunov exponents. The DChaos package provides different useful tools and efficient algorithms which test robustly the hypothesis of chaos based on the Lyapunov exponent in order to know if the data generating process behind time series behave chaotically or not.
Reliability Analysis and Maintenance Optimization using Hidden Markov Models (HMM). The use of HMMs to model the state of a system which is not directly observable and instead certain indicators (signals) of the true situation are provided via a control system. A hidden model can provide key information about the system dependability, such as the reliability of the system and related measures. An estimation procedure is implemented based on the Baum-Welch algorithm. Classical structures such as K-out-of-N systems and Shock models are illustrated. Finally, the maintenance of the system is considered in the HMM context and two functions for new preventive maintenance strategies are considered. Maintenance efficiency is measured in terms of expected cost. Methods are described in Gamiz, Limnios, and Segovia-Garcia (2023) <doi:10.1016/j.ejor.2022.05.006>.
This package provides a computational method that infers copy number variations (CNV) in cancer scRNA-seq data and reconstructs the tumor phylogeny. It integrates signals from gene expression, allelic ratio, and population haplotype structures to accurately infer allele-specific CNVs in single cells and reconstruct their lineage relationship. It does not require tumor/normal-paired DNA or genotype data, but operates solely on the donor scRNA-data data (for example, 10x Cell Ranger output). It can be used to:
detect allele-specific copy number variations from single-cells
differentiate tumor versus normal cells in the tumor microenvironment
infer the clonal architecture and evolutionary history of profiled tumors
For details on the method see Gao et al in Nature Biotechnology 2022.