Training of neural networks for classification and regression tasks using mini-batch gradient descent. Special features include a function for training autoencoders, which can be used to detect anomalies, and some related plotting functions. Multiple activation functions are supported, including tanh, relu, step and ramp. For the use of the step and ramp activation functions in detecting anomalies using autoencoders, see Hawkins et al. (2002) <doi:10.1007/3-540-46145-0_17>. Furthermore, several loss functions are supported, including robust ones such as Huber and pseudo-Huber loss, as well as L1 and L2 regularization. The possible options for optimization algorithms are RMSprop, Adam and SGD with momentum. The package contains a vectorized C++ implementation that facilitates fast training through mini-batch learning.
This package provides a toolkit for Flux Balance Analysis and related metabolic modeling techniques. Functions are provided for: parsing models in tabular format, converting parsed metabolic models to input formats for common linear programming solvers, and evaluating and applying gene-protein-reaction mappings. In addition, there are wrappers to parse a model, select a solver, find the metabolic fluxes, and return the results applied to the original model. Compared to other packages in this field, this package puts a much heavier focus on providing reusable components that can be used in the design of new implementation of new techniques, in particular those that involve large parameter sweeps. For a background on the theory, see What is Flux Balance Analysis <doi:10.1038/nbt.1614>.
This package provides a novel searching scheme for tuning parameter in high-dimensional penalized regression. We propose a new estimate of the regularization parameter based on an estimated lower bound of the proportion of false null hypotheses (Meinshausen and Rice (2006) <doi:10.1214/009053605000000741>). The bound is estimated by applying the empirical null distribution of the higher criticism statistic, a second-level significance testing, which is constructed by dependent p-values from a multi-split regression and aggregation method (Jeng, Zhang and Tzeng (2019) <doi:10.1080/01621459.2018.1518236>). An estimate of tuning parameter in penalized regression is decided corresponding to the lower bound of the proportion of false null hypotheses. Different penalized regression methods are provided in the multi-split algorithm.
Mixed models for repeated measures (MMRM) are a popular choice for analyzing longitudinal continuous outcomes in randomized clinical trials and beyond; see Cnaan, Laird and Slasor (1997) <doi:10.1002/(SICI)1097-0258(19971030)16:20%3C2349::AID-SIM667%3E3.0.CO;2-E> for a tutorial and Mallinckrodt, Lane, Schnell, Peng and Mancuso (2008) <doi:10.1177/009286150804200402> for a review. This package implements MMRM based on the marginal linear model without random effects using Template Model Builder ('TMB') which enables fast and robust model fitting. Users can specify a variety of covariance matrices, weight observations, fit models with restricted or standard maximum likelihood inference, perform hypothesis testing with Satterthwaite or Kenward-Roger adjustment, and extract least square means estimates by using emmeans'.
Parameter estimation and classification for Gaussian Mixture Models (GMMs) in the presence of missing data. This package complements existing implementations by allowing for both missing elements in the input vectors and full (as opposed to strictly diagonal) covariance matrices. Estimation is performed using an expectation conditional maximization algorithm that accounts for missingness of both the cluster assignments and the vector components. The output includes the marginal cluster membership probabilities; the mean and covariance of each cluster; the posterior probabilities of cluster membership; and a completed version of the input data, with missing values imputed to their posterior expectations. For additional details, please see McCaw ZR, Julienne H, Aschard H. "Fitting Gaussian mixture models on incomplete data." <doi:10.1186/s12859-022-04740-9>.
This package contains the functions for construction and visualization of various families of the proximity catch digraphs (PCDs), see (Ceyhan (2005) ISBN:978-3-639-19063-2), for computing the graph invariants for testing the patterns of segregation and association against complete spatial randomness (CSR) or uniformity in one, two and three dimensional cases. The package also has tools for generating points from these spatial patterns. The graph invariants used in testing spatial point data are the domination number (Ceyhan (2011) <doi:10.1080/03610921003597211>) and arc density (Ceyhan et al. (2006) <doi:10.1016/j.csda.2005.03.002>; Ceyhan et al. (2007) <doi:10.1002/cjs.5550350106>). The PCD families considered are Arc-Slice PCDs, Proportional-Edge PCDs, and Central Similarity PCDs.
Most price indexes are made with a two-step procedure, where period-over-period elementary indexes are first calculated for a collection of elementary aggregates at each point in time, and then aggregated according to a price index aggregation structure. These indexes can then be chained together to form a time series that gives the evolution of prices with respect to a fixed base period. This package contains a collection of functions that revolve around this work flow, making it easy to build standard price indexes, and implement the methods described by Balk (2008, <doi:10.1017/CBO9780511720758>), von der Lippe (2007, <doi:10.3726/978-3-653-01120-3>), and the CPI manual (2020, <doi:10.5089/9781484354841.069>) for bilateral price indexes.
Data science methods used in wind energy applications. Current functionalities include creating a multi-dimensional power curve model, performing power curve function comparison, covariate matching, and energy decomposition. Relevant works for the developed functions are: funGP() - Prakash et al. (2022) <doi:10.1080/00401706.2021.1905073>, AMK() - Lee et al. (2015) <doi:10.1080/01621459.2014.977385>, tempGP() - Prakash et al. (2022) <doi:10.1080/00401706.2022.2069158>, ComparePCurve() - Ding et al. (2021) <doi:10.1016/j.renene.2021.02.136>, deltaEnergy() - Latiffianti et al. (2022) <doi:10.1002/we.2722>, syncSize() - Latiffianti et al. (2022) <doi:10.1002/we.2722>, imptPower() - Latiffianti et al. (2022) <doi:10.1002/we.2722>, All other functions - Ding (2019, ISBN:9780429956508).
This package provides deterministic forecasting for weekly, monthly, quarterly, and yearly time series using the Generalized Adaptive Capped Estimator. The method includes preprocessing for missing and extreme values, extraction of multiple growth components (including long-term, short-term, rolling, and drift-based signals), volatility-aware asymmetric capping, optional seasonal adjustment via damped and normalized seasonal factors, and a recursive forecast formulation with moderated growth. The package includes a user-facing forecasting interface and a plotting helper for visualization. Related forecasting background is discussed in Hyndman and Athanasopoulos (2021) <https://otexts.com/fpp3/> and Hyndman and Khandakar (2008) <doi:10.18637/jss.v027.i03>. The method extends classical extrapolative forecasting approaches and is suited for operational and business planning contexts where stability and interpretability are important.
Analysis of dyadic network and relational data using additive and multiplicative effects (AME) models. The basic model includes regression terms, the covariance structure of the social relations model (Warner, Kenny and Stoto (1979) <DOI:10.1037/0022-3514.37.10.1742>, Wong (1982) <DOI:10.2307/2287296>), and multiplicative factor models (Hoff(2009) <DOI:10.1007/s10588-008-9040-4>). Several different link functions accommodate different relational data structures, including binary/network data, normal relational data, zero-inflated positive outcomes using a tobit model, ordinal relational data and data from fixed-rank nomination schemes. Several of these link functions are discussed in Hoff, Fosdick, Volfovsky and Stovel (2013) <DOI:10.1017/nws.2013.17>. Development of this software was supported in part by NIH grant R01HD067509.
This package provides a set of functions to perform distribution-free Bayesian analyses. Included are Bayesian analogues to the frequentist Mann-Whitney U test, the Wilcoxon Signed-Ranks test, Kendall's Tau Rank Correlation Coefficient, Goodman and Kruskal's Gamma, McNemar's Test, the binomial test, the sign test, the median test, as well as distribution-free methods for testing contrasts among condition and for computing Bayes factors for hypotheses. The package also includes procedures to estimate the power of distribution-free Bayesian tests based on data simulations using various probability models for the data. The set of functions provide data analysts with a set of Bayesian procedures that avoids requiring parametric assumptions about measurement error and is robust to problem of extreme outlier scores.
This package provides a comprehensive framework for visualizing associations and interaction structures in matrix-formatted data using Generalized Association Plots (GAP). The package implements multiple proximity computation methods (e.g., correlation, distance metrics), ordering techniques including hierarchical clustering (HCT) and Rank-2-Ellipse (R2E) seriation, and optional flipping strategies to enhance visual symmetry. It supports a variety of covariate-based color annotations, allows flexible customization of layout and output, and is suitable for analyzing multivariate data across domains such as social sciences, genomics, and medical research. The method is based on Generalized Association Plots introduced by Chen (2002) <https://www3.stat.sinica.edu.tw/statistica/J12N1/J12N11/J12N11.html> and further extended by Wu, Tien, and Chen (2010) <doi:10.1016/j.csda.2008.09.029>.
Nested loop cross validation for classification purposes for misclassification error rate estimation. The package supports several methodologies for feature selection: random forest, Student t-test, limma, and provides an interface to the following classification methods in the MLInterfaces package: linear, quadratic discriminant analyses, random forest, bagging, prediction analysis for microarray, generalized linear model, support vector machine (svm and ksvm). Visualizations to assess the quality of the classifier are included: plot of the ranks of the features, scores plot for a specific classification algorithm and number of features, misclassification rate for the different number of features and classification algorithms tested and ROC plot. For further details about the methodology, please check: Markus Ruschhaupt, Wolfgang Huber, Annemarie Poustka, and Ulrich Mansmann (2004) <doi:10.2202/1544-6115.1078>.
This package provides a collection of functions to perform core tasks within Energy Trading and Risk Management (ETRM). Calculation of maximum smoothness forward price curves for electricity and natural gas contracts with flow delivery, as presented in F. E. Benth, S. Koekebakker, and F. Ollmar (2007) <doi:10.3905/jod.2007.694791> and F. E. Benth, J. S. Benth, and S. Koekebakker (2008) <doi:10.1142/6811>. Portfolio insurance trading strategies for price risk management in the forward market, see F. Black (1976) <doi:10.1016/0304-405X(76)90024-6>, T. Bjork (2009) <https://EconPapers.repec.org/RePEc:oxp:obooks:9780199574742>, F. Black and R. W. Jones (1987) <doi:10.3905/jpm.1987.409131> and H. E. Leland (1980) <http://www.jstor.org/stable/2327419>.
Analysis and visualization of similarities between epilepsy ontologies based on text mining results by comparing ranked lists of co-occurring drug terms in the BioASQ corpus. The ranked result lists of neurological drug terms co-occurring with terms from the epilepsy ontologies EpSO, ESSO, EPILONT, EPISEM and FENICS undergo further analysis. The source data to create the ranked lists of drug names is produced using the text mining workflows described in Mueller, Bernd and Hagelstein, Alexandra (2016) <doi:10.4126/FRL01-006408558>, Mueller, Bernd et al. (2017) <doi:10.1007/978-3-319-58694-6_22>, Mueller, Bernd and Rebholz-Schuhmann, Dietrich (2020) <doi:10.1007/978-3-030-43887-6_52>, and Mueller, Bernd et al. (2022) <doi:10.1186/s13326-021-00258-w>.
This package provides a set of objects and functions for Bayes Linear emulation and history matching. Core functionality includes automated training of emulators to data, diagnostic functions to ensure suitability, and a variety of proposal methods for generating waves of points. For details on the mathematical background, there are many papers available on the topic (see references attached to function help files or the below references); for details of the functions in this package, consult the manual or help files. Iskauskas, A, et al. (2024) <doi:10.18637/jss.v109.i10>. Bower, R.G., Goldstein, M., and Vernon, I. (2010) <doi:10.1214/10-BA524>. Craig, P.S., Goldstein, M., Seheult, A.H., and Smith, J.A. (1997) <doi:10.1007/978-1-4612-2290-3_2>.
The MSiP is a computational approach to predict protein-protein interactions from large-scale affinity purification mass spectrometry (AP-MS) data. This approach includes both spoke and matrix models for interpreting AP-MS data in a network context. The "spoke" model considers only bait-prey interactions, whereas the "matrix" model assumes that each of the identified proteins (baits and prey) in a given AP-MS experiment interacts with each of the others. The spoke model has a high false-negative rate, whereas the matrix model has a high false-positive rate. Although, both statistical models have merits, a combination of both models has shown to increase the performance of machine learning classifiers in terms of their capabilities in discrimination between true and false positive interactions.
Quasi likelihood-based methods for estimating linear and log-linear Poisson Network Autoregression models with p lags and covariates. Tools for testing the linearity versus several non-linear alternatives. Tools for simulation of multivariate count distributions, from linear and non-linear PNAR models, by using a specific copula construction. References include: Armillotta, M. and K. Fokianos (2023). "Nonlinear network autoregression". Annals of Statistics, 51(6): 2526--2552. <doi:10.1214/23-AOS2345>. Armillotta, M. and K. Fokianos (2024). "Count network autoregression". Journal of Time Series Analysis, 45(4): 584--612. <doi:10.1111/jtsa.12728>. Armillotta, M., Tsagris, M. and Fokianos, K. (2024). "Inference for Network Count Time Series with the R Package PNAR". The R Journal, 15/4: 255--269. <doi:10.32614/RJ-2023-094>.
Utilizes the Reliability-Adjusted Product Indicator (RAPI) method to estimate effects among latent variables, thus allowing for more precise definition and analysis of mediation and moderation models. Our simulation studies reveal that while silp may exhibit instability with smaller sample sizes and lower reliability scores (e.g., N = 100, omega = 0.7), implementing nearest positive definite matrix correction and bootstrap confidence interval estimation can significantly ameliorate this volatility. When these adjustments are applied, silp achieves estimations akin in quality to those derived from LMS. In conclusion, the silp package is a valuable tool for researchers seeking to explore complex relational structures between variables without resorting to commercial software. Cheung et al.(2021)<doi:10.1007/s10869-020-09717-0> Hsiao et al.(2018)<doi:10.1177/0013164416679877>.
Fit Bayesian generalized (non-)linear multivariate multilevel models using Stan for full Bayesian inference. A wide range of distributions and link functions are supported, allowing users to fit -- among others -- linear, robust linear, count data, survival, response times, ordinal, zero-inflated, hurdle, and even self-defined mixture models all in a multilevel context. Further modeling options include non-linear and smooth terms, auto-correlation structures, censored data, meta-analytic standard errors, and quite a few more. In addition, all parameters of the response distribution can be predicted in order to perform distributional regression. Prior specifications are flexible and explicitly encourage users to apply prior distributions that actually reflect their beliefs. Model fit can easily be assessed and compared with posterior predictive checks and leave-one-out cross-validation.
Most analyses of Affymetrix GeneChip data (including tranditional 3 arrays and exon arrays and Human Transcriptome Array 2.0) are based on point estimates of expression levels and ignore the uncertainty of such estimates. By propagating uncertainty to downstream analyses we can improve results from microarray analyses. For the first time, the puma package makes a suite of uncertainty propagation methods available to a general audience. In additon to calculte gene expression from Affymetrix 3 arrays, puma also provides methods to process exon arrays and produces gene and isoform expression for alternative splicing study. puma also offers improvements in terms of scope and speed of execution over previously available uncertainty propagation methods. Included are summarisation, differential expression detection, clustering and PCA methods, together with useful plotting functions.
Classification using Richard A. Harshman's Parallel Factor Analysis-1 (Parafac) model or Parallel Factor Analysis-2 (Parafac2) model fit to a three-way or four-way data array. See Harshman and Lundy (1994): <doi:10.1016/0167-9473(94)90132-5>. Uses component weights from one mode of a Parafac or Parafac2 model as features to tune parameters for one or more classification methods via a k-fold cross-validation procedure. Allows for constraints on different tensor modes. Supports penalized logistic regression, support vector machine, random forest, feed-forward neural network, regularized discriminant analysis, and gradient boosting machine. Supports binary and multiclass classification. Predicts class labels or class probabilities and calculates multiple classification performance measures. Implements parallel computing via the parallel', doParallel', and doRNG packages.
Estimate a total causal effect from observational data under linearity and causal sufficiency. The observational data is supposed to be generated from a linear structural equation model (SEM) with independent and additive noise. The underlying causal DAG associated the SEM is required to be known up to a maximally oriented partially directed graph (MPDAG), which is a general class of graphs consisting of both directed and undirected edges, including CPDAGs (i.e., essential graphs) and DAGs. Such graphs are usually obtained with structure learning algorithms with added background knowledge. The program is able to estimate every identified effect, including single and multiple treatment variables. Moreover, the resulting estimate has the minimal asymptotic covariance (and hence shortest confidence intervals) among all estimators that are based on the sample covariance.
Allows to estimate dynamic model averaging, dynamic model selection and median probability model. The original methods are implemented, as well as, selected further modifications of these methods. In particular the user might choose between recursive moment estimation and exponentially moving average for variance updating. Inclusion probabilities might be modified in a way using Google Trends'. The code is written in a way which minimises the computational burden (which is quite an obstacle for dynamic model averaging if many variables are used). For example, this package allows for parallel computations and Occam's window approach. The package is designed in a way that is hoped to be especially useful in economics and finance. Main reference: Raftery, A.E., Karny, M., Ettler, P. (2010) <doi:10.1198/TECH.2009.08104>.