This package provides methods and tools for forecasting univariate time series using the NARFIMA (Neural AutoRegressive Fractionally Integrated Moving Average) model. It combines neural networks with fractional differencing to capture both nonlinear patterns and long-term dependencies. The NARFIMA model supports seasonal adjustment, Box-Cox transformations, optional exogenous variables, and the computation of prediction intervals. In addition to the NARFIMA model, this package provides alternative forecasting models including NARIMA (Neural ARIMA), NBSTS (Neural Bayesian Structural Time Series), and NNaive (Neural Naive) for performance comparison across different modeling approaches. The methods are based on algorithms introduced by Chakraborty et al. (2025) <doi:10.48550/arXiv.2509.06697>.
Clustering is unsupervised and exploratory in nature. Yet, it can be performed through penalized regression with grouping pursuit. In this package, we provide two algorithms for fitting the penalized regression-based clustering (PRclust) with non-convex grouping penalties, such as group truncated lasso, MCP and SCAD. One algorithm is based on quadratic penalty and difference convex method. Another algorithm is based on difference convex and ADMM, called DC-ADD, which is more efficient. Generalized cross validation and stability based method were provided to select the tuning parameters. Rand index, adjusted Rand index and Jaccard index were provided to estimate the agreement between estimated cluster memberships and the truth.
An implementation of the sample size computation method for network models proposed by Constantin et al. (2023) <doi:10.1037/met0000555>. The implementation takes the form of a three-step recursive algorithm designed to find an optimal sample size given a model specification and a performance measure of interest. It starts with a Monte Carlo simulation step for computing the performance measure and a statistic at various sample sizes selected from an initial sample size range. It continues with a monotone curve-fitting step for interpolating the statistic across the entire sample size range. The final step employs stratified bootstrapping to quantify the uncertainty around the fitted curve.
TEMPoral TEnsor Decomposition (TEMPTED), is a dimension reduction method for multivariate longitudinal data with varying temporal sampling. It formats the data into a temporal tensor and decomposes it into a summation of low-dimensional components, each consisting of a subject loading vector, a feature loading vector, and a continuous temporal loading function. These loadings provide a low-dimensional representation of subjects or samples and can be used to identify features associated with clusters of subjects or samples. TEMPTED provides the flexibility of allowing subjects to have different temporal sampling, so time points do not need to be binned, and missing time points do not need to be imputed.
It is a versatile tool for predicting time series data using Long Short-Term Memory (LSTM) models. It is specifically designed to handle time series with an exogenous variable, allowing users to denote whether data was available for a particular period or not. The package encompasses various functionalities, including hyperparameter tuning, custom loss function support, model evaluation, and one-step-ahead forecasting. With an emphasis on ease of use and flexibility, it empowers users to explore, evaluate, and deploy LSTM models for accurate time series predictions and forecasting in diverse applications. More details can be found in Garai and Paul (2023) <doi:10.1016/j.iswa.2023.200202>.
R7RS-small Scheme library for reading and writing RSV (Rows of String Values) data format, a very simple binary format for storing tables of strings. It is a competitor for e.g. CSV (Comma Seperated Values), and TSV (Tab Separated Values). Its main benefit is that the strings are represented as Unicode encoded as UTF-8, and the value and row separators are byte values that are never used in UTF-8, so the strings do not need any error prone escaping and thus can be written and read verbatim.
Specified in https://github.com/Stenway/RSV-Specification and demonstrated in https://www.youtube.com/watch?v=tb_70o6ohMA.
This package provides tools to estimate tail area-based false discovery rates as well as local false discovery rates for a variety of null models (p-values, z-scores, correlation coefficients, t-scores). The proportion of null values and the parameters of the null distribution are adaptively estimated from the data. In addition, the package contains functions for non-parametric density estimation (Grenander estimator), for monotone regression (isotonic regression and antitonic regression with weights), for computing the greatest convex minorant (GCM) and the least concave majorant (LCM), for the half-normal and correlation distributions, and for computing empirical higher criticism (HC) scores and the corresponding decision threshold.
The empirical cumulative average deviation function introduced by the author is utilized to develop both Ad- and Ud-plots. The Ad-plot can identify symmetry, skewness, and outliers of the data distribution, including anomalies. The Ud-plot created by slightly modifying Ad-plot is exceptional in assessing normality, outperforming normal QQ-plot, normal PP-plot, and their derivations. The d-value that quantifies the degree of proximity between the Ud-plot and the graph of the estimated normal density function helps guide to make decisions on confirmation of normality. Full description of this methodology can be found in the article by Wijesuriya (2025) <doi:10.1080/03610926.2024.2440583>.
This package provides a tool that implements the clustering algorithms from mothur (Schloss PD et al. (2009) <doi:10.1128/AEM.01541-09>). clustur make use of the cluster() and make.shared() command from mothur'. Our cluster() function has five different algorithms implemented: OptiClust', furthest', nearest', average', and weighted'. OptiClust is an optimized clustering method for Operational Taxonomic Units, and you can learn more here, (Westcott SL, Schloss PD (2017) <doi:10.1128/mspheredirect.00073-17>). The make.shared() command is always applied at the end of the clustering command. This functionality allows us to generate and create clustering and abundance data efficiently.
This package contains implementations of the integrative Cox model with uncertain event times proposed by Wang, et al. (2020) <doi:10.1214/19-AOAS1287>, the regularized Cox cure rate model with uncertain event status proposed by Wang, et al. (2023) <doi:10.1007/s12561-023-09374-w>, and other survival analysis routines including the Cox cure rate model proposed by Kuk and Chen (1992) <doi:10.1093/biomet/79.3.531> via an EM algorithm proposed by Sy and Taylor (2000) <doi:10.1111/j.0006-341X.2000.00227.x>, the regularized Cox cure rate model with elastic net penalty following Masud et al. (2018) <doi:10.1177/0962280216677748>.
This package implements multiple variants of the Information Bottleneck ('IB') method for clustering datasets containing continuous, categorical (nominal/ordinal) and mixed-type variables. The package provides deterministic, agglomerative, generalized, and standard IB clustering algorithms that preserve relevant information while forming interpretable clusters. The Deterministic Information Bottleneck is described in Costa et al. (2024) <doi:10.48550/arXiv.2407.03389>. The standard IB method originates from Tishby et al. (2000) <doi:10.48550/arXiv.physics/0004057>, the agglomerative variant from Slonim and Tishby (1999) <https://papers.nips.cc/paper/1651-agglomerative-information-bottleneck>, and the generalized IB from Strouse and Schwab (2017) <doi:10.1162/NECO_a_00961>.
This package contains functions that allow Bayesian meta-analysis (1) with binomial data, counts(y) and total counts (n) or, (2) with user-supplied point estimates and associated variances. Case (1) provides an analysis based on the logit transformation of the sample proportion. This methodology is also appropriate for combining data from sample surveys and related sources. The functions can calculate the corresponding similarity matrix. More details can be found in Cahoy and Sedransk (2023), Cahoy and Sedransk (2022) <doi:10.1007/s42519-018-0027-2>, Evans and Sedransk (2001) <doi:10.1093/biomet/88.3.643>, and Malec and Sedransk (1992) <doi:10.1093/biomet/79.3.593>.
This package provides a facility to generate various classes of fractional designs for order-of-addition experiments namely fractional order-of-additions orthogonal arrays, see Voelkel, Joseph G. (2019). "The design of order-of-addition experiments." Journal of Quality Technology 51:3, 230-241, <doi:10.1080/00224065.2019.1569958>. Provides facility to construct component orthogonal arrays, see Jian-Feng Yang, Fasheng Sun and Hongquan Xu (2020). "A Component Position Model, Analysis and Design for Order-of-Addition Experiments." Technometrics, <doi:10.1080/00401706.2020.1764394>. Supports generation of fractional designs for order-of-addition mixture experiments. Analysis of data from order-of-addition mixture experiments is also supported.
These guidelines are meant to provide a pragmatic, yet rigorous, help to drug developers and decision makers, since they are shaped by three fundamental ingredients: the clinically determined margin of detriment on OS that is unacceptably high (delta null); the benefit on OS that is plausible given the mechanism of action of the novel intervention (delta alt); and the quantity of information (i.e. survival events) it is feasible to accrue given the clinical and drug development setting. The proposed guidelines facilitate transparent discussions between stakeholders focusing on the risks of erroneous decisions and what might be an acceptable trade-off between power and the false positive error rate.
Fit a logistic regression model using Firth's bias reduction method, equivalent to penalization of the log-likelihood by the Jeffreys prior. Confidence intervals for regression coefficients can be computed by penalized profile likelihood. Firth's method was proposed as ideal solution to the problem of separation in logistic regression, see Heinze and Schemper (2002) <doi:10.1002/sim.1047>. If needed, the bias reduction can be turned off such that ordinary maximum likelihood logistic regression is obtained. Two new modifications of Firth's method, FLIC and FLAC, lead to unbiased predictions and are now available in the package as well, see Puhr et al (2017) <doi:10.1002/sim.7273>.
Classification of pediatric tumors into biologically defined subtypes is challenging and multifaceted approaches are needed. For this aim, we developed a diagnostic classifier based on DNA methylation profiles. We offer MethPed as an easy-to-use toolbox that allows researchers and clinical diagnosticians to test single samples as well as large cohorts for subclass prediction of pediatric brain tumors. The current version of MethPed can classify the following tumor diagnoses/subgroups: Diffuse Intrinsic Pontine Glioma (DIPG), Ependymoma, Embryonal tumors with multilayered rosettes (ETMR), Glioblastoma (GBM), Medulloblastoma (MB) - Group 3 (MB_Gr3), Group 4 (MB_Gr3), Group WNT (MB_WNT), Group SHH (MB_SHH) and Pilocytic Astrocytoma (PiloAstro).
An automated and streamlined workflow for predictive climate mapping using climate station data. Works within an environment the user provides a destined path to - otherwise it's tempdir(). Quick and relatively easy creation of resilient and reproducible climate models, predictions and climate maps, shortening the usually long and complicated work of predictive modelling. For more information, please find the provided URL. Many methods in this package are new, but the main method is based on a workflow from Meyer (2019) <doi:10.1016/j.ecolmodel.2019.108815> and Meyer (2022) <doi:10.1038/s41467-022-29838-9> , however, it was generalized and adjusted in the context of this package.
An R-based application for exploratory data analysis of global EvapoTranspiration (ET) datasets. evapoRe enables users to download, validate, visualize, and analyze multi-source ET data across various spatio-temporal scales. Also, the package offers calculation methods for estimating potential ET (PET), including temperature-based, combined type, and radiation-based approaches described in : Oudin et al., (2005) <doi:10.1016/j.jhydrol.2004.08.026>. evapoRe supports hydrological modeling, climate studies, agricultural research, and other data-driven fields by facilitating access to ET data and offering powerful analysis capabilities. Users can seamlessly integrate the package into their research applications and explore diverse ET data at different resolutions.
Three sets of data and functions for informing ecosystem restoration decisions, particularly in the context of the U.S. Army Corps of Engineers. First, model parameters are compiled as a data set and associated metadata for over 300 habitat suitability models developed by the U.S. Fish and Wildlife Service (USFWS 1980, <https://www.fws.gov/policy-library/870fw1>). Second, functions for conducting habitat suitability analyses both for the models described above as well as generic user-specified model parameterizations. Third, a suite of decision support tools for conducting cost-effectiveness and incremental cost analyses (Robinson et al. 1995, IWR Report 95-R-1, U.S. Army Corps of Engineers).
This package provides a long-term forecast model called "Jubilee-Tectonic model" is implemented to forecast future returns of the U.S. stock market, Treasury yield, and gold price. The five-factor model forecasts the 10-year and 20-year future equity returns with high R-squared above 80 percent. It is based on linear growth and mean reversion characteristics in the U.S. stock market. This model also enhances the CAPE model by introducing the hypothesis that there are fault lines in the historical CAPE, which can be calibrated and corrected through statistical learning. In addition, it contains a module for business cycles, optimal interest rate, and recession forecasts.
This package creates interactive trees that can be included in Shiny apps and R markdown documents. A tree allows to represent hierarchical data (e.g. the contents of a directory). Similar to the shinyTree package but offers more features and options, such as the grid extension, restricting the drag-and-drop behavior, and settings for the search functionality. It is possible to attach some data to the nodes of a tree and then to get these data in Shiny when a node is selected. Also provides a Shiny gadget allowing to manipulate one or more folders, and a Shiny module allowing to navigate in the server side file system.
This package provides stochastic EM algorithms for latent variable models with a high-dimensional latent space. So far, we provide functions for confirmatory item factor analysis based on the multidimensional two parameter logistic (M2PL) model and the generalized multidimensional partial credit model. These functions scale well for problems with many latent traits (e.g., thirty or even more) and are virtually tuning-free. The computation is facilitated by multiprocessing OpenMP API. For more information, please refer to: Zhang, S., Chen, Y., & Liu, Y. (2018). An Improved Stochastic EM Algorithm for Large-scale Full-information Item Factor Analysis. British Journal of Mathematical and Statistical Psychology. <doi:10.1111/bmsp.12153>.
Obtain least-squares means for linear, generalized linear, and mixed models. Compute contrasts or linear functions of least-squares means, and comparisons of slopes. Plots and compact letter displays. Least-squares means were proposed in Harvey, W (1960) "Least-squares analysis of data with unequal subclass numbers", Tech Report ARS-20-8, USDA National Agricultural Library, and discussed further in Searle, Speed, and Milliken (1980) "Population marginal means in the linear model: An alternative to least squares means", The American Statistician 34(4), 216-221 <doi:10.1080/00031305.1980.10483031>. NOTE: lsmeans now relies primarily on code in the emmeans package. lsmeans will be archived in the near future.
The main function, plot_GMM, is used for plotting output from Gaussian mixture models (GMMs), including both densities and overlaying mixture weight component curves from the fit GMM. The package also include the function, plot_cut_point, which plots the cutpoint (mu) from the GMM over a histogram of the distribution with several color options. Finally, the package includes the function, plot_mix_comps, which is used in the plot_GMM function, and can be used to create a custom plot for overlaying mixture component curves from GMMs. For the plot_mix_comps function, usage most often will be specifying the "fun" argument within "stat_function" in a ggplot2 object.