Identifies clusters of individual longitudinal trajectories. In the spirit of Leffondre et al. (2004), the procedure involves identifying each trajectory to a point in the space of measures. In this context, a measure is a quantity meant to capture a certain characteristic feature of the trajectory. The points in the space of measures are then clustered using a version of the Spectral Clustering algorithm.
Semiparametric modeling of lifetime data with crossing survival curves via Yang and Prentice model with baseline hazard/odds modeled with Bernstein polynomials. Details about the model can be found in Demarqui et al. (2019) <arXiv:1910.04475>. Model fitting can be carried out via both maximum likelihood and Bayesian approaches. The package also provides point and interval estimation for the crossing survival times.
Autoencoding Random Forests ('RFAE') provide a method to autoencode mixed-type tabular data using Random Forests ('RF'), which involves projecting the data to a latent feature space of user-chosen dimensionality (usually a lower dimension), and then decoding the latent representations back into the input space. The encoding stage is useful for feature engineering and data visualisation tasks, akin to how principal component analysis ('PCA') is used, and the decoding stage is useful for compression and denoising tasks. At its core, RFAE is a post-processing pipeline on a trained random forest model. This means that it can accept any trained RF of ranger object type: RF', URF or ARF'. Because of this, it inherits Random Forests robust performance and capacity to seamlessly handle mixed-type tabular data. For more details, see Vu et al. (2025) <doi:10.48550/arXiv.2505.21441>.
Least Angle Regression ("LAR") is a model selection algorithm; a useful and less greedy version of traditional forward selection methods. A simple modification of the LAR algorithm implements Tibshirani's Lasso; the Lasso modification of LARS calculates the entire Lasso path of coefficients for a given problem at the cost of a single least squares fit. Another LARS modification efficiently implements epsilon Forward Stagewise linear regression.
An R interface for the remote file hosting service Box (<https://www.box.com/>). In addition to uploading and downloading files, this package includes functions which mirror base R operations for local files, (e.g. box_load(), box_save(), box_read(), box_setwd(), etc.), as well as git style functions for entire directories (e.g. box_fetch(), box_push()).
Estimation of hierarchical Bayesian vector autoregressive models following Kuschnig & Vashold (2021) <doi:10.18637/jss.v100.i14>. Implements hierarchical prior selection for conjugate priors in the fashion of Giannone, Lenza & Primiceri (2015) <doi:10.1162/REST_a_00483>. Functions to compute and identify impulse responses, calculate forecasts, forecast error variance decompositions and scenarios are available. Several methods to print, plot and summarise results facilitate analysis.
This package implements v2 of the B.L.S. API for requests of survey information and time series data through 3-tiered API that allows users to interact with the raw API directly, create queries through a functional interface, and re-shape the data structures returned to fit common uses. The API definition is located at: <https://www.bls.gov/developers/api_signature_v2.htm>.
This package provides a comprehensive framework for time series omics analysis, integrating changepoint detection, smooth and shape-constrained trends, and uncertainty quantification. It supports gene- and transcript-level inferences, p-value aggregation for improved power, and both case-only and case-control designs. It includes an interactive shiny interface. The methods are described in Yates et al. (2024) <doi:10.1101/2024.12.22.630003>.
Conditioned Latin hypercube sampling, as published by Minasny and McBratney (2006) <DOI:10.1016/j.cageo.2005.12.009>. This method proposes to stratify sampling in presence of ancillary data. An extension of this method, which propose to associate a cost to each individual and take it into account during the optimisation process, is also proposed (Roudier et al., 2012, <DOI:10.1201/b12728>).
Draws systematic samples from a population that follows linear trend. The function returns a matrix comprising of the required samples as its column vectors. The samples produced are highly efficient and the inter sampling variance is minimum. The scheme will be useful in various field like Bioinformatics where the samples are expensive and must be precise in reflecting the population by possessing least sampling variance.
This package provides an implementation of a mixture of hidden Markov models (HMMs) for discrete sequence data in the Discrete Bayesian HMM Clustering (DBHC) algorithm. The DBHC algorithm is an HMM Clustering algorithm that finds a mixture of discrete-output HMMs while using heuristics based on Bayesian Information Criterion (BIC) to search for the optimal number of HMM states and the optimal number of clusters.
This package provides functions of five estimation method for ED50 (50 percent effective dose) are provided, and they are respectively Dixon-Mood method (1948) <doi:10.2307/2280071>, Choi's original turning point method (1990) <doi:10.2307/2531453> and it's modified version given by us, as well as logistic regression and isotonic regression. Besides, the package also supports comparison between two estimation results.
This package provides implementations of computationally efficient maximum likelihood parameter estimation algorithms for models representing linear dynamical systems. Currently, two such algorithms (one offline and one online) are implemented for the single-output cumulative structural equation model with an additive-noise output measurement equation and assumptions of normality and independence. The corresponding scientific papers are referenced in the descriptions of the functions implementing these algorithms.
This package provides a simple wrapper around the ical.js library executing Javascript code via V8 (the Javascript engine driving the Chrome browser and Node.js and accessible via the V8 R package). This package enables users to parse iCalendar files ('.ics', .ifb', .iCal', .iFBf') into lists and data.frames to ultimately do statistics on events, meetings, schedules, birthdays, and the like.
Due to lack of proper inference procedure and software, the ordinary linear regression model is seldom used in practice for the analysis of right censored data. This paper presents an S-Plus/R program that implements a recently developed inference procedure (Jin, Lin and Ying, 2006) <doi:10.1093/biomet/93.1.147> for the accelerated failure time model based on the least-squares principle.
This package performs Bayesian linear regression and forecasting in astronomy. The method accounts for heteroscedastic errors in both the independent and the dependent variables, intrinsic scatters (in both variables) and scatter correlation, time evolution of slopes, normalization, scatters, Malmquist and Eddington bias, upper limits and break of linearity. The posterior distribution of the regression parameters is sampled with a Gibbs method exploiting the JAGS library.
Classification method obtained through linear programming. It is advantageous with respect to the classical developments when the distribution of the variables involved is unknown or when the number of variables is much greater than the number of individuals. Mathematical details behind the method are published in Nueda, et al. (2022) "LPDA: A new classification method based on linear programming". <doi:10.1371/journal.pone.0270403>.
This package contains a set of functions to create data libraries, generate data dictionaries, and simulate a data step. The libname() function will load a directory of data into a library in one line of code. The dictionary() function will generate data dictionaries for individual data frames or an entire library. And the datestep() function will perform row-by-row data processing.
This package performs multiple imputation of missing data using an ensemble super learner built with the tidymodels framework. For each incomplete column, a stacked ensemble of candidate learners is trained on a bootstrap sample of the observed data and used to generate imputations via predictive mean matching (continuous), probability draws (binary), or cumulative probability draws (categorical). Supports parallelism across imputed datasets via the future framework.
This package provides new functions info(), warn() and error(), similar to message(), warning() and stop() respectively. However, the new functions can have a level associated with them, so that when executed the global level option determines whether they are shown or not. This allows debug modes, outputting more information. The can also output all messages to a log file.
Ordered homogeneity pursuit lasso (OHPL) algorithm for group variable selection proposed in Lin et al. (2017) <DOI:10.1016/j.chemolab.2017.07.004>. The OHPL method exploits the homogeneity structure in high-dimensional data and enjoys the grouping effect to select groups of important variables automatically. This feature makes it particularly useful for high-dimensional datasets with strongly correlated variables, such as spectroscopic data.
We present a penalized log-density estimation method using Legendre polynomials with lasso penalty to adjust estimate's smoothness. Re-expressing the logarithm of the density estimator via a linear combination of Legendre polynomials, we can estimate parameters by maximizing the penalized log-likelihood function. Besides, we proposed an implementation strategy that builds on the coordinate decent algorithm, together with the Bayesian information criterion (BIC).
This package provides a system for fast, accurate, and flexible whole genome bisulfite sequencing (WGBS) data analysis of two-condition comparisons. Principal Component BiSulfite, PCBS', assigns methylated loci eigenvector values from the treatment-delineating principal component in lieu of running millions of pairwise statistical tests, which dramatically increases analysis flexibility and reduces computational requirements. Methods: <https://katlande.github.io/PCBS/articles/Differential_Methylation.html>.
This package implements recently developed projection pursuit algorithms for finding optimal linear cluster separators. The clustering algorithms use optimal hyperplane separators based on minimum density, Pavlidis et. al (2016) <http://jmlr.org/papers/volume17/15-307/15-307.pdf>; minimum normalised cut, Hofmeyr (2017) <doi:10.1109/TPAMI.2016.2609929>; and maximum variance ratio clusterability, Hofmeyr and Pavlidis (2015) <doi:10.1109/SSCI.2015.116>.