Column Text Format (CTF) is a new tabular data format designed for simplicity and performance. CTF is the simplest column store you can imagine: plain text files for each column in a table, and a metadata file. The underlying plain text means the data is human readable and familiar to programmers, unlike specialized binary formats. CTF is faster than row oriented formats like CSV when loading a subset of the columns in a table. This package provides functions to read and write CTF data from R.
Functions, S4 classes/methods and a graphical user interface (GUI) to design surveys to substantiate freedom from disease using a modified hypergeometric function (see Cameron and Baldock, 1997, <doi:10.1016/s0167-5877(97)00081-0>). Herd sensitivities are computed according to sampling strategies "individual sampling" or "limited sampling" (see M. Ziller, T. Selhorst, J. Teuffert, M. Kramer and H. Schlueter, 2002, <doi:10.1016/S0167-5877(01)00245-8>). Methods to compute the a-posteriori alpha-error are implemented. Risk-based targeted sampling is supported.
Diagnostic tools as residual analysis, global, local and total-local influence for the multivariate model from the random intercept Poisson generalized log gamma model are available in this package. Including also, the estimation process by maximum likelihood method, for details see Fabio, L. C; Villegas, C. L.; Carrasco, J.M.F and de Castro, M. (2023) <doi:10.1080/03610926.2021.1939380> and Fábio, L. C.; Villegas, C.; Mamun, A. S. M. A. and Carrasco, J. M. F. (2025) <doi:10.28951/bjb.v43i1.728>.
Optogenetics is a new tool to study neuronal circuits that have been genetically modified to allow stimulation by flashes of light. This package implements the methodological framework, Point-process Response model for Optogenetics (PRO), for analyzing data from these experiments. This method provides explicit nonlinear transformations to link the flash point-process with the spiking point-process. Such response functions can be used to provide important and interpretable scientific insights into the properties of the biophysical process that governs neural spiking in response to optogenetic stimulation.
In the big data setting, working data sets are often distributed on multiple machines. However, classical statistical methods are often developed to solve the problems of single estimation or inference. We employ a novel parallel quasi-likelihood method in generalized linear models, to make the variances between different sub-estimators relatively similar. Estimates are obtained from projection subsets of data and later combined by suitably-chosen unknown weights. The philosophy of the package is described in Guo G. (2020) <doi:10.1007/s00180-020-00974-4>.
Population dynamic models underpin a range of analyses and applications in ecology and epidemiology. The various approaches for analysing population dynamics models (MPMs, IPMs, ODEs, POMPs, PVA) each require the model to be defined in a different way. This makes it difficult to combine different modelling approaches and data types to solve a given problem. pop aims to provide a flexible and easy to use common interface for constructing population dynamic models and enabling to them to be fitted and analysed in lots of different ways.
This package implements a method for fitting a bounded probability distribution to quantiles (for example stated by an expert), see Bornkamp and Ickstadt (2009) for details. For this purpose B-splines are used, and the density is obtained by penalized least squares based on a Brier entropy penalty. The package provides methods for fitting the distribution as well as methods for evaluating the underlying density and cdf. In addition methods for plotting the distribution, drawing random numbers and calculating quantiles of the obtained distribution are provided.
The method implemented in this package performs bottom-up hierarchical clustering, using a Dirichlet Process (infinite mixture) to model uncertainty in the data and Bayesian model selection to decide at each step which clusters to merge. This avoids several limitations of traditional methods, for example how many clusters there should be and how to choose a principled distance metric. This implementation accepts multinomial (i.e. discrete, with 2+ categories) or time-series data. This version also includes a randomised algorithm which is more efficient for larger data sets.
This package provides functions useful in the design and ANOVA of experiments. The content falls into the following groupings:
data,
factor manipulation functions,
design functions,
ANOVA functions,
matrix functions,
projector and canonical efficiency functions, and
miscellaneous functions.
There is a vignette called DesignNotes describing how to use the design functions for randomizing and assessing designs. The ANOVA functions facilitate the extraction of information when the Error function has been used in the call to aov.
Single-cell RNA-sequencing (scRNA-seq) is widely used to explore cellular variation. The analysis of scRNA-seq data often starts from clustering cells into subpopulations. This initial step has a high impact on downstream analyses, and hence it is important to be accurate. However, there have not been unsupervised metric designed for scRNA-seq to evaluate clustering performance. Hence, we propose clustering deviation index (CDI), an unsupervised metric based on the modeling of scRNA-seq UMI counts to evaluate clustering of cells.
The Well-Plate Maker (WPM) is a shiny application deployed as an R package. Functions for a command-line/script use are also available. The WPM allows users to generate well plate maps to carry out their experiments while improving the handling of batch effects. In particular, it helps controlling the "plate effect" thanks to its ability to randomize samples over multiple well plates. The algorithm for placing the samples is inspired by the backtracking algorithm: the samples are placed at random while respecting specific spatial constraints.
The desirable Dietary Pattern (DDP)/ PPH score measures the variety of food consumption. The (weighted) score is calculated based on the type of food. This package is intended to calculate the DDP/ PPH score that is faster than traditional method via a manual calculation by BKP (2017) <http://bkp.pertanian.go.id/storage/app/uploads/public/5bf/ca9/06b/5bfca906bc654274163456.pdf> and is simpler than the nutrition survey <http://www.nutrisurvey.de>. The database to create weights and baseline values is the Indonesia national survey in 2017.
Treatments of a one-way layout, being equivalent to a control, can be selected with this package. Bonferroni adjusted "two one-sided t-tests" (TOST) and related simultaneous confidence intervals are given for both differences or ratios of means of normally distributed data. For the case of equal variances and balanced sample sizes for the treatment groups, the single-step procedure of Bofinger and Bofinger (1995) <doi:10.1111/j.2517-6161.1995.tb02058.x> can be chosen. For non-normal data, the Wilcoxon test is applied.
Multiple testing procedures for heterogeneous and discrete tests as described in Döhler and Roquain (2020) <doi:10.1214/20-EJS1771>. The main algorithms of the paper are available as continuous, discrete and weighted versions. They take as input the results of a test procedure from package DiscreteTests', or a set of observed p-values and their discrete support under their nulls. A shortcut function to obtain such p-values and supports is also provided, along with wrappers allowing to apply discrete procedures directly to data.
This package provides quick and easy access to official spatial data from Germanyâ s Federal Agency for Cartography and Geodesy (BKG) <https://gdz.bkg.bund.de/>. Interfaces various web feature services (WFS) and download servers. Allows retrieval, caching and filtering with a wide range of open geodata products, including administrative or non-administrative boundaries, land cover, elevation models, geographic names, and points of interest covering Germany. Can be particularly useful for linking regional statistics to their spatial representations and streamlining workflows that involve spatial data of Germany.
Builds and runs c++ code for classes that encapsulate state space model, particle filtering algorithm pairs. Algorithms include the Bootstrap Filter from Gordon et al. (1993) <doi:10.1049/ip-f-2.1993.0015>, the generic SISR filter, the Auxiliary Particle Filter from Pitt et al (1999) <doi:10.2307/2670179>, and a variety of Rao-Blackwellized particle filters inspired by Andrieu et al. (2002) <doi:10.1111/1467-9868.00363>. For more details on the c++ library pf', see Brown (2020) <doi:10.21105/joss.02599>.
Developed to perform the estimation and inference for regression coefficient parameters in longitudinal marginal models using the method of quadratic inference functions. Like generalized estimating equations, this method is also a quasi-likelihood inference method. It has been showed that the method gives consistent estimators of the regression coefficients even if the correlation structure is misspecified, and it is more efficient than GEE when the correlation structure is misspecified. Based on Qu, A., Lindsay, B.G. and Li, B. (2000) <doi:10.1093/biomet/87.4.823>.
This package provides researchers with a simple set of diagnostic tools for monitoring the progress and reliability of raters conducting content coding tasks. Goehring (2024) <https://bengoehring.github.io/improving-content-analysis-tools-for-working-with-undergraduate-research-assistants.pdf> argues that supervisors---especially supervisors of small teams---should utilize computational tools to monitor reliability in real time. As such, this package provides easy-to-use functions for calculating inter-rater reliability statistics and measuring the reliability of one coder compared to the rest of the team.
This package provides methods for fast access to large ASCII files. Currently the following file formats are supported: comma separated format (CSV) and fixed width format. It is assumed that the files are too large to fit into memory, although the package can also be used to efficiently access files that do fit into memory. Methods are provided to access and process files blockwise. Furthermore, an opened file can be accessed as one would an ordinary data.frame. The LaF vignette gives an overview of the functionality provided.
PAA imports single color (protein) microarray data that has been saved in gpr file format - esp. ProtoArray data. After preprocessing (background correction, batch filtering, normalization) univariate feature preselection is performed (e.g., using the "minimum M statistic" approach - hereinafter referred to as "mMs"). Subsequently, a multivariate feature selection is conducted to discover biomarker candidates. Therefore, either a frequency-based backwards elimination aproach or ensemble feature selection can be used. PAA provides a complete toolbox of analysis tools including several different plots for results examination and evaluation.
This package provides a collection of tools to create, use and maintain modularized model code written in the modeling language GAMS (<https://www.gams.com/>). Out-of-the-box GAMS does not come with support for modularized model code. This package provides the tools necessary to convert a standard GAMS model to a modularized one by introducing a modularized code structure together with a naming convention which emulates local environments. In addition, this package provides tools to monitor the compliance of the model code with modular coding guidelines.
Generalized factor model is implemented for ultra-high dimensional data with mixed-type variables. Two algorithms, variational EM and alternate maximization, are designed to implement the generalized factor model, respectively. The factor matrix and loading matrix together with the number of factors can be well estimated. This model can be employed in social and behavioral sciences, economy and finance, and genomics, to extract interpretable nonlinear factors. More details can be referred to Wei Liu, Huazhen Lin, Shurong Zheng and Jin Liu. (2023) <doi:10.1080/01621459.2021.1999818>.
This package provides a high level interface for torch providing utilities to reduce the the amount of code needed for common tasks, abstract away torch details and make the same code work on both the CPU and GPU'. It's flexible enough to support expressing a large range of models. It's heavily inspired by fastai by Howard et al. (2020) <doi:10.48550/arXiv.2002.04688>, Keras by Chollet et al. (2015) and PyTorch Lightning by Falcon et al. (2019) <doi:10.5281/zenodo.3828935>.
Bayesian Model Averaging for linear models with a wide choice of (customizable) priors. Built-in priors include coefficient priors (fixed, hyper-g and empirical priors), 5 kinds of model priors, moreover model sampling by enumeration or various MCMC approaches. Post-processing functions allow for inferring posterior inclusion and model probabilities, various moments, coefficient and predictive densities. Plotting functions available for posterior model size, MCMC convergence, predictive and coefficient densities, best models representation, BMA comparison. Also includes Bayesian normal-conjugate linear model with Zellner's g prior, and assorted methods.