This package provides functions that provide statistical methods for interval-censored (grouped) data. The package supports the estimation of linear and linear mixed regression models with interval-censored dependent variables. Parameter estimates are obtained by a stochastic expectation maximization algorithm. Furthermore, the package enables the direct (without covariates) estimation of statistical indicators from interval-censored data via an iterative kernel density algorithm. Survey and Organisation for Economic Co-operation and Development (OECD) weights can be included into the direct estimation (see, Walter, P. (2019) <doi:10.17169/refubium-1621>).
The fmcsR package introduces an efficient maximum common substructure (MCS) algorithms combined with a novel matching strategy that allows for atom and/or bond mismatches in the substructures shared among two small molecules. The resulting flexible MCSs (FMCSs) are often larger than strict MCSs, resulting in the identification of more common features in their source structures, as well as a higher sensitivity in finding compounds with weak structural similarities. The fmcsR package provides several utilities to use the FMCS algorithm for pairwise compound comparisons, structure similarity searching and clustering.
We propose a general ensemble classification framework, RaSE
algorithm, for the sparse classification problem. In RaSE
algorithm, for each weak learner, some random subspaces are generated and the optimal one is chosen to train the model on the basis of some criterion. To be adapted to the problem, a novel criterion, ratio information criterion (RIC) is put up with based on Kullback-Leibler divergence. Besides minimizing RIC, multiple criteria can be applied, for instance, minimizing extended Bayesian information criterion (eBIC
), minimizing training error, minimizing the validation error, minimizing the cross-validation error, minimizing leave-one-out error. There are various choices of base classifier, for instance, linear discriminant analysis, quadratic discriminant analysis, k-nearest neighbour, logistic regression, decision trees, random forest, support vector machines. RaSE
algorithm can also be applied to do feature ranking, providing us the importance of each feature based on the selected percentage in multiple subspaces. RaSE
framework can be extended to the general prediction framework, including both classification and regression. We can use the selected percentages of variables for variable screening. The latest version added the variable screening function for both regression and classification problems.
This package provides a Bayesian hybrid approach for inferring Directed Acyclic Graphs (DAGs) for continuous, discrete, and mixed data. The algorithm can use the graph inferred by another more efficient graph inference method as input; the input graph may contain false edges or undirected edges but can help reduce the search space to a more manageable size. A Bayesian Markov chain Monte Carlo algorithm is then used to infer the probability of direction and absence for the edges in the network. References: Martin and Fu (2019) <arXiv:1909.10678>
.
Measure of agreement delta was originally by Martà n & Femia (2004) <DOI:10.1348/000711004849268>. Since then has been considered as agreement measure for different fields, since their behavior is usually better than the usual kappa index by Cohen (1960) <DOI:10.1177/001316446002000104>. The main issue with delta is that can not be computed by hand contrary to kappa. The current algorithm is based on the Version 5 of the delta windows program that can be found on <https://www.ugr.es/~bioest/software/delta/cmd.php?seccion=downloads>.
This package implements the method of Hofmeyr, D.P. (2021) <DOI:10.1109/TPAMI.2019.2930501> for fast evaluation of univariate kernel smoothers based on recursive computations. Applications to the basic problems of density and regression function estimation are provided, as well as some projection pursuit methods for which the objective is based on non-parametric functionals of the projected density, or conditional density of a response given projected covariates. The package is accompanied by an instructive paper in the Journal of Statistical Software <doi:10.18637/jss.v101.i03>.
Estimates parameters in Mixture Transition Distribution (MTD) models, a class of high-order Markov chains. The set of relevant pasts (lags) is selected using either the Bayesian Information Criterion or the Forward Stepwise and Cut algorithms. Other model parameters (e.g. transition probabilities and oscillations) can be estimated via maximum likelihood estimation or the Expectation-Maximization algorithm. Additionally, hdMTD
includes a perfect sampling algorithm that generates samples of an MTD model from its invariant distribution. For theory, see Ost & Takahashi (2023) <http://jmlr.org/papers/v24/22-0266.html>.
Volume prediction is one of challenging task in forestry research. This package is a comprehensive toolset designed for the fitting and validation of various linear and nonlinear allometric equations (Linear, Log-Linear, Inverse, Quadratic, Cubic, Compound, Power and Exponential) used in the prediction of conifer tree volume. This package is particularly useful for forestry professionals, researchers, and resource managers engaged in assessing and estimating the volume of coniferous trees. This package has been developed using the algorithm of Sharma et al. (2017) <doi:10.13140/RG.2.2.33786.62407>.
Facilitates access to the International Union for Conservation of Nature (IUCN) Red List of Threatened Species, a comprehensive global inventory of species at risk of extinction. This package streamlines the process of determining conservation status by matching species names with Red List data, providing tools to easily query and retrieve conservation statuses. Designed to support biodiversity research and conservation planning, this package relies on data from the iucnrdata package, available on GitHub
<https://github.com/PaulESantos/iucnrdata>
. To install the data package, use pak::pak('PaulESantos/iucnrdata
').
Option is a one of the financial derivatives and its pricing is an important problem in practice. The process of stock prices are represented as Geometric Brownian motion [Black (1973) <doi:10.1086/260062>] or jump diffusion processes [Kou (2002) <doi:10.1287/mnsc.48.8.1086.166>]. In this package, algorithms and visualizations are implemented by Monte Carlo method in order to calculate European option price for three equations by Geometric Brownian motion and jump diffusion processes and furthermore a model that presents jumps among companies affect each other.
Latent group structures are a common challenge in panel data analysis. Disregarding group-level heterogeneity can introduce bias. Conversely, estimating individual coefficients for each cross-sectional unit is inefficient and may lead to high uncertainty. This package addresses the issue of unobservable group structures by implementing the pairwise adaptive group fused Lasso (PAGFL) by Mehrabani (2023) <doi:10.1016/j.jeconom.2022.12.002>. PAGFL identifies latent group structures and group-specific coefficients in a single step. On top of that, we extend the PAGFL to time-varying coefficient functions.
Handling of behavioural data from the Ethoscope platform (Geissmann, Garcia Rodriguez, Beckwith, French, Jamasb and Gilestro (2017) <DOI:10.1371/journal.pbio.2003026>). Ethoscopes (<https://giorgiogilestro.notion.site/Ethoscope-User-Manual-a9739373ae9f4840aa45b277f2f0e3a7>) are an open source/open hardware framework made of interconnected raspberry pis (<https://www.raspberrypi.org>) designed to quantify the behaviour of multiple small animals in a distributed and real-time fashion. The default tracking algorithm records primary variables such as xy coordinates, dimensions and speed. This package is part of the rethomics framework <https://rethomics.github.io/>.
Convenient tools for exchanging files securely from within R. By encrypting the content safe passage of files (shipment) can be provided by common but insecure carriers such as ftp and email. Based on asymmetric cryptography no management of shared secrets is needed to make a secure shipment as long as authentic public keys are available. Public keys used for secure shipments may also be obtained from external providers as part of the overall process. Transportation of files will require that relevant services such as ftp and email servers are available.
Comprehensive analysis and forecasting of univariate time series using automatic time series models of many kinds. Harvey AC (1989) <doi:10.1017/CBO9781107049994>. Pedregal DJ and Young PC (2002) <doi:10.1002/9780470996430>. Durbin J and Koopman SJ (2012) <doi:10.1093/acprof:oso/9780199641178.001.0001>. Hyndman RJ, Koehler AB, Ord JK, and Snyder RD (2008) <doi:10.1007/978-3-540-71918-2>. Gómez V, Maravall A (2000) <doi:10.1002/9781118032978>. Pedregal DJ, Trapero JR and Holgado E (2024) <doi:10.1016/j.ijforecast.2023.09.004>.
This package provides a compendium of new geometries, coordinate systems, statistical transformations, scales and fonts for ggplot2, including splines, 1d and 2d densities, univariate average shifted histograms, a new map coordinate system based on the PROJ.4-library along with geom_cartogram()
that mimics the original functionality of geom_map()
, formatters for "bytes", a stat_stepribbon()
function, increased plotly
compatibility and the StateFace
open source font ProPublica. Further new functionality includes lollipop charts, dumbbell charts, the ability to encircle points and coordinate-system-based text annotations.
This package works as a prelude replacement for Haskell, providing more functionality and types out of the box than the standard prelude (such as common data types like ByteString
and Text
), as well as removing common ``gotchas'', like partial functions and lazy I/O. The guiding principle here is:
If something is safe to use in general and has no expected naming conflicts, expose it.
If something should not always be used, or has naming conflicts, expose it from another module in the hierarchy.
Bayesian approaches for analyzing multivariate data in ecology. Estimation is performed using Markov Chain Monte Carlo (MCMC) methods via Three. JAGS types of models may be fitted: 1) With explanatory variables only, boral fits independent column Generalized Linear Models (GLMs) to each column of the response matrix; 2) With latent variables only, boral fits a purely latent variable model for model-based unconstrained ordination; 3) With explanatory and latent variables, boral fits correlated column GLMs with latent variables to account for any residual correlation between the columns of the response matrix.
P-values and no/lowest observed (adverse) effect concentration values derived from the closure principle computational approach test (Lehmann, R. et al. (2015) <doi:10.1007/s00477-015-1079-4>) are provided. The package contains functions to generate intersection hypotheses according to the closure principle (Bretz, F., Hothorn, T., Westfall, P. (2010) <doi:10.1201/9781420010909>), an implementation of the computational approach test (Ching-Hui, C., Nabendu, P., Jyh-Jiuan, L. (2010) <doi:10.1080/03610918.2010.508860>) and the combination of both, that is, the closure principle computational approach test.
This package provides a collection of acceleration schemes for proximal gradient methods for estimating penalized regression parameters described in Goldstein, Studer, and Baraniuk (2016) <arXiv:1411.3406>
. Schemes such as Fast Iterative Shrinkage and Thresholding Algorithm (FISTA) by Beck and Teboulle (2009) <doi:10.1137/080716542> and the adaptive stepsize rule introduced in Wright, Nowak, and Figueiredo (2009) <doi:10.1109/TSP.2009.2016892> are included. You provide the objective function and proximal mappings, and it takes care of the issues like stepsize selection, acceleration, and stopping conditions for you.
An ensemble of algorithms that enable the clustering of networks and data matrices (such as counts, categorical or continuous) with different type of generative models. Model selection and clustering is performed in combination by optimizing the Integrated Classification Likelihood (which is equivalent to minimizing the description length). Several models are available such as: Stochastic Block Model, degree corrected Stochastic Block Model, Mixtures of Multinomial, Latent Block Model. The optimization is performed thanks to a combination of greedy local search and a genetic algorithm (see <arXiv:2002:11577>
for more details).
This is a stochastic framework that combines biochemical reaction networks with extended Kalman filter and Rauch-Tung-Striebel smoothing. This framework allows to investigate the dynamics of cell differentiation from high-dimensional clonal tracking data subject to measurement noise, false negative errors, and systematically unobserved cell types. Our tool can provide statistical support to biologists in gene therapy clonal tracking studies for a deeper understanding of clonal reconstitution dynamics. Further details on the methods can be found in L. Del Core et al., (2022) <doi:10.1101/2022.07.08.499353>.
Data are partitioned (clustered) into k clusters "around medoids", which is a more robust version of K-means implemented in the function pam()
in the cluster package. The PAM algorithm is described in Kaufman and Rousseeuw (1990) <doi:10.1002/9780470316801>. Please refer to the pam()
function documentation for more references. Clustered data is plotted as a split heatmap allowing visualisation of representative "group-clusters" (medoids) in the data as separated fractions of the graph while those "sub-clusters" are visualised as a traditional heatmap based on hierarchical clustering.
Conduct dsep tests (piecewise SEM) of a directed, or mixed, acyclic graph without latent variables (but possibly with implicitly marginalized or conditioned latent variables that create dependent errors) based on linear, generalized linear, or additive modelswith or without a nesting structure for the data. Also included are functions to do desp tests step-by-step,exploratory path analysis, and Monte Carlo X2 probabilities. This package accompanies Shipley, B, (2026).Cause and Correlation in Biology: A User's Guide to Path Analysis, StructuralEquations
and Causal Inference (3rd edition). Cambridge University Press.
Facilitate the management of data from knowledge resources that are frequently used alone or together in research environments. In TKCat', knowledge resources are manipulated as modeled database (MDB) objects. These objects provide access to the data tables along with a general description of the resource and a detail data model documenting the tables, their fields and their relationships. These MDBs are then gathered in catalogs that can be easily explored an shared. Finally, TKCat provides tools to easily subset, filter and combine MDBs and create new catalogs suited for specific needs.