This package provides tools for econometric analysis and economic modelling with the traditional two-input Constant Elasticity of Substitution (CES) function and with nested CES functions with three and four inputs. The econometric estimation can be done by the Kmenta approximation, or non-linear least-squares using various gradient-based or global optimisation algorithms. Some of these algorithms can constrain the parameters to certain ranges, e.g. economically meaningful values. Furthermore, the non-linear least-squares estimation can be combined with a grid-search for the rho-parameter(s). The estimation methods are described in Henningsen et al. (2021) <doi:10.4337/9781788976480.00030>.
Utilizing a combination of machine learning models (Random Forest, Naive Bayes, K-Nearest Neighbor, Support Vector Machines, Extreme Gradient Boosting, and Linear Discriminant Analysis) and a deep Artificial Neural Network model, MBMethPred can predict medulloblastoma subgroups, including wingless (WNT), sonic hedgehog (SHH), Group 3, and Group 4 from DNA methylation beta values. See Sharif Rahmani E, Lawarde A, Lingasamy P, Moreno SV, Salumets A and Modhukur V (2023), MBMethPred: a computational framework for the accurate classification of childhood medulloblastoma subgroups using data integration and AI-based approaches. Front. Genet. 14:1233657. <doi: 10.3389/fgene.2023.1233657> for more details.
Stochastic blockmodeling of one-mode and linked networks as presented in Škulj and Žiberna (2022) <doi:10.1016/j.socnet.2022.02.001>. The optimization is done via CEM (Classification Expectation Maximization) algorithm that can be initialized by random partitions or the results of k-means algorithm. The development of this package is financially supported by the Slovenian Research Agency (<https://www.arrs.si/>) within the research programs P5-0168 and the research projects J7-8279 (Blockmodeling multilevel and temporal networks) and J5-2557 (Comparison and evaluation of different approaches to blockmodeling dynamic networks by simulations with application to Slovenian co-authorship networks).
Sequential Monte Carlo (SMC) algorithms for fitting a generalised additive mixed model (GAMM) to surface-enhanced resonance Raman spectroscopy (SERRS), using the method of Moores et al. (2016) <arXiv:1604.07299>. Multivariate observations of SERRS are highly collinear and lend themselves to a reduced-rank representation. The GAMM separates the SERRS signal into three components: a sequence of Lorentzian, Gaussian, or pseudo-Voigt peaks; a smoothly-varying baseline; and additive white noise. The parameters of each component of the model are estimated iteratively using SMC. The posterior distributions of the parameters given the observed spectra are represented as a population of weighted particles.
This package implements the SparseStep model for solving regression problems with a sparsity constraint on the parameters. The SparseStep regression model was proposed in Van den Burg, Groenen, and Alfons (2017) <arXiv:1701.06967>. In the model, a regularization term is added to the regression problem which approximates the counting norm of the parameters. By iteratively improving the approximation a sparse solution to the regression problem can be obtained. In this package both the standard SparseStep algorithm is implemented as well as a path algorithm which uses golden section search to determine solutions with different values for the regularization parameter.
This package implements an automated binning of numeric variables and factors with respect to a dichotomous target variable. Two approaches are provided: An implementation of fine and coarse classing that merges granular classes and levels step by step. And a tree-like approach that iteratively segments the initial bins via binary splits. Both procedures merge, respectively split, bins based on similar weight of evidence (WOE) values and stop via an information value (IV) based criteria. The package can be used with single variables or an entire data frame. It provides flexible tools for exploring different binning solutions and for deploying them to (new) data.
Pulls together a collection of datasets from Miguel de Carvalho research articles. Including, for example: - de Carvalho (2012) <doi:10.1016/j.jspi.2011.08.016>; - de Carvalho et al (2012) <doi:10.1080/03610926.2012.709905>; - de Carvalho et al (2012) <doi:10.1016/j.econlet.2011.09.007>); - de Carvalho and Davison (2014) <doi:10.1080/01621459.2013.872651>; - de Carvalho and Rua (2017) <doi:10.1016/j.ijforecast.2015.09.004>; - de Carvalho et al (2023) <doi:10.1002/sta4.560>; - de Carvalho et al (2022) <doi:10.1007/s13253-021-00469-9>; - Palacios et al (2024) <doi:10.1214/24-BA1420>.
This package provides two functions that generate source code implementing the predict function of fitted glm objects. In this version, code can be generated for either C or Java'. The idea is to provide a tool for the easy and fast deployment of glm predictive models into production. The source code generated by this package implements two function/methods. One of such functions implements the equivalent to predict(type="response"), while the second implements predict(type="link"). Source code is written to disk as a .c or .java file in the specified path. In the case of c, an .h file is also generated.
Utilities for reading data from the Human Mortality Database (<https://www.mortality.org>), Human Fertility Database (<https://www.humanfertility.org>), and similar databases from the web or locally into an R session as data.frame objects. These are the two most widely used sources of demographic data to study basic demographic change, trends, and develop new demographic methods. Other supported databases at this time include the Human Fertility Collection (<https://www.fertilitydata.org>), The Japanese Mortality Database (<https://www.ipss.go.jp/p-toukei/JMD/index-en.html>), and the Canadian Human Mortality Database (<http://www.bdlc.umontreal.ca/chmd/>). Arguments and data are standardized.
Simulates categorical maps on actual geographical realms, starting from either empty landscapes or landscapes provided by the user (e.g. land use maps). Allows to tweak or create landscapes while retaining a high degree of control on its features, without the hassle of specifying each location attribute. In this it differs from other tools which generate null or neutral landscapes in a theoretical space. The basic algorithm currently implemented uses a simple agent style/cellular automata growth model, with no rules (apart from areas of exclusion) and von Neumann neighbourhood (four cells, aka Rook case). Outputs are raster dataset exportable to any common GIS format.
There are three distinct approaches for phase error correction, they are: a single linear model with a choice of optimization functions, multiple linear models with optimization function choices and a shrinkage-based method. The methodology is based on our new algorithms and various references (Binczyk et al. (2015) <doi:10.1186/1475-925X-14-S2-S5>,Chen et al. (2002) <doi:10.1016/S1090-7807(02)00069-1>, de Brouwer (2009) <doi:10.1016/j.jmr.2009.09.017>, Džakula (2000) <doi:10.1006/jmre.2000.2123>, Ernst (1969) <doi:10.1016/0022-2364(69)90003-1>, Liland et al. (2010) <doi:10.1366/000370210792434350>).
cytoKernel implements a kernel-based score test to identify differentially expressed features in high-dimensional biological experiments. This approach can be applied across many different high-dimensional biological data including gene expression data and dimensionally reduced cytometry-based marker expression data. In this R package, we implement functions that compute the feature-wise p values and their corresponding adjusted p values. Additionally, it also computes the feature-wise shrunk effect sizes and their corresponding shrunken effect size. Further, it calculates the percent of differentially expressed features and plots user-friendly heatmap of the top differentially expressed features on the rows and samples on the columns.
Bayes Watch fits an array of Gaussian Graphical Mixture Models to groupings of homogeneous data in time, called regimes, which are modeled as the observed states of a Markov process with unknown transition probabilities. In doing so, Bayes Watch defines a posterior distribution on a vector of regime assignments, which gives meaningful expressions on the probability of every possible change-point. Bayes Watch also allows for an effective and efficient fault detection system that assesses what features in the data where the most responsible for a given change-point. For further details, see: Alexander C. Murph et al. (2023) <doi:10.48550/arXiv.2310.02940>.
The main function calculates confidence intervals (CI) for Mixed Models, utilizing both classical estimators from the lmer() function in the lme4 package and robust estimators from the rlmer() function in the robustlmm package, as well as the varComprob() function in the robustvarComp package. Three methods are available: the classical Wald method, the wild bootstrap, and the parametric bootstrap. Bootstrap methods offer flexibility in obtaining lower and upper bounds through percentile or BCa methods. More details are given in Mason, F., Cantoni, E., & Ghisletta, P. (2021) <doi:10.5964/meth.6607> and Mason, F., Cantoni, E., & Ghisletta, P. (2024) <doi:10.1037/met0000643>.
Lake morphometry metrics are used by limnologists to understand, among other things, the ecological processes in a lake. Traditionally, these metrics are calculated by hand, with planimeters, and increasingly with commercial GIS products. All of these methods work; however, they are either outdated, difficult to reproduce, or require expensive licenses to use. The lakemorpho package provides the tools to calculate a typical suite of these metrics from an input elevation model and lake polygon. The metrics currently supported are: fetch, major axis, minor axis, major/minor axis ratio, maximum length, maximum width, mean width, maximum depth, mean depth, shoreline development, shoreline length, surface area, and volume.
This package provides a computationally efficient solution for generating optimal experimental designs in Accelerated Life Testing (ALT). Leveraging a Particle Swarm Optimization (PSO)-based hybrid algorithm, the package identifies optimal test plans that minimize estimation variance under specified failure models and stress profiles. For more detailed, see Lee et al. (2025), Optimal Robust Strategies for Accelerated Life Tests and Fatigue Testing of Polymer Composite Materials, submitted to Annals of Applied Statistics, <https://imstat.org/journals-and-publications/annals-of-applied-statistics/annals-of-applied-statistics-next-issues/>, and Hoang (2025), Model-Robust Minimax Design of Accelerated Life Tests via PSO-based Hybrid Algorithm, Master Thesis, Unpublished.
Numerically solve and plot solutions of a parametric ordinary differential equations model of growth, death, and respiration of macroinvertebrate and algae taxa dependent on pre-defined environmental factors. The model (version 1.0) is introduced in Schuwirth, N. and Reichert, P., (2013) <DOI:10.1890/12-0591.1>. This package includes model extensions and the core functions introduced and used in Schuwirth, N. et al. (2016) <DOI:10.1111/1365-2435.12605>, Kattwinkel, M. et al. (2016) <DOI:10.1021/acs.est.5b04068>, Mondy, C. P., and Schuwirth, N. (2017) <DOI:10.1002/eap.1530>, and Paillex, A. et al. (2017) <DOI:10.1111/fwb.12927>.
This package provides utility functions that enhance the parallel package and support the built-in parallel backends of the future package. For example, availableCores gives the number of CPU cores available to your R process as given by R options and environment variables, including those set by job schedulers on high-performance compute clusters. If none is set, it will fall back to parallel::detectCores. Another example is makeClusterPSOCK, which is backward compatible with parallel::makePSOCKcluster while doing a better job in setting up remote cluster workers without the need for configuring the firewall to do port-forwarding to your local computer.
Create quick and easy dot-and-whisker plots of regression results. It takes as input either (1) a coefficient table in standard form or (2) one (or a list of) fitted model objects (of any type that has methods implemented in the parameters package). It returns ggplot objects that can be further customized using tools from the ggplot2 package. The package also includes helper functions for tasks such as rescaling coefficients or relabeling predictor variables. See more methodological discussion of the visualization and data management methods used in this package in Kastellec and Leoni (2007) <doi:10.1017/S1537592707072209> and Gelman (2008) <doi:10.1002/sim.3107>.
There are growing concerns on flow data in diverse fields including trade, migration, knowledge diffusion, disease spread, and transportation. The package is an effective visual support to learn the pattern of flow which is called halfcircle diagram. The flow between two nodes placed on the center line of a circle is represented using a half circle drawn from the origin to the destination in a clockwise direction. Through changing the order of nodes, the halfcircle diagram enables users to examine the complex relationship between bidirectional flow and each potential determinants. Furthermore, the halfmeancenter function, which calculates (un) weighted mean center of half circles, makes the comparison easier.
This package provides a dataframe validation framework for package builders who use dataframes as function parameters. It performs checks on column names, coerces data-types, and checks grouping to make sure user inputs conform to a specification provided by the package author. It provides a mechanism for package authors to automatically document supported dataframe inputs and selectively dispatch to functions depending on the format of a dataframe much like S3 does for classes. It also contains some developer tools to make working with and documenting dataframe specifications easier. It helps package developers to improve their documentation and simplifies parameter validation where dataframes are used as function parameters.
Neighbourhood functions are key components of local-search algorithms such as Simulated Annealing or Threshold Accepting. These functions take a solution and return a slightly-modified copy of it, i.e. a neighbour. The package provides a function neighbourfun() that constructs such neighbourhood functions, based on parameters such as admissible ranges for elements in a solution. Supported are numeric and logical solutions. The algorithms were originally created for portfolio-optimisation applications, but can be used for other models as well. Several recipes for neighbour computations are taken from "Numerical Methods and Optimization in Finance" by M. Gilli, D. Maringer and E. Schumann (2019, ISBN:978-0128150658).
This package implements a partial linear semiparametric mixed-effects model (PLSMM) featuring a random intercept and applies a lasso penalty to both the fixed effects and the coefficients associated with the nonlinear function. The model also accommodates interactions between the nonlinear function and a grouping variable, allowing for the capture of group-specific nonlinearities. Nonlinear functions are modeled using a set of bases functions. Estimation is conducted using a penalized Expectation-Maximization algorithm, and the package offers flexibility in choosing between various information criteria for model selection. Post-selection inference is carried out using a debiasing method, while inference on the nonlinear functions employs a bootstrap approach.
This package implements the approach described in Fong and Grimmer (2016) <https://aclweb.org/anthology/P/P16/P16-1151.pdf> for automatically discovering latent treatments from a corpus and estimating the average marginal component effect (AMCE) of each treatment. The data is divided into a training and test set. The supervised Indian Buffet Process (sibp) is used to discover latent treatments in the training set. The fitted model is then applied to the test set to infer the values of the latent treatments in the test set. Finally, Y is regressed on the latent treatments in the test set to estimate the causal effect of each treatment.