Método simples e eficiente de geolocalizar dados no Brasil. O pacote é baseado em conjuntos de dados espaciais abertos de endereços brasileiros, utilizando como fonte principal o Cadastro Nacional de Endereços para Fins Estatà sticos (CNEFE). O CNEFE é publicado pelo Instituto Brasileiro de Geografia e Estatà stica (IBGE), órgão oficial de estatà sticas e geografia do Brasil. (A simple and efficient method for geolocating data in Brazil. The package is based on open spatial datasets of Brazilian addresses, primarily using the Cadastro Nacional de Endereços para Fins Estatà sticos (CNEFE), published by the Instituto Brasileiro de Geografia e Estatà stica (IBGE), Brazil's official statistics and geography agency.).
Raman and (FT)IR spectral analysis tool for plastic particles and other environmental samples (Cowger et al. 2021, <doi:10.1021/acs.analchem.1c00123>). With read_any(), Open Specy provides a single function for reading individual, batch, or map spectral data files like .asp, .csv, .jdx, .spc, .spa, .0, and .zip. process_spec() simplifies processing spectra, including smoothing, baseline correction, range restriction and flattening, intensity conversions, wavenumber alignment, and min-max normalization. Spectra can be identified in batch using an onboard reference library (Cowger et al. 2020, <doi:10.1177/0003702820929064>) using match_spec(). A Shiny app is available via run_app() or online at <https://www.openanalysis.org/openspecy/>.
Launch an application by a simple click without opening R or RStudio. The package has 3 functions of which only one is essential in its use, `shiny.exe()`. It generates a script in the open shiny project then create a shortcut in the same folder that allows you to launch the app by clicking.If you set `host = public'`, the application will be launched on the public server to which you are connected. Thus, all other devices connected to the same server will be able to access the application through the link of your `IPv4` extended by the port. You can stop the application by leaving the terminal opened by the shortcut.
This package provides elastic net penalized maximum likelihood estimator for structural equation models (SEM). The package implements `lasso` and `elastic net` (l1/l2) penalized SEM and estimates the model parameters with an efficient block coordinate ascent algorithm that maximizes the penalized likelihood of the SEM. Hyperparameters are inferred from cross-validation (CV). A Stability Selection (STS) function is also available to provide accurate causal effect selection. The software achieves high accuracy performance through a `Network Generative Pre-trained Transformer` (Network GPT) Framework with two steps: 1) pre-trains the model to generate a complete (fully connected) graph; and 2) uses the complete graph as the initial state to fit the `elastic net` penalized SEM.
Systematic 3D interaction calls and differential analysis for Hi-C and HiChIP. The HiC-DC+ (Hi-C/HiChIP direct caller plus) package enables principled statistical analysis of Hi-C and HiChIP data sets – including calling significant interactions within a single experiment and performing differential analysis between conditions given replicate experiments – to facilitate global integrative studies. HiC-DC+ estimates significant interactions in a Hi-C or HiChIP experiment directly from the raw contact matrix for each chromosome up to a specified genomic distance, binned by uniform genomic intervals or restriction enzyme fragments, by training a background model to account for random polymer ligation and systematic sources of read count variation.
Utility functions to download and process data produced by the ALARM Project, including 2020 redistricting files Kenny and McCartan (2021) <https://alarm-redist.org/posts/2021-08-10-census-2020/> and the 50-State Redistricting Simulations of McCartan, Kenny, Simko, Garcia, Wang, Wu, Kuriwaki, and Imai (2022) <doi:10.7910/DVN/SLCD3E>. The package extends the data introduced in McCartan, Kenny, Simko, Garcia, Wang, Wu, Kuriwaki, and Imai (2022) <doi:10.1038/s41597-022-01808-2> to also include states with only a single district. The package also includes the Japanese 2022 redistricting files from the 47-Prefecture Redistricting Simulations of Miyazaki, Yamada, Yatsuhashi, and Imai (2022) <doi:10.7910/DVN/Z9UKSH>.
Enable users to evaluate long-term trends using a Generalized Additive Modeling (GAM) approach. The model development includes selecting a GAM structure to describe nonlinear seasonally-varying changes over time, incorporation of hydrologic variability via either a river flow or salinity, the use of an intervention to deal with method or laboratory changes suspected to impact data values, and representation of left- and interval-censored data. The approach has been applied to water quality data in the Chesapeake Bay, a major estuary on the east coast of the United States to provide insights to a range of management- and research-focused questions. Methodology described in Murphy (2019) <doi:10.1016/j.envsoft.2019.03.027>.
Fits Bayesian dose-response model-based network meta-analysis (MBNMA) that incorporate multiple doses within an agent by modelling different dose-response functions, as described by Mawdsley et al. (2016) <doi:10.1002/psp4.12091>. By modelling dose-response relationships this can connect networks of evidence that might otherwise be disconnected, and can improve precision on treatment estimates. Several common dose-response functions are provided; others may be added by the user. Various characteristics and assumptions can be flexibly added to the models, such as shared class effects. The consistency of direct and indirect evidence in the network can be assessed using unrelated mean effects models and/or by node-splitting at the treatment level.
This package provides functions used to fit and test the phenology of species based on counts. Based on Girondot, M. (2010) <doi:10.3354/esr00292> for the phenology function, Girondot, M. (2017) <doi:10.1016/j.ecolind.2017.05.063> for the convolution of negative binomial, Girondot, M. and Rizzo, A. (2015) <doi:10.2993/etbi-35-02-337-353.1> for Bayesian estimate, Pfaller JB, ..., Girondot M (2019) <doi:10.1007/s00227-019-3545-x> for tag-loss estimate, Hancock J, ..., Girondot M (2019) <doi:10.1016/j.ecolmodel.2019.04.013> for nesting history, Laloe J-O, ..., Girondot M, Hays GC (2020) <doi:10.1007/s00227-020-03686-x> for aggregating several seasons.
A workflow is an object that can bundle together your pre-processing, modeling, and post-processing requests. For example, if you have a recipe and parsnip model, these can be combined into a workflow. The advantages are:
You don’t have to keep track of separate objects in your workspace.
The recipe prepping and model fitting can be executed using a single call to
fit().If you have custom tuning parameter settings, these can be defined using a simpler interface when combined with
tune.In the future, workflows will be able to add post-processing operations, such as modifying the probability cutoff for two-class models.
In the single cell World, which includes flow cytometry, mass cytometry, single-cell RNA-seq (scRNA-seq), and others, there is a need to improve data visualisation and to bring analysis capabilities to researchers even from non-technical backgrounds. scDataviz attempts to fit into this space, while also catering for advanced users. Additonally, due to the way that scDataviz is designed, which is based on SingleCellExperiment, it has a plug and play feel, and immediately lends itself as flexibile and compatibile with studies that go beyond scDataviz. Finally, the graphics in scDataviz are generated via the ggplot engine, which means that users can add on features to these with ease.
Dual Scaling, developed by Professor Shizuhiko Nishisato (1994, ISBN: 0-9691785-3-6), is a fundamental technique in multivariate analysis used for data scaling and correspondence analysis. Its utility lies in its ability to represent multidimensional data in a lower-dimensional space, making it easier to visualize and understand underlying patterns in complex data. This technique has been implemented to handle various types of data, including Contingency and Frequency data (CF), Multiple-Choice data (MC), Sorting data (SO), Paired-Comparison data (PC), and Rank-Order data (RO), providing users with a powerful tool to explore relationships between variables and observations in various fields, from sociology to ecology, enabling deeper and more efficient analysis of multivariate datasets.
This package implements two estimations related to the foundations of info metrics applied to ecological inference. These methodologies assess the lack of disaggregated data and provide an approach to obtaining disaggregated territorial-level data. For more details, see the following references: Fernández-Vázquez, E., Dà az-Dapena, A., Rubiera-Morollón, F. et al. (2020) "Spatial Disaggregation of Social Indicators: An Info-Metrics Approach." <doi:10.1007/s11205-020-02455-z>. Dà az-Dapena, A., Fernández-Vázquez, E., Rubiera-Morollón, F., & Vinuela, A. (2021) "Mapping poverty at the local level in Europe: A consistent spatial disaggregation of the AROPE indicator for France, Spain, Portugal and the United Kingdom." <doi:10.1111/rsp3.12379>.
Package for training interpretable machine learning models. Historically, the most interpretable machine learning models were not very accurate, and the most accurate models were not very interpretable. Microsoft Research has developed an algorithm called the Explainable Boosting Machine (EBM) which has both high accuracy and interpretable characteristics. EBM uses machine learning techniques like bagging and boosting to breathe new life into traditional GAMs (Generalized Additive Models). This makes them as accurate as random forests and gradient boosted trees, and also enhances their intelligibility and editability. Details on the EBM algorithm can be found in the paper by Rich Caruana, Yin Lou, Johannes Gehrke, Paul Koch, Marc Sturm, and Noemie Elhadad (2015, <doi:10.1145/2783258.2788613>).
The Piece-wise exponential (Additive Mixed) Model (PAMM; Bender and others (2018) <doi: 10.1177/1471082X17748083>) is a powerful model class for the analysis of survival (or time-to-event) data, based on Generalized Additive (Mixed) Models (GA(M)Ms). It offers intuitive specification and robust estimation of complex survival models with stratified baseline hazards, random effects, time-varying effects, time-dependent covariates and cumulative effects (Bender and others (2019)), as well as support for left-truncated data as well as competing risks, recurrent events and multi-state settings. pammtools provides tidy workflow for survival analysis with PAMMs, including data simulation, transformation and other functions for data preprocessing and model post-processing as well as visualization.
Automatic generation and selection of spatial predictors for Random Forest models fitted to spatially structured data. Spatial predictors are constructed from a distance matrix among training samples using Moran's Eigenvector Maps (MEMs; Dray, Legendre, and Peres-Neto 2006 <DOI:10.1016/j.ecolmodel.2006.02.015>) or the RFsp approach (Hengl et al. <DOI:10.7717/peerj.5518>). These predictors are used alongside user-supplied explanatory variables in Random Forest models. The package provides functions for model fitting, multicollinearity reduction, interaction identification, hyperparameter tuning, evaluation via spatial cross-validation, and result visualization using partial dependence and interaction plots. Model fitting relies on the ranger package (Wright and Ziegler 2017 <DOI:10.18637/jss.v077.i01>).
Enables off-the-shelf functionality for fully Bayesian, nonstationary Gaussian process modeling. The approach to nonstationary modeling involves a closed-form, convolution-based covariance function with spatially-varying parameters; these parameter processes can be specified either deterministically (using covariates or basis functions) or stochastically (using approximate Gaussian processes). Stationary Gaussian processes are a special case of our methodology, and we furthermore implement approximate Gaussian process inference to account for very large spatial data sets (Finley, et al (2017) <doi:10.48550/arXiv.1702.00434>). Bayesian inference is carried out using Markov chain Monte Carlo methods via the "nimble" package, and posterior prediction for the Gaussian process at unobserved locations is provided as a post-processing step.
This package provides a suite of Bayesian MI-LASSO for variable selection methods for multiply-imputed datasets. The package includes four Bayesian MI-LASSO models using shrinkage (Multi-Laplace, Horseshoe, ARD) and Spike-and-Slab (Spike-and-Laplace) priors, along with tools for model fitting via MCMC, four-step projection predictive variable selection, and hyperparameter calibration. Methods are suitable for both continuous and binary covariates under missing-at-random or missing-completely-at-random assumptions. See Zou, J., Wang, S. and Chen, Q. (2025), Bayesian MI-LASSO for Variable Selection on Multiply-Imputed Data. ArXiv, 2211.00114. <doi:10.48550/arXiv.2211.00114> for more details. We also provide the frequentist`s MI-LASSO function.
This package provides a suite of functions for performing analyses, based on a multiverse approach, for conditioning data. Specifically, given the appropriate data, the functions are able to perform t-tests, analyses of variance, and mixed models for the provided data and return summary statistics and plots. The function is also able to return for all those tests p-values, confidence intervals, and Bayes factors. The methods are described in Lonsdorf, Gerlicher, Klingelhofer-Jens, & Krypotos (2022) <doi:10.1016/j.brat.2022.104072>. Since November 2025, this package contains code from the ez R package (Copyright (c) 2016-11-01, Michael A. Lawrence <mike.lwrnc@gmail.com>), originally distributed under the GPL (equal and above 2) license.
This package provides a toolbox to facilitate the calculation of political system indicators for researchers. This package offers a variety of basic indicators related to electoral systems, party systems, elections, and parliamentary studies, as well as others. Main references are: Loosemore and Hanby (1971) <doi:10.1017/S000712340000925X>; Gallagher (1991) <doi:10.1016/0261-3794(91)90004-C>; Laakso and Taagepera (1979) <doi:10.1177/001041407901200101>; Rae (1968) <doi:10.1177/001041406800100305>; HirschmaÅ (1945) <ISBN:0-520-04082-1>; Kesselman (1966) <doi:10.2307/1953769>; Jones and Mainwaring (2003) <doi:10.1177/13540688030092002>; Rice (1925) <doi:10.2307/2142407>; Pedersen (1979) <doi:10.1111/j.1475-6765.1979.tb01267.x>; SANTOS (2002) <ISBN:85-225-0395-8>.
Flexible stochastic tree ensemble software. Robust implementations of Bayesian Additive Regression Trees (BART) (Chipman, George, McCulloch (2010) <doi:10.1214/09-AOAS285>) for supervised learning and Bayesian Causal Forests (BCF) (Hahn, Murray, Carvalho (2020) <doi:10.1214/19-BA1195>) for causal inference. Enables model serialization and parallel sampling and provides a low-level interface for custom stochastic forest samplers. Includes the grow-from-root algorithm for accelerated forest sampling (He and Hahn (2021) <doi:10.1080/01621459.2021.1942012>), a log-linear leaf model for forest-based heteroskedasticity (Murray (2020) <doi:10.1080/01621459.2020.1813587>), and the cloglog BART model of Alam and Linero (2025) <doi:10.48550/arXiv.2502.00606> for ordinal outcomes.
The NCBI Gene Expression Omnibus (GEO) represents the largest public repository of microarray data. However, finding data of interest can be challenging using current tools. GEOmetadb is an attempt to make access to the metadata associated with samples, platforms, and datasets much more feasible. This is accomplished by parsing all the NCBI GEO metadata into a SQLite database that can be stored and queried locally. GEOmetadb is simply a thin wrapper around the SQLite database along with associated documentation. Finally, the SQLite database is updated regularly as new data is added to GEO and can be downloaded at will for the most up-to-date metadata. GEOmetadb paper: http://bioinformatics.oxfordjournals.org/cgi/content/short/24/23/2798 .
This R package supports the handling and analysis of imaging mass cytometry and other highly multiplexed imaging data. The main functionality includes reading in single-cell data after image segmentation and measurement, data formatting to perform channel spillover correction and a number of spatial analysis approaches. First, cell-cell interactions are detected via spatial graph construction; these graphs can be visualized with cells representing nodes and interactions representing edges. Furthermore, per cell, its direct neighbours are summarized to allow spatial clustering. Per image/grouping level, interactions between types of cells are counted, averaged and compared against random permutations. In that way, types of cells that interact more (attraction) or less (avoidance) frequently than expected by chance are detected.
Survival analysis is employed to model the time it takes for events to occur. Survival model examines the relationship between survival and one or more predictors, usually termed covariates in the survival-analysis literature. To this end, Cox-proportional (Cox-PH) hazard rate model introduced in a seminal paper by Cox (1972) <doi:10.1111/j.2517-6161.1972.tb00899.x>, is a broadly applicable and the most widely used method of survival analysis. This package can be used to estimate the effect of fixed and time-dependent covariates and also to compute the survival probabilities of the lactation of dairy animal. This package has been developed using algorithm of Klein and Moeschberger (2003) <doi:10.1007/b97377>.