Essential Biodiversity Variables (EBV) are state variables with dimensions on time, space, and biological organization that document biodiversity change. Freely available ecosystem remote sensing products (ERSP) are downloaded and integrated with data for national or regional domains to derive indicators for EBV in the class ecosystem structure (Pereira et al., 2013) <doi:10.1126/science.1229931>, including horizontal ecosystem extents, fragmentation, and information-theory indices. To process ERSP, users must provide a polygon or geographic administrative data map. Downloadable ERSP include Global Surface Water (Peckel et al., 2016) <doi:10.1038/nature20584>, Forest Change (Hansen et al., 2013) <doi:10.1126/science.1244693>, and Continuous Tree Cover data (Sexton et al., 2013) <doi:10.1080/17538947.2013.786146>.
This package provides various functions for parameter estimation of one-dimensional stable distributions and their mixtures. It implements a diverse set of estimation methods, including quantile-based approaches, regression methods based on the empirical characteristic function (empirical, kernel, and recursive), and maximum likelihood estimation. For mixture models, it provides stochastic expectationâ maximization (SEM) algorithms and Bayesian estimation methods using sampling and importance sampling to overcome the long burn-in period of Markov Chain Monte Carlo (MCMC) strategies. The package also includes tools and statistical tests for analyzing whether a dataset follows a stable distribution. Some of the implemented methods are described in Hajjaji, O., Manou-Abi, S. M., and Slaoui, Y. (2024) <doi:10.1080/02664763.2024.2434627>.
Process GPS and accelerometry data to generate walk bouts. A walk bout is a period of activity with accelerometer movement matching the patterns of walking with corresponding GPS measurements that confirm travel. The inputs of the walkboutr package are individual-level accelerometry and GPS data. The outputs of the model are walk bouts with corresponding times, duration, and summary statistics on the sample population, which collapse all personally identifying information. These bouts can be used to measure walking both as an outcome of a change to the built environment or as a predictor of health outcomes such as a cardioprotective behavior. Kang B, Moudon AV, Hurvitz PM, Saelens BE (2017) <doi:10.1016/j.trd.2017.09.026>.
Computes and integrates daily potential evapotranspiration (PET) and a soil water balance model. It allows users to estimate and predict the wet season calendar, including onset, cessation, and duration, based on an agroclimatic approach for a specified period. This functionality helps in managing agricultural water resources more effectively. For detailed methodologies, users can refer to Allen et al. (1998, ISBN:92-5-104219-5); Allen (2005, ISBN:9780784408056); Doorenbos and Pruitt (1975, ISBN:9251002797); Guo et al. (2016) <doi:10.1016/j.envsoft.2015.12.019>; Hargreaves and Samani (1985) <doi:10.13031/2013.26773>; Priestley and Taylor (1972) <https://journals.ametsoc.org/view/journals/apme/18/7/1520-0450_1979_018_0898_tptema_2_0_co_2.xml>.
Fit (generalized) linear regression models in each leaf node of a tree. The tree is constructed using clinical variables only. The linear regression models are constructed using (high-dimensional) omics variables only. The leaf-node-specific regression models are estimated using the penalized likelihood including a standard ridge (L2) penalty and a fusion penalty that links the leaf-node-specific regression models to one another. The intercepts of the leaf nodes reflect the effects of the clinical variables and are left unpenalized. The tree, fitted with the clinical variables only, should be constructed outside of the package with the rpart R package. See Goedhart and others (2024) <doi:10.48550/arXiv.2411.02396> for details on the method.
Random forests are a statistical learning method widely used in many areas of scientific research essentially for its ability to learn complex relationships between input and output variables and also its capacity to handle high-dimensional data. However, current random forests approaches are not flexible enough to handle longitudinal data. In this package, we propose a general approach of random forests for high-dimensional longitudinal data. It includes a flexible stochastic model which allows the covariance structure to vary over time. Furthermore, we introduce a new method which takes intra-individual covariance into consideration to build random forests. The method is fully detailled in Capitaine et.al. (2020) <doi:10.1177/0962280220946080> Random forests for high-dimensional longitudinal data.
This package provides a collection of parametric and nonparametric methods for the analysis of survival data. Parametric families implemented include Gompertz-Makeham, exponential and generalized Pareto models and extended models. The package includes an implementation of the nonparametric maximum likelihood estimator for arbitrary truncation and censoring pattern based on Turnbull (1976) <doi:10.1111/j.2517-6161.1976.tb01597.x>, along with graphical goodness-of-fit diagnostics. Parametric models for positive random variables and peaks over threshold models based on extreme value theory are described in Rootzén and Zholud (2017) <doi:10.1007/s10687-017-0305-5>; Belzile et al. (2021) <doi:10.1098/rsos.202097> and Belzile et al. (2022) <doi:10.1146/annurev-statistics-040120-025426>.
This package provides visual citations containing the metadata of a scientific paper and a QR code. A visual citation is a banner containing title, authors, journal and year of a publication. This package can create such banners based on BibTeX and BibLaTeX references or call the reference metadata from Crossref'-API. The banners include a QR code pointing to the DOI'. The resulting HTML object or PNG image can be included in a presentation to point the audience to good resources for further reading. Styling is possible via predefined designs or via custom CSS'. This package is not intended as replacement for proper reference manager packages, but a tool to enrich scientific presentation slides and conference posters.
Sankey diagrams are a powerfull and visually attractive way to visualize the flow of conservative substances through a system. They typically consists of a network of nodes, and fluxes between them, where the total balance in each internal node is 0, i.e. input equals output. Sankey diagrams are typically used to display energy systems, material flow accounts etc. Unlike so-called alluvial plots, Sankey diagrams also allow for cyclic flows: flows originating from a single node can, either direct or indirect, contribute to the input of that same node. This package, named after the Greek aphorism Panta Rhei (everything flows), provides functions to create publication-quality diagrams, using data in tables (or spread sheets) and a simple syntax.
OSTA.data is a companion package for the "Orchestrating Spatial Transcriptomics Analysis" (OSTA) with Bioconductor online book. Throughout OSTA, we rely on a set of publicly available datasets that cover different sequencing- and imaging-based platforms, such as Visium, Visium HD, Xenium (10x Genomics) and CosMx (NanoString). In addition, we rely on scRNA-seq (Chromium) data for tasks, e.g., spot deconvolution and label transfer (i.e., supervised clustering). These data been deposited in an Open Storage Framework (OSF) repository, and can be queried and downloaded using functions from the osfr package. For convenience, we have implemented OSTA.data to query and retrieve data from our OSF node, and cache retrieved Zip archives using BiocFileCache'.
Extraction, preparation, visualisation and analysis of TERN AusPlots ecosystem monitoring data. Direct access to plot-based data on vegetation and soils across Australia, including physical sample barcode numbers. Simple function calls extract the data and merge them into species occurrence matrices for downstream analysis, or calculate things like basal area and fractional cover. TERN AusPlots is a national field plot-based ecosystem surveillance monitoring method and dataset for Australia. The data have been collected across a national network of plots and transects by the Terrestrial Ecosystem Research Network (TERN - <https://www.tern.org.au>), an Australian Government NCRIS-enabled project, and its Ecosystem Surveillance platform (<https://www.tern.org.au/tern-land-observatory/ecosystem-surveillance-and-environmental-monitoring/>).
This package provides a collection of functions for top-down exploratory data analysis of spectral data including nuclear magnetic resonance (NMR), infrared (IR), Raman, X-ray fluorescence (XRF) and other similar types of spectroscopy. Includes functions for plotting and inspecting spectra, peak alignment, hierarchical cluster analysis (HCA), principal components analysis (PCA) and model-based clustering. Robust methods appropriate for this type of high-dimensional data are available. ChemoSpec is designed for structured experiments, such as metabolomics investigations, where the samples fall into treatment and control groups. Graphical output is formatted consistently for publication quality plots. ChemoSpec is intended to be very user friendly and to help you get usable results quickly. A vignette covering typical operations is available.
"This package quantifies the provenance of sediments in a catchment or study area. Based on a characterization of the sediment sources and the end sediment mixtures, a mixing model algorithm is applied to the sediment mixtures to estimate the relative contribution of each potential source. The package includes several graphs to help users in their data understanding, such as box plots, correlation, PCA, and LDA graphs. In addition, new developments such as the Consensus Ranking (CR), Consistent Tracer Selection (CTS), and Linear Variability Propagation (LVP) methods are included to correctly apply the fingerprinting technique and increase dataset and model understanding. A new method based on Conservative Balance (CB) method has also been included to enable the use of isotopic tracers.".
Latent Class Analysis of phenotypic measurements in pedigrees and model selection based on one of two methods: likelihood-based cross-validation and Bayesian Information Criterion. Computation of individual and triplet child-parents weights in a pedigree is performed using an upward-downward algorithm. The model takes into account the familial dependence defined by the pedigree structure by considering that a class of a child depends on his parents classes via triplet-transition probabilities of the classes. The package handles the case where measurements are available on all subjects and the case where measurements are available only on symptomatic (i.e. affected) subjects. Distributions for discrete (or ordinal) and continuous data are currently implemented. The package can deal with missing data.
This package provides efficient methods to compute co-occurrence matrices, pointwise mutual information (PMI) and singular value decomposition (SVD). In the biomedical and clinical settings, one challenge is the huge size of databases, e.g. when analyzing data of millions of patients over tens of years. To address this, this package provides functions to efficiently compute monthly co-occurrence matrices, which is the computational bottleneck of the analysis, by using the RcppAlgos package and sparse matrices. Furthermore, the functions can be called on SQL databases, enabling the computation of co-occurrence matrices of tens of gigabytes of data, representing millions of patients over tens of years. Partly based on Hong C. (2021) <doi:10.1038/s41746-021-00519-z>.
This package provides a framework that joins topic modeling and sentiment analysis of textual data. The package implements a fast Gibbs sampling estimation of Latent Dirichlet Allocation (Griffiths and Steyvers (2004) <doi:10.1073/pnas.0307752101>) and Joint Sentiment/Topic Model (Lin, He, Everson and Ruger (2012) <doi:10.1109/TKDE.2011.48>). It offers a variety of helpers and visualizations to analyze the result of topic modeling. The framework also allows enriching topic models with dates and externally computed sentiment measures. A flexible aggregation scheme enables the creation of time series of sentiment or topical proportions from the enriched topic models. Moreover, a novel method jointly aggregates topic proportions and sentiment measures to derive time series of topical sentiment.
This package implements the network clustering algorithm described in Newman (2006) <doi:10.1103/PhysRevE.74.036104>. The complete iterative algorithm comprises of two steps. In the first step, the network is expressed in terms of its leading eigenvalue and eigenvector and recursively partition into two communities. Partitioning occurs if the maximum positive eigenvalue is greater than the tolerance (10e-5) for the current partition, and if it results in a positive contribution to the Modularity. Given an initial separation using the leading eigen step, rSpectral then continues to maximise for the change in Modularity using a fine-tuning step - or variate thereof. The first stage here is to find the node which, when moved from one community to another, gives the maximum change in Modularity. This nodeâ s community is then fixed and we repeat the process until all nodes have been moved. The whole process is repeated from this new state until the change in the Modularity, between the new and old state, is less than the predefined tolerance. A slight variant of the fine-tuning step, which can improve speed of the calculation, is also provided. Instead of moving each node into each community in turn, we only consider moves of neighbouring nodes, found in different communities, to the community of the current node of interest. The two steps process is repeatedly applied to each new community found, subdivided each community into two new communities, until we are unable to find any division that results in a positive change in Modularity.
The ORFhunteR package is a R and C++ library for an automatic determination and annotation of open reading frames (ORF) in a large set of RNA molecules. It efficiently implements the machine learning model based on vectorization of nucleotide sequences and the random forest classification algorithm. The ORFhunteR package consists of a set of functions written in the R language in conjunction with C++. The efficiency of the package was confirmed by the examples of the analysis of RNA molecules from the NCBI RefSeq and Ensembl databases. The package can be used in basic and applied biomedical research related to the study of the transcriptome of normal as well as altered (for example, cancer) human cells.
This package provides functionality for calculating pregnancy-related dates and tracking medications during pregnancy and fertility treatment. Calculates due dates from various starting points including last menstrual period and IVF (In Vitro Fertilisation) transfer dates, determines pregnancy progress on any given date, and identifies when specific pregnancy weeks are reached. Includes medication tracking capabilities for individuals undergoing fertility treatment or during pregnancy, allowing users to monitor remaining doses and quantities needed over specified time periods. Designed for those tracking their own pregnancies or supporting partners through the process, making use of options to personalise output messages. For details on due date calculations, see <https://www.acog.org/clinical/clinical-guidance/committee-opinion/articles/2017/05/methods-for-estimating-the-due-date>.
An implementation of the generalized power analysis for the local average treatment effect (LATE), proposed by Bansak (2020) <doi:10.1214/19-STS732>. Power analysis is in the context of estimating the LATE (also known as the complier average causal effect, or CACE), with calculations based on a test of the null hypothesis that the LATE equals 0 with a two-sided alternative. The method uses standardized effect sizes to place a conservative bound on the power under minimal assumptions. Package allows users to recover power, sample size requirements, or minimum detectable effect sizes. Package also allows users to work with absolute effects rather than effect sizes, to specify an additional assumption to narrow the bounds, and to incorporate covariate adjustment.
Sparse principal component analysis (SPCA) attempts to find sparse weight vectors (loadings), i.e., a weight vector with only a few active (nonzero) values. This approach provides better interpretability for the principal components in high-dimensional data settings. This is, because the principal components are formed as a linear combination of only a few of the original variables. This package provides efficient routines to compute SPCA. Specifically, a variable projection solver is used to compute the sparse solution. In addition, a fast randomized accelerated SPCA routine and a robust SPCA routine is provided. Robust SPCA allows to capture grossly corrupted entries in the data. The methods are discussed in detail by N. Benjamin Erichson et al. (2018) <arXiv:1804.00341>.
Estimates time varying regression effects under Cox type models in survival data using classification and regression tree. The codes in this package were originally written in S-Plus for the paper "Survival Analysis with Time-Varying Regression Effects Using a Tree-Based Approach," by Xu, R. and Adak, S. (2002) <doi:10.1111/j.0006-341X.2002.00305.x>, Biometrics, 58: 305-315. Development of this package was supported by NIH grants AG053983 and AG057707, and by the UCSD Altman Translational Research Institute, NIH grant UL1TR001442. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. The example data are from the Honolulu Heart Program/Honolulu Asia Aging Study (HHP/HAAS).
This package provides multiple water chemistry-based models and published empirical models in one standard format. As many models have been included as possible, however, users should be aware that models have varying degrees of accuracy and applicability. To learn more, read the references provided below for the models implemented. Functions can be chained together to model a complete treatment process and are designed to work in a tidyverse workflow. Models are primarily based on these sources: Benjamin, M. M. (2002, ISBN:147862308X), Crittenden, J. C., Trussell, R., Hand, D., Howe, J. K., & Tchobanoglous, G., Borchardt, J. H. (2012, ISBN:9781118131473), USEPA. (2001) <https://www.epa.gov/sites/default/files/2017-03/documents/wtp_model_v._2.0_manual_508.pdf>.
This package provides an extension to ggplot2 (Wickham, 2016, <doi:10.1007/978-3-319-24277-4>) for creating two types of continuous confidence interval plots (Violin CI and Gradient CI plots), typically for the sample mean. These plots contain multiple user-defined confidence areas with varying colours, defined by the underlying t-distribution used to compute standard confidence intervals for the mean of the normal distribution when the variance is unknown. Two types of plots are available, a gradient plot with rectangular areas, and a violin plot where the shape (horizontal width) is defined by the probability density function of the t-distribution. These visualizations are studied in (Helske, Helske, Cooper, Ynnerman, and Besancon, 2021) <doi:10.1109/TVCG.2021.3073466>.