This package provides methods for the computation of surface/image texture indices using a geostatistical based approach (Trevisani et al. (2023) <doi:10.1016/j.geomorph.2023.108838>). It provides various functions for the computation of surface texture indices (e.g., omnidirectional roughness and roughness anisotropy), including the ones based on the robust MAD estimator. The kernels included in the software permit also to calculate the surface/image texture indices directly from the input surface (i.e., without de-trending) using increments of order 2. It also provides the new radial roughness index (RRI), representing the improvement of the popular topographic roughness index (TRI). The framework can be easily extended with ad-hoc surface/image texture indices.
A workflow is an object that can bundle together your pre-processing, modeling, and post-processing requests. For example, if you have a recipe
and parsnip
model, these can be combined into a workflow. The advantages are:
You don’t have to keep track of separate objects in your workspace.
The recipe prepping and model fitting can be executed using a single call to
fit()
.If you have custom tuning parameter settings, these can be defined using a simpler interface when combined with
tune
.In the future, workflows will be able to add post-processing operations, such as modifying the probability cutoff for two-class models.
Survival analysis is employed to model the time it takes for events to occur. Survival model examines the relationship between survival and one or more predictors, usually termed covariates in the survival-analysis literature. To this end, Cox-proportional (Cox-PH) hazard rate model introduced in a seminal paper by Cox (1972) <doi:10.1111/j.2517-6161.1972.tb00899.x>, is a broadly applicable and the most widely used method of survival analysis. This package can be used to estimate the effect of fixed and time-dependent covariates and also to compute the survival probabilities of the lactation of dairy animal. This package has been developed using algorithm of Klein and Moeschberger (2003) <doi:10.1007/b97377>.
Essential Biodiversity Variables (EBV) are state variables with dimensions on time, space, and biological organization that document biodiversity change. Freely available ecosystem remote sensing products (ERSP) are downloaded and integrated with data for national or regional domains to derive indicators for EBV in the class ecosystem structure (Pereira et al., 2013) <doi:10.1126/science.1229931>, including horizontal ecosystem extents, fragmentation, and information-theory indices. To process ERSP, users must provide a polygon or geographic administrative data map. Downloadable ERSP include Global Surface Water (Peckel et al., 2016) <doi:10.1038/nature20584>, Forest Change (Hansen et al., 2013) <doi:10.1126/science.1244693>, and Continuous Tree Cover data (Sexton et al., 2013) <doi:10.1080/17538947.2013.786146>.
Process GPS and accelerometry data to generate walk bouts. A walk bout is a period of activity with accelerometer movement matching the patterns of walking with corresponding GPS measurements that confirm travel. The inputs of the walkboutr package are individual-level accelerometry and GPS data. The outputs of the model are walk bouts with corresponding times, duration, and summary statistics on the sample population, which collapse all personally identifying information. These bouts can be used to measure walking both as an outcome of a change to the built environment or as a predictor of health outcomes such as a cardioprotective behavior. Kang B, Moudon AV, Hurvitz PM, Saelens BE (2017) <doi:10.1016/j.trd.2017.09.026>.
The NCBI Gene Expression Omnibus (GEO) represents the largest public repository of microarray data. However, finding data of interest can be challenging using current tools. GEOmetadb is an attempt to make access to the metadata associated with samples, platforms, and datasets much more feasible. This is accomplished by parsing all the NCBI GEO metadata into a SQLite database that can be stored and queried locally. GEOmetadb is simply a thin wrapper around the SQLite database along with associated documentation. Finally, the SQLite database is updated regularly as new data is added to GEO and can be downloaded at will for the most up-to-date metadata. GEOmetadb paper: http://bioinformatics.oxfordjournals.org/cgi/content/short/24/23/2798 .
This R package supports the handling and analysis of imaging mass cytometry and other highly multiplexed imaging data. The main functionality includes reading in single-cell data after image segmentation and measurement, data formatting to perform channel spillover correction and a number of spatial analysis approaches. First, cell-cell interactions are detected via spatial graph construction; these graphs can be visualized with cells representing nodes and interactions representing edges. Furthermore, per cell, its direct neighbours are summarized to allow spatial clustering. Per image/grouping level, interactions between types of cells are counted, averaged and compared against random permutations. In that way, types of cells that interact more (attraction) or less (avoidance) frequently than expected by chance are detected.
Computes and integrates daily potential evapotranspiration (PET) and a soil water balance model. It allows users to estimate and predict the wet season calendar, including onset, cessation, and duration, based on an agroclimatic approach for a specified period. This functionality helps in managing agricultural water resources more effectively. For detailed methodologies, users can refer to Allen et al. (1998, ISBN:92-5-104219-5); Allen (2005, ISBN:9780784408056); Doorenbos and Pruitt (1975, ISBN:9251002797); Guo et al. (2016) <doi:10.1016/j.envsoft.2015.12.019>; Hargreaves and Samani (1985) <doi:10.13031/2013.26773>; Priestley and Taylor (1972) <https://journals.ametsoc.org/view/journals/apme/18/7/1520-0450_1979_018_0898_tptema_2_0_co_2.xml>.
Computational topology, which enables topological data analysis (TDA), makes pervasive use of abstract mathematical objects called simplicial complexes; see Edelsbrunner and Harer (2010) <doi:10.1090/mbk/069>. Several R packages and other software libraries used through an R interface construct and use data structures that represent simplicial complexes, including mathematical graphs viewed as 1-dimensional complexes. This package provides coercers (converters) between these data structures. Currently supported structures are complete lists of simplices as used by TDA'; the simplex trees of Boissonnat and Maria (2014) <doi:10.1007/s00453-014-9887-3> as implemented in simplextree and in Python GUDHI (by way of reticulate'); and the graph classes of igraph and network', by way of the intergraph package.
This package provides a collection of parametric and nonparametric methods for the analysis of survival data. Parametric families implemented include Gompertz-Makeham, exponential and generalized Pareto models and extended models. The package includes an implementation of the nonparametric maximum likelihood estimator for arbitrary truncation and censoring pattern based on Turnbull (1976) <doi:10.1111/j.2517-6161.1976.tb01597.x>, along with graphical goodness-of-fit diagnostics. Parametric models for positive random variables and peaks over threshold models based on extreme value theory are described in Rootzén and Zholud (2017) <doi:10.1007/s10687-017-0305-5>; Belzile et al. (2021) <doi:10.1098/rsos.202097> and Belzile et al. (2022) <doi:10.1146/annurev-statistics-040120-025426>.
Random forests are a statistical learning method widely used in many areas of scientific research essentially for its ability to learn complex relationships between input and output variables and also its capacity to handle high-dimensional data. However, current random forests approaches are not flexible enough to handle longitudinal data. In this package, we propose a general approach of random forests for high-dimensional longitudinal data. It includes a flexible stochastic model which allows the covariance structure to vary over time. Furthermore, we introduce a new method which takes intra-individual covariance into consideration to build random forests. The method is fully detailled in Capitaine et.al. (2020) <doi:10.1177/0962280220946080> Random forests for high-dimensional longitudinal data.
This package provides visual citations containing the metadata of a scientific paper and a QR code. A visual citation is a banner containing title, authors, journal and year of a publication. This package can create such banners based on BibTeX
and BibLaTeX
references or call the reference metadata from Crossref'-API. The banners include a QR code pointing to the DOI'. The resulting HTML object or PNG image can be included in a presentation to point the audience to good resources for further reading. Styling is possible via predefined designs or via custom CSS'. This package is not intended as replacement for proper reference manager packages, but a tool to enrich scientific presentation slides and conference posters.
Sankey diagrams are a powerfull and visually attractive way to visualize the flow of conservative substances through a system. They typically consists of a network of nodes, and fluxes between them, where the total balance in each internal node is 0, i.e. input equals output. Sankey diagrams are typically used to display energy systems, material flow accounts etc. Unlike so-called alluvial plots, Sankey diagrams also allow for cyclic flows: flows originating from a single node can, either direct or indirect, contribute to the input of that same node. This package, named after the Greek aphorism Panta Rhei (everything flows), provides functions to create publication-quality diagrams, using data in tables (or spread sheets) and a simple syntax.
Extraction, preparation, visualisation and analysis of TERN AusPlots
ecosystem monitoring data. Direct access to plot-based data on vegetation and soils across Australia, including physical sample barcode numbers. Simple function calls extract the data and merge them into species occurrence matrices for downstream analysis, or calculate things like basal area and fractional cover. TERN AusPlots
is a national field plot-based ecosystem surveillance monitoring method and dataset for Australia. The data have been collected across a national network of plots and transects by the Terrestrial Ecosystem Research Network (TERN - <https://www.tern.org.au>), an Australian Government NCRIS-enabled project, and its Ecosystem Surveillance platform (<https://www.tern.org.au/tern-land-observatory/ecosystem-surveillance-and-environmental-monitoring/>).
This package provides a collection of functions for top-down exploratory data analysis of spectral data including nuclear magnetic resonance (NMR), infrared (IR), Raman, X-ray fluorescence (XRF) and other similar types of spectroscopy. Includes functions for plotting and inspecting spectra, peak alignment, hierarchical cluster analysis (HCA), principal components analysis (PCA) and model-based clustering. Robust methods appropriate for this type of high-dimensional data are available. ChemoSpec
is designed for structured experiments, such as metabolomics investigations, where the samples fall into treatment and control groups. Graphical output is formatted consistently for publication quality plots. ChemoSpec
is intended to be very user friendly and to help you get usable results quickly. A vignette covering typical operations is available.
Latent Class Analysis of phenotypic measurements in pedigrees and model selection based on one of two methods: likelihood-based cross-validation and Bayesian Information Criterion. Computation of individual and triplet child-parents weights in a pedigree is performed using an upward-downward algorithm. The model takes into account the familial dependence defined by the pedigree structure by considering that a class of a child depends on his parents classes via triplet-transition probabilities of the classes. The package handles the case where measurements are available on all subjects and the case where measurements are available only on symptomatic (i.e. affected) subjects. Distributions for discrete (or ordinal) and continuous data are currently implemented. The package can deal with missing data.
This package provides efficient methods to compute co-occurrence matrices, pointwise mutual information (PMI) and singular value decomposition (SVD). In the biomedical and clinical settings, one challenge is the huge size of databases, e.g. when analyzing data of millions of patients over tens of years. To address this, this package provides functions to efficiently compute monthly co-occurrence matrices, which is the computational bottleneck of the analysis, by using the RcppAlgos
package and sparse matrices. Furthermore, the functions can be called on SQL databases, enabling the computation of co-occurrence matrices of tens of gigabytes of data, representing millions of patients over tens of years. Partly based on Hong C. (2021) <doi:10.1038/s41746-021-00519-z>.
This package provides a framework that joins topic modeling and sentiment analysis of textual data. The package implements a fast Gibbs sampling estimation of Latent Dirichlet Allocation (Griffiths and Steyvers (2004) <doi:10.1073/pnas.0307752101>) and Joint Sentiment/Topic Model (Lin, He, Everson and Ruger (2012) <doi:10.1109/TKDE.2011.48>). It offers a variety of helpers and visualizations to analyze the result of topic modeling. The framework also allows enriching topic models with dates and externally computed sentiment measures. A flexible aggregation scheme enables the creation of time series of sentiment or topical proportions from the enriched topic models. Moreover, a novel method jointly aggregates topic proportions and sentiment measures to derive time series of topical sentiment.
Several classes for moment-based models are defined. The classes are defined for moment conditions derived from a single equation or a system of equations. The conditions can also be expressed as functions or formulas. Several methods are also offered to facilitate the development of different estimation techniques. The methods that are currently provided are the Generalized method of moments (Hansen 1982; <doi:10.2307/1912775>), for single equations and systems of equation, and the Generalized Empirical Likelihood (Smith 1997; <doi:10.1111/j.0013-0133.1997.174.x>, Kitamura 1997; <doi:10.1214/aos/1069362388>, Newey and Smith 2004; <doi:10.1111/j.1468-0262.2004.00482.x>, and Anatolyev 2005 <doi:10.1111/j.1468-0262.2005.00601.x>).
An implementation of the generalized power analysis for the local average treatment effect (LATE), proposed by Bansak (2020) <doi:10.1214/19-STS732>. Power analysis is in the context of estimating the LATE (also known as the complier average causal effect, or CACE), with calculations based on a test of the null hypothesis that the LATE equals 0 with a two-sided alternative. The method uses standardized effect sizes to place a conservative bound on the power under minimal assumptions. Package allows users to recover power, sample size requirements, or minimum detectable effect sizes. Package also allows users to work with absolute effects rather than effect sizes, to specify an additional assumption to narrow the bounds, and to incorporate covariate adjustment.
Sparse principal component analysis (SPCA) attempts to find sparse weight vectors (loadings), i.e., a weight vector with only a few active (nonzero) values. This approach provides better interpretability for the principal components in high-dimensional data settings. This is, because the principal components are formed as a linear combination of only a few of the original variables. This package provides efficient routines to compute SPCA. Specifically, a variable projection solver is used to compute the sparse solution. In addition, a fast randomized accelerated SPCA routine and a robust SPCA routine is provided. Robust SPCA allows to capture grossly corrupted entries in the data. The methods are discussed in detail by N. Benjamin Erichson et al. (2018) <arXiv:1804.00341>
.
Estimates time varying regression effects under Cox type models in survival data using classification and regression tree. The codes in this package were originally written in S-Plus for the paper "Survival Analysis with Time-Varying Regression Effects Using a Tree-Based Approach," by Xu, R. and Adak, S. (2002) <doi:10.1111/j.0006-341X.2002.00305.x>, Biometrics, 58: 305-315. Development of this package was supported by NIH grants AG053983 and AG057707, and by the UCSD Altman Translational Research Institute, NIH grant UL1TR001442. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. The example data are from the Honolulu Heart Program/Honolulu Asia Aging Study (HHP/HAAS).
The ORFhunteR
package is a R and C++ library for an automatic determination and annotation of open reading frames (ORF) in a large set of RNA molecules. It efficiently implements the machine learning model based on vectorization of nucleotide sequences and the random forest classification algorithm. The ORFhunteR
package consists of a set of functions written in the R language in conjunction with C++. The efficiency of the package was confirmed by the examples of the analysis of RNA molecules from the NCBI RefSeq
and Ensembl databases. The package can be used in basic and applied biomedical research related to the study of the transcriptome of normal as well as altered (for example, cancer) human cells.
This package implements the network clustering algorithm described in Newman (2006) <doi:10.1103/PhysRevE.74.036104>
. The complete iterative algorithm comprises of two steps. In the first step, the network is expressed in terms of its leading eigenvalue and eigenvector and recursively partition into two communities. Partitioning occurs if the maximum positive eigenvalue is greater than the tolerance (10e-5) for the current partition, and if it results in a positive contribution to the Modularity. Given an initial separation using the leading eigen step, rSpectral
then continues to maximise for the change in Modularity using a fine-tuning step - or variate thereof. The first stage here is to find the node which, when moved from one community to another, gives the maximum change in Modularity. This nodeâ s community is then fixed and we repeat the process until all nodes have been moved. The whole process is repeated from this new state until the change in the Modularity, between the new and old state, is less than the predefined tolerance. A slight variant of the fine-tuning step, which can improve speed of the calculation, is also provided. Instead of moving each node into each community in turn, we only consider moves of neighbouring nodes, found in different communities, to the community of the current node of interest. The two steps process is repeatedly applied to each new community found, subdivided each community into two new communities, until we are unable to find any division that results in a positive change in Modularity.