Routines for model-based functional cluster analysis for functional data with optional covariates. The idea is to cluster functional subjects (often called functional objects) into homogenous groups by using spline smoothers (for functional data) together with scalar covariates. The spline coefficients and the covariates are modelled as a multivariate Gaussian mixture model, where the number of mixtures corresponds to the number of clusters. The parameters of the model are estimated by maximizing the observed mixture likelihood via an EM algorithm (Arnqvist and Sjöstedt de Luna, 2019) <doi:10.48550/arXiv.1904.10265>. The clustering method is used to analyze annual lake sediment from lake Kassjön (Northern Sweden) which cover more than 6400 years and can be seen as historical records of weather and climate.
It offers a sophisticated and versatile tool for creating and evaluating artificial intelligence based neural network models tailored for regression analysis on datasets with continuous target variables. Leveraging the power of neural networks, it allows users to experiment with various hidden neuron configurations across two layers, optimizing model performance through "5 fold"" or "10 fold"" cross validation. The package normalizes input data to ensure efficient training and assesses model accuracy using key metrics such as R squared (R2), Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Percentage Error (PER). By storing and visualizing the best performing models, it provides a comprehensive solution for precise and efficient regression modeling making it an invaluable tool for data scientists and researchers aiming to harness AI for predictive analytics.
Penalized regression methods, such as lasso and elastic net, are used in many biomedical applications when simultaneous regression coefficient estimation and variable selection is desired. However, missing data complicates the implementation of these methods, particularly when missingness is handled using multiple imputation. Applying a variable selection algorithm on each imputed dataset will likely lead to different sets of selected predictors, making it difficult to ascertain a final active set without resorting to ad hoc combination rules. miselect presents Stacked Adaptive Elastic Net (saenet) and Grouped Adaptive LASSO (galasso) for continuous and binary outcomes, developed by Du et al (2022) <doi:10.1080/10618600.2022.2035739>. They, by construction, force selection of the same variables across multiply imputed data. miselect also provides cross validated variants of these methods.
This package provides a method for the multiresolution analysis of spatial fields and images to capture scale-dependent features. mrbsizeR is based on scale space smoothing and uses differences of smooths at neighbouring scales for finding features on different scales. To infer which of the captured features are credible, Bayesian analysis is used. The scale space multiresolution analysis has three steps: (1) Bayesian signal reconstruction. (2) Using differences of smooths, scale-dependent features of the reconstructed signal can be found. (3) Posterior credibility analysis of the differences of smooths created. The method has first been proposed by Holmstrom, Pasanen, Furrer, Sain (2011) <DOI:10.1016/j.csda.2011.04.011> and extended in Flury, Gerber, Schmid and Furrer (2021) <DOI:10.1016/j.spasta.2020.100483>.
The Satellite Application Facility on Climate Monitoring (CM SAF) is a ground segment of the European Organization for the Exploitation of Meteorological Satellites (EUMETSAT) and one of EUMETSATs Satellite Application Facilities. The CM SAF contributes to the sustainable monitoring of the climate system by providing essential climate variables related to the energy and water cycle of the atmosphere (<https://www.cmsaf.eu>). It is a joint cooperation of eight National Meteorological and Hydrological Services. The cmsafvis R-package provides a collection of R-operators for the analysis and visualization of CM SAF NetCDF data. CM SAF climate data records are provided for free via (<https://wui.cmsaf.eu/safira>). Detailed information and test data are provided on the CM SAF webpage (<http://www.cmsaf.eu/R_toolbox>).
Read in and analyze functions for education survey and assessment data from the National Center for Education Statistics (NCES) <https://nces.ed.gov/>, including National Assessment of Educational Progress (NAEP) data <https://nces.ed.gov/nationsreportcard/> and data from the International Assessment Database: Organisation for Economic Co-operation and Development (OECD) <https://www.oecd.org/>, including Programme for International Student Assessment (PISA), Teaching and Learning International Survey (TALIS), Programme for the International Assessment of Adult Competencies (PIAAC), and International Association for the Evaluation of Educational Achievement (IEA) <https://www.iea.nl/>, including Trends in International Mathematics and Science Study (TIMSS), TIMSS Advanced, Progress in International Reading Literacy Study (PIRLS), International Civic and Citizenship Study (ICCS), International Computer and Information Literacy Study (ICILS), and Civic Education Study (CivEd).
Analyse light spectra for visual and non-visual (often called melanopic) needs, wrapped up in a Shiny App. Spectran allows for the import of spectra in various CSV forms but also provides a wide range of example spectra and even the creation of own spectral power distributions. The goal of the app is to provide easy access and a visual overview of the spectral calculations underlying common parameters used in the field. It is thus ideal for educational purposes or the creation of presentation ready graphs in lighting research and application. Spectran uses equations and action spectra described in CIE S026 (2018) <doi:10.25039/S026.2018>, DIN/TS 5031-100 (2021) <doi:10.31030/3287213>, and ISO/CIE 23539 (2023) <doi:10.25039/IS0.CIE.23539.2023>.
Render R Markdown to Markdown (without using knitr), and Markdown to lightweight HTML or LaTeX documents with the commonmark package (instead of Pandoc). Some missing Markdown features in commonmark are also supported, such as raw HTML or LaTeX blocks, LaTeX math, superscripts, subscripts, footnotes, element attributes, and appendices, but not all Pandoc Markdown features are (or will be) supported. With additional JavaScript and CSS, you can also create HTML slides and articles. This package can be viewed as a trimmed-down version of R Markdown and knitr. It does not aim at rich Markdown features or a large variety of output formats (the primary formats are HTML and LaTeX). Book and website projects of multiple input documents are also supported.
This package provides functions to access data from public RESTful APIs including Nager.Date', World Bank API', and REST Countries API', retrieving real-time or historical data related to China, such as holidays, economic indicators, and international demographic and geopolitical indicators. Additionally, the package includes one of the largest curated collections of open datasets focused on China and Hong Kong, covering topics such as air quality, demographics, input-output tables, epidemiology, political structure, names, and social indicators. The package supports reproducible research and teaching by integrating reliable international APIs and structured datasets from public, academic, and government sources. For more information on the APIs, see: Nager.Date <https://date.nager.at/Api>, World Bank API <https://datahelpdesk.worldbank.org/knowledgebase/articles/889392>, and REST Countries API <https://restcountries.com/>.
An approach and software for modelling marine and freshwater ecosystems. It is articulated entirely around trophic levels. EcoTroph's key displays are bivariate plots, with trophic levels as the abscissa, and biomass flows or related quantities as ordinates. Thus, trophic ecosystem functioning can be modelled as a continuous flow of biomass surging up the food web, from lower to higher trophic levels, due to predation and ontogenic processes. Such an approach, wherein species as such disappear, may be viewed as the ultimate stage in the use of the trophic level metric for ecosystem modelling, providing a simplified but potentially useful caricature of ecosystem functioning and impacts of fishing. This version contains catch trophic spectrum analysis (CTSA) function and corrected versions of the mf.diagnosis and create.ETmain functions.
Two implementations of canonical correlation analysis (CCA) that are based on iterated regression. By choosing the appropriate regression algorithm for each data domain, it is possible to enforce sparsity, non-negativity or other kinds of constraints on the projection vectors. Multiple canonical variables are computed sequentially using a generalized deflation scheme, where the additional correlation not explained by previous variables is maximized. nscancor() is used to analyze paired data from two domains, and has the same interface as cancor() from the stats package (plus some extra parameters). mcancor() is appropriate for analyzing data from three or more domains. See <https://sigg-iten.ch/learningbits/2014/01/20/canonical-correlation-analysis-under-constraints/> and Sigg et al. (2007) <doi:10.1109/MLSP.2007.4414315> for more details.
Practitioners of Bayesian statistics often use Markov chain Monte Carlo (MCMC) samplers to sample from a posterior distribution. This package determines whether the MCMC sample is large enough to yield reliable estimates of the target distribution. In particular, this calculates a Gelman-Rubin convergence diagnostic using stable and consistent estimators of Monte Carlo variance. Additionally, this uses the connection between an MCMC sample's effective sample size and the Gelman-Rubin diagnostic to produce a threshold for terminating MCMC simulation. Finally, this informs the user whether enough samples have been collected and (if necessary) estimates the number of samples needed for a desired level of accuracy. The theory underlying these methods can be found in "Revisiting the Gelman-Rubin Diagnostic" by Vats and Knudson (2018) <arXiv:1812:09384>.
This package provides a unifying framework for managing and deploying shiny applications that consist of modules, where an "app" is a tab-based workflow that guides a user step-by-step through an analysis. The shinymgr app builder "stitches" shiny modules together so that outputs from one module serve as inputs to the next, creating an analysis pipeline that is easy to implement and maintain. Users of shinymgr apps can save analyses as an RDS file that fully reproduces the analytic steps and can be ingested into an R Markdown report for rapid reporting. In short, developers use the shinymgr framework to write modules and seamlessly combine them into shiny apps, and users of these apps can execute reproducible analyses that can be incorporated into reports for rapid dissemination.
High performance variant of apply() for a fixed set of functions. Considerable speedup of this implementation is a trade-off for universality: user defined functions cannot be used with this package. However, about 20 most currently employed functions are available for usage. They can be divided in three types: reducing functions (like mean(), sum() etc., giving a scalar when applied to a vector), mapping function (like normalise(), cumsum() etc., giving a vector of the same length as the input vector) and finally, vector reducing function (like diff() which produces result vector of a length different from the length of input vector). Optional or mandatory additional arguments required by some functions (e.g. norm type for norm()) can be passed as named arguments in ...'.
Allow user to run the Adaptive Correlated Spike and Slab (ACSS) algorithm, corresponding INdependent Spike and Slab (INSS) algorithm, and Giannone, Lenza and Primiceri (GLP) algorithm with adaptive burn-in. All of the three algorithms are used to fit high dimensional data set with either sparse structure, or dense structure with smaller contributions from all predictors. The state-of-the-art GLP algorithm is in Giannone, D., Lenza, M., & Primiceri, G. E. (2021, ISBN:978-92-899-4542-4) "Economic predictions with big data: The illusion of sparsity". The two new algorithms, ACSS algorithm and INSS algorithm, and the discussion on their performance can be seen in Yang, Z., Khare, K., & Michailidis, G. (2024, submitted to Journal of Business & Economic Statistics) "Bayesian methodology for adaptive sparsity and shrinkage in regression".
Propagate uncertainty from several estimates when combining these estimates via a function. This is done by using the parametric bootstrap to simulate values from the distribution of each estimate to build up an empirical distribution of the combined parameter. Finally either the percentile method is used or the highest density interval is chosen to derive a confidence interval for the combined parameter with the desired coverage. Gaussian copulas are used for when parameters are assumed to be dependent / correlated. References: Davison and Hinkley (1997,ISBN:0-521-57471-4) for the parametric bootstrap and percentile method, Gelman et al. (2014,ISBN:978-1-4398-4095-5) for the highest density interval, Stockdale et al. (2020)<doi:10.1016/j.jhep.2020.04.008> for an example of combining conditional prevalences.
Estimation of gas transport properties (viscosity, diffusion, thermal conductivity) using Chapman-Enskok theory (Chapman and Larmor 1918, <doi:10.1098/rsta.1918.0005>) and of the second virial coefficient (Vargas et al. 2001, <doi:10.1016/s0378-4371(00)00362-9>) using the Lennard-Jones (12-6) potential. Up to the third order correction is taken into account for viscosity and thermal conductivity. It is also possible to calculate the binary diffusion coefficients of polar and non-polar gases in non-polar bath gases (Brown et al. 2011, <doi:10.1016/j.pecs.2010.12.001>). 16 collision integrals are calculated with four digit accuracy over the reduced temperature range [0.3, 400] using an interpolation function of Kim and Monroe (2014, <doi:10.1016/j.jcp.2014.05.018>).
This package provides functions to upload vectorial data and derive landscape connectivity metrics in habitat or matrix systems. Additionally, includes an approach to assess individual patch contribution to the overall landscape connectivity, enabling the prioritization of habitat patches. The computation of landscape connectivity and patch importance are very useful in Landscape Ecology research. The metrics available are: number of components, number of links, size of the largest component, mean size of components, class coincidence probability, landscape coincidence probability, characteristic path length, expected cluster size, area-weighted flux and integral index of connectivity. Pascual-Hortal, L., and Saura, S. (2006) <doi:10.1007/s10980-006-0013-z> Urban, D., and Keitt, T. (2001) <doi:10.2307/2679983> Laita, A., Kotiaho, J., Monkkonen, M. (2011) <doi:10.1007/s10980-011-9620-4>.
Computes indirect effects, conditional effects, and conditional indirect effects in a structural equation model or path model after model fitting, with no need to define any user parameters or label any paths in the model syntax, using the approach presented in Cheung and Cheung (2024) <doi:10.3758/s13428-023-02224-z>. Can also form bootstrap confidence intervals by doing bootstrapping only once and reusing the bootstrap estimates in all subsequent computations. Supports bootstrap confidence intervals for standardized (partially or completely) indirect effects, conditional effects, and conditional indirect effects as described in Cheung (2009) <doi:10.3758/BRM.41.2.425> and Cheung, Cheung, Lau, Hui, and Vong (2022) <doi:10.1037/hea0001188>. Model fitting can be done by structural equation modeling using lavaan() or regression using lm().
There are two main functions: (1) To estimate the power of testing for linkage using an affected sib pair design, as a function of the recurrence risk ratios. We will use analytical power formulae as implemented in R. These are based on a Mathematica notebook created by Martin Farrall. (2) To examine how the power of the transmission disequilibrium test (TDT) depends on the disease allele frequency, the marker allele frequency, the strength of the linkage disequilibrium, and the magnitude of the genetic effect. We will use an R program that implements the power formulae of Abel and Muller-Myhsok (1998). These formulae allow one to quickly compute power of the TDT approach under a variety of different conditions. This R program was modeled on Martin Farrall's Mathematica notebook.
The first goal of this package is to provide a multitude of tree models, i.e., functions that generate rooted binary trees with a given number of leaves. Second, the package allows for an easy evaluation and comparison of tree shape statistics by estimating their power to differentiate between different tree models. Please note that this R package was developed alongside the manuscript "Tree balance in phylogenetic models" by S. J. Kersting, K. Wicke, and M. Fischer (2024) <doi:10.48550/arXiv.2406.05185>, which provides further background and the respective mathematical definitions. This project was supported by the project ArtIGROW, which is a part of the WIR!-Alliance ArtIFARM â Artificial Intelligence in Farming funded by the German Federal Ministry of Education and Research (No. 03WIR4805).
Implementation of statistical methods for the estimation of toroidal diffusions. Several diffusive models are provided, most of them belonging to the Langevin family of diffusions on the torus. Specifically, the wrapped normal and von Mises processes are included, which can be seen as toroidal analogues of the Ornstein-Uhlenbeck diffusion. A collection of methods for approximate maximum likelihood estimation, organized in four blocks, is given: (i) based on the exact transition probability density, obtained as the numerical solution to the Fokker-Plank equation; (ii) based on wrapped pseudo-likelihoods; (iii) based on specific analytic approximations by wrapped processes; (iv) based on maximum likelihood of the stationary densities. The package allows the replicability of the results in Garcà a-Portugués et al. (2019) <doi:10.1007/s11222-017-9790-2>.
Under a different representation of the multivariate normal (MVN) probability, we can use the Vecchia approximation to sample the integrand at a linear complexity with respect to n. Additionally, both the SOV algorithm from Genz (92) and the exponential-tilting method from Botev (2017) can be adapted to linear complexity. The reference for the method implemented in this package is Jian Cao and Matthias Katzfuss (2024) "Linear-Cost Vecchia Approximation of Multivariate Normal Probabilities" <doi:10.48550/arXiv.2311.09426>. Two major references for the development of our method are Alan Genz (1992) "Numerical Computation of Multivariate Normal Probabilities" <doi:10.1080/10618600.1992.10477010> and Z. I. Botev (2017) "The Normal Law Under Linear Restrictions: Simulation and Estimation via Minimax Tilting" <doi:10.48550/arXiv.1603.04166>.
Most human genes have multiple promoters that control the expression of different isoforms. The use of these alternative promoters enables the regulation of isoform expression pre-transcriptionally. Alternative promoters have been found to be important in a wide number of cell types and diseases. proActiv is an R package that enables the analysis of promoters from RNA-seq data. proActiv uses aligned reads as input, and generates counts and normalized promoter activity estimates for each annotated promoter. In particular, proActiv accepts junction files from TopHat2 or STAR or BAM files as inputs. These estimates can then be used to identify which promoter is active, which promoter is inactive, and which promoters change their activity across conditions. proActiv also allows visualization of promoter activity across conditions.