This package provides functions to access data from the BrasilAPI', REST Countries API', Nager.Date API', and World Bank API', related to Brazil's postal codes, banks, holidays, company registrations, international country indicators, public holidays information, and economic development data. Additionally, the package includes curated datasets related to Brazil, covering topics such as demographic data (males and females by state and year), river levels, environmental emission factors, film festivals, and yellow fever outbreak records. The package supports research and analysis focused on Brazil by integrating open APIs with high-quality datasets from multiple domains. For more information on the APIs, see: BrasilAPI <https://brasilapi.com.br/>, Nager.Date <https://date.nager.at/Api>, World Bank API <https://datahelpdesk.worldbank.org/knowledgebase/articles/889392>, and REST Countries API <https://restcountries.com/>.
The main goal of this package is drawing the membership function of the fuzzy p-value which is defined as a fuzzy set on the unit interval for three following problems: (1) testing crisp hypotheses based on fuzzy data, (2) testing fuzzy hypotheses based on crisp data, and (3) testing fuzzy hypotheses based on fuzzy data. In all cases, the fuzziness of data or/and the fuzziness of the boundary of null fuzzy hypothesis transported via the p-value function and causes to produce the fuzzy p-value. If the p-value is fuzzy, it is more appropriate to consider a fuzzy significance level for the problem. Therefore, the comparison of the fuzzy p-value and the fuzzy significance level is evaluated by a fuzzy ranking method in this package.
This package provides tools for Bayesian power analysis and assurance calculations using the statistical frameworks of brms and INLA'. Includes simulation-based approaches, support for multiple decision rules (direction, threshold, ROPE), sequential designs, and visualisation helpers. Methods are based on Kruschke (2014, ISBN:9780124058880) "Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan", O'Hagan & Stevens (2001) <doi:10.1177/0272989X0102100307> "Bayesian Assessment of Sample Size for Clinical Trials of Cost-Effectiveness", Kruschke (2018) <doi:10.1177/2515245918771304> "Rejecting or Accepting Parameter Values in Bayesian Estimation", Rue et al. (2009) <doi:10.1111/j.1467-9868.2008.00700.x> "Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations", and Bürkner (2017) <doi:10.18637/jss.v080.i01> "brms: An R Package for Bayesian Multilevel Models using Stan".
Multi-stage selection is practiced in numerous fields of life and social sciences and particularly in breeding. A special characteristic of multi-stage selection is that candidates are evaluated in successive stages with increasing intensity and effort, and only a fraction of the superior candidates is selected and promoted to the next stage. For the optimum design of such selection programs, the selection gain plays a crucial role. It can be calculated by integration of a truncated multivariate normal (MVN) distribution. While mathematical formulas for calculating the selection gain and the variance among selected candidates were developed long time ago, solutions for numerical calculation were not available. This package can also be used for optimizing multi-stage selection programs for a given total budget and different costs of evaluating the candidates in each stage.
Computes various stability parameters from Additive Main Effects and Multiplicative Interaction (AMMI) analysis results such as Modified AMMI Stability Value (MASV), Sums of the Absolute Value of the Interaction Principal Component Scores (SIPC), Sum Across Environments of Genotype-Environment Interaction Modelled by AMMI (AMGE), Sum Across Environments of Absolute Value of Genotype-Environment Interaction Modelled by AMMI (AV_(AMGE)), AMMI Stability Index (ASI), Modified ASI (MASI), AMMI Based Stability Parameter (ASTAB), Annicchiarico's D Parameter (DA), Zhang's D Parameter (DZ), Averages of the Squared Eigenvector Values (EV), Stability Measure Based on Fitted AMMI Model (FA), Absolute Value of the Relative Contribution of IPCs to the Interaction (Za). Further calculates the Simultaneous Selection Index for Yield and Stability from the computed stability parameters. See the vignette for complete list of citations for the methods implemented.
We introduce an R function one_two_sample() which can deal with one and two (normal) samples, Ying-Ying Zhang, Yi Wei (2012) <doi:10.2991/asshm-13.2013.29>. For one normal sample x, the function reports descriptive statistics, plot, interval estimation and test of hypothesis of x. For two normal samples x and y, the function reports descriptive statistics, plot, interval estimation and test of hypothesis of x and y, respectively. It also reports interval estimation and test of hypothesis of mu1-mu2 (the difference of the means of x and y) and sigma1^2 / sigma2^2 (the ratio of the variances of x and y), tests whether x and y are from the same population, finds the correlation coefficient of x and y if x and y have the same length.
The developed function generates soil salinity indices using satellite data, utilizing multiple spectral bands such as Blue, Green, Red, Near-Infrared (NIR), and Shortwave Infrared (SWIR1, SWIR2). It computes 24 different salinity indices crucial for monitoring and analyzing salt-affected soils efficiently. For more details see, Rani, et al. (2022). <DOI: 10.1007/s12517-022-09682-3>. One of the key features of the developed function is its flexibility. Users can provide any combination of the required spectral bands, and the function will automatically calculate only the relevant indices based on the available data. This dynamic capability ensures that users can maximize the utility of their data without the need for all spectral bands, making the package versatile and user-friendly. Outputs are provided as GeoTIFF file format, facilitating easy integration with GIS workflows.
Develop and evaluate treatment rules based on: (1) the standard indirect approach of split-regression, which fits regressions separately in both treatment groups and assigns an individual to the treatment option under which predicted outcome is more desirable; (2) the direct approach of outcome-weighted-learning proposed by Yingqi Zhao, Donglin Zeng, A. John Rush, and Michael Kosorok (2012) <doi:10.1080/01621459.2012.695674>; (3) the direct approach, which we refer to as direct-interactions, proposed by Shuai Chen, Lu Tian, Tianxi Cai, and Menggang Yu (2017) <doi:10.1111/biom.12676>. Please see the vignette for a walk-through of how to start with an observational dataset whose design is understood scientifically and end up with a treatment rule that is trustworthy statistically, along with an estimation of rule benefit in an independent sample.
Top-Down mass spectrometry aims to identify entire proteins as well as their (post-translational) modifications or ions bound (eg Chen et al (2018) <doi:10.1021/acs.analchem.7b04747>). The pattern of internal fragments (Haverland et al (2017) <doi:10.1007/s13361-017-1635-x>) may reveal important information about the original structure of the proteins studied (Skinner et al (2018) <doi:10.1038/nchembio.2515> and Li et al (2018) <doi:10.1038/nchem.2908>). However, the number of possible internal fragments gets huge with longer proteins and subsequent identification of internal fragments remains challenging, in particular since the the accuracy of measurements with current mass spectrometers represents a limiting factor. This package attempts to deal with the complexity of internal fragments and allows identification of terminal and internal fragments from deconvoluted mass-spectrometry data.
Multi-state models are essential tools in longitudinal data analysis. One primary goal of these models is the estimation of transition probabilities, a critical metric for predicting clinical prognosis across various stages of diseases or medical conditions. Traditionally, inference in multi-state models relies on the Aalen-Johansen (AJ) estimator which is consistent under the Markov assumption. However, in many practical applications, the Markovian nature of the process is often not guaranteed, limiting the applicability of the AJ estimator in more complex scenarios. This package extends the landmark Aalen-Johansen estimator (Putter, H, Spitoni, C (2018) <doi:10.1177/0962280216674497>) incorporating presmoothing techniques described by Soutinho, Meira-Machado and Oliveira (2020) <doi:10.1080/03610918.2020.1762895>, offering a robust alternative for estimating transition probabilities in non-Markovian multi-state models with multiple states and potential reversible transitions.
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. This is a another exporter for org-mode that translates Org-mode file to beautiful PDF file EXAMPLE ORG FILE HEADER: #+title:Readme ox-notes #+author: Matthias David #+options: toc:nil #+ou:Zoom #+quand: 20/2/2021 #+projet: ox-minutes #+absent: C. Robert,T. tartanpion #+present: K. Soulet,I. Payet #+excuse:Sophie Fonsec,Karine Soulet #+logo: logo.png
In self-reported or anonymised data the user often encounters heaped data, i.e. data which are rounded (to a possibly different degree of coarseness). While this is mostly a minor problem in parametric density estimation the bias can be very large for non-parametric methods such as kernel density estimation. This package implements a partly Bayesian algorithm treating the true unknown values as additional parameters and estimates the rounding parameters to give a corrected kernel density estimate. It supports various standard bandwidth selection methods. Varying rounding probabilities (depending on the true value) and asymmetric rounding is estimable as well: Gross, M. and Rendtel, U. (2016) (<doi:10.1093/jssam/smw011>). Additionally, bivariate non-parametric density estimation for rounded data, Gross, M. et al. (2016) (<doi:10.1111/rssa.12179>), as well as data aggregated on areas is supported.
Lag-sequential analysis is a method of assessing of patterns (what tends to follow what?) in sequences of codes. The codes are typically for discrete behaviors or states. The functions in this package read a stream of codes, or a frequency transition matrix, and produce a variety of lag sequential statistics, including transitional frequencies, expected transitional frequencies, transitional probabilities, z values, adjusted residuals, Yule's Q values, likelihood ratio tests of stationarity across time and homogeneity across groups or segments, transformed kappas for unidirectional dependence, bidirectional dependence, parallel and nonparallel dominance, and significance levels based on both parametric and randomization tests. The methods are described in Bakeman & Quera (2011) <doi:10.1017/CBO9781139017343>, O'Connor (1999) <doi:10.3758/BF03200753>, Wampold & Margolin (1982) <doi:10.1037/0033-2909.92.3.755>, and Wampold (1995, ISBN:0-89391-919-5).
The document converter pandoc <https://pandoc.org/> is widely used in the R community. One feature of pandoc is that it can produce and consume JSON-formatted abstract syntax trees (AST). This allows to transform a given source document into JSON-formatted AST, alter it by so called filters and pass the altered JSON-formatted AST back to pandoc'. This package provides functions which allow to write such filters in native R code. Although this package is inspired by the Python package pandocfilters <https://github.com/jgm/pandocfilters/>, it provides additional convenience functions which make it simple to use the pandocfilters package as a report generator. Since pandocfilters inherits most of it's functionality from pandoc it can create documents in many formats (for more information see <https://pandoc.org/>) but is also bound to the same limitations as pandoc'.
This package provides a unified method, called M statistic, is provided for detecting phylogenetic signals in continuous traits, discrete traits, and multi-trait combinations. Blomberg and Garland (2002) <doi:10.1046/j.1420-9101.2002.00472.x> provided a widely accepted statistical definition of the phylogenetic signal, which is the "tendency for related species to resemble each other more than they resemble species drawn at random from the tree". The M statistic strictly adheres to the definition of phylogenetic signal, formulating an index and developing a method of testing in strict accordance with the definition, instead of relying on correlation analysis or evolutionary models. The novel method equivalently expressed the textual definition of the phylogenetic signal as an inequality equation of the phylogenetic and trait distances and constructed the M statistic. Also, there are more distance-based methods under development.
Estimate a suite of normalizing transformations, including a new adaptation of a technique based on ranks which can guarantee normally distributed transformed data if there are no ties: ordered quantile normalization (ORQ). ORQ normalization combines a rank-mapping approach with a shifted logit approximation that allows the transformation to work on data outside the original domain. It is also able to handle new data within the original domain via linear interpolation. The package is built to estimate the best normalizing transformation for a vector consistently and accurately. It implements the Box-Cox transformation, the Yeo-Johnson transformation, three types of Lambert WxF transformations, and the ordered quantile normalization transformation. It estimates the normalization efficacy of other commonly used transformations, and it allows users to specify custom transformations or normalization statistics. Finally, functionality can be integrated into a machine learning workflow via recipes.
Includes bases for litholog generation: graphical functions based on R base graphics, interval management functions and svg importation functions among others. Also include stereographic projection functions, and other functions made to deal with large datasets while keeping options to get into the details of the data. When using for publication please cite Sebastien Wouters, Anne-Christine Da Silva, Frederic Boulvain and Xavier Devleeschouwer, 2021. The R Journal 13:2, 153-178. The palaeomagnetism functions are based on: Tauxe, L., 2010. Essentials of Paleomagnetism. University of California Press. <https://earthref.org/MagIC/books/Tauxe/Essentials/>; Allmendinger, R. W., Cardozo, N. C., and Fisher, D., 2013, Structural Geology Algorithms: Vectors & Tensors: Cambridge, England, Cambridge University Press, 289 pp.; Cardozo, N., and Allmendinger, R. W., 2013, Spherical projections with OSXStereonet: Computers & Geosciences, v. 51, no. 0, p. 193 - 205, <doi: 10.1016/j.cageo.2012.07.021>.
doubletrouble aims to identify duplicated genes from whole-genome protein sequences and classify them based on their modes of duplication. The duplication modes are i. segmental duplication (SD); ii. tandem duplication (TD); iii. proximal duplication (PD); iv. transposed duplication (TRD) and; v. dispersed duplication (DD). Transposon-derived duplicates (TRD) can be further subdivided into rTRD (retrotransposon-derived duplication) and dTRD (DNA transposon-derived duplication). If users want a simpler classification scheme, duplicates can also be classified into SD- and SSD-derived (small-scale duplication) gene pairs. Besides classifying gene pairs, users can also classify genes, so that each gene is assigned a unique mode of duplication. Users can also calculate substitution rates per substitution site (i.e., Ka and Ks) from duplicate pairs, find peaks in Ks distributions with Gaussian Mixture Models (GMMs), and classify gene pairs into age groups based on Ks peaks.
Some tools to assist with converting International Organization for Standardization (ISO) standard 11784 (ISO11784) animal ID codes between 4 recognised formats commonly displayed on Passive Integrated Transponder (PIT) tag readers. The most common formats are 15 digit decimal, e.g., 999123456789012, and 13 character hexadecimal dot format, e.g., 3E7.1CBE991A14. These are referred to in this package as isodecimal and isodothex. The other two formats are the raw hexadecimal representation of the ISO11784 binary structure (see <https://en.wikipedia.org/wiki/ISO_11784_and_ISO_11785>). There are two flavours of this format, a left and a right variation. Which flavour a reader happens to output depends on if the developers decided to reverse the binary number or not before converting to hexadecimal, a decision based on the fact that the PIT tags will transmit their binary code Least Significant Bit (LSB) first, or backwards basically.
Offers a rich and diverse collection of datasets focused on the brain, nervous system, and related disorders. The package includes clinical, experimental, neuroimaging, behavioral, cognitive, and simulated data on conditions such as Parkinson's disease, Alzheimer's disease, dementia, epilepsy, schizophrenia, autism spectrum disorder, attention deficit, hyperactivity disorder, Tourette's syndrome, traumatic brain injury, gliomas, migraines, headaches, sleep disorders, concussions, encephalitis, subarachnoid hemorrhage, and mental health conditions. Datasets cover structural and functional brain data, cross-sectional and longitudinal MRI imaging studies, neurotransmission, gene expression, cognitive performance, intelligence metrics, sleep deprivation effects, treatment outcomes, brain-body relationships across species, neurological injury patterns, and acupuncture interventions. Data sources include peer-reviewed studies, clinical trials, military health records, sports injury databases, and international comparative studies. Designed for researchers, neuroscientists, clinicians, psychologists, data scientists, and students, this package facilitates exploratory data analysis, statistical modeling, and hypothesis testing in neuroscience and neuroepidemiology.
This package provides functions to find edges for bibliometric networks like bibliographic coupling network, co-citation network and co-authorship network. The weights of network edges can be calculated according to different methods, depending on the type of networks, the type of nodes, and what you want to analyse. These functions are optimized to be be used on large dataset. The package contains functions inspired by: Leydesdorff, Loet and Park, Han Woo (2017) <doi:10.1016/j.joi.2016.11.007>; Perianes-Rodriguez, Antonio, Ludo Waltman, and Nees Jan Van Eck (2016) <doi:10.1016/j.joi.2016.10.006>; Sen, Subir K. and Shymal K. Gan (1983) <http://nopr.niscair.res.in/handle/123456789/28008>; Shen, Si, Zhu, Danhao, Rousseau, Ronald, Su, Xinning and Wang, Dongbo (2019) <doi:10.1016/j.joi.2019.01.012>; Zhao, Dangzhi and Strotmann, Andreas (2008) <doi:10.1002/meet.2008.1450450292>.
The Genetic Algorithm (GA) is used to perform changepoint analysis in time series data. The package also includes an extended island version of GA, as described in Lu, Lund, and Lee (2010, <doi:10.1214/09-AOAS289>). By mimicking the principles of natural selection and evolution, GA provides a powerful stochastic search technique for solving combinatorial optimization problems. In changepointGA', each chromosome represents a changepoint configuration, including the number and locations of changepoints, hyperparameters, and model parameters. The package employs genetic operatorsâ selection, crossover, and mutationâ to iteratively improve solutions based on the given fitness (objective) function. Key features of changepointGA include encoding changepoint configurations in an integer format, enabling dynamic and simultaneous estimation of model hyperparameters, changepoint configurations, and associated parameters. The detailed algorithmic implementation can be found in the package vignettes and in the paper of Li (2024, <doi:10.48550/arXiv.2410.15571>).
Cancer cells accumulate DNA mutations as result of DNA damage and DNA repair processes. This computational framework is aimed at deciphering DNA mutational signatures operating in cancer. The framework includes modules that support raw data import and processing, mutational signature extraction, and results interpretation and visualization. The framework accepts widely used file formats storing information about DNA variants, such as Variant Call Format files. The framework performs Non-Negative Matrix Factorization to extract mutational signatures explaining the observed set of DNA mutations. Bootstrapping is performed as part of the analysis. The framework supports parallelization and is optimized for use on multi-core systems. The software was described by Fantini D et al (2020) <doi:10.1038/s41598-020-75062-0> and is based on a custom R-based implementation of the original MATLAB WTSI framework by Alexandrov LB et al (2013) <doi:10.1016/j.celrep.2012.12.008>.
Quantifying systematic heterogeneity in meta-analysis using R. The M statistic aggregates heterogeneity information across multiple variants to, identify systematic heterogeneity patterns and their direction of effect in meta-analysis. It's primary use is to identify outlier studies, which either show "null" effects or consistently show stronger or weaker genetic effects than average across, the panel of variants examined in a GWAS meta-analysis. In contrast to conventional heterogeneity metrics (Q-statistic, I-squared and tau-squared) which measure random heterogeneity at individual variants, M measures systematic (non-random) heterogeneity across multiple independently associated variants. Systematic heterogeneity can arise in a meta-analysis due to differences in the study characteristics of participating studies. Some of the differences may include: ancestry, allele frequencies, phenotype definition, age-of-disease onset, family-history, gender, linkage disequilibrium and quality control thresholds. See <https://magosil86.github.io/getmstatistic/> for statistical statistical theory, documentation and examples.