Infer constant and stochastic, time-dependent parameters to consider intrinsic stochasticity of a dynamic model and/or to analyze model structure modifications that could reduce model deficits. The concept is based on inferring time-dependent parameters as stochastic processes in the form of Ornstein-Uhlenbeck processes jointly with inferring constant model parameters and parameters of the Ornstein-Uhlenbeck processes. The package also contains functions to sample from and calculate densities of Ornstein-Uhlenbeck processes. References: Tomassini, L., Reichert, P., Kuensch, H.-R. Buser, C., Knutti, R. and Borsuk, M.E. (2009), A smoothing algorithm for estimating stochastic, continuous-time model parameters and its application to a simple climate model, Journal of the Royal Statistical Society: Series C (Applied Statistics) 58, 679-704, <doi:10.1111/j.1467-9876.2009.00678.x> Reichert, P., and Mieleitner, J. (2009), Analyzing input and structural uncertainty of nonlinear dynamic models with stochastic, time-dependent parameters. Water Resources Research, 45, W10402, <doi:10.1029/2009WR007814> Reichert, P., Ammann, L. and Fenicia, F. (2021), Potential and challenges of investigating intrinsic uncertainty of hydrological models with time-dependent, stochastic parameters. Water Resources Research 57(8), e2020WR028311, <doi:10.1029/2020WR028311> Reichert, P. (2022), timedeppar: An R package for inferring stochastic, time-dependent model parameters, in preparation.
Efficiently implementing two complementary methodologies for discovering motifs in functional data: ProbKMA and FunBIalign. Cremona and Chiaromonte (2023) "Probabilistic K-means with Local Alignment for Clustering and Motif Discovery in Functional Data" <doi:10.1080/10618600.2022.2156522> is a probabilistic K-means algorithm that leverages local alignment and fuzzy clustering to identify recurring patterns (candidate functional motifs) across and within curves, allowing different portions of the same curve to belong to different clusters. It includes a family of distances and a normalization to discover various motif types and learns motif lengths in a data-driven manner. It can also be used for local clustering of misaligned data. Di Iorio, Cremona, and Chiaromonte (2023) "funBIalign: A Hierarchical Algorithm for Functional Motif Discovery Based on Mean Squared Residue Scores" <doi:10.48550/arXiv.2306.04254> applies hierarchical agglomerative clustering with a functional generalization of the Mean Squared Residue Score to identify motifs of a specified length in curves. This deterministic method includes a small set of user-tunable parameters. Both algorithms are suitable for single curves or sets of curves. The package also includes a flexible function to simulate functional data with embedded motifs, allowing users to generate benchmark datasets for validating and comparing motif discovery methods.
Monolix is a tool for running mixed effects model using saem'. This tool allows you to convert Monolix models to rxode2 (Wang, Hallow and James (2016) <doi:10.1002/psp4.12052>) using the form compatible with nlmixr2 (Fidler et al (2019) <doi:10.1002/psp4.12445>). If available, the rxode2 model will read in the Monolix data and compare the simulation for the population model individual model and residual model to immediately show how well the translation is performing. This saves the model development time for people who are creating an rxode2 model manually. Additionally, this package reads in all the information to allow simulation with uncertainty (that is the number of observations, the number of subjects, and the covariance matrix) with a rxode2 model. This is complementary to the babelmixr2 package that translates nlmixr2 models to Monolix and can convert the objects converted from monolix2rx to a full nlmixr2 fit. While not required, you can get/install the lixoftConnectors package in the Monolix installation, as described at the following url <https://monolixsuite.slp-software.com/r-functions/2024R1/installation-and-initialization>. When lixoftConnectors is available, Monolix can be used to load its model library instead manually setting up text files (which only works with old versions of Monolix').
This package contains functions for hidden Markov models with observations having extra zeros as defined in the following two publications, Wang, T., Zhuang, J., Obara, K. and Tsuruoka, H. (2016) <doi:10.1111/rssc.12194>; Wang, T., Zhuang, J., Buckby, J., Obara, K. and Tsuruoka, H. (2018) <doi:10.1029/2017JB015360>. The observed response variable is either univariate or bivariate Gaussian conditioning on presence of events, and extra zeros mean that the response variable takes on the value zero if nothing is happening. Hence the response is modelled as a mixture distribution of a Bernoulli variable and a continuous variable. That is, if the Bernoulli variable takes on the value 1, then the response variable is Gaussian, and if the Bernoulli variable takes on the value 0, then the response is zero too. This package includes functions for simulation, parameter estimation, goodness-of-fit, the Viterbi algorithm, and plotting the classified 2-D data. Some of the functions in the package are based on those of the R package HiddenMarkov by David Harte. This updated version has included an example dataset and R code examples to show how to transform the data into the objects needed in the main functions. We have also made changes to increase the speed of some of the functions.
This package provides functions are included for recalling AQL (Acceptable Quality Level or Acceptance Quality Level) Based single, double, and multiple attribute sampling plans from the Military Standard (MIL-STD-105E) - American National Standards Institute/American Society for Quality (ANSI/ASQ Z1.4) tables and for retrieving variable sampling plans from Military Standard (MIL-STD-414) - American National Standards Institute/American Society for Quality (ANSI/ASQ Z1.9) tables. The sources for these tables are listed in the URL: field. Also included are functions for computing the OC (Operating Characteristic) and ASN (Average Sample Number) coordinates for the attribute plans it recalls, and functions for computing the estimated proportion nonconforming and the maximum allowable proportion nonconforming for variable sampling plans. The MIL-STD AQL Sampling schemes were the most used and copied set of standards in the world. They are intended to be used for sampling a stream of lots, and were used in contract agreements between supplier and customer companies. When the US military dropped support of MIL-STD 105E and 414, The American National Standards Institute (ANSI) and the International Standards Organization (ISO) adopted the standard with few changes or no changes to the central tables. This package is useful because its computer implementation of these tables duplicates that available in other commercial software and subscription online calculators.
Motivation: The understanding of cancer mechanism requires the identification of genes playing a role in the development of the pathology and the characterization of their role (notably oncogenes and tumor suppressors). Results: We present an R/bioconductor package called MoonlightR which returns a list of candidate driver genes for specific cancer types on the basis of TCGA expression data. The method first infers gene regulatory networks and then carries out a functional enrichment analysis (FEA) (implementing an upstream regulator analysis, URA) to score the importance of well-known biological processes with respect to the studied cancer type. Eventually, by means of random forests, MoonlightR predicts two specific roles for the candidate driver genes: i) tumor suppressor genes (TSGs) and ii) oncogenes (OCGs). As a consequence, this methodology does not only identify genes playing a dual role (e.g. TSG in one cancer type and OCG in another) but also helps in elucidating the biological processes underlying their specific roles. In particular, MoonlightR can be used to discover OCGs and TSGs in the same cancer type. This may help in answering the question whether some genes change role between early stages (I, II) and late stages (III, IV) in breast cancer. In the future, this analysis could be useful to determine the causes of different resistances to chemotherapeutic treatments.
Generates Muller plot from parental/genealogy/phylogeny information and population/abundance/frequency dynamics data. Muller plots are plots which combine information about succession of different OTUs (genotypes, phenotypes, species, ...) and information about dynamics of their abundances (populations or frequencies) over time. They are powerful and fascinating tools to visualize evolutionary dynamics. They may be employed also in study of diversity and its dynamics, i.e. how diversity emerges and how changes over time. They are called Muller plots in honor of Hermann Joseph Muller which used them to explain his idea of Muller's ratchet (Muller, 1932, American Naturalist). A big difference between Muller plots and normal box plots of abundances is that a Muller plot depicts not only the relative abundances but also succession of OTUs based on their genealogy/phylogeny/parental relation. In a Muller plot, horizontal axis is time/generations and vertical axis represents relative abundances of OTUs at the corresponding times/generations. Different OTUs are usually shown with polygons with different colors and each OTU originates somewhere in the middle of its parent area in order to illustrate their succession in evolutionary process. To generate a Muller plot one needs the genealogy/phylogeny/parental relation of OTUs and their abundances over time. MullerPlot package has the tools to generate Muller plots which clearly depict the origin of successors of OTUs.
Supervised, multivariate, and non-parametric discretization algorithm based on tree ensembles learning and moment matching optimization. This version of the algorithm relies on random forest algorithm to learn a large set of split points that conserves the relationship between attributes and the target class, and on moment matching optimization to transform this set into a reduced number of cut points matching as well as possible statistical properties of the initial set of split points. For each attribute to be discretized, the set S of its related split points extracted through random forest is mapped to a reduced set C of cut points of size k. This mapping relies on minimizing, for each continuous attribute to be discretized, the distance between the four first moments of S and the four first moments of C subject to some constraints. This non-linear optimization problem is performed using k values ranging from 2 to max_splits', and the best solution returned correspond to the value k which optimum solution is the lowest one over the different realizations. ForestDisc is a generalization of RFDisc discretization method initially proposed by Berrado and Runger (2009) <doi:10.1109/AICCSA.2009.5069327>, and improved by Berrado et al. in 2012 by adopting the idea of moment matching optimization related by Hoyland and Wallace (2001) <doi: 10.1287/mnsc.47.2.295.9834>.
Creating, and refining data nuggets. Data nuggets reduce a large dataset into a small collection of nuggets of data, each containing a center (location), weight (importance), and scale (variability) parameter. Data nugget centers are created by choosing observations in the dataset which are as equally spaced apart as possible. Data nugget weights are created by counting the number observations closest to a given data nugget center. We then say the data nugget contains these observations and the data nugget center is recalculated as the mean of these observations. Data nugget scales are created by calculating the trace of the covariance matrix of the observations contained within a data nugget divided by the dimension of the dataset. Data nuggets are refined by splitting data nuggets which have scales or shapes (defined as the ratio of the two largest eigenvalues of the covariance matrix of the observations contained within the data nugget) Reference paper: [1] Beavers, T. E., Cheng, G., Duan, Y., Cabrera, J., Lubomirski, M., Amaratunga, D., & Teigler, J. E. (2024). Data Nuggets: A Method for Reducing Big Data While Preserving Data Structure. Journal of Computational and Graphical Statistics, 1-21. [2] Cherasia, K. E., Cabrera, J., Fernholz, L. T., & Fernholz, R. (2022). Data Nuggets in Supervised Learning. \emphIn Robust and Multivariate Statistical Methods: Festschrift in Honor of David E. Tyler (pp. 429-449). Cham: Springer International Publishing.
Genomic signatures represent unique features within a species DNA, enabling the differentiation of species and offering broad applications across various fields. This package provides essential tools for calculating these specific signatures, streamlining the process for researchers and offering a comprehensive and time-saving solution for genomic analysis.The amino acid contents are identified based on the work published by Sandberg et al. (2003) <doi:10.1016/s0378-1119(03)00581-x> and Xiao et al. (2015) <doi:10.1093/bioinformatics/btv042>. The Average Mutual Information Profiles (AMIP) values are calculated based on the work of Bauer et al. (2008) <doi:10.1186/1471-2105-9-48>. The Chaos Game Representation (CGR) plot visualization was done based on the work of Deschavanne et al. (1999) <doi:10.1093/oxfordjournals.molbev.a026048> and Jeffrey et al. (1990) <doi:10.1093/nar/18.8.2163>. The GC content is calculated based on the work published by Nakabachi et al. (2006) <doi:10.1126/science.1134196> and Barbu et al. (1956) <https://pubmed.ncbi.nlm.nih.gov/13363015>. The Oligonucleotide Frequency Derived Error Gradient (OFDEG) values are computed based on the work published by Saeed et al. (2009) <doi:10.1186/1471-2164-10-S3-S10>. The Relative Synonymous Codon Usage (RSCU) values are calculated based on the work published by Elek (2018) <https://urn.nsk.hr/urn:nbn:hr:217:686131>.
Assists in automating the selection of terms to include in mixed models when asreml is used to fit the models. Procedures are available for choosing models that conform to the hierarchy or marginality principle, for fitting and choosing between two-dimensional spatial models using correlation, natural cubic smoothing spline and P-spline models. A history of the fitting of a sequence of models is kept in a data frame. Also used to compute functions and contrasts of, to investigate differences between and to plot predictions obtained using any model fitting function. The content falls into the following natural groupings: (i) Data, (ii) Model modification functions, (iii) Model selection and description functions, (iv) Model diagnostics and simulation functions, (v) Prediction production and presentation functions, (vi) Response transformation functions, (vii) Object manipulation functions, and (viii) Miscellaneous functions (for further details see asremlPlus-package in help). The asreml package provides a computationally efficient algorithm for fitting a wide range of linear mixed models using Residual Maximum Likelihood. It is a commercial package and a license for it can be purchased from VSNi <https://vsni.co.uk/> as asreml-R', who will supply a zip file for local installation/updating (see <https://asreml.kb.vsni.co.uk/>). It is not needed for functions that are methods for alldiffs and data.frame objects. The package asremPlus can also be installed from <http://chris.brien.name/rpackages/>.
This package provides methods for working with dose-finding clinical trials. We provide implementations of many dose-finding clinical trial designs, including the continual reassessment method (CRM) by O'Quigley et al. (1990) <doi:10.2307/2531628>, the toxicity probability interval (TPI) design by Ji et al. (2007) <doi:10.1177/1740774507079442>, the modified TPI (mTPI) design by Ji et al. (2010) <doi:10.1177/1740774510382799>, the Bayesian optimal interval design (BOIN) by Liu & Yuan (2015) <doi:10.1111/rssc.12089>, EffTox by Thall & Cook (2004) <doi:10.1111/j.0006-341X.2004.00218.x>; the design of Wages & Tait (2015) <doi:10.1080/10543406.2014.920873>, and the 3+3 described by Korn et al. (1994) <doi:10.1002/sim.4780131802>. All designs are implemented with a common interface. We also offer optional additional classes to tailor the behaviour of all designs, including avoiding skipping doses, stopping after n patients have been treated at the recommended dose, stopping when a toxicity condition is met, or demanding that n patients are treated before stopping is allowed. By daisy-chaining together these classes using the pipe operator from magrittr', it is simple to tailor the behaviour of a dose-finding design so it behaves how the trialist wants. Having provided a flexible interface for specifying designs, we then provide functions to run simulations and calculate dose-paths for future cohorts of patients.
This package provides a collection of methods to estimate parameters of different tempered stable distributions (TSD). Currently, there are seven different tempered stable distributions to choose from: Tempered stable subordinator distribution, classical TSD, generalized classical TSD, normal TSD, modified TSD, rapid decreasing TSD, and Kim-Rachev TSD. The package also provides functions to compute density and probability functions and tools to run Monte Carlo simulations. This package has already been used for the estimation of tempered stable distributions (Massing (2023) <arXiv:2303.07060>). The following references form the theoretical background for various functions in this package. References for each function are explicitly listed in its documentation: Bianchi et al. (2010) <doi:10.1007/978-88-470-1481-7_4> Bianchi et al. (2011) <doi:10.1137/S0040585X97984632> Carrasco (2017) <doi:10.1017/S0266466616000025> Feuerverger (1981) <doi:10.1111/j.2517-6161.1981.tb01143.x> Hansen et al. (1996) <doi:10.1080/07350015.1996.10524656> Hansen (1982) <doi:10.2307/1912775> Hofert (2011) <doi:10.1145/2043635.2043638> Kawai & Masuda (2011) <doi:10.1016/j.cam.2010.12.014> Kim et al. (2008) <doi:10.1016/j.jbankfin.2007.11.004> Kim et al. (2009) <doi:10.1007/978-3-7908-2050-8_5> Kim et al. (2010) <doi:10.1016/j.jbankfin.2010.01.015> Kuechler & Tappe (2013) <doi:10.1016/j.spa.2013.06.012> Rachev et al. (2011) <doi:10.1002/9781118268070>.
The Statistical Learning Theory (SLT) provides the theoretical background to ensure that a supervised algorithm generalizes the mapping f:X -> Y given f is selected from its search space bias F. This formal result depends on the Shattering coefficient function N(F,2n) to upper bound the empirical risk minimization principle, from which one can estimate the necessary training sample size to ensure the probabilistic learning convergence and, most importantly, the characterization of the capacity of F, including its under and overfitting abilities while addressing specific target problems. In this context, we propose a new approach to estimate the maximal number of hyperplanes required to shatter a given sample, i.e., to separate every pair of points from one another, based on the recent contributions by Har-Peled and Jones in the dataset partitioning scenario, and use such foundation to analytically compute the Shattering coefficient function for both binary and multi-class problems. As main contributions, one can use our approach to study the complexity of the search space bias F, estimate training sample sizes, and parametrize the number of hyperplanes a learning algorithm needs to address some supervised task, what is specially appealing to deep neural networks. Reference: de Mello, R.F. (2019) "On the Shattering Coefficient of Supervised Learning Algorithms" <arXiv:1911.05461>; de Mello, R.F., Ponti, M.A. (2018, ISBN: 978-3319949888) "Machine Learning: A Practical Approach on the Statistical Learning Theory".
Which uses Twitter APIs for the necessary data in sentiment analysis, acts as a middleware with the approved Twitter Application. A special access key is given to users who subscribe to the application with their Twitter account. With this special access key, the user defined keyword for sentiment analysis can be searched in twitter recent searches and results can be obtained( more information <https://github.com/hakkisabah/tsentiment> ). In addition, a service named tsentiment-services has been developed to provide all these operations ( for more information <https://github.com/hakkisabah/tsentiment-services> ). After the successful results obtained and in line with the permissions given by the user, the results of the analysis of the word cloud and bar graph saved in the user folder directory can be seen. In each analysis performed, the previous analysis visual result is deleted and this is the basic information you need to know as a practice rule. tsentiment package provides a free service that acts as a middleware for easy data extraction from Twitter, and in return, the user rate limit is reduced by 30 requests from the total limit and the remaining requests are used. These 30 requests are reserved for use in application analytics. For information about endpoints, you can refer to the limit information in the "GET search/tweets" row in the Endpoints column in the list at <https://developer.twitter.com/en/docs/twitter-api/v1/rate-limits>.
Helps calculate statistical values commonly used in meta-analysis. It provides several methods to compute different forms of standardized mean differences, as well as other values such as standard errors and standard deviations. The methods used in this package are described in the following references: Altman D G, Bland J M. (2011) <doi:10.1136/bmj.d2090> Borenstein, M., Hedges, L.V., Higgins, J.P.T. and Rothstein, H.R. (2009) <doi:10.1002/9780470743386.ch4> Chinn S. (2000) <doi:10.1002/1097-0258(20001130)19:22%3C3127::aid-sim784%3E3.0.co;2-m> Cochrane Handbook (2011) <https://handbook-5-1.cochrane.org/front_page.htm> Cooper, H., Hedges, L. V., & Valentine, J. C. (2009) <https://psycnet.apa.org/record/2009-05060-000> Cohen, J. (1977) <https://psycnet.apa.org/record/1987-98267-000> Ellis, P.D. (2009) <https://www.psychometrica.de/effect_size.html> Goulet-Pelletier, J.-C., & Cousineau, D. (2018) <doi:10.20982/tqmp.14.4.p242> Hedges, L. V. (1981) <doi:10.2307/1164588> Hedges L. V., Olkin I. (1985) <doi:10.1016/C2009-0-03396-0> Murad M H, Wang Z, Zhu Y, Saadi S, Chu H, Lin L et al. (2023) <doi:10.1136/bmj-2022-073141> Mayer M (2023) <https://search.r-project.org/CRAN/refmans/confintr/html/ci_proportion.html> Stackoverflow (2014) <https://stats.stackexchange.com/questions/82720/confidence-interval-around-binomial-estimate-of-0-or-1> Stackoverflow (2018) <https://stats.stackexchange.com/q/338043>.
This package implements the Oaxaca-Blinder decomposition method and generalizations of it that decompose differences in distributional statistics beyond the mean. The function ob_decompose() decomposes differences in the mean outcome between two groups into one part explained by different covariates (composition effect) and into another part due to differences in the way covariates are linked to the outcome variable (structure effect). The function further divides the two effects into the contribution of each covariate and allows for weighted doubly robust decompositions. For distributional statistics beyond the mean, the function performs the recentered influence function (RIF) decomposition proposed by Firpo, Fortin, and Lemieux (2018). The function dfl_decompose() divides differences in distributional statistics into an composition effect and a structure effect using inverse probability weighting as introduced by DiNardo, Fortin, and Lemieux (1996). The function also allows to sequentially decompose the composition effect into the contribution of single covariates. References: Firpo, Sergio, Nicole M. Fortin, and Thomas Lemieux. (2018) <doi:10.3390/econometrics6020028>. "Decomposing Wage Distributions Using Recentered Influence Function Regressions." Fortin, Nicole M., Thomas Lemieux, and Sergio Firpo. (2011) <doi:10.3386/w16045>. "Decomposition Methods in Economics." DiNardo, John, Nicole M. Fortin, and Thomas Lemieux. (1996) <doi:10.2307/2171954>. "Labor Market Institutions and the Distribution of Wages, 1973-1992: A Semiparametric Approach." Oaxaca, Ronald. (1973) <doi:10.2307/2525981>. "Male-Female Wage Differentials in Urban Labor Markets." Blinder, Alan S. (1973) <doi:10.2307/144855>. "Wage Discrimination: Reduced Form and Structural Estimates.".
Implementation of the Factorized Binary Search (FaBiSearch) methodology for the estimation of the number and the location of multiple change points in the network (or clustering) structure of multivariate high-dimensional time series. The method is motivated by the detection of change points in functional connectivity networks for functional magnetic resonance imaging (fMRI) data. FaBiSearch uses non-negative matrix factorization (NMF), an unsupervised dimension reduction technique, and a new binary search algorithm to identify multiple change points. It requires minimal assumptions. Lastly, we provide interactive, 3-dimensional, brain-specific network visualization capability in a flexible, stand-alone function. This function can be conveniently used with any node coordinate atlas, and nodes can be color coded according to community membership, if applicable. The output is an elegantly displayed network laid over a cortical surface, which can be rotated in the 3-dimensional space. The main routines of the package are detect.cps(), for multiple change point detection, est.net(), for estimating a network between stationary multivariate time series, net.3dplot(), for plotting the estimated functional connectivity networks, and opt.rank(), for finding the optimal rank in NMF for a given data set. The functions have been extensively tested on simulated multivariate high-dimensional time series data and fMRI data. For details on the FaBiSearch methodology, please see Ondrus et al. (2021) <arXiv:2103.06347>. For a more detailed explanation and applied examples of the fabisearch package, please see Ondrus and Cribben (2022), preprint.
Brings together a comprehensive collection of R packages providing access to API functions and curated datasets from Argentina, Brazil, Chile, Colombia, and Peru. Includes real-time and historical data through public RESTful APIs ('Nager.Date', World Bank API, REST Countries API, and country-specific APIs) and extensive curated collections of open datasets covering economics, demographics, public health, environmental data, political indicators, social metrics, and cultural information. Designed to provide researchers, analysts, educators, and data scientists with centralized access to Latin American data sources, facilitating reproducible research, comparative analysis, and teaching applications focused on these five major Latin American countries. Included packages: - ArgentinAPI': API functions and curated datasets for Argentina covering exchange rates, inflation, political figures, national holidays and more. - BrazilDataAPI': API functions and curated datasets for Brazil covering postal codes, banks, economic indicators, holidays, company registrations and more. - ChileDataAPI': API functions and curated datasets for Chile covering financial indicators ('UF', UTM, Dollar, Euro, Yen, Copper, Bitcoin, IPSA index), holidays and more. - ColombiAPI': API functions and curated datasets for Colombia covering geographic locations, cultural attractions, economic indicators, demographic data, national holidays and more. - PeruAPIs': API functions and curated datasets for Peru covering economic indicators, demographics, national holidays, administrative divisions, electoral data, biodiversity and more. For more information on the APIs, see: Nager.Date <https://date.nager.at/Api>, World Bank API <https://datahelpdesk.worldbank.org/knowledgebase/articles/889392>, REST Countries API <https://restcountries.com/>, ArgentinaDatos API <https://argentinadatos.com/>, BrasilAPI <https://brasilapi.com.br/>, FINDIC <https://findic.cl/>, and API-Colombia <https://api-colombia.com/>.
This package implements state-of-the-art algorithms for the Bayesian analysis of Structural Vector Autoregressions (SVARs) identified by sign, zero, and narrative restrictions. The core model is based on a flexible Vector Autoregression with estimated hyper-parameters of the Minnesota prior and the dummy observation priors as in Giannone, Lenza, Primiceri (2015) <doi:10.1162/REST_a_00483>. The sign restrictions are implemented employing the methods proposed by Rubio-Ramà rez, Waggoner & Zha (2010) <doi:10.1111/j.1467-937X.2009.00578.x>, while identification through sign and zero restrictions follows the approach developed by Arias, Rubio-Ramà rez, & Waggoner (2018) <doi:10.3982/ECTA14468>. Furthermore, our tool provides algorithms for identification via sign and narrative restrictions, in line with the methods introduced by Antolà n-Dà az and Rubio-Ramà rez (2018) <doi:10.1257/aer.20161852>. Users can also estimate a model with sign, zero, and narrative restrictions imposed at once. The package facilitates predictive and structural analyses using impulse responses, forecast error variance and historical decompositions, forecasting and conditional forecasting, as well as analyses of structural shocks and fitted values. All this is complemented by colourful plots, user-friendly summary functions, and comprehensive documentation including the vignette by Wang & Woźniak (2024) <doi:10.48550/arXiv.2501.16711>. The bsvarSIGNs package is aligned regarding objects, workflows, and code structure with the R package bsvars by Woźniak (2024) <doi:10.32614/CRAN.package.bsvars>, and they constitute an integrated toolset. It was granted the Di Cook Open-Source Statistical Software Award by the Statistical Society of Australia in 2024.
It generates summary statistics on the input dataset using different descriptive univariate statistical measures on entire data or at a group level. Though there are other packages which does similar job but each of these are deficient in one form or other, in the measures generated, in treating numeric, character and date variables alike, no functionality to view these measures on a group level or the way the output is represented. Given the foremost role of the descriptive statistics in any of the exploratory data analysis or solution development, there is a need for a more constructive, structured and refined version over these packages. This is the idea behind the package and it brings together all the required descriptive measures to give an initial understanding of the data quality, distribution in a faster,easier and elaborative way.The function brings an additional capability to be able to generate these statistical measures on the entire dataset or at a group level. It calculates measures of central tendency (mean, median), distribution (count, proportion), dispersion (min, max, quantile, standard deviation, variance) and shape (skewness, kurtosis). Addition to these measures, it provides information on the data type, count on no. of rows, unique entries and percentage of missing entries. More importantly the measures are generated based on the data types as required by them,rather than applying numerical measures on character and data variables and vice versa. Output as a dataframe object gives a very neat representation, which often is useful when working with a large number of columns. It can easily be exported as csv and analyzed further or presented as a summary report for the data.
Transfer learning, aiming to use auxiliary domains to help improve learning of the target domain of interest when multiple heterogeneous datasets are available, has always been a hot topic in statistical machine learning. The recent transfer learning methods with statistical guarantees mainly focus on the overall parameter transfer for supervised models in the ideal case with the informative auxiliary domains with overall similarity. In contrast, transfer learning for unsupervised graph learning is in its infancy and largely follows the idea of overall parameter transfer as for supervised learning. In this package, the transfer learning for several complex graphical models is implemented, including Tensor Gaussian graphical models, non-Gaussian directed acyclic graph (DAG), and Gaussian graphical mixture models. Notably, this package promotes local transfer at node-level and subgroup-level in DAG structural learning and Gaussian graphical mixture models, respectively, which are more flexible and robust than the existing overall parameter transfer. As by-products, transfer learning for undirected graphical model (precision matrix) via D-trace loss, transfer learning for mean vector estimation, and single non-Gaussian learning via topological layer method are also included in this package. Moreover, the aggregation of auxiliary information is an important issue in transfer learning, and this package provides multiple user-friendly aggregation methods, including sample weighting, similarity weighting, and most informative selection. Reference: Ren, M., Zhen Y., and Wang J. (2022) <arXiv:2211.09391> "Transfer learning for tensor graphical models". Ren, M., He X., and Wang J. (2023) <arXiv:2310.10239> "Structural transfer learning of non-Gaussian DAG". Zhao, R., He X., and Wang J. (2022) <https://jmlr.org/papers/v23/21-1173.html> "Learning linear non-Gaussian directed acyclic graph with diverging number of nodes".
Gene Set Enrichment Analysis is a very powerful and interesting computational method that allows an easy correlation between differential expressed genes and biological processes. Unfortunately, although it was designed to help researchers to interpret gene expression data it can generate huge amounts of results whose biological meaning can be difficult to interpret. Many available tools rely on the hierarchically structured Gene Ontology (GO) classification to reduce reundandcy in the results. However, due to the popularity of GSEA many more gene set collections, such as those in the Molecular Signatures Database are emerging. Since these collections are not organized as those in GO, their usage for GSEA do not always give a straightforward answer or, in other words, getting all the meaninful information can be challenging with the currently available tools. For these reasons, GSEAmining was born to be an easy tool to create reproducible reports to help researchers make biological sense of GSEA outputs. Given the results of GSEA, GSEAmining clusters the different gene sets collections based on the presence of the same genes in the leadind edge (core) subset. Leading edge subsets are those genes that contribute most to the enrichment score of each collection of genes or gene sets. For this reason, gene sets that participate in similar biological processes should share genes in common and in turn cluster together. After that, GSEAmining is able to identify and represent for each cluster: - The most enriched terms in the names of gene sets (as wordclouds) - The most enriched genes in the leading edge subsets (as bar plots). In each case, positive and negative enrichments are shown in different colors so it is easy to distinguish biological processes or genes that may be of interest in that particular study.
API bindings to the Geospatial Data Abstraction Library ('GDAL', <https://gdal.org>). Implements the GDAL Raster and Vector Data Models. Bindings are implemented with Rcpp modules. Exposed C++ classes and stand-alone functions wrap much of the GDAL API and provide additional functionality. Calling signatures resemble the native C, C++ and Python APIs provided by the GDAL project. Class GDALRaster encapsulates a GDALDataset and its raster band objects. Class GDALVector encapsulates an OGRLayer and the GDALDataset that contains it. Initial bindings are provided to the unified gdal command line interface added in GDAL 3.11. C++ stand-alone functions provide bindings to most GDAL "traditional" raster and vector utilities, including OGR facilities for vector geoprocessing, several algorithms, as well as the Geometry API ('GEOS via GDAL headers), the Spatial Reference Systems API, and methods for coordinate transformation. Bindings to the Virtual Systems Interface ('VSI') API implement standard file system operations abstracted for URLs, cloud storage services, Zip'/'GZip'/'7z'/'RAR', in-memory files, as well as regular local file systems. This provides a single interface for operating on file system objects that works the same for any storage backend. A custom raster calculator evaluates a user-defined R expression on a layer or stack of layers, with pixel x/y available as variables in the expression. Raster combine() identifies and counts unique pixel combinations across multiple input layers, with optional raster output of the pixel-level combination IDs. Basic plotting capability is provided for raster and vector display. gdalraster leans toward minimalism and the use of simple, lightweight objects for holding raw data. Currently, only minimal S3 class interfaces have been implemented for selected R objects that contain spatial data. gdalraster may be useful in applications that need scalable, low-level I/O, or prefer a direct GDAL API.