This package provides a comprehensive interface to access diverse public data about Colombia through multiple APIs and curated datasets. The package integrates four different APIs: API-Colombia for Colombian-specific data including geography, culture, tourism, and government information; World Bank API for economic and demographic indicators; Nager.Date for public holidays; and REST Countries API for general country information. The package enables users to explore various aspects of Colombia such as geographic locations, cultural attractions, economic indicators, demographic data, and public holidays. Additionally, ColombiAPI includes curated datasets covering Bogota air stations, business and holiday dates, public schools, Colombian coffee exports, cannabis licenses, Medellin rainfall, malls in Bogota, as well as datasets on indigenous languages, student admissions and school statistics, forest liana mortality, municipal and regional data, connectivity and digital infrastructure, program graduates, vehicle counts, international visitors, and GDP projections. These datasets provide users with a rich and multifaceted view of Colombian social, economic, environmental, and technological information, making ColombiAPI a comprehensive tool for exploring Colombia's diverse data landscape. For more information on the APIs, see: API-Colombia <https://api-colombia.com/>, Nager.Date <https://date.nager.at/Api>, World Bank API <https://datahelpdesk.worldbank.org/knowledgebase/articles/889392>, and REST Countries API <https://restcountries.com/>.
Distance measures (GDM1, GDM2, Sokal-Michener, Bray-Curtis, for symbolic interval-valued data), cluster quality indices (Calinski-Harabasz, Baker-Hubert, Hubert-Levine, Silhouette, Krzanowski-Lai, Hartigan, Gap, Davies-Bouldin), data normalization formulas (metric data, interval-valued symbolic data), data generation (typical and non-typical data), HINoV method, replication analysis, linear ordering methods, spectral clustering, agreement indices between two partitions, plot functions (for categorical and symbolic interval-valued data). (MILLIGAN, G.W., COOPER, M.C. (1985) <doi:10.1007/BF02294245>, HUBERT, L., ARABIE, P. (1985) <doi:10.1007%2FBF01908075>, RAND, W.M. (1971) <doi:10.1080/01621459.1971.10482356>, JAJUGA, K., WALESIAK, M. (2000) <doi:10.1007/978-3-642-57280-7_11>, MILLIGAN, G.W., COOPER, M.C. (1988) <doi:10.1007/BF01897163>, JAJUGA, K., WALESIAK, M., BAK, A. (2003) <doi:10.1007/978-3-642-55721-7_12>, DAVIES, D.L., BOULDIN, D.W. (1979) <doi:10.1109/TPAMI.1979.4766909>, CALINSKI, T., HARABASZ, J. (1974) <doi:10.1080/03610927408827101>, HUBERT, L. (1974) <doi:10.1080/01621459.1974.10480191>, TIBSHIRANI, R., WALTHER, G., HASTIE, T. (2001) <doi:10.1111/1467-9868.00293>, BRECKENRIDGE, J.N. (2000) <doi:10.1207/S15327906MBR3502_5>, WALESIAK, M., DUDEK, A. (2008) <doi:10.1007/978-3-540-78246-9_11>).
We developed an inference tool based on approximate Bayesian computation to decipher network data and assess the strength of the inferred links between network's actors. It is a new multi-level approximate Bayesian computation (ABC) approach. At the first level, the method captures the global properties of the network, such as a scale-free structure and clustering coefficients, whereas the second level is targeted to capture local properties, including the probability of each couple of genes being linked. Up to now, Approximate Bayesian Computation (ABC) algorithms have been scarcely used in that setting and, due to the computational overhead, their application was limited to a small number of genes. On the contrary, our algorithm was made to cope with that issue and has low computational cost. It can be used, for instance, for elucidating gene regulatory network, which is an important step towards understanding the normal cell physiology and complex pathological phenotype. Reverse-engineering consists in using gene expressions over time or over different experimental conditions to discover the structure of the gene network in a targeted cellular process. The fact that gene expression data are usually noisy, highly correlated, and have high dimensionality explains the need for specific statistical methods to reverse engineer the underlying network.
Infer constant and stochastic, time-dependent parameters to consider intrinsic stochasticity of a dynamic model and/or to analyze model structure modifications that could reduce model deficits. The concept is based on inferring time-dependent parameters as stochastic processes in the form of Ornstein-Uhlenbeck processes jointly with inferring constant model parameters and parameters of the Ornstein-Uhlenbeck processes. The package also contains functions to sample from and calculate densities of Ornstein-Uhlenbeck processes. References: Tomassini, L., Reichert, P., Kuensch, H.-R. Buser, C., Knutti, R. and Borsuk, M.E. (2009), A smoothing algorithm for estimating stochastic, continuous-time model parameters and its application to a simple climate model, Journal of the Royal Statistical Society: Series C (Applied Statistics) 58, 679-704, <doi:10.1111/j.1467-9876.2009.00678.x> Reichert, P., and Mieleitner, J. (2009), Analyzing input and structural uncertainty of nonlinear dynamic models with stochastic, time-dependent parameters. Water Resources Research, 45, W10402, <doi:10.1029/2009WR007814> Reichert, P., Ammann, L. and Fenicia, F. (2021), Potential and challenges of investigating intrinsic uncertainty of hydrological models with time-dependent, stochastic parameters. Water Resources Research 57(8), e2020WR028311, <doi:10.1029/2020WR028311> Reichert, P. (2022), timedeppar: An R package for inferring stochastic, time-dependent model parameters, in preparation.
Efficiently implementing two complementary methodologies for discovering motifs in functional data: ProbKMA and FunBIalign. Cremona and Chiaromonte (2023) "Probabilistic K-means with Local Alignment for Clustering and Motif Discovery in Functional Data" <doi:10.1080/10618600.2022.2156522> is a probabilistic K-means algorithm that leverages local alignment and fuzzy clustering to identify recurring patterns (candidate functional motifs) across and within curves, allowing different portions of the same curve to belong to different clusters. It includes a family of distances and a normalization to discover various motif types and learns motif lengths in a data-driven manner. It can also be used for local clustering of misaligned data. Di Iorio, Cremona, and Chiaromonte (2023) "funBIalign: A Hierarchical Algorithm for Functional Motif Discovery Based on Mean Squared Residue Scores" <doi:10.48550/arXiv.2306.04254> applies hierarchical agglomerative clustering with a functional generalization of the Mean Squared Residue Score to identify motifs of a specified length in curves. This deterministic method includes a small set of user-tunable parameters. Both algorithms are suitable for single curves or sets of curves. The package also includes a flexible function to simulate functional data with embedded motifs, allowing users to generate benchmark datasets for validating and comparing motif discovery methods.
Monolix is a tool for running mixed effects model using saem'. This tool allows you to convert Monolix models to rxode2 (Wang, Hallow and James (2016) <doi:10.1002/psp4.12052>) using the form compatible with nlmixr2 (Fidler et al (2019) <doi:10.1002/psp4.12445>). If available, the rxode2 model will read in the Monolix data and compare the simulation for the population model individual model and residual model to immediately show how well the translation is performing. This saves the model development time for people who are creating an rxode2 model manually. Additionally, this package reads in all the information to allow simulation with uncertainty (that is the number of observations, the number of subjects, and the covariance matrix) with a rxode2 model. This is complementary to the babelmixr2 package that translates nlmixr2 models to Monolix and can convert the objects converted from monolix2rx to a full nlmixr2 fit. While not required, you can get/install the lixoftConnectors package in the Monolix installation, as described at the following url <https://monolixsuite.slp-software.com/r-functions/2024R1/installation-and-initialization>. When lixoftConnectors is available, Monolix can be used to load its model library instead manually setting up text files (which only works with old versions of Monolix').
Ports the Stata ado package tost which provides a suite of commands to perform two one-sided tests for equivalence following the approach by Schuirman (1987) <doi:10.1007/BF01068419>. Commands are provided for t tests on means, z tests on proportions, McNemar's test (1947) <doi:10.1007/BF02295996> on proportions and related tests, tests on the regression coefficients from OLS linear regression (not yet implementing all of the current regression options from the Stata tostregress command, e.g., survey regression options, estimation options, etc.), Wilcoxon's (1945) <doi:10.2307/3001968> signed rank tests, Wilcoxon-Mann-Whitney (1947) <doi:10.1214/aoms/1177730491> rank sum tests, supporting inference about equivalence for a number of paired and unpaired, parametric and nonparametric study designs and data types. Each command tests a null hypothesis that samples were drawn from populations different by at least plus or minus some researcher-defined level of tolerance, which can be defined in terms of units of the data or rank units (Delta), or in units of the test statistic's distribution (epsilon) except for tost.rrp() and tost.rrpi(). Enough evidence rejects this null hypothesis in favor of equivalence within the tolerance. Equivalence intervals for all tests may be defined symmetrically or asymmetrically.
This package contains functions for hidden Markov models with observations having extra zeros as defined in the following two publications, Wang, T., Zhuang, J., Obara, K. and Tsuruoka, H. (2016) <doi:10.1111/rssc.12194>; Wang, T., Zhuang, J., Buckby, J., Obara, K. and Tsuruoka, H. (2018) <doi:10.1029/2017JB015360>. The observed response variable is either univariate or bivariate Gaussian conditioning on presence of events, and extra zeros mean that the response variable takes on the value zero if nothing is happening. Hence the response is modelled as a mixture distribution of a Bernoulli variable and a continuous variable. That is, if the Bernoulli variable takes on the value 1, then the response variable is Gaussian, and if the Bernoulli variable takes on the value 0, then the response is zero too. This package includes functions for simulation, parameter estimation, goodness-of-fit, the Viterbi algorithm, and plotting the classified 2-D data. Some of the functions in the package are based on those of the R package HiddenMarkov by David Harte. This updated version has included an example dataset and R code examples to show how to transform the data into the objects needed in the main functions. We have also made changes to increase the speed of some of the functions.
This package provides functions are included for recalling AQL (Acceptable Quality Level or Acceptance Quality Level) Based single, double, and multiple attribute sampling plans from the Military Standard (MIL-STD-105E) - American National Standards Institute/American Society for Quality (ANSI/ASQ Z1.4) tables and for retrieving variable sampling plans from Military Standard (MIL-STD-414) - American National Standards Institute/American Society for Quality (ANSI/ASQ Z1.9) tables. The sources for these tables are listed in the URL: field. Also included are functions for computing the OC (Operating Characteristic) and ASN (Average Sample Number) coordinates for the attribute plans it recalls, and functions for computing the estimated proportion nonconforming and the maximum allowable proportion nonconforming for variable sampling plans. The MIL-STD AQL Sampling schemes were the most used and copied set of standards in the world. They are intended to be used for sampling a stream of lots, and were used in contract agreements between supplier and customer companies. When the US military dropped support of MIL-STD 105E and 414, The American National Standards Institute (ANSI) and the International Standards Organization (ISO) adopted the standard with few changes or no changes to the central tables. This package is useful because its computer implementation of these tables duplicates that available in other commercial software and subscription online calculators.
Motivation: The understanding of cancer mechanism requires the identification of genes playing a role in the development of the pathology and the characterization of their role (notably oncogenes and tumor suppressors). Results: We present an R/bioconductor package called MoonlightR which returns a list of candidate driver genes for specific cancer types on the basis of TCGA expression data. The method first infers gene regulatory networks and then carries out a functional enrichment analysis (FEA) (implementing an upstream regulator analysis, URA) to score the importance of well-known biological processes with respect to the studied cancer type. Eventually, by means of random forests, MoonlightR predicts two specific roles for the candidate driver genes: i) tumor suppressor genes (TSGs) and ii) oncogenes (OCGs). As a consequence, this methodology does not only identify genes playing a dual role (e.g. TSG in one cancer type and OCG in another) but also helps in elucidating the biological processes underlying their specific roles. In particular, MoonlightR can be used to discover OCGs and TSGs in the same cancer type. This may help in answering the question whether some genes change role between early stages (I, II) and late stages (III, IV) in breast cancer. In the future, this analysis could be useful to determine the causes of different resistances to chemotherapeutic treatments.
Generates Muller plot from parental/genealogy/phylogeny information and population/abundance/frequency dynamics data. Muller plots are plots which combine information about succession of different OTUs (genotypes, phenotypes, species, ...) and information about dynamics of their abundances (populations or frequencies) over time. They are powerful and fascinating tools to visualize evolutionary dynamics. They may be employed also in study of diversity and its dynamics, i.e. how diversity emerges and how changes over time. They are called Muller plots in honor of Hermann Joseph Muller which used them to explain his idea of Muller's ratchet (Muller, 1932, American Naturalist). A big difference between Muller plots and normal box plots of abundances is that a Muller plot depicts not only the relative abundances but also succession of OTUs based on their genealogy/phylogeny/parental relation. In a Muller plot, horizontal axis is time/generations and vertical axis represents relative abundances of OTUs at the corresponding times/generations. Different OTUs are usually shown with polygons with different colors and each OTU originates somewhere in the middle of its parent area in order to illustrate their succession in evolutionary process. To generate a Muller plot one needs the genealogy/phylogeny/parental relation of OTUs and their abundances over time. MullerPlot package has the tools to generate Muller plots which clearly depict the origin of successors of OTUs.
Supervised, multivariate, and non-parametric discretization algorithm based on tree ensembles learning and moment matching optimization. This version of the algorithm relies on random forest algorithm to learn a large set of split points that conserves the relationship between attributes and the target class, and on moment matching optimization to transform this set into a reduced number of cut points matching as well as possible statistical properties of the initial set of split points. For each attribute to be discretized, the set S of its related split points extracted through random forest is mapped to a reduced set C of cut points of size k. This mapping relies on minimizing, for each continuous attribute to be discretized, the distance between the four first moments of S and the four first moments of C subject to some constraints. This non-linear optimization problem is performed using k values ranging from 2 to max_splits', and the best solution returned correspond to the value k which optimum solution is the lowest one over the different realizations. ForestDisc is a generalization of RFDisc discretization method initially proposed by Berrado and Runger (2009) <doi:10.1109/AICCSA.2009.5069327>, and improved by Berrado et al. in 2012 by adopting the idea of moment matching optimization related by Hoyland and Wallace (2001) <doi: 10.1287/mnsc.47.2.295.9834>.
Creating, and refining data nuggets. Data nuggets reduce a large dataset into a small collection of nuggets of data, each containing a center (location), weight (importance), and scale (variability) parameter. Data nugget centers are created by choosing observations in the dataset which are as equally spaced apart as possible. Data nugget weights are created by counting the number observations closest to a given data nugget center. We then say the data nugget contains these observations and the data nugget center is recalculated as the mean of these observations. Data nugget scales are created by calculating the trace of the covariance matrix of the observations contained within a data nugget divided by the dimension of the dataset. Data nuggets are refined by splitting data nuggets which have scales or shapes (defined as the ratio of the two largest eigenvalues of the covariance matrix of the observations contained within the data nugget) Reference paper: [1] Beavers, T. E., Cheng, G., Duan, Y., Cabrera, J., Lubomirski, M., Amaratunga, D., & Teigler, J. E. (2024). Data Nuggets: A Method for Reducing Big Data While Preserving Data Structure. Journal of Computational and Graphical Statistics, 1-21. [2] Cherasia, K. E., Cabrera, J., Fernholz, L. T., & Fernholz, R. (2022). Data Nuggets in Supervised Learning. \emphIn Robust and Multivariate Statistical Methods: Festschrift in Honor of David E. Tyler (pp. 429-449). Cham: Springer International Publishing.
Genomic signatures represent unique features within a species DNA, enabling the differentiation of species and offering broad applications across various fields. This package provides essential tools for calculating these specific signatures, streamlining the process for researchers and offering a comprehensive and time-saving solution for genomic analysis.The amino acid contents are identified based on the work published by Sandberg et al. (2003) <doi:10.1016/s0378-1119(03)00581-x> and Xiao et al. (2015) <doi:10.1093/bioinformatics/btv042>. The Average Mutual Information Profiles (AMIP) values are calculated based on the work of Bauer et al. (2008) <doi:10.1186/1471-2105-9-48>. The Chaos Game Representation (CGR) plot visualization was done based on the work of Deschavanne et al. (1999) <doi:10.1093/oxfordjournals.molbev.a026048> and Jeffrey et al. (1990) <doi:10.1093/nar/18.8.2163>. The GC content is calculated based on the work published by Nakabachi et al. (2006) <doi:10.1126/science.1134196> and Barbu et al. (1956) <https://pubmed.ncbi.nlm.nih.gov/13363015>. The Oligonucleotide Frequency Derived Error Gradient (OFDEG) values are computed based on the work published by Saeed et al. (2009) <doi:10.1186/1471-2164-10-S3-S10>. The Relative Synonymous Codon Usage (RSCU) values are calculated based on the work published by Elek (2018) <https://urn.nsk.hr/urn:nbn:hr:217:686131>.
Assists in automating the selection of terms to include in mixed models when asreml is used to fit the models. Procedures are available for choosing models that conform to the hierarchy or marginality principle, for fitting and choosing between two-dimensional spatial models using correlation, natural cubic smoothing spline and P-spline models. A history of the fitting of a sequence of models is kept in a data frame. Also used to compute functions and contrasts of, to investigate differences between and to plot predictions obtained using any model fitting function. The content falls into the following natural groupings: (i) Data, (ii) Model modification functions, (iii) Model selection and description functions, (iv) Model diagnostics and simulation functions, (v) Prediction production and presentation functions, (vi) Response transformation functions, (vii) Object manipulation functions, and (viii) Miscellaneous functions (for further details see asremlPlus-package in help). The asreml package provides a computationally efficient algorithm for fitting a wide range of linear mixed models using Residual Maximum Likelihood. It is a commercial package and a license for it can be purchased from VSNi <https://vsni.co.uk/> as asreml-R', who will supply a zip file for local installation/updating (see <https://asreml.kb.vsni.co.uk/>). It is not needed for functions that are methods for alldiffs and data.frame objects. The package asremPlus can also be installed from <http://chris.brien.name/rpackages/>.
This package provides methods for working with dose-finding clinical trials. We provide implementations of many dose-finding clinical trial designs, including the continual reassessment method (CRM) by O'Quigley et al. (1990) <doi:10.2307/2531628>, the toxicity probability interval (TPI) design by Ji et al. (2007) <doi:10.1177/1740774507079442>, the modified TPI (mTPI) design by Ji et al. (2010) <doi:10.1177/1740774510382799>, the Bayesian optimal interval design (BOIN) by Liu & Yuan (2015) <doi:10.1111/rssc.12089>, EffTox by Thall & Cook (2004) <doi:10.1111/j.0006-341X.2004.00218.x>; the design of Wages & Tait (2015) <doi:10.1080/10543406.2014.920873>, and the 3+3 described by Korn et al. (1994) <doi:10.1002/sim.4780131802>. All designs are implemented with a common interface. We also offer optional additional classes to tailor the behaviour of all designs, including avoiding skipping doses, stopping after n patients have been treated at the recommended dose, stopping when a toxicity condition is met, or demanding that n patients are treated before stopping is allowed. By daisy-chaining together these classes using the pipe operator from magrittr', it is simple to tailor the behaviour of a dose-finding design so it behaves how the trialist wants. Having provided a flexible interface for specifying designs, we then provide functions to run simulations and calculate dose-paths for future cohorts of patients.
This package provides a collection of methods to estimate parameters of different tempered stable distributions (TSD). Currently, there are seven different tempered stable distributions to choose from: Tempered stable subordinator distribution, classical TSD, generalized classical TSD, normal TSD, modified TSD, rapid decreasing TSD, and Kim-Rachev TSD. The package also provides functions to compute density and probability functions and tools to run Monte Carlo simulations. This package has already been used for the estimation of tempered stable distributions (Massing (2023) <arXiv:2303.07060>). The following references form the theoretical background for various functions in this package. References for each function are explicitly listed in its documentation: Bianchi et al. (2010) <doi:10.1007/978-88-470-1481-7_4> Bianchi et al. (2011) <doi:10.1137/S0040585X97984632> Carrasco (2017) <doi:10.1017/S0266466616000025> Feuerverger (1981) <doi:10.1111/j.2517-6161.1981.tb01143.x> Hansen et al. (1996) <doi:10.1080/07350015.1996.10524656> Hansen (1982) <doi:10.2307/1912775> Hofert (2011) <doi:10.1145/2043635.2043638> Kawai & Masuda (2011) <doi:10.1016/j.cam.2010.12.014> Kim et al. (2008) <doi:10.1016/j.jbankfin.2007.11.004> Kim et al. (2009) <doi:10.1007/978-3-7908-2050-8_5> Kim et al. (2010) <doi:10.1016/j.jbankfin.2010.01.015> Kuechler & Tappe (2013) <doi:10.1016/j.spa.2013.06.012> Rachev et al. (2011) <doi:10.1002/9781118268070>.
The Statistical Learning Theory (SLT) provides the theoretical background to ensure that a supervised algorithm generalizes the mapping f:X -> Y given f is selected from its search space bias F. This formal result depends on the Shattering coefficient function N(F,2n) to upper bound the empirical risk minimization principle, from which one can estimate the necessary training sample size to ensure the probabilistic learning convergence and, most importantly, the characterization of the capacity of F, including its under and overfitting abilities while addressing specific target problems. In this context, we propose a new approach to estimate the maximal number of hyperplanes required to shatter a given sample, i.e., to separate every pair of points from one another, based on the recent contributions by Har-Peled and Jones in the dataset partitioning scenario, and use such foundation to analytically compute the Shattering coefficient function for both binary and multi-class problems. As main contributions, one can use our approach to study the complexity of the search space bias F, estimate training sample sizes, and parametrize the number of hyperplanes a learning algorithm needs to address some supervised task, what is specially appealing to deep neural networks. Reference: de Mello, R.F. (2019) "On the Shattering Coefficient of Supervised Learning Algorithms" <arXiv:1911.05461>; de Mello, R.F., Ponti, M.A. (2018, ISBN: 978-3319949888) "Machine Learning: A Practical Approach on the Statistical Learning Theory".
Which uses Twitter APIs for the necessary data in sentiment analysis, acts as a middleware with the approved Twitter Application. A special access key is given to users who subscribe to the application with their Twitter account. With this special access key, the user defined keyword for sentiment analysis can be searched in twitter recent searches and results can be obtained( more information <https://github.com/hakkisabah/tsentiment> ). In addition, a service named tsentiment-services has been developed to provide all these operations ( for more information <https://github.com/hakkisabah/tsentiment-services> ). After the successful results obtained and in line with the permissions given by the user, the results of the analysis of the word cloud and bar graph saved in the user folder directory can be seen. In each analysis performed, the previous analysis visual result is deleted and this is the basic information you need to know as a practice rule. tsentiment package provides a free service that acts as a middleware for easy data extraction from Twitter, and in return, the user rate limit is reduced by 30 requests from the total limit and the remaining requests are used. These 30 requests are reserved for use in application analytics. For information about endpoints, you can refer to the limit information in the "GET search/tweets" row in the Endpoints column in the list at <https://developer.twitter.com/en/docs/twitter-api/v1/rate-limits>.
Helps calculate statistical values commonly used in meta-analysis. It provides several methods to compute different forms of standardized mean differences, as well as other values such as standard errors and standard deviations. The methods used in this package are described in the following references: Altman D G, Bland J M. (2011) <doi:10.1136/bmj.d2090> Borenstein, M., Hedges, L.V., Higgins, J.P.T. and Rothstein, H.R. (2009) <doi:10.1002/9780470743386.ch4> Chinn S. (2000) <doi:10.1002/1097-0258(20001130)19:22%3C3127::aid-sim784%3E3.0.co;2-m> Cochrane Handbook (2011) <https://handbook-5-1.cochrane.org/front_page.htm> Cooper, H., Hedges, L. V., & Valentine, J. C. (2009) <https://psycnet.apa.org/record/2009-05060-000> Cohen, J. (1977) <https://psycnet.apa.org/record/1987-98267-000> Ellis, P.D. (2009) <https://www.psychometrica.de/effect_size.html> Goulet-Pelletier, J.-C., & Cousineau, D. (2018) <doi:10.20982/tqmp.14.4.p242> Hedges, L. V. (1981) <doi:10.2307/1164588> Hedges L. V., Olkin I. (1985) <doi:10.1016/C2009-0-03396-0> Murad M H, Wang Z, Zhu Y, Saadi S, Chu H, Lin L et al. (2023) <doi:10.1136/bmj-2022-073141> Mayer M (2023) <https://search.r-project.org/CRAN/refmans/confintr/html/ci_proportion.html> Stackoverflow (2014) <https://stats.stackexchange.com/questions/82720/confidence-interval-around-binomial-estimate-of-0-or-1> Stackoverflow (2018) <https://stats.stackexchange.com/q/338043>.
This package provides a comprehensive framework for analyzing agricultural nutrient balances across multiple spatial scales (county, HUC8', HUC2') with integration of wastewater treatment plant ('WWTP') effluent loads for both nitrogen and phosphorus. Supports classification of spatial units as nutrient sources, sinks, or balanced areas based on agricultural surplus and deficit calculations. Includes visualization tools, spatial transition probability analysis, and nutrient flow network mapping. Built-in datasets include agricultural nutrient balance data from the Nutrient Use Geographic Information System ('NuGIS'; The Fertilizer Institute and Plant Nutrition Canada, 1987-2016) <https://nugis.tfi.org/tabular_data/> and U.S. Environmental Protection Agency ('EPA') wastewater discharge data from the ECHO Discharge Monitoring Report ('DMR') Loading Tool (2007-2016) <https://echo.epa.gov/trends/loading-tool/water-pollution-search>. Data are downloaded on demand from the Open Science Framework ('OSF') repository to minimize package size while maintaining full functionality. The integrated manureshed framework methodology is described in Akanbi et al. (2025) <doi:10.1016/j.resconrec.2025.108697>. Designed for nutrient management planning, environmental analysis, and circular economy research at watershed/administrative to national scales. This material is based upon financial support by the National Science Foundation, EEC Division of Engineering Education and Centers, NSF Engineering Research Center for Advancing Sustainable and Distributed Fertilizer Production (CASFER), NSF 20-553 Gen-4 Engineering Research Centers award 2133576. We thank Dr. Robert D. Sabo (U.S. Environmental Protection Agency) for his valuable contributions to the conceptual development and review of this work.
This package implements the Oaxaca-Blinder decomposition method and generalizations of it that decompose differences in distributional statistics beyond the mean. The function ob_decompose() decomposes differences in the mean outcome between two groups into one part explained by different covariates (composition effect) and into another part due to differences in the way covariates are linked to the outcome variable (structure effect). The function further divides the two effects into the contribution of each covariate and allows for weighted doubly robust decompositions. For distributional statistics beyond the mean, the function performs the recentered influence function (RIF) decomposition proposed by Firpo, Fortin, and Lemieux (2018). The function dfl_decompose() divides differences in distributional statistics into an composition effect and a structure effect using inverse probability weighting as introduced by DiNardo, Fortin, and Lemieux (1996). The function also allows to sequentially decompose the composition effect into the contribution of single covariates. References: Firpo, Sergio, Nicole M. Fortin, and Thomas Lemieux. (2018) <doi:10.3390/econometrics6020028>. "Decomposing Wage Distributions Using Recentered Influence Function Regressions." Fortin, Nicole M., Thomas Lemieux, and Sergio Firpo. (2011) <doi:10.3386/w16045>. "Decomposition Methods in Economics." DiNardo, John, Nicole M. Fortin, and Thomas Lemieux. (1996) <doi:10.2307/2171954>. "Labor Market Institutions and the Distribution of Wages, 1973-1992: A Semiparametric Approach." Oaxaca, Ronald. (1973) <doi:10.2307/2525981>. "Male-Female Wage Differentials in Urban Labor Markets." Blinder, Alan S. (1973) <doi:10.2307/144855>. "Wage Discrimination: Reduced Form and Structural Estimates.".
Implementation of the Factorized Binary Search (FaBiSearch) methodology for the estimation of the number and the location of multiple change points in the network (or clustering) structure of multivariate high-dimensional time series. The method is motivated by the detection of change points in functional connectivity networks for functional magnetic resonance imaging (fMRI) data. FaBiSearch uses non-negative matrix factorization (NMF), an unsupervised dimension reduction technique, and a new binary search algorithm to identify multiple change points. It requires minimal assumptions. Lastly, we provide interactive, 3-dimensional, brain-specific network visualization capability in a flexible, stand-alone function. This function can be conveniently used with any node coordinate atlas, and nodes can be color coded according to community membership, if applicable. The output is an elegantly displayed network laid over a cortical surface, which can be rotated in the 3-dimensional space. The main routines of the package are detect.cps(), for multiple change point detection, est.net(), for estimating a network between stationary multivariate time series, net.3dplot(), for plotting the estimated functional connectivity networks, and opt.rank(), for finding the optimal rank in NMF for a given data set. The functions have been extensively tested on simulated multivariate high-dimensional time series data and fMRI data. For details on the FaBiSearch methodology, please see Ondrus et al. (2021) <arXiv:2103.06347>. For a more detailed explanation and applied examples of the fabisearch package, please see Ondrus and Cribben (2022), preprint.
Brings together a comprehensive collection of R packages providing access to API functions and curated datasets from Argentina, Brazil, Chile, Colombia, and Peru. Includes real-time and historical data through public RESTful APIs ('Nager.Date', World Bank API, REST Countries API, and country-specific APIs) and extensive curated collections of open datasets covering economics, demographics, public health, environmental data, political indicators, social metrics, and cultural information. Designed to provide researchers, analysts, educators, and data scientists with centralized access to Latin American data sources, facilitating reproducible research, comparative analysis, and teaching applications focused on these five major Latin American countries. Included packages: - ArgentinAPI': API functions and curated datasets for Argentina covering exchange rates, inflation, political figures, national holidays and more. - BrazilDataAPI': API functions and curated datasets for Brazil covering postal codes, banks, economic indicators, holidays, company registrations and more. - ChileDataAPI': API functions and curated datasets for Chile covering financial indicators ('UF', UTM, Dollar, Euro, Yen, Copper, Bitcoin, IPSA index), holidays and more. - ColombiAPI': API functions and curated datasets for Colombia covering geographic locations, cultural attractions, economic indicators, demographic data, national holidays and more. - PeruAPIs': API functions and curated datasets for Peru covering economic indicators, demographics, national holidays, administrative divisions, electoral data, biodiversity and more. For more information on the APIs, see: Nager.Date <https://date.nager.at/Api>, World Bank API <https://datahelpdesk.worldbank.org/knowledgebase/articles/889392>, REST Countries API <https://restcountries.com/>, ArgentinaDatos API <https://argentinadatos.com/>, BrasilAPI <https://brasilapi.com.br/>, FINDIC <https://findic.cl/>, and API-Colombia <https://api-colombia.com/>.