Heuristic methods to solve the routing problems in a warehouse management. Package includes several heuristics such as the Midpoint, Return, S-Shape and Semi-Optimal Heuristics for designation of the pickerâ s route in order picking. The heuristics aim to provide the acceptable travel distances while considering warehouse layout constraints such as aisles and shelves. It also includes implementation of the COPRAS (COmplex PRoportional ASsessment) method for supporting selection of locations to be visited by the picker in shared storage systems. The package is designed to facilitate more efficient warehouse routing and logistics operations. see: Bartholdi, J. J., Hackman, S. T. (2019). "WAREHOUSE & DISTRIBUTION SCIENCE. Release 0.98.1." The Supply Chain & Logistics Institute. H. Milton Stewart School of Industrial and Systems Engineering. Georgia Institute of Technology. <https://www.warehouse-science.com/book/editions/wh-sci-0.98.1.pdf>.
Visualization of antibody titer scores is valuable for examination of vaccination effects. AntibodyTiters
visualizes antibody titers of all or selected patients. This package also produces empty excel files in a specified format, in which users can fill in experimental data for visualization. Excel files with toy data can also be produced, so that users can see how it is visualized before obtaining real data. The data should contain titer scores at pre-vaccination, after-1st shot, after-2nd shot, and at least one additional sampling points. Patients with missing values can be included. The first two sampling points (pre-vaccination and after-1st shot) will be plotted discretely, whereas those following will be plotted on a continuous time scale that starts from the day of second shot. Half-life of titer can also be calculated for each pair of sampling points.
This package contains functions that allow Bayesian inference on a parameter of some widely-used exponential models. The functions can generate independent samples from the closed-form posterior distribution using the inverse stable prior. Inverse stable is a non-conjugate prior for a parameter of an exponential subclass of discrete and continuous data distributions (e.g. Poisson, exponential, inverse gamma, double exponential (Laplace), half-normal/half-Gaussian, etc.). The prior class provides flexibility in capturing a wide array of prior beliefs (right-skewed and left-skewed) as modulated by a parameter that is bounded in (0,1). The generated samples can be used to simulate the prior and posterior predictive distributions. More details can be found in Cahoy and Sedransk (2019) <doi:10.1007/s42519-018-0027-2>. The package can also be used as a teaching demo for introductory Bayesian courses.
This package provides tools for performing routine analysis and plotting tasks with environmental data from the System Wide Monitoring Program of the National Estuarine Research Reserve System <https://cdmo.baruch.sc.edu/>. This package builds on the functionality of the SWMPr package <https://cran.r-project.org/package=SWMPr>, which is used to retrieve and organize the data. The combined set of tools address common challenges associated with continuous time series data for environmental decision making, and are intended for use in annual reporting activities. References: Beck, Marcus W. (2016) <ISSN 2073-4859><https://journal.r-project.org/archive/2016-1/beck.pdf> Rudis, Bob (2014) <https://rud.is/b/2014/11/16/moving-the-earth-well-alaska-hawaii-with-r/>. United States Environmental Protection Agency (2015) <https://cfpub.epa.gov/si/si_public_record_Report.cfm?Lab=OWOW&dirEntryId=327030>
.
Sample size estimations for MRMC studies based on the Obuchowski-Rockette (OR) methodology is implemented. The function can calculate sample sizes where the endpoint of interest in the study is either ROC AUC (Area-Under-the-Receiver-Operating-Characteristics-Curve) or sensitivity. The package can also return sample sizes for studies expected to have clustering effect (e.g.- multiple pulmonary nodules per patient). All calculations assume that the study design is fully crossed (paired-reader, paired-case) where each reader reads/interprets each case and that there are two interventions/imaging-modalities/techniques in the study. In addition to MRMC, it can also be used to estimate sample sizes for standalone studies where sensitivity or AUC are the primary endpoints. The methods implemented are based on the methods described in Zhou et.al. (2011) <doi:10.1002/9780470906514> and Obuchowski (2000) <doi:10.1097/EDE.0b013e3181a663cc>.
Speed up common tasks, particularly logical or relational comparisons and routine follow up tasks such as finding the indices and subsetting. Inspired by mathematics, where something like: 3 < x < 6 is a standard, elegant and clear way to assert that x is both greater than 3 and less than 6 (see for example <https://en.wikipedia.org/wiki/Relational_operator>), a chaining operator is implemented. The chaining operator, %c%, allows multiple relational operations to be used in quotes on the right hand side for the same object, on the left hand side. The %e% operator allows something like set-builder notation (see for example <https://en.wikipedia.org/wiki/Set-builder_notation>) to be used on the right hand side. All operators have built in prefixes defined for all, subset, and which to reduce the amount of code needed for common tasks, such as return those values that are true.
This package provides a supervised learning algorithm inputs a train set, and outputs a prediction function, which can be used on a test set. If each data point belongs to a subset (such as geographic region, year, etc), then how do we know if subsets are similar enough so that we can get accurate predictions on one subset, after training on Other subsets? And how do we know if training on All subsets would improve prediction accuracy, relative to training on the Same subset? SOAK, Same/Other/All K-fold cross-validation, <doi:10.48550/arXiv.2410.08643>
can be used to answer these question, by fixing a test subset, training models on Same/Other/All subsets, and then comparing test error rates (Same versus Other and Same versus All). Also provides code for estimating how many train samples are required to get accurate predictions on a test set.
Enables computationally efficient parameters-estimation by variational Bayesian methods for various diagnostic classification models (DCMs). DCMs are a class of discrete latent variable models for classifying respondents into latent classes that typically represent distinct combinations of skills they possess. Recently, to meet the growing need of large-scale diagnostic measurement in the field of educational, psychological, and psychiatric measurements, variational Bayesian inference has been developed as a computationally efficient alternative to the Markov chain Monte Carlo methods, e.g., Yamaguchi and Okada (2020a) <doi:10.1007/s11336-020-09739-w>, Yamaguchi and Okada (2020b) <doi:10.3102/1076998620911934>, Yamaguchi (2020) <doi:10.1007/s41237-020-00104-w>, Oka and Okada (2023) <doi:10.1007/s11336-022-09884-4>, and Yamaguchi and Martinez (2023) <doi:10.1111/bmsp.12308>. To facilitate their applications, variationalDCM
is developed to provide a collection of recently-proposed variational Bayesian estimation methods for various DCMs.
We present a range of simulations to aid researchers in determining appropriate sample sizes when performing critical thermal limits studies (e.g. CTmin/CTmin experiments). A number of wrapper functions are provided for plotting and summarising outputs from these simulations. This package is presented in van Steenderen, C.J.M., Sutton, G.F., Owen, C.A., Martin, G.D., and Coetzee, J.A. Sample size assessments for thermal physiology studies: An R package and R Shiny application. 2023. Physiological Entomology. <doi:10.1111/phen.12416>. The GUI version of this package is available on the R Shiny online server at: <https://clarkevansteenderen.shinyapps.io/ThermalSampleR_Shiny/>
, or it is accessible via GitHub
at <https://github.com/clarkevansteenderen/ThermalSampleR_Shiny/>
. We would like to thank Grant Duffy (University of Otago, Dundedin, New Zealand) for granting us permission to use the source code for the Test of Total Equivalency function.
Compute the frequency distribution of a search term in a series of texts. For example, Arthur Conan Doyle wrote a total of 60 Sherlock Holmes stories, comprised of 54 short stories and 4 longer novels. I wanted to test my own subjective impression that, in many of the stories, Sherlock Holmes popularity was used as bait to induce the reader to read a story that is essentially not primarily a Sherlock Holmes story. I used the term "Holmes" as a search pattern, since Watson would frequently address him by name, or use his name to describe something that he was doing. My hypothesis is that the frequency distribution of the search pattern "Holmes" is a good proxy for the degree to which a story is or is not truly a Sherlock Holmes story. The results are presented in a manuscript that is available as a vignette and online at <https://barryzee.github.io/Concordance/index.html>.
This package provides functions for eleven procedures for determining the number of factors, including functions for parallel analysis and the minimum average partial test. There are also functions for conducting principal components analysis, principal axis factor analysis, maximum likelihood factor analysis, image factor analysis, and extension factor analysis, all of which can take raw data or correlation matrices as input and with options for conducting the analyses using Pearson correlations, Kendall correlations, Spearman correlations, gamma correlations, or polychoric correlations. Varimax rotation, promax rotation, and Procrustes rotations can be performed. Additional functions focus on the factorability of a correlation matrix, the congruences between factors from different datasets, the assessment of local independence, the assessment of factor solution complexity, and internal consistency. Auerswald & Moshagen (2019, ISSN:1939-1463); Field, Miles, & Field (2012, ISBN:978-1-4462-0045-2); Mulaik (2010, ISBN:978-1-4200-9981-2); O'Connor (2000, <doi:10.3758/bf03200807>); O'Connor (2001, ISSN:0146-6216).
Weather indices represent the overall weekly effect of a weather variable on crop yield throughout the cropping season. This package contains functions that can convert the weekly weather data into yearly weighted Weather indices with weights being the correlation coefficient between weekly weather data over the years and crop yield over the years. This can be done for an individual weather variable and for two weather variables at a time as the interaction effect. This method was first devised by Jain, RC, Agrawal R, and Jha, MP (1980), "Effect of climatic variables on rice yield and its forecast",MAUSAM, 31(4), 591â 596, <doi:10.54302/mausam.v31i4.3477>. Later, the method have been used by various researchers and the latest can found in Gupta, AK, Sarkar, KA, Dhakre, DS, & Bhattacharya, D (2022), "Weather Based Potato Yield Modelling using Statistical and Machine Learning Technique",Environment and Ecology, 40(3B), 1444â 1449,<https://www.environmentandecology.com/volume-40-2022>.
QuaternaryProd
is an R package that performs causal reasoning on biological networks, including publicly available networks such as STRINGdb. QuaternaryProd
is an open-source alternative to commercial products such as Inginuity Pathway Analysis. For a given a set of differentially expressed genes, QuaternaryProd
computes the significance of upstream regulators in the network by performing causal reasoning using the Quaternary Dot Product Scoring Statistic (Quaternary Statistic), Ternary Dot product Scoring Statistic (Ternary Statistic) and Fisher's exact test (Enrichment test). The Quaternary Statistic handles signed, unsigned and ambiguous edges in the network. Ambiguity arises when the direction of causality is unknown, or when the source node (e.g., a protein) has edges with conflicting signs for the same target gene. On the other hand, the Ternary Statistic provides causal reasoning using the signed and unambiguous edges only. The Vignette provides more details on the Quaternary Statistic and illustrates an example of how to perform causal reasoning using STRINGdb.
In searching for research articles, we often want to obtain lists of references from across studies, and also obtain lists of articles that cite a particular study. In systematic reviews, this supplementary search technique is known as citation chasing': forward citation chasing looks for all records citing one or more articles of known relevance; backward citation chasing looks for all records referenced in one or more articles. Traditionally, this process would be done manually, and the resulting records would need to be checked one-by-one against included studies in a review to identify potentially relevant records that should be included in a review. This package contains functions to automate this process by making use of the Lens.org API. An input article list can be used to return a list of all referenced records, and/or all citing records in the Lens.org database (consisting of PubMed
, PubMed
Central, CrossRef
, Microsoft Academic Graph and CORE; <https://www.lens.org>).
Providing easy, portable access to NASA EarthData
products through the use of bearer tokens. Much of NASA's public data catalogs hosted and maintained by its 12 Distributed Active Archive Centers ('DAACs') are now made available on the Amazon Web Services S3 storage. However, accessing this data through the standard S3 API is restricted to only to compute resources running inside us-west-2 Data Center in Portland, Oregon, which allows NASA to avoid being charged data egress rates. This package provides public access to the data from any networked device by using the EarthData
login application programming interface (API), <https://www.earthdata.nasa.gov/eosdis/science-system-description/eosdis-components/earthdata-login>, providing convenient authentication and access to cloud-hosted NASA EarthData
products. This makes access to a wide range of earth observation data from any location straight forward and compatible with R packages that are widely used with cloud native earth observation data (such as terra', sf', etc.).
This collection of gene representation-independent functions implements the population layer of extended evolutionary and genetic algorithms and its support. The population layer consists of functions for initializing, logging, observing, evaluating a population of genes, as well as of computing the next population. For parallel evaluation of a population of genes 4 execution models - named Sequential, MultiCore
, FutureApply
, and Cluster - are provided. They are implemented by configuring the lapply()
function. The execution model FutureApply
can be externally configured as recommended by Bengtsson (2021) <doi:10.32614/RJ-2021-048>. Configurable acceptance rules and cooling schedules (see Kirkpatrick, S., Gelatt, C. D. J, and Vecchi, M. P. (1983) <doi:10.1126/science.220.4598.671>, and Aarts, E., and Korst, J. (1989, ISBN:0-471-92146-7) offer simulated annealing or greedy randomized approximate search procedure elements. Adaptive crossover and mutation rates depending on population statistics generalize the approach of Stanhope, S. A. and Daida, J. M. (1996, ISBN:0-18-201-031-7).
In the field of stratified sampling design, this package offers an approach for the determination of the best stratification of a sampling frame, the one that ensures the minimum sample cost under the condition to satisfy precision constraints in a multivariate and multidomain case. This approach is based on the use of the genetic algorithm: each solution (i.e. a particular partition in strata of the sampling frame) is considered as an individual in a population; the fitness of all individuals is evaluated applying the Bethel-Chromy algorithm to calculate the sampling size satisfying precision constraints on the target estimates. Functions in the package allows to: (a) analyse the obtained results of the optimisation step; (b) assign the new strata labels to the sampling frame; (c) select a sample from the new frame accordingly to the best allocation. Functions for the execution of the genetic algorithm are a modified version of the functions in the genalg package. M.Ballin, G.Barcaroli (2020) <arXiv:2004.09366>
"R package SamplingStrata
: new developments and extension to Spatial Sampling".
The least-squares Monte Carlo (LSM) simulation method is a popular method for the approximation of the value of early and multiple exercise options. LSMRealOptions
provides implementations of the LSM simulation method to value American option products and capital investment projects through real options analysis. LSMRealOptions
values capital investment projects with cash flows dependent upon underlying state variables that are stochastically evolving, providing analysis into the timing and critical values at which investment is optimal. LSMRealOptions
provides flexibility in the stochastic processes followed by underlying assets, the number of state variables, basis functions and underlying asset characteristics to allow a broad range of assets to be valued through the LSM simulation method. Real options projects are further able to be valued whilst considering construction periods, time-varying initial capital expenditures and path-dependent operational flexibility including the ability to temporarily shutdown or permanently abandon projects after initial investment has occurred. The LSM simulation method was first presented in the prolific work of Longstaff and Schwartz (2001) <doi:10.1093/rfs/14.1.113>.
Recast is state of the art navigation mesh construction toolset for games.
It is automatic, which means that you can throw any level geometry at it and you will get robust mesh out.
It is fast which means swift turnaround times for level designers.
The Recast process starts with constructing a voxel mold from a level geometry and then casting a navigation mesh over it. The process consists of three steps, building the voxel mold, partitioning the mold into simple regions, peeling off the regions as simple polygons.
Recast is accompanied with Detour, path-finding and spatial reasoning toolkit. You can use any navigation mesh with Detour, but of course the data generated with Recast fits perfectly.
Detour offers simple static navigation mesh which is suitable for many simple cases, as well as tiled navigation mesh which allows you to plug in and out pieces of the mesh. The tiled mesh allows you to create systems where you stream new navigation data in and out as the player progresses the level, or you may regenerate tiles as the world changes.
An implementation of a method of extending a logistic regression model beyond linear effects of the co-variates. The extension in is constructed by first equating the logistic regression model to a naive Bayes model where all the margins are specified to follow natural exponential distributions conditional on Y, that is, a model for Y given X that is specified through the distribution of X given Y, where the columns of X are assumed to be mutually independent conditional on Y. Subsequently, the model is expanded by adding vine - copulas to relax the assumption of mutual independence, where pair-copulas are added in a stage-wise, forward selection manner. Some heuristics are employed during the process of selecting edges, as well as the families of pair-copula models. After each component is added, the parameters are updated by a (smaller) number of gradient steps to maximise the likelihood. When the algorithm has stopped adding edges, based the criterion that a new edge should improve the likelihood more than k times the number new parameters, the parameters are updated with a larger number of gradient steps, or until convergence.
Normally building a GODB is fairly complicated, involving downloading multiple database files and using these to build e.g. a mySQL
database. Accessing this database is also complicated, involving an intimate knowledge of the database in order to construct reliable queries. Here we have a more modest goal, generating GOGOA3, which is a stripped down version of the GODB that is restricted to human genes as designated by the HUGO Gene Nomenclature Committee (HGNC) (see <https://geneontology.org/>). This can be built in a matter of seconds from 2 easily downloaded files (see <https://current.geneontology.org/products/pages/downloads.html> and <https://geneontology.org/docs/download-ontology/>), and it can be queried by e.g. w<-which(GOGOA3[,"HGNC"] %in% hgncList
) where GOGOA3 is a matrix representing the minimalist GODB and hgncList
is a list of gene identifiers. This database will be used in my upcoming package GoMiner
which is based on my previous publication (see Zeeberg, B.R., Feng, W., Wang, G. et al. (2003)<doi:10.1186/gb-2003-4-4-r28>). Relevant .RData files are available from GitHub
(<https://github.com/barryzee/GO>).
An approach to identify microbiome biomarker for time to event data by discovering microbiome for predicting survival and classifying subjects into risk groups. Classifiers are constructed as a linear combination of important microbiome and treatment effects if necessary. Several methods were implemented to estimate the microbiome risk score such as the LASSO method by Robert Tibshirani (1998) <doi:10.1002/(SICI)1097-0258(19970228)16:4%3C385::AID-SIM380%3E3.0.CO;2-3>, Elastic net approach by Hui Zou and Trevor Hastie (2005) <doi:10.1111/j.1467-9868.2005.00503.x>, supervised principle component analysis of Wold Svante et al. (1987) <doi:10.1016/0169-7439(87)80084-9>, and supervised partial least squares analysis by Inge S. Helland <https://www.jstor.org/stable/4616159>. Sensitivity analysis on the quantile used for the classification can also be accessed to check the deviation of the classification group based on the quantile specified. Large scale cross validation can be performed in order to investigate the mostly selected microbiome and for internal validation. During the evaluation process, validation is accessed using the hazard ratios (HR) distribution of the test set and inference is mainly based on resampling and permutations technique.
This package provides several confidence interval and testing procedures using event-specific win ratios for semi-competing risks data with non-terminal and terminal events, as developed in Yang et al. (2021<doi:10.1002/sim.9266>). Compared with conventional methods for survival data, these procedures are designed to utilize more data for improved inference procedures with semi-competing risks data. The event-specific win ratios were introduced in Yang and Troendle (2021<doi:10.1177/1740774520972408>). In this package, the event-specific win ratios and confidence intervals are obtained for each event type, and several testing procedures are developed for the global null of no treatment effect on either terminal or non-terminal events. Furthermore, a test of proportional hazard assumptions, under which the event-specific win ratios converge to the hazard ratios, and a test of equal hazard ratios are provided. For summarizing the treatment effect on all events, confidence intervals for linear combinations of the event-specific win ratios are available using pre-determined or data-driven weights. Asymptotic properties of these inference procedures are discussed in Yang et al (2021<doi:10.1002/sim.9266>). Also, transformations are used to yield better control of the type one error rates for moderately sized data sets.
This package provides data set and function for exploration of Multiple Indicator Cluster Survey (MICS) 2017-18 Household questionnaire data for Punjab, Pakistan. The results of the present survey are critically important for the purposes of SDG monitoring, as the survey produces information on 32 global SDG indicators. The data was collected from 53,840 households selected at the second stage with systematic random sampling out of a sample of 2,692 clusters selected using Probability Proportional to size sampling. Six questionnaires were used in the survey: (1) a household questionnaire to collect basic demographic information on all de jure household members (usual residents), the household, and the dwelling; (2) a water quality testing questionnaire administered in three households in each cluster of the sample; (3) a questionnaire for individual women administered in each household to all women age 15-49 years; (4) a questionnaire for individual men administered in every second household to all men age 15-49 years; (5) an under-5 questionnaire, administered to mothers (or caretakers) of all children under 5 living in the household; and (6) a questionnaire for children age 5-17 years, administered to the mother (or caretaker) of one randomly selected child age 5-17 years living in the household (<http://www.mics.unicef.org/surveys>).