Derivation of indexes for benchmarking purposes. A methodology with flexible number of constituents is implemented. Also functions for market capitalization and volume weighted indexes with fixed number of constituents are available. The main function of the package, indexComp(), provides the derived index, suitable for analysis purposes. The functions indexUpdate(), indexMemberSelection() and indexMembersUpdate() are components of indexComp() and enable one to construct and continuously update an index, e.g. for display on a website. The methodology behind the functions provided gets introduced in Trimborn and Haerdle (2018) <doi:10.1016/j.jempfin.2018.08.004>.
This package provides a classification (decision) tree is constructed from survival data with high-dimensional covariates. The method is a robust version of the logrank tree, where the variance is stabilized. The main function "uni.tree" returns a classification tree for a given survival dataset. The inner nodes (splitting criterion) are selected by minimizing the P-value of the two-sample the score tests. The decision of declaring terminal nodes (stopping criterion) is the P-value threshold given by an argument (specified by user). This tree construction algorithm is proposed by Emura et al. (2021, in review).
Allows to generate automatically testthat code files from offensive programming test cases. Generated test files are complete and ready to run. Using wyz.code.testthat you will earn a lot of time, reduce the number of errors in test case production, be able to test immediately generated files without any need to view or modify them, and enter a zero time latency between code implementation and industrial testing. As with testthat', you may complete provided test cases according to your needs to push testing further, but this need is nearly void when using wyz.code.offensiveProgramming'.
Providing a common set of simplified web scraping tools for working with the NHS Data Dictionary <https://datadictionary.nhs.uk/data_elements_overview.html>. The intended usage is to access the data elements section of the NHS Data Dictionary to access key lookups. The benefits of having it in this package are that the lookups are the live lookups on the website and will not need to be maintained. This package was commissioned by the NHS-R community <https://nhsrcommunity.com/> to provide this consistency of lookups. The OpenSafely lookups have now been added <https://www.opencodelists.org/docs/>.
Implementation of selected Tidyverse functions within DataSHIELD', an open-source federated analysis solution in R. Currently, DataSHIELD contains very limited tools for data manipulation, so the aim of this package is to improve the researcher experience by implementing essential functions for data manipulation, including subsetting, filtering, grouping, and renaming variables. This is the clientside package which should be installed locally, and is used in conjuncture with the serverside package dsTidyverse which is installed on the remote server holding the data. For more information, see <https://www.tidyverse.org/>, <https://datashield.org/> and <https://github.com/molgenis/ds-tidyverse>.
Allows users to create time series of tropical storm exposure histories for chosen counties for a number of hazard metrics (wind, rain, distance from the storm, etc.). This package interacts with data available through the hurricaneexposuredata package, which is available in a drat repository. To access this data package, see the instructions at <https://github.com/geanders/hurricaneexposure>. The size of the hurricaneexposuredata package is approximately 20 MB. This work was supported in part by grants from the National Institute of Environmental Health Sciences (R00ES022631), the National Science Foundation (1331399), and a NASA Applied Sciences Program/Public Health Program Grant (NNX09AV81G).
Missing data imputation based on the missForest algorithm (Stekhoven, Daniel J (2012) <doi:10.1093/bioinformatics/btr597>) with adaptations for prediction settings. The function missForest() is used to impute a (training) dataset with missing values and to learn imputation models that can be later used for imputing new observations. The function missForestPredict() is used to impute one or multiple new observations (test set) using the models learned on the training data. For more details see Albu, E., Gao, S., Wynants, L., & Van Calster, B. (2024). missForestPredict--Missing data imputation for prediction settings <doi:10.48550/arXiv.2407.03379>.
When multiple Cox proportional hazard models are performed on clinical data (month or year and status) and a set of differential expressions of genes, the results (Hazard risks, z-scores and p-values) can be used to create gene-expression signatures. Weights are calculated using the survival p-values of genes and are utilized to calculate expression values of the signature across the selected genes in all patients in a cohort. A Single or multiple univariate or multivariate Cox proportional hazard survival analyses of the patients in one cohort can be performed by using the gene-expression signature and visualized using our survival plots.
This package provides an extensive and curated collection of datasets related to the digestive system, stomach, intestines, liver, pancreas, and associated diseases. This package includes clinical trials, observational studies, experimental datasets, cohort data, and case series involving gastrointestinal disorders such as gastritis, ulcers, pancreatitis, liver cirrhosis, colon cancer, colorectal conditions, Helicobacter pylori infection, irritable bowel syndrome, intestinal infections, and post-surgical outcomes. The datasets support educational, clinical, and research applications in gastroenterology, public health, epidemiology, and biomedical sciences. Designed for researchers, clinicians, data scientists, students, and educators interested in digestive diseases, the package facilitates reproducible analysis, modeling, and hypothesis testing using real-world and historical data.
The package provides a comprehensive mapping table of metabolites linked to Wikipathways pathways. The tables include HMDB, KEGG, ChEBI, Drugbank, PubChem compound, ChemSpider, KNApSAcK, and Wikidata IDs plus CAS and InChIKey. The tables are provided for each of the 25 species ("Anopheles gambiae", "Arabidopsis thaliana", "Bacillus subtilis", "Bos taurus", "Caenorhabditis elegans", "Canis familiaris", "Danio rerio", "Drosophila melanogaster", "Equus caballus", "Escherichia coli", "Gallus gallus", "Gibberella zeae", "Homo sapiens", "Hordeum vulgare", "Mus musculus", "Mycobacterium tuberculosis", "Oryza sativa", "Pan troglodytes", "Plasmodium falciparum", "Populus trichocarpa", "Rattus norvegicus", "Saccharomyces cerevisiae", "Solanum lycopersicum", "Sus scrofa", "Zea mays"). These table information can be used for Metabolite Set Enrichment Analysis.
Automated flagging of common spatial and temporal errors in biological and paleontological collection data, for the use in conservation, ecology and paleontology. Includes automated tests to easily flag (and exclude) records assigned to country or province centroid, the open ocean, the headquarters of the Global Biodiversity Information Facility, urban areas or the location of biodiversity institutions (museums, zoos, botanical gardens, universities). Furthermore identifies per species outlier coordinates, zero coordinates, identical latitude/longitude and invalid coordinates. Also implements an algorithm to identify data sets with a significant proportion of rounded coordinates. Especially suited for large data sets. The reference for the methodology is: Zizka et al. (2019) <doi:10.1111/2041-210X.13152>.
Dominance analysis is a method that allows to compare the relative importance of predictors in multiple regression models: ordinary least squares, generalized linear models, hierarchical linear models, beta regression and dynamic linear models. The main principles and methods of dominance analysis are described in Budescu, D. V. (1993) <doi:10.1037/0033-2909.114.3.542> and Azen, R., & Budescu, D. V. (2003) <doi:10.1037/1082-989X.8.2.129> for ordinary least squares regression. Subsequently, the extensions for multivariate regression, logistic regression and hierarchical linear models were described in Azen, R., & Budescu, D. V. (2006) <doi:10.3102/10769986031002157>, Azen, R., & Traxel, N. (2009) <doi:10.3102/1076998609332754> and Luo, W., & Azen, R. (2013) <doi:10.3102/1076998612458319>, respectively.
Estimation and testing methods for dependently truncated data. Semi-parametric methods are based on Emura et al. (2011)<Stat Sinica 21:349-67>, Emura & Wang (2012)<doi:10.1016/j.jmva.2012.03.012>, and Emura & Murotani (2015)<doi:10.1007/s11749-015-0432-8>. Parametric approaches are based on Emura & Konno (2012)<doi:10.1007/s00362-014-0626-2> and Emura & Pan (2017)<doi:10.1007/s00362-017-0947-z>. A regression approach is based on Emura & Wang (2016)<doi:10.1007/s10463-015-0526-9>. Quasi-independence tests are based on Emura & Wang (2010)<doi:10.1016/j.jmva.2009.07.006>. Right-truncated data for Japanese male centenarians are given by Emura & Murotani (2015)<doi:10.1007/s11749-015-0432-8>.
Targets parameters that solve Ordinary Differential Equations (ODEs) driven by a vector of cumulative hazard functions. The package provides a method for estimating these parameters using an estimator defined by a corresponding Stochastic Differential Equation (SDE) system driven by cumulative hazard estimates. By providing cumulative hazard estimates as input, the package gives estimates of the parameter as output, along with pointwise (co)variances derived from an asymptotic expression. Examples of parameters that can be targeted in this way include the survival function, the restricted mean survival function, cumulative incidence functions, among others; see Ryalen, Stensrud, and Røysland (2018) <doi:10.1093/biomet/asy035>, and further applications in Stensrud, Røysland, and Ryalen (2019) <doi:10.1111/biom.13102> and Ryalen et al. (2021) <doi:10.1093/biostatistics/kxab009>.
Survival analysis is employed to model time-to-event data. This package examines the relationship between survival and one or more predictors, termed as covariates, which can include both treatment variables (e.g., season of birth, represented by indicator functions) and continuous variables. To this end, the Cox-proportional hazard (Cox-PH) model, introduced by Cox in 1972, is a widely applicable and commonly used method for survival analysis. This package enables the estimation of the effect of randomization for the treatment variable to account for potential confounders, providing adjustment when estimating the association with exposure. It accommodates both fixed and time-dependent covariates and computes survival probabilities for lactation periods in dairy animals. The package is built upon the algorithm developed by Klein and Moeschberger (2003) <DOI:10.1007/b97377>.
This package contains the functions to use the econometric methods in the paper Bryzgalova, Huang, and Julliard (2023) <doi:10.1111/jofi.13197>. In this package, we provide a novel Bayesian framework for analyzing linear asset pricing models: simple, robust, and applicable to high-dimensional problems. For a stand-alone model, we provide functions including BayesianFM() and BayesianSDF() to deliver reliable price of risk estimates for both tradable and nontradable factors. For competing factors and possibly nonnested models, we provide functions including continuous_ss_sdf(), continuous_ss_sdf_v2(), and dirac_ss_sdf_pvalue() to analyze high-dimensional models. If you use this package, please cite the paper. We are thankful to Yunan Ding and Jingtong Zhang for their research assistance. Any errors or omissions are the responsibility of the authors.
This package implements variable screening techniques for ultra-high dimensional regression settings. Techniques for independent (iid) data, varying-coefficient models, and longitudinal data are implemented. The package currently contains three screen functions: screenIID(), screenLD() and screenVCM(), and six methods for simulating dataset: simulateDCSIS(), simulateLD, simulateMVSIS(), simulateMVSISNY(), simulateSIRS() and simulateVCM(). The package is based on the work of Li-Ping ZHU, Lexin LI, Runze LI, and Li-Xing ZHU (2011) <DOI:10.1198/jasa.2011.tm10563>, Runze LI, Wei ZHONG, & Liping ZHU (2012) <DOI:10.1080/01621459.2012.695654>, Jingyuan LIU, Runze LI, & Rongling WU (2014) <DOI:10.1080/01621459.2013.850086> Hengjian CUI, Runze LI, & Wei ZHONG (2015) <DOI:10.1080/01621459.2014.920256>, and Wanghuan CHU, Runze LI and Matthew REIMHERR (2016) <DOI:10.1214/16-AOAS912>.
Computing elliptical joint confidence regions at a specified confidence level. It provides the flexibility to estimate either classical or robust confidence regions, which can be visualized in 2D or 3D plots. The classical approach assumes normality and uses the mean and covariance matrix to define the confidence regions. Alternatively, the robustified version employs estimators like minimum covariance determinant (MCD) and M-estimator, making them less sensitive to outliers and departures from normality. Furthermore, the functions allow users to group the dataset based on categorical variables and estimate separate confidence regions for each group. This capability is particularly useful for exploring potential differences or similarities across subgroups within a dataset. Varmuza and Filzmoser (2009, ISBN:978-1-4200-5947-2). Johnson and Wichern (2007, ISBN:0-13-187715-1). Raymaekers and Rousseeuw (2019) <DOI:10.1080/00401706.2019.1677270>.
The modified Adult Treatment Panel -III guidelines (ATP-III) proposed by American Heart Association (AHA) and National Heart, Lung and Blood Institute (NHLBI) are used widely for the clinical diagnosis of Metabolic Syndrome. The AHA-NHLBI criteria advise using parameters such as waist circumference (WC), systolic blood pressure (SBP), diastolic blood pressure (DBP), fasting plasma glucose (FPG), triglycerides (TG) and high-density lipoprotein cholesterol (HDLC) for diagnosis of metabolic syndrome. Each parameter has to be interpreted based on the proposed cut-offs, making the diagnosis slightly complex and error-prone. This package is developed by incorporating the modified ATP-III guidelines, and it will aid in the easy and quick diagnosis of metabolic syndrome in busy healthcare settings and also for research purposes. The modified ATP-III-AHA-NHLBI criteria for the diagnosis is described by Grundy et al ., (2005) <doi:10.1161/CIRCULATIONAHA.105.169404>.
Load and analyze updated time series worldwide data of reported cases for the Novel Coronavirus Disease (COVID-19) from different sources, including the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE) data repository <https://github.com/CSSEGISandData/COVID-19>, "Our World in Data" <https://github.com/owid/> among several others. The datasets reporting the COVID-19 cases are available in two main modalities, as a time series sequences and aggregated data for the last day with greater spatial resolution. Several analysis, visualization and modelling functions are available in the package that will allow the user to compute and visualize total number of cases, total number of changes and growth rate globally or for an specific geographical location, while at the same time generating models using these trends; generate interactive visualizations and generate Susceptible-Infected-Recovered (SIR) model for the disease spread.
Automatically returns 24 logistic models including 13 individual models and 11 ensembles of models of logistic data. The package also returns 25 plots, 5 tables, and a summary report. The package automatically builds all 24 models, reports all results, and provides graphics to show how the models performed. This can be used for a wide range of data, such as sports or medical data. The package includes medical data (the Pima Indians data set), and information about the performance of Lebron James. The package can be used to analyze many other examples, such as stock market data. The package automatically returns many values for each model, such as True Positive Rate, True Negative Rate, False Positive Rate, False Negative Rate, Positive Predictive Value, Negative Predictive Value, F1 Score, Area Under the Curve. The package also returns 36 Receiver Operating Characteristic (ROC) curves for each of the 24 models.
Management problems of deterministic and stochastic projects. It obtains the duration of a project and the appropriate slack for each activity in a deterministic context. In addition it obtains a schedule of activities time (Castro, Gómez & Tejada (2007) <doi:10.1016/j.orl.2007.01.003>). It also allows the management of resources. When the project is done, and the actual duration for each activity is known, then it can know how long the project is delayed and make a fair delivery of the delay between each activity (Bergantiños, Valencia-Toledo & Vidal-Puga (2018) <doi:10.1016/j.dam.2017.08.012>). In a stochastic context it can estimate the average duration of the project and plot the density of this duration, as well as, the density of the early and last times of the chosen activities. As in the deterministic case, it can make a distribution of the delay generated by observing the project already carried out.
Generally, soil functionality is characterized by its capability to sustain microbial activity, nutritional element supply, structural stability and aid for crop production. Since soil functions can be linked to 80% of ecosystem services, conservation of degraded land should strive to restore not only the capacity of soil to sustain flora but also ecosystem provisions. The primary ecosystem services of soil are carbon sequestration, food or biomass production, provision of microbial habitat, nutrient recycling. However, the actual magnitude of soil functions provided by agricultural land uses has never been quantified. Nutrient supply capacity (NSC) is a measure of nutrient dynamics in restored land uses. Carbon accumulation proficiency (CAP) is a measure of ecosystem carbon sequestration. Biological activity index (BAI) is the average of responses of all enzyme activities in treated land over control/reference land. The CAP parameter investigates how land uses may affect carbon flows, retention, and sequestration. The CAP provides a signal for C cycles, flows, and the systems relative operational supremacy.
The ZygosityPredictor allows to predict how many copies of a gene are affected by small variants. In addition to the basic calculations of the affected copy number of a variant, the Zygosity-Predictor can integrate the influence of several variants on a gene and ultimately make a statement if and how many wild-type copies of the gene are left. This information proves to be of particular use in the context of translational medicine. For example, in cancer genomes, the Zygosity-Predictor can address whether unmutated copies of tumor-suppressor genes are present. Beyond this, it is possible to make this statement for all genes of an organism. The Zygosity-Predictor was primarily developed to handle SNVs and INDELs (later addressed as small-variants) of somatic and germline origin. In order not to overlook severe effects outside of the small-variant context, it has been extended with the assessment of large scale deletions, which cause losses of whole genes or parts of them.