This package provides functions to access and download data from the Open Case Studies <https://www.opencasestudies.org/> repositories on GitHub <https://github.com/opencasestudies>. Different functions enable users to grab the data they need at different sections in the case study, as well as download the whole case study repository. All the user needs to do is input the name of the case study being worked on. The package relies on the httr::GET() function to access files through the GitHub API. The functions usethis::use_zip() and usethis::create_from_github() are used to clone and/or download the case study repositories. To cite an individual case study, please see the respective README file at <https://github.com/opencasestudies/>. <https://github.com/opencasestudies/ocs-bp-rural-and-urban-obesity> <https://github.com/opencasestudies/ocs-bp-air-pollution> <https://github.com/opencasestudies/ocs-bp-vaping-case-study> <https://github.com/opencasestudies/ocs-bp-opioid-rural-urban> <https://github.com/opencasestudies/ocs-bp-RTC-wrangling> <https://github.com/opencasestudies/ocs-bp-RTC-analysis> <https://github.com/opencasestudies/ocs-bp-youth-disconnection> <https://github.com/opencasestudies/ocs-bp-youth-mental-health> <https://github.com/opencasestudies/ocs-bp-school-shootings-dashboard> <https://github.com/opencasestudies/ocs-bp-co2-emissions> <https://github.com/opencasestudies/ocs-bp-diet>.
Identification, model fitting and estimation for time series with periodic structure. Additionally, procedures for simulation of periodic processes and real data sets are included. Hurd, H. L., Miamee, A. G. (2007) <doi:10.1002/9780470182833> Box, G. E. P., Jenkins, G. M., Reinsel, G. (1994) <doi:10.1111/jtsa.12194> Brockwell, P. J., Davis, R. A. (1991, ISBN:978-1-4419-0319-8) Bretz, F., Hothorn, T., Westfall, P. (2010, ISBN: 9780429139543) Westfall, P. H., Young, S. S. (1993, ISBN:978-0-471-55761-6) Bloomfield, P., Hurd, H. L.,Lund, R. (1994) <doi:10.1111/j.1467-9892.1994.tb00181.x> Dehay, D., Hurd, H. L. (1994, ISBN:0-7803-1023-3) Vecchia, A. (1985) <doi:10.1080/00401706.1985.10488076> Vecchia, A. (1985) <doi:10.1111/j.1752-1688.1985.tb00167.x> Jones, R., Brelsford, W. (1967) <doi:10.1093/biomet/54.3-4.403> Makagon, A. (1999) <https://www.math.uni.wroc.pl/~pms/files/19.2/Article/19.2.5.pdf> Sakai, H. (1989) <doi:10.1111/j.1467-9892.1991.tb00069.x> Gladyshev, E. G. (1961) <https://www.mathnet.ru/php/archive.phtml?wshow=paper&jrnid=dan&paperid=24851> Ansley (1979) <doi:10.1093/biomet/66.1.59> Hurd, H. L., Gerr, N. L. (1991) <doi:10.1111/j.1467-9892.1991.tb00088.x>.
This package provides a set of functions to implement decision-making systems based on the W.A.S.P.A.S. method (Weighted Aggregated Sum Product Assessment), Chakraborty and Zavadskas (2012) <doi:10.5755/j01.eee.122.6.1810>. So this package offers functions that analyze and validate the raw data, which must be entered in a determined format; extract specific vectors and matrices from this raw database; normalize the input data; calculate rankings by intermediate methods; apply the lambda parameter for the main method; and a function that does everything at once. The package has an example database called choppers, with which the user can see how the input data should be organized so that everything works as recommended by the decision methods based on multiple criteria that this package solves. Basically, the data are composed of a set of alternatives, which will be ranked, a set of choice criteria, a matrix of values for each Alternative-Criterion relationship, a vector of weights associated with the criteria, since certain criteria are considered more important than others, as well as a vector that defines each criterion as cost or benefit, this determines the calculation formula, as there are those criteria that we want the highest possible value (e.g. durability) and others that we want the lowest possible value (e.g. price).
Analysis of Q methodology, used to identify distinct perspectives existing within a group. This methodology is used across social, health and environmental sciences to understand diversity of attitudes, discourses, or decision-making styles (for more information, see <https://qmethod.org/>). A single function runs the full analysis. Each step can be run separately using the corresponding functions: for automatic flagging of Q-sorts (manual flagging is optional), for statement scores, for distinguishing and consensus statements, and for general characteristics of the factors. The package allows to choose either principal components or centroid factor extraction, manual or automatic flagging, a number of mathematical methods for rotation (or none), and a number of correlation coefficients for the initial correlation matrix, among many other options. Additional functions are available to import and export data (from raw *.CSV, HTMLQ and FlashQ *.CSV, PQMethod *.DAT and easy-htmlq *.JSON files), to print and plot, to import raw data from individual *.CSV files, and to make printable cards. The package also offers functions to print Q cards and to generate Q distributions for study administration. See further details in the package documentation, and in the web pages below, which include a cookbook, guidelines for more advanced analysis (how to perform manual flagging or change the sign of factors), data management, and a graphical user interface (GUI) for online and offline use.
It is assumed that psychological distances between the categories are equal for the measurement instruments consisted of polytomously scored items. According to Muraki, this assumption must be tested. In the examination process of this assumption, the fit indexes are obtained and evaluated. This package provides that this assumption is removed. By with this package, the converted scale values of all items in a measurement instrument can be calculated by estimating a category parameter set for each item. Thus, the calculations can be made without any need to usage of the common category parameter set. Through this package, the psychological distances of the items are scaled. The scaling of a category parameter set for each item cause differentiation of score of the categories will be got from items. Also, the total measurement instrument score of an individual can be calculated according to the scaling of item score categories by with this package.This package provides that the place of individuals related to the structure to be measured with a measurement instrument consisted of polytomously scored items can be reveal more accurately. In this way, it is thought that the results obtained about individuals can be made more sensitive, and the differences between individuals can be revealed more accurately. On the other hand, it can be argued that more accurate evidences can be obtained regarding the psychometric properties of the measurement instruments.
Allow to compute and visualise convective parameters commonly used in the operational prediction of severe convective storms. Core algorithm is based on a highly optimized C++ code linked into R via Rcpp'. Highly efficient engine allows to derive thermodynamic and kinematic parameters from large numerical datasets such as reanalyses or operational Numerical Weather Prediction models in a reasonable amount of time. Package has been developed since 2017 by research meteorologists specializing in severe thunderstorms. The most relevant methods used in the package based on the following publications Stipanuk (1973) <https://apps.dtic.mil/sti/pdfs/AD0769739.pdf>, McCann et al. (1994) <doi:10.1175/1520-0434(1994)009%3C0532:WNIFFM%3E2.0.CO;2>, Bunkers et al. (2000) <doi:10.1175/1520-0434(2000)015%3C0061:PSMUAN%3E2.0.CO;2>, Corfidi et al. (2003) <doi:10.1175/1520-0434(2003)018%3C0997:CPAMPF%3E2.0.CO;2>, Showalter (1953) <doi:10.1175/1520-0477-34.6.250>, Coffer et al. (2019) <doi:10.1175/WAF-D-19-0115.1>, Gropp and Davenport (2019) <doi:10.1175/WAF-D-17-0150.1>, Czernecki et al. (2019) <doi:10.1016/j.atmosres.2019.05.010>, Taszarek et al. (2020) <doi:10.1175/JCLI-D-20-0346.1>, Sherburn and Parker (2014) <doi:10.1175/WAF-D-13-00041.1>, Romanic et al. (2022) <doi:10.1016/j.wace.2022.100474>.
Framework to facilitate patient subtyping with similarity network fusion and meta clustering. The similarity network fusion (SNF) algorithm was introduced by Wang et al. (2014) in <doi:10.1038/nmeth.2810>. SNF is a data integration approach that can transform high-dimensional and diverse data types into a single similarity network suitable for clustering with minimal loss of information from each initial data source. The meta clustering approach was introduced by Caruana et al. (2006) in <doi:10.1109/ICDM.2006.103>. Meta clustering involves generating a wide range of cluster solutions by adjusting clustering hyperparameters, then clustering the solutions themselves into a manageable number of qualitatively similar solutions, and finally characterizing representative solutions to find ones that are best for the user's specific context. This package provides a framework to easily transform multi-modal data into a wide range of similarity network fusion-derived cluster solutions as well as to visualize, characterize, and validate those solutions. Core package functionality includes easy customization of distance metrics, clustering algorithms, and SNF hyperparameters to generate diverse clustering solutions; calculation and plotting of associations between features, between patients, and between cluster solutions; and standard cluster validation approaches including resampled measures of cluster stability, standard metrics of cluster quality, and label propagation to evaluate generalizability in unseen data. Associated vignettes guide the user through using the package to identify patient subtypes while adhering to best practices for unsupervised learning.
SCANVIS is a set of annotation-dependent tools for analyzing splice junctions and their read support as predetermined by an alignment tool of choice (for example, STAR aligner). SCANVIS assesses each junction's relative read support (RRS) by relating to the context of local split reads aligning to annotated transcripts. SCANVIS also annotates each splice junction by indicating whether the junction is supported by annotation or not, and if not, what type of junction it is (e.g. exon skipping, alternative 5 or 3 events, Novel Exons). Unannotated junctions are also futher annotated by indicating whether it induces a frame shift or not. SCANVIS includes a visualization function to generate static sashimi-style plots depicting relative read support and number of split reads using arc thickness and arc heights, making it easy for users to spot well-supported junctions. These plots also clearly delineate unannotated junctions from annotated ones using designated color schemes, and users can also highlight splice junctions of choice. Variants and/or a read profile are also incoroporated into the plot if the user supplies variants in bed format and/or the BAM file. One further feature of the visualization function is that users can submit multiple samples of a certain disease or cohort to generate a single plot - this occurs via a "merge" function wherein junction details over multiple samples are merged to generate a single sashimi plot, which is useful when contrasting cohorots (eg. disease vs control).
This package provides a user-friendly tool to fit Bayesian regression models. It can fit 3 types of Bayesian models using individual-level, summary-level, and individual plus pedigree-level (single-step) data for both Genomic prediction/selection (GS) and Genome-Wide Association Study (GWAS), it was designed to estimate joint effects and genetic parameters for a complex trait, including: (1) fixed effects and coefficients of covariates, (2) environmental random effects, and its corresponding variance, (3) genetic variance, (4) residual variance, (5) heritability, (6) genomic estimated breeding values (GEBV) for both genotyped and non-genotyped individuals, (7) SNP effect size, (8) phenotype/genetic variance explained (PVE) for single or multiple SNPs, (9) posterior probability of association of the genomic window (WPPA), (10) posterior inclusive probability (PIP). The functions are not limited, we will keep on going in enriching it with more features. References: Lilin Yin et al. (2025) <doi:10.18637/jss.v114.i06>; Meuwissen et al. (2001) <doi:10.1093/genetics/157.4.1819>; Gustavo et al. (2013) <doi:10.1534/genetics.112.143313>; Habier et al. (2011) <doi:10.1186/1471-2105-12-186>; Yi et al. (2008) <doi:10.1534/genetics.107.085589>; Zhou et al. (2013) <doi:10.1371/journal.pgen.1003264>; Moser et al. (2015) <doi:10.1371/journal.pgen.1004969>; Lloyd-Jones et al. (2019) <doi:10.1038/s41467-019-12653-0>; Henderson (1976) <doi:10.2307/2529339>; Fernando et al. (2014) <doi:10.1186/1297-9686-46-50>.
To date, thousands of single nucleotide polymorphisms (SNPs) have been found to be associated with complex traits and diseases. However, the vast majority of these disease-associated SNPs lie in the non-coding part of the genome, and are likely to affect regulatory elements, such as enhancers and promoters, rather than function of a protein. Thus, to understand the molecular mechanisms underlying genetic traits and diseases, it becomes increasingly important to study the effect of a SNP on nearby molecular traits such as chromatin environment or transcription factor (TF) binding. Towards this aim, we developed SNPhood, a user-friendly *Bioconductor* R package to investigate and visualize the local neighborhood of a set of SNPs of interest for NGS data such as chromatin marks or transcription factor binding sites from ChIP-Seq or RNA- Seq experiments. SNPhood comprises a set of easy-to-use functions to extract, normalize and summarize reads for a genomic region, perform various data quality checks, normalize read counts using additional input files, and to cluster and visualize the regions according to the binding pattern. The regions around each SNP can be binned in a user-defined fashion to allow for analysis of very broad patterns as well as a detailed investigation of specific binding shapes. Furthermore, SNPhood supports the integration with genotype information to investigate and visualize genotype-specific binding patterns. Finally, SNPhood can be employed for determining, investigating, and visualizing allele-specific binding patterns around the SNPs of interest.
This package provides tools for transport planning with an emphasis on spatial transport data and non-motorized modes. The package was originally developed to support the Propensity to Cycle Tool', a publicly available strategic cycle network planning tool (Lovelace et al. 2017) <doi:10.5198/jtlu.2016.862>, but has since been extended to support public transport routing and accessibility analysis (Moreno-Monroy et al. 2017) <doi:10.1016/j.jtrangeo.2017.08.012> and routing with locally hosted routing engines such as OSRM (Lowans et al. 2023) <doi:10.1016/j.enconman.2023.117337>. The main functions are for creating and manipulating geographic "desire lines" from origin-destination (OD) data (building on the od package); calculating routes on the transport network locally and via interfaces to routing services such as <https://cyclestreets.net/> (Desjardins et al. 2021) <doi:10.1007/s11116-021-10197-1>; and calculating route segment attributes such as bearing. The package implements the travel flow aggregration method described in Morgan and Lovelace (2020) <doi:10.1177/2399808320942779> and the OD jittering method described in Lovelace et al. (2022) <doi:10.32866/001c.33873>. Further information on the package's aim and scope can be found in the vignettes and in a paper in the R Journal (Lovelace and Ellison 2018) <doi:10.32614/RJ-2018-053>, and in a paper outlining the landscape of open source software for geographic methods in transport planning (Lovelace, 2021) <doi:10.1007/s10109-020-00342-2>.
Hospitals, hospital systems, and even trauma systems that provide care to injured patients may not be aware of robust metrics that can help gauge the efficacy of their programs in saving the lives of injured patients. traumar provides robust functions driven by the academic literature to automate the calculation of relevant metrics to individuals desiring to measure the performance of their trauma center or even a trauma system. traumar also provides some helper functions for the data analysis journey. Users can refer to the following publications for descriptions of the methods used in traumar'. TRISS methodology, including probability of survival, and the W, M, and Z Scores - Flora (1978) <doi:10.1097/00005373-197810000-00003>, Boyd et al. (1987, PMID:3106646), Llullaku et al. (2009) <doi:10.1186/1749-7922-4-2>, Singh et al. (2011) <doi:10.4103/0974-2700.86626>, Baker et al. (1974, PMID:4814394), and Champion et al. (1989) <doi:10.1097/00005373-198905000-00017>. For the Relative Mortality Metric, see Napoli et al. (2017) <doi:10.1080/24725579.2017.1325948>, Schroeder et al. (2019) <doi:10.1080/10903127.2018.1489021>, and Kassar et al. (2016) <doi:10.1177/00031348221093563>. For more information about methods to calculate over- and under-triage in trauma hospital populations and samples, please see the following publications - Peng & Xiang (2016) <doi:10.1016/j.ajem.2016.08.061>, Beam et al. (2022) <doi:10.23937/2474-3674/1510136>, Roden-Foreman et al. (2017) <doi:10.1097/JTN.0000000000000283>.
This package provides tools for exploring the topography of 3d triangle meshes. The functions were developed with dental surfaces in mind, but could be applied to any triangle mesh of class mesh3d'. More specifically, doolkit allows to isolate the border of a mesh, or a subpart of the mesh using the polygon networks method; crop a mesh; compute basic descriptors (elevation, orientation, footprint area); compute slope, angularity and relief index (Ungar and Williamson (2000) <https://palaeo-electronica.org/2000_1/gorilla/issue1_00.htm>; Boyer (2008) <doi:10.1016/j.jhevol.2008.08.002>), inclination and occlusal relief index or gamma (Guy et al. (2013) <doi:10.1371/journal.pone.0066142>), OPC (Evans et al. (2007) <doi:10.1038/nature05433>), OPCR (Wilson et al. (2012) <doi:10.1038/nature10880>), DNE (Bunn et al. (2011) <doi:10.1002/ajpa.21489>; Pampush et al. (2016) <doi:10.1007/s10914-016-9326-0>), form factor (Horton (1932) <doi:10.1029/TR013i001p00350>), basin elongation (Schum (1956) <doi:10.1130/0016-7606(1956)67[597:EODSAS]2.0.CO;2>), lemniscate ratio (Chorley et al; (1957) <doi:10.2475/ajs.255.2.138>), enamel-dentine distance (Guy et al. (2015) <doi:10.1371/journal.pone.0138802>; Thiery et al. (2017) <doi:10.3389/fphys.2017.00524>), absolute crown strength (Schwartz et al. (2020) <doi:10.1098/rsbl.2019.0671>), relief rate (Thiery et al. (2019) <doi:10.1002/ajpa.23916>) and area-relative curvature; draw cumulative profiles of a topographic variable; and map a variable over a 3d triangle mesh.
The R-package bayespm implements Bayesian Statistical Process Control and Monitoring (SPC/M) methodology. These methods utilize available prior information and/or historical data, providing efficient online quality monitoring of a process, in terms of identifying moderate/large transient shifts (i.e., outliers) or persistent shifts of medium/small size in the process. These self-starting, sequentially updated tools can also run under complete absence of any prior information. The Predictive Control Charts (PCC) are introduced for the quality monitoring of data from any discrete or continuous distribution that is a member of the regular exponential family. The Predictive Ratio CUSUMs (PRC) are introduced for the Binomial, Poisson and Normal data (a later version of the library will cover all the remaining distributions from the regular exponential family). The PCC targets transient process shifts of typically large size (a.k.a. outliers), while PRC is focused in detecting persistent (structural) shifts that might be of medium or even small size. Apart from monitoring, both PCC and PRC provide the sequentially updated posterior inference for the monitored parameter. Bourazas K., Kiagias D. and Tsiamyrtzis P. (2022) "Predictive Control Charts (PCC): A Bayesian approach in online monitoring of short runs" <doi:10.1080/00224065.2021.1916413>, Bourazas K., Sobas F. and Tsiamyrtzis, P. 2023. "Predictive ratio CUSUM (PRC): A Bayesian approach in online change point detection of short runs" <doi:10.1080/00224065.2022.2161434>, Bourazas K., Sobas F. and Tsiamyrtzis, P. 2023. "Design and properties of the predictive ratio cusum (PRC) control charts" <doi:10.1080/00224065.2022.2161435>.
Consider ambiguity in probabilistic descriptions by replacing a parametric probabilistic description of uncertainty by a non-parametric set of probability distributions in the form of a Density Ratio Class. This is of particular interest in Bayesian inference. The Density Ratio Class is particularly suited for this purpose as it is invariant under Bayesian inference, marginalization, and propagation through a deterministic model. Here, invariant means that the result of the operation applied to a Density Ratio Class is again a Density Ratio Class. In particular the invariance under Bayesian inference thus enables iterative learning within the same framework of Density Ratio Classes. The use of imprecise probabilities in general, and Density Ratio Classes in particular, lead to intervals of characteristics of probability distributions, such as cumulative distribution functions, quantiles, and means. The package is based on a sample of the distribution proportional to the upper bound of the class. Typically this will be a sample from the posterior in Bayesian inference. Based on such a sample, the package provides functions to calculate lower and upper class boundaries and lower and upper bounds of cumulative distribution functions, and quantiles. Rinderknecht, S.L., Albert, C., Borsuk, M.E., Schuwirth, N., Kuensch, H.R. and Reichert, P. (2014) "The effect of ambiguous prior knowledge on Bayesian model parameter inference and prediction." Environmental Modelling & Software. 62, 300-315, 2014. <doi:10.1016/j.envsoft.2014.08.020>. Sriwastava, A. and Reichert, P. "Robust Bayesian Estimation of Value Function Parameters using Imprecise Priors." Submitted. <https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4973574>.
Epistasis, commonly defined as the interaction between genetic loci, is known to play an important role in the phenotypic variation of complex traits. As a result, many statistical methods have been developed to identify genetic variants that are involved in epistasis, and nearly all of these approaches carry out this task by focusing on analyzing one trait at a time. Previous studies have shown that jointly modeling multiple phenotypes can often dramatically increase statistical power for association mapping. In this package, we present the multivariate MArginal ePIstasis Test ('mvMAPIT') â a multi-outcome generalization of a recently proposed epistatic detection method which seeks to detect marginal epistasis or the combined pairwise interaction effects between a given variant and all other variants. By searching for marginal epistatic effects, one can identify genetic variants that are involved in epistasis without the need to identify the exact partners with which the variants interact â thus, potentially alleviating much of the statistical and computational burden associated with conventional explicit search based methods. Our proposed mvMAPIT builds upon this strategy by taking advantage of correlation structure between traits to improve the identification of variants involved in epistasis. We formulate mvMAPIT as a multivariate linear mixed model and develop a multi-trait variance component estimation algorithm for efficient parameter inference and P-value computation. Together with reasonable model approximations, our proposed approach is scalable to moderately sized genome-wide association studies. Crawford et al. (2017) <doi:10.1371/journal.pgen.1006869>. Stamp et al. (2023) <doi:10.1093/g3journal/jkad118>.
Cross validation informed Relaxed LASSO (or more generally elastic net), gradient boosting machine ('xgboost'), Random Forest ('RandomForestSRC'), Oblique Random Forest ('aorsf'), Artificial Neural Network (ANN), Recursive Partitioning ('RPART') or step wise regression models are fit. Cross validation leave out samples (leading to nested cross validation) or bootstrap out-of-bag samples are used to evaluate and compare performances between these models with results presented in tabular or graphical means. Calibration plots can also be generated, again based upon (outer nested) cross validation or bootstrap leave out (out of bag) samples. For some datasets, for example when the design matrix is not of full rank, glmnet may have very long run times when fitting the relaxed lasso model, from our experience when fitting Cox models on data with many predictors and many patients, making it difficult to get solutions from either glmnet() or cv.glmnet(). This may be remedied by using the path=TRUE option when calling glmnet() and cv.glmnet(). Within the glmnetr package the approach of path=TRUE is taken by default. other packages doing similar include nestedcv <https://cran.r-project.org/package=nestedcv>, glmnetSE <https://cran.r-project.org/package=glmnetSE> which may provide different functionality when performing a nested CV. Use of the glmnetr has many similarities to the glmnet package and it could be helpful for the user of glmnetr also become familiar with the glmnet package <https://cran.r-project.org/package=glmnet>, with the "An Introduction to glmnet'" and "The Relaxed Lasso" being especially useful in this regard.
Processing and analysis of field collected or simulated sprinkler system catch data (depths) to characterize irrigation uniformity and efficiency using standard and other measures. Standard measures include the Christiansen coefficient of uniformity (CU) as found in Christiansen, J.E.(1942, ISBN:0138779295, "Irrigation by Sprinkling"); and distribution uniformity (DU), potential efficiency of the low quarter (PELQ), and application efficiency of the low quarter (AELQ) that are implementations of measures of the same notation in Keller, J. and Merriam, J.L. (1978) "Farm Irrigation System Evaluation: A Guide for Management" <https://pdf.usaid.gov/pdf_docs/PNAAG745.pdf>. spreval::DU.lh is similar to spreval::DU but is the distribution uniformity of the low half instead of low quarter as in DU. spreval::PELQT is a version of spreval::PELQ adapted for traveling systems instead of lateral move or solid-set sprinkler systems. The function spreval::eff is analogous to the method used to compute application efficiency for furrow irrigation presented in Walker, W. and Skogerboe, G.V. (1987,ISBN:0138779295, "Surface Irrigation: Theory and Practice"),that uses piecewise integration of infiltrated depth compared against soil-moisture deficit (SMD), when the argument "target" is set equal to SMD. The other functions contained in the package provide graphical representation of sprinkler system uniformity, and other standard univariate parametric and non-parametric statistical measures as applied to sprinkler system catch depths. A sample data set of field test data spreval::catchcan (catch depths) is provided and is used in examples and vignettes. Agricultural systems emphasized, but this package can be used for landscape irrigation evaluation, and a landscape (turf) vignette is included as an example application.
Fits (excess) hazard, relative mortality ratio or marginal intensity models with multidimensional penalized splines allowing for time-dependent effects, non-linear effects and interactions between several continuous covariates. In survival and net survival analysis, in addition to modelling the effect of time (via the baseline hazard), one has often to deal with several continuous covariates and model their functional forms, their time-dependent effects, and their interactions. Model specification becomes therefore a complex problem and penalized regression splines represent an appealing solution to that problem as splines offer the required flexibility while penalization limits overfitting issues. Current implementations of penalized survival models can be slow or unstable and sometimes lack some key features like taking into account expected mortality to provide net survival and excess hazard estimates. In contrast, survPen provides an automated, fast, and stable implementation (thanks to explicit calculation of the derivatives of the likelihood) and offers a unified framework for multidimensional penalized hazard and excess hazard models. Later versions (>2.0.0) include penalized models for relative mortality ratio, and marginal intensity in recurrent event setting. survPen may be of interest to those who 1) analyse any kind of time-to-event data: mortality, disease relapse, machinery breakdown, unemployment, etc 2) wish to describe the associated hazard and to understand which predictors impact its dynamics, 3) wish to model the relative mortality ratio between a cohort and a reference population, 4) wish to describe the marginal intensity for recurrent event data. See Fauvernier et al. (2019a) <doi:10.21105/joss.01434> for an overview of the package and Fauvernier et al. (2019b) <doi:10.1111/rssc.12368> for the method.
The single cell mapper (scMappR) R package contains a suite of bioinformatic tools that provide experimentally relevant cell-type specific information to a list of differentially expressed genes (DEG). The function "scMappR_and_pathway_analysis" reranks DEGs to generate cell-type specificity scores called cell-weighted fold-changes. Users input a list of DEGs, normalized counts, and a signature matrix into this function. scMappR then re-weights bulk DEGs by cell-type specific expression from the signature matrix, cell-type proportions from RNA-seq deconvolution and the ratio of cell-type proportions between the two conditions to account for changes in cell-type proportion. With cwFold-changes calculated, scMappR uses two approaches to utilize cwFold-changes to complete cell-type specific pathway analysis. The "process_dgTMatrix_lists" function in the scMappR package contains an automated scRNA-seq processing pipeline where users input scRNA-seq count data, which is made compatible for scMappR and other R packages that analyze scRNA-seq data. We further used this to store hundreds up regularly updating signature matrices. The functions "tissue_by_celltype_enrichment", "tissue_scMappR_internal", and "tissue_scMappR_custom" combine these consistently processed scRNAseq count data with gene-set enrichment tools to allow for cell-type marker enrichment of a generic gene list (e.g. GWAS hits). Reference: Sokolowski,D.J., Faykoo-Martinez,M., Erdman,L., Hou,H., Chan,C., Zhu,H., Holmes,M.M., Goldenberg,A. and Wilson,M.D. (2021) Single-cell mapper (scMappR): using scRNA-seq to infer cell-type specificities of differentially expressed genes. NAR Genomics and Bioinformatics. 3(1). Iqab011. <doi:10.1093/nargab/lqab011>.
Prepares data for statistical analysis (e.g., analysis of variance ;ANOVA) by enabling the user to easily and quickly merge (using the file_merge() function) raw data files into one merged table and then aggregate the merged table (using the prep() function) into a finalized table while keeping track and summarizing every step of the preparation. The finalized table contains several possibilities for dependent measures of the dependent variable. Most suitable when measuring variables in an interval or ratio scale (e.g., reaction-times) and/or discrete values such as accuracy. Main functions included are file_merge() and prep(). The file_merge() function vertically merges individual data files (in a long format) in which each line is a single observation to one single dataset. The prep() function aggregates the single dataset according to any combination of grouping variables (i.e., between-subjects and within-subjects independent variables, respectively), and returns a data frame with a number of dependent measures for further analysis for each cell according to the combination of provided grouping variables. Dependent measures for each cell include among others means before and after rejecting all values according to a flexible standard deviation criteria, number of rejected values according to the flexible standard deviation criteria, proportions of rejected values according to the flexible standard deviation criteria, number of values before rejection, means after rejecting values according to procedures described in Van Selst & Jolicoeur (1994; suitable when measuring reaction-times), standard deviations, medians, means according to any percentile (e.g., 0.05, 0.25, 0.75, 0.95) and harmonic means. The data frame prep() returns can also be exported as a txt file to be used for statistical analysis in other statistical programs.
Calculate multiple biotic indices using diatoms from environmental samples. Diatom species are recognized by their species name using a heuristic search, and their ecological data is retrieved from multiple sources. It includes number/shape of chloroplasts diversity indices, size classes, ecological guilds, and multiple biotic indices. It outputs both a dataframe with all the results and plots of all the obtained data in a defined output folder. - Sample data was taken from Nicolosi Gelis, Cochero & Gómez (2020, <doi:10.1016/j.ecolind.2019.105951>). - The package uses the Diat.Barcode database to calculate morphological and ecological information by Rimet & Couchez (2012, <doi:10.1051/kmae/2012018>),and the combined classification of guilds and size classes established by B-Béres et al. (2017, <doi:10.1016/j.ecolind.2017.07.007>). - Current diatom-based biotic indices include the DES index by Descy (1979) - EPID index by Dell'Uomo (1996, ISBN: 3950009002) - IDAP index by Prygiel & Coste (1993, <doi:10.1007/BF00028033>) - ID-CH index by Hürlimann & Niederhauser (2007) - IDP index by Gómez & Licursi (2001, <doi:10.1023/A:1011415209445>) - ILM index by Leclercq & Maquet (1987) - IPS index by Coste (1982) - LOBO index by Lobo, Callegaro, & Bender (2002, ISBN:9788585869908) - SLA by SládeÄ ek (1986, <doi:10.1002/aheh.19860140519>) - TDI index by Kelly, & Whitton (1995, <doi:10.1007/BF00003802>) - SPEAR(herbicide) index by Wood, Mitrovic, Lim, Warne, Dunlop, & Kefford (2019, <doi:10.1016/j.ecolind.2018.12.035>) - PBIDW index by Castro-Roa & Pinilla-Agudelo (2014) - DISP index by Stenger-Kovács et al. (2018, <doi:10.1016/j.ecolind.2018.07.026>) - EDI index by Chamorro et al. (2024, <doi:10.1021/acsestwater.4c00126>) - DDI index by à lvarez-Blanco et al. (2013, <doi: 10.1007/s10661-012-2607-z>) - PDISE index by Kahlert et al. (2023, <doi:10.1007/s10661-023-11378-4>).
Analysis of task-related functional magnetic resonance imaging (fMRI) activity at the level of individual participants is commonly based on general linear modelling (GLM) that allows us to estimate to what extent the blood oxygenation level dependent (BOLD) signal can be explained by task response predictors specified in the GLM model. The predictors are constructed by convolving the hypothesised timecourse of neural activity with an assumed hemodynamic response function (HRF). To get valid and precise estimates of task response, it is important to construct a model of neural activity that best matches actual neuronal activity. The construction of models is most often driven by predefined assumptions on the components of brain activity and their duration based on the task design and specific aims of the study. However, our assumptions about the onset and duration of component processes might be wrong and can also differ across brain regions. This can result in inappropriate or suboptimal models, bad fitting of the model to the actual data and invalid estimations of brain activity. Here we present an approach in which theoretically driven models of task response are used to define constraints based on which the final model is derived computationally using the actual data. Specifically, we developed autohrf â a package for the R programming language that allows for data-driven estimation of HRF models. The package uses genetic algorithms to efficiently search for models that fit the underlying data well. The package uses automated parameter search to find the onset and duration of task predictors which result in the highest fitness of the resulting GLM based on the fMRI signal under predefined restrictions. We evaluate the usefulness of the autohrf package on publicly available datasets of task-related fMRI activity. Our results suggest that by using autohrf users can find better task related brain activity models in a quick and efficient manner.
Simulation methods for phylogenetic trees where (i) all tips are sampled at one time point or (ii) tips are sampled sequentially through time. (i) For sampling at one time point, simulations are performed under a constant rate birth-death process, conditioned on having a fixed number of final tips (sim.bd.taxa()), or a fixed age (sim.bd.age()), or a fixed age and number of tips (sim.bd.taxa.age()). When conditioning on the number of final tips, the method allows for shifts in rates and mass extinction events during the birth-death process (sim.rateshift.taxa()). The function sim.bd.age() (and sim.rateshift.taxa() without extinction) allow the speciation rate to change in a density-dependent way. The LTT plots of the simulations can be displayed using LTT.plot(), LTT.plot.gen() and LTT.average.root(). TreeSim further samples trees with n final tips from a set of trees generated by the common sampling algorithm stopping when a fixed number m>>n of tips is first reached (sim.gsa.taxa()). This latter method is appropriate for m-tip trees generated under a big class of models (details in the sim.gsa.taxa() man page). For incomplete phylogeny, the missing speciation events can be added through simulations (corsim()). (ii) sim.rateshifts.taxa() is generalized to sim.bdsky.stt() for serially sampled trees, where the trees are conditioned on either the number of sampled tips or the age. Furthermore, for a multitype-branching process with sequential sampling, trees on a fixed number of tips can be simulated using sim.bdtypes.stt.taxa(). This function further allows to simulate under epidemiological models with an exposed class. The function sim.genespeciestree() simulates coalescent gene trees within birth-death species trees, and sim.genetree() simulates coalescent gene trees.