NuPoP is an R package for Nucleosome Positioning Prediction.This package is built upon a duration hidden Markov model proposed in Xi et al, 2010; Wang et al, 2008. The core of the package was written in Fotran. In addition to the R package, a stand-alone Fortran software tool is also available at https://github.com/jipingw. The Fortran codes have complete functonality as the R package. Note: NuPoP has two separate functions for prediction of nucleosome positioning, one for MNase-map trained models and the other for chemical map-trained models. The latter was implemented for four species including yeast, S.pombe, mouse and human, trained based on our recent publications. We noticed there is another package nuCpos by another group for prediction of nucleosome positioning trained with chemicals. A report to compare recent versions of NuPoP with nuCpos can be found at https://github.com/jiping/NuPoP_doc. Some more information can be found and will be posted at https://github.com/jipingw/NuPoP.
The Satellite Application Facility on Climate Monitoring (CM SAF) is a ground segment of the European Organization for the Exploitation of Meteorological Satellites (EUMETSAT) and one of EUMETSATs Satellite Application Facilities. The CM SAF contributes to the sustainable monitoring of the climate system by providing essential climate variables related to the energy and water cycle of the atmosphere (<https://www.cmsaf.eu>). It is a joint cooperation of eight National Meteorological and Hydrological Services. The cmsaf R-package includes a shiny based interface for an easy application of the cmsafops and cmsafvis packages - the CM SAF R Toolbox. The Toolbox offers an easy way to prepare, manipulate, analyse and visualize CM SAF NetCDF formatted data. Other CF conform NetCDF data with time, longitude and latitude dimension should be applicable, but there is no guarantee for an error-free application. CM SAF climate data records are provided for free via (<https://wui.cmsaf.eu/safira>). Detailed information and test data are provided on the CM SAF webpage (<http://www.cmsaf.eu/R_toolbox>).
This package provides a glycolipid mass spectrometry technology has the potential to accurately identify individual bacterial species from polymicrobial samples. To develop bacterial identification algorithms (e.g. machine learning) using this glycolipid technology, it is necessary to generate a large number of various in-silico polymicrobial mass spectra that are similar to real mass spectra. MGMS2 (Membrane Glycolipid Mass Spectrum Simulator) generates such in-silico mass spectra, considering errors in m/z (mass-to-charge ratio) and variances of intensity values, occasions of missing signature ions, and noise peaks. It estimates summary statistics of monomicrobial mass spectra for each strain or species and simulates polymicrobial glycolipid mass spectra using the summary statistics of monomicrobial mass spectra. References: Ryu, S.Y., Wendt, G.A., Chandler, C.E., Ernst, R.K. and Goodlett, D.R. (2019) <doi:10.1021/acs.analchem.9b03340> "Model-based Spectral Library Approach for Bacterial Identification via Membrane Glycolipids." Gibb, S. and Strimmer, K. (2012) <doi:10.1093/bioinformatics/bts447> "MALDIquant: a versatile R package for the analysis of mass spectrometry data.".
Transfers/imputes statistics among Spanish spatial polygons (census sections or postal code areas) from different moments in time (2001-2023) without need of spatial files, just linking statistics to the ID codes of the spatial units. The data available in the census sections of a partition/division (cartography) into force in a moment of time is transferred to the census sections of another partition/division employing the geometric approach (also known as areal weighting or polygon overlay). References: Goerlich (2022) <doi:10.12842/WPIVIE_0322>. Pavà a and Cantarino (2017a, b) <doi:10.1111/gean.12112>, <doi:10.1016/j.apgeog.2017.06.021>. Pérez and Pavà a (2024a, b) <doi:10.4995/CARMA2024.2024.17796>, <doi:10.38191/iirr-jorr.24.057>. Acknowledgements: The authors wish to thank Consellerà a de Educación, Cultura, Universidades y Empleo, Generalitat Valenciana (grant CIACIO/2023/031), Consellerà a de Educación, Universidades y Empleo, Generalitat Valenciana (grant AICO/2021/257), Ministerio de Economà a e Innovación (grant PID2021-128228NB-I00) and Fundación Mapfre for supporting this research.
The latest guidelines proposed by International Expert Consensus are used for the clinical diagnosis of Metabolic Associated Fatty Liver Disease (MAFLD). The new definition takes hepatic steatosis (determined by elastography or histology or biomarker-based fatty liver index) as a major criterion. In addition, race, gender, body mass index (BMI), waist circumference (WC), fasting plasma glucose (FPG), systolic blood pressure (SBP), diastolic blood pressure (DBP), triglycerides (TG), high-density lipoprotein cholesterol (HDLC), homeostatic model assessment of insulin resistance (HOMAIR), high sensitive c-reactive protein (HsCRP) for the diagnosis of MAFLD. Each parameter has to be interpreted based on the proposed cut-offs, making the diagnosis slightly complex and error-prone. This package is developed by incorporating the latest international expert consensus guidelines, and it will aid in the easy and quick diagnosis of MAFLD based on FibroScan in busy healthcare settings and also for research purposes. The new definition for MAFLD as per the International Consensus Statement is described by Eslam M et al (2020). <doi:10.1016/j.jhep.2020.03.039>.
Una herramienta rápida y consistente para la disposición de microdatos y la visualización de las cifras y estadà sticas oficiales de la Universidad Nacional de Colombia <https://unal.edu.co>. Contiene una biblioteca de funciones gráficas, tanto estáticas como interactivas, que ofrece numerosos tipos de gráficos con una sintaxis altamente configurable y simple. Entre estos encontramos la visualización de tablas HTML, series, gráficos de barras y circulares, mapas, etc. Todo lo anterior apoyado en bibliotecas de JavaScript. English: A fast and consistent tool for the arrangement of microdata and the visualization of official figures and statistics from the National University of Colombia <https://unal.edu.co>. It includes a library of graphical functions, both static and interactive, offering numerous types of charts with a highly configurable and simple syntax. Among these, we find the visualization of HTML tables, series, bar and pie charts, maps, etc. It provides the capability to transition from the interactive to the dynamic world and from one library to another without changing function or syntax.
Covariate measurement error correction is implemented by means of regression calibration by Carroll RJ, Ruppert D, Stefanski LA & Crainiceanu CM (2006, ISBN:1584886331), efficient regression calibration by Spiegelman D, Carroll RJ & Kipnis V (2001) <doi:10.1002/1097-0258(20010115)20:1%3C139::AID-SIM644%3E3.0.CO;2-K> and maximum likelihood estimation by Bartlett JW, Stavola DBL & Frost C (2009) <doi:10.1002/sim.3713>. Outcome measurement error correction is implemented by means of the method of moments by Buonaccorsi JP (2010, ISBN:1420066560) and efficient method of moments by Keogh RH, Carroll RJ, Tooze JA, Kirkpatrick SI & Freedman LS (2014) <doi:10.1002/sim.7011>. Standard error estimation of the corrected estimators is implemented by means of the Delta method by Rosner B, Spiegelman D & Willett WC (1990) <doi:10.1093/oxfordjournals.aje.a115715> and Rosner B, Spiegelman D & Willett WC (1992) <doi:10.1093/oxfordjournals.aje.a116453>, the Fieller method described by Buonaccorsi JP (2010, ISBN:1420066560), and the Bootstrap by Carroll RJ, Ruppert D, Stefanski LA & Crainiceanu CM (2006, ISBN:1584886331).
Fitting multivariate covariance generalized linear models (McGLMs) to data. McGLM is a general framework for non-normal multivariate data analysis, designed to handle multivariate response variables, along with a wide range of temporal and spatial correlation structures defined in terms of a covariance link function combined with a matrix linear predictor involving known matrices. The models take non-normality into account in the conventional way by means of a variance function, and the mean structure is modelled by means of a link function and a linear predictor. The models are fitted using an efficient Newton scoring algorithm based on quasi-likelihood and Pearson estimating functions, using only second-moment assumptions. This provides a unified approach to a wide variety of different types of response variables and covariance structures, including multivariate extensions of repeated measures, time series, longitudinal, spatial and spatio-temporal structures. The package offers a user-friendly interface for fitting McGLMs similar to the glm() R function. See Bonat (2018) <doi:10.18637/jss.v084.i04>, for more information and examples.
This package implements techniques to estimate the unknown quantities related to two-component admixture models, where the two components can belong to any distribution (note that in the case of multinomial mixtures, the two components must belong to the same family). Estimation methods depend on the assumptions made on the unknown component density; see Bordes and Vandekerkhove (2010) <doi:10.3103/S1066530710010023>, Patra and Sen (2016) <doi:10.1111/rssb.12148>, and Milhaud, Pommeret, Salhi, Vandekerkhove (2024) <doi:10.3150/23-BEJ1593>. In practice, one can estimate both the mixture weight and the unknown component density in a wide variety of frameworks. On top of that, hypothesis tests can be performed in one and two-sample contexts to test the unknown component density (see Milhaud, Pommeret, Salhi and Vandekerkhove (2022) <doi:10.1016/j.jspi.2021.05.010>, and Milhaud, Pommeret, Salhi, Vandekerkhove (2024) <doi:10.3150/23-BEJ1593>). Finally, clustering of unknown mixture components is also feasible in a K-sample setting (see Milhaud, Pommeret, Salhi, Vandekerkhove (2024) <https://jmlr.org/papers/v25/23-0914.html>).
This package performs the analysis of completely randomized experimental designs (CRD), randomized blocks (RBD) and Latin square (LSD), experiments in double and triple factorial scheme (in CRD and RBD), experiments in subdivided plot scheme (in CRD and RBD), subdivided and joint analysis of experiments in CRD and RBD, linear regression analysis, test for two samples. The package performs analysis of variance, ANOVA assumptions and multiple comparison test of means or regression, according to Pimentel-Gomes (2009, ISBN: 978-85-7133-055-9), nonparametric test (Conover, 1999, ISBN: 0471160687), test for two samples, joint analysis of experiments according to Ferreira (2018, ISBN: 978-85-7269-566-4) and generalized linear model (glm) for binomial and Poisson family in CRD and RBD (Carvalho, FJ (2019), <doi:10.14393/ufu.te.2019.1244>). It can also be used to obtain descriptive measures and graphics, in addition to correlations and creative graphics used in agricultural sciences (Agronomy, Zootechnics, Food Science and related areas). Shimizu, G. D., Marubayashi, R. Y. P., Goncalves, L. S. A. (2025) <doi:10.4025/actasciagron.v47i1.73889>.
Original idea was presented in the reference paper. Varghese et al. (2020, 74(1):35-42) "Bayesian State-space Implementation of Schaefer Production Model for Assessment of Stock Status for Multi-gear Fishery". Marine fisheries governance and management practices are very essential to ensure the sustainability of the marine resources. A widely accepted resource management strategy towards this is to derive sustainable fish harvest levels based on the status of marine fish stock. Various fish stock assessment models that describe the biomass dynamics using time series data on fish catch and fishing effort are generally used for this purpose. In the scenario of complex multi-species marine fishery in which different species are caught by a number of fishing gears and each gear harvests a number of species make it difficult to obtain the fishing effort corresponding to each fish species. Since the capacity of the gears varies, the effort made to catch a resource cannot be considered as the sum of efforts expended by different fishing gears. This necessitates standardisation of fishing effort in unit base.
An efficient unified nonconvex penalized estimation algorithm for Gaussian (linear), binomial Logit (logistic), Poisson, multinomial Logit, and Cox proportional hazard regression models. The unified algorithm is implemented based on the convex concave procedure and the algorithm can be applied to most of the existing nonconvex penalties. The algorithm also supports convex penalty: least absolute shrinkage and selection operator (LASSO). Supported nonconvex penalties include smoothly clipped absolute deviation (SCAD), minimax concave penalty (MCP), truncated LASSO penalty (TLP), clipped LASSO (CLASSO), sparse ridge (SRIDGE), modified bridge (MBRIDGE) and modified log (MLOG). For high-dimensional data (data set with many variables), the algorithm selects relevant variables producing a parsimonious regression model. Kim, D., Lee, S. and Kwon, S. (2018) <arXiv:1811.05061>, Lee, S., Kwon, S. and Kim, Y. (2016) <doi:10.1016/j.csda.2015.08.019>, Kwon, S., Lee, S. and Kim, Y. (2015) <doi:10.1016/j.csda.2015.07.001>. (This research is funded by Julian Virtue Professorship from Center for Applied Research at Pepperdine Graziadio Business School and the National Research Foundation of Korea.).
An implementation of a single-index regression for optimizing individualized dose rules from an observational study. To model interaction effects between baseline covariates and a treatment variable defined on a continuum, we employ two-dimensional penalized spline regression on an index-treatment domain, where the index is defined as a linear combination of the covariates (a single-index). An unspecified main effect for the covariates is allowed, which can also be modeled through a parametric model. A unique contribution of this work is in the parsimonious single-index parametrization specifically defined for the interaction effect term. We refer to Park, Petkova, Tarpey, and Ogden (2020) <doi:10.1111/biom.13320> (for the case of a discrete treatment) and Park, Petkova, Tarpey, and Ogden (2021) "A single-index model with a surface-link for optimizing individualized dose rules" <arXiv:2006.00267v2> for detail of the method. The model can take a member of the exponential family as a response variable and can also take an ordinal categorical response. The main function of this package is simsl().
Constructs genotype x environment interaction (GxE) models where G is a weighted sum of genetic variants (genetic score) and E is a weighted sum of environments (environmental score) using the alternating optimization algorithm by Jolicoeur-Martineau et al. (2017) <arXiv:1703.08111>. This approach has greatly enhanced predictive power over traditional GxE models which include only a single genetic variant and a single environmental exposure. Although this approach was originally made for GxE modelling, it is flexible and does not require the use of genetic and environmental variables. It can also handle more than 2 latent variables (rather than just G and E) and 3-way interactions or more. The LEGIT model produces highly interpretable results and is very parameter-efficient thus it can even be used with small sample sizes (n < 250). Tools to determine the type of interaction (vantage sensitivity, diathesis-stress or differential susceptibility), with any number of genetic variants or environments, are available <arXiv:1712.04058>. The software can now produce mixed-effects LEGIT models through the lme4 package.
Calculate unified measures that quantify the effect of a covariate on a binary dependent variable (e.g., for meta-analyses). This can be particularly important if the estimation results are obtained with different models/estimators (e.g., linear probability model, logit, probit, ...) and/or with different transformations of the explanatory variable of interest (e.g., linear, quadratic, interval-coded, ...). The calculated unified measures are: (a) semi-elasticities of linear, quadratic, or interval-coded covariates and (b) effects of linear, quadratic, interval-coded, or categorical covariates when a linear or quadratic covariate changes between distinct intervals, the reference category of a categorical variable or the reference interval of an interval-coded variable needs to be changed, or some categories of a categorical covariate or some intervals of an interval-coded covariate need to be grouped together. Approximate standard errors of the unified measures are also calculated. All methods that are implemented in this package are described in the vignette "Extracting and Unifying Semi-Elasticities and Effect Sizes from Studies with Binary Dependent Variables" that is included in this package.
This package provides a major challenge in estimating treatment decision rules from a randomized clinical trial dataset with covariates measured at baseline lies in detecting relatively small treatment effect modification-related variability (i.e., the treatment-by-covariates interaction effects on treatment outcomes) against a relatively large non-treatment-related variability (i.e., the main effects of covariates on treatment outcomes). The class of Single-Index Models with Multiple-Links is a novel single-index model specifically designed to estimate a single-index (a linear combination) of the covariates associated with the treatment effect modification-related variability, while allowing a nonlinear association with the treatment outcomes via flexible link functions. The models provide a flexible regression approach to developing treatment decision rules based on patients data measured at baseline. We refer to Park, Petkova, Tarpey, and Ogden (2020) <doi:10.1016/j.jspi.2019.05.008> and Park, Petkova, Tarpey, and Ogden (2020) <doi:10.1111/biom.13320> (that allows an unspecified X main effect) for detail of the method. The main function of this package is simml().
In some situations where researchers would like to demonstrate causal effects, it is hard to obtain a sample size that would allow for a well-powered randomized controlled trial. Single case designs are experimental designs that can be used to demonstrate causal effects with only one participant or with only a few participants. The scdtb package provides a suite of tools for analyzing data from studies that use single case designs. The nap() function can be used to compute the nonoverlap of all pairs as outlined by the What Works Clearinghouse (2022) <https://ies.ed.gov/ncee/wwc/Handbooks>. The package also offers the mixed_model_analysis() and cross_lagged() functions which implement mixed effects models and cross lagged analyses as described in Maric & van der Werff (2020) <doi:10.4324/9780429273872-9>. The randomization_test() function implements randomization tests based on methods presented in Onghena (2020) <doi:10.4324/9780429273872-8>. The scdtb() shiny application can be used to upload single case design data and access various scdtb tools for plotting and analysis.
Third order response surface designs (M. Hemavathi, Shashi Shekhar, Eldho Varghese, Seema Jaggi, Bikas Sinha & Nripes Kumar Mandal (2022) <DOI:10.1080/03610926.2021.1944213>."Theoretical developments in response surface designs: an informative review and further thoughts") are classified into two types viz., designs which are suitable for sequential experimentation and designs for non-sequential experimentation (M. Hemavathi, Eldho Varghese, Shashi Shekhar & Seema Jaggi (2022)<DOI:10.1080/02664763.2020.1864817>." Sequential asymmetric third order rotatable designs (SATORDs)"). The sequential experimentation approach involves conducting the trials step by step whereas, in the non-sequential experimentation approach, the entire runs are executed in one go.This package contains functions named STORDs() and NSTORDs() for generating sequential/non-sequential TORDs given in Das, M. N., and V. L. Narasimham (1962). <DOI:10.1214/aoms/1177704374>. "Construction of rotatable designs through balanced incomplete block designs" along with the randomized layout. It also contains another function named Pred3.var() for generating the variance of predicted response as well as the moment matrix based on a third order response surface model.
Allows users to create and deploy the workflow with multiple functions in Function-as-a-Service (FaaS) cloud computing platforms. The FaaSr package makes it simpler for R developers to use FaaS platforms by providing the following functionality: 1) Parsing and validating a JSON-based payload compliant to FaaSr schema supporting multiple FaaS platforms 2) Invoking user functions written in R in a Docker container (derived from rocker), using a list generated from the parser as argument 3) Downloading/uploading of files from/to S3 buckets using simple primitives 4) Logging to files in S3 buckets 5) Triggering downstream actions supporting multiple FaaS platforms 6) Generating FaaS-specific API calls to simplify the registering of a user's workflow with a FaaS platform Supported FaaS platforms: Apache OpenWhisk <https://openwhisk.apache.org/> GitHub Actions <https://github.com/features/actions> Amazon Web Services (AWS) Lambda <https://aws.amazon.com/lambda/> Supported cloud data storage for persistent storage: Amazon Web Services (AWS) Simple Storage Service (S3) <https://aws.amazon.com/s3/>.
Conduct multi-locus genome-wide association study under the framework of multi-locus random-SNP-effect mixed linear model (mrMLM). First, each marker on the genome is scanned. Bonferroni correction is replaced by a less stringent selection criterion for significant test. Then, all the markers that are potentially associated with the trait are included in a multi-locus genetic model, their effects are estimated by empirical Bayes, and all the nonzero effects were further identified by likelihood ratio test for significant QTL. The program may run on a desktop or laptop computers. If marker genotypes in association mapping population are almost homozygous, these methods in this software are very effective. If there are many heterozygous marker genotypes, the IIIVmrMLM software is recommended. Wen YJ, Zhang H, Ni YL, Huang B, Zhang J, Feng JY, Wang SB, Dunwell JM, Zhang YM, Wu R (2018, <doi:10.1093/bib/bbw145>), and Li M, Zhang YW, Zhang ZC, Xiang Y, Liu MH, Zhou YH, Zuo JF, Zhang HQ, Chen Y, Zhang YM (2022, <doi:10.1016/j.molp.2022.02.012>).
This package provides a scalable and fast method for estimating joint Species Distribution Models (jSDMs) for big community data, including eDNA data. The package estimates a full (i.e. non-latent) jSDM with different response distributions (including the traditional multivariate probit model). The package allows to perform variation partitioning (VP) / ANOVA on the fitted models to separate the contribution of environmental, spatial, and biotic associations. In addition, the total R-squared can be further partitioned per species and site to reveal the internal metacommunity structure, see Leibold et al., <doi:10.1111/oik.08618>. The internal structure can then be regressed against environmental and spatial distinctiveness, richness, and traits to analyze metacommunity assembly processes. The package includes support for accounting for spatial autocorrelation and the option to fit responses using deep neural networks instead of a standard linear predictor. As described in Pichler & Hartig (2021) <doi:10.1111/2041-210X.13687>, scalability is achieved by using a Monte Carlo approximation of the joint likelihood implemented via PyTorch and reticulate', which can be run on CPUs or GPUs.
Whole genome single-cell DNA sequencing (scDNA-seq) enables characterization of copy number profiles at the cellular level. This circumvents the averaging effects associated with bulk-tissue sequencing and has increased resolution yet decreased ambiguity in deconvolving cancer subclones and elucidating cancer evolutionary history. ScDNA-seq data is, however, sparse, noisy, and highly variable even within a homogeneous cell population, due to the biases and artifacts that are introduced during the library preparation and sequencing procedure. Here, we propose SCOPE, a normalization and copy number estimation method for scDNA-seq data. The distinguishing features of SCOPE include: (i) utilization of cell-specific Gini coefficients for quality controls and for identification of normal/diploid cells, which are further used as negative control samples in a Poisson latent factor model for normalization; (ii) modeling of GC content bias using an expectation-maximization algorithm embedded in the Poisson generalized linear models, which accounts for the different copy number states along the genome; (iii) a cross-sample iterative segmentation procedure to identify breakpoints that are shared across cells from the same genetic background.
The Bayesian Markov renewal mixed models take sequentially observed categorical data with continuous duration times, being either state duration or inter-state duration. These models comprehensively analyze the stochastic dynamics of both state transitions and duration times under the influence of multiple exogenous factors and random individual effect. The default setting flexibly models the transition probabilities using Dirichlet mixtures and the duration times using gamma mixtures. It also provides the flexibility of modeling the categorical sequences using Bayesian Markov mixed models alone, either ignoring the duration times altogether or dividing duration time into multiples of an additional category in the sequence by a user-specific unit. The package allows extensive inference of the state transition probabilities and the duration times as well as relevant plots and graphs. It also includes a synthetic data set to demonstrate the desired format of input data set and the utility of various functions. Methods for Bayesian Markov renewal mixed models are as described in: Abhra Sarkar et al., (2018) <doi:10.1080/01621459.2018.1423986> and Yutong Wu et al., (2022) <doi:10.1093/biostatistics/kxac050>.
We solve non linear least squares problems with optional equality and/or inequality constraints. Non linear iterations are globalized with back-tracking method. Linear problems are solved by dense QR decomposition from LAPACK which can limit the size of treated problems. On the other side, we avoid condition number degradation which happens in classical quadratic programming approach. Inequality constraints treatment on each non linear iteration is based on NNLS method (by Lawson and Hanson). We provide an original function lsi_ln for solving linear least squares problem with inequality constraints in least norm sens. Thus if Jacobian of the problem is rank deficient a solution still can be provided. However, truncation errors are probable in this case. Equality constraints are treated by using a basis of Null-space. User defined function calculating residuals must return a list having residual vector (not their squared sum) and Jacobian. If Jacobian is not in the returned list, package numDeriv is used to calculated finite difference version of Jacobian. The NLSIC method was fist published in Sokol et al. (2012) <doi:10.1093/bioinformatics/btr716>.