This package implements the synthetic control group method for comparative case studies as described in Abadie and Gardeazabal (2003) and Abadie, Diamond, and Hainmueller (2010, 2011, 2014). The synthetic control method allows for effect estimation in settings where a single unit (a state, country, firm, etc.) is exposed to an event or intervention. It provides a data-driven procedure to construct synthetic control units based on a weighted combination of comparison units that approximates the characteristics of the unit that is exposed to the intervention. A combination of comparison units often provides a better comparison for the unit exposed to the intervention than any comparison unit alone.
Partial application is the process of reducing the arity of a function by fixing one or more arguments, thus creating a new function lacking the fixed arguments. The curry package provides three different ways of performing partial function application by fixing arguments from either end of the argument list (currying and tail currying) or by fixing multiple named arguments (partial application). This package provides this functionality through the %<%, %-<%, and %><% operators which allows for a programming style comparable to modern functional languages. Compared to other implementations such a purrr::partial() the operators in curry composes functions with named arguments, aiding in autocomplete etc.
This package provides a Python based pipeline for extraction of species occurrence data through the usage of large language models. Includes validation tools designed to handle model hallucinations for a scientific, rigorous use of LLM. Currently supports usage of GPT with more planned, including local and non-proprietary models. For more details on the methodology used please consult the references listed under each function, such as Kent, A. et al. (1995) <doi:10.1002/asi.5090060209>, van Rijsbergen, C.J. (1979, ISBN:978-0408709293, Levenshtein, V.I. (1966) <https://nymity.ch/sybilhunting/pdf/Levenshtein1966a.pdf> and Klaus Krippendorff (2011) <https://repository.upenn.edu/handle/20.500.14332/2089>.
This package provides a comprehensive high-level package, for composite indicator construction and analysis. It is a "development environment" for composite indicators and scoreboards, which includes utilities for construction (indicator selection, denomination, imputation, data treatment, normalisation, weighting and aggregation) and analysis (multivariate analysis, correlation plotting, short cuts for principal component analysis, global sensitivity analysis, and more). A composite indicator is completely encapsulated inside a single hierarchical list called a "coin". This allows a fast and efficient work flow, as well as making quick copies, testing methodological variations and making comparisons. It also includes many plotting options, both statistical (scatter plots, distribution plots) as well as for presenting results.
This package provides functions to compute small area estimates based on a basic area or unit-level model. The model is fit using restricted maximum likelihood, or in a hierarchical Bayesian way. In the latter case numerical integration is used to average over the posterior density for the between-area variance. The output includes the model fit, small area estimates and corresponding mean squared errors, as well as some model selection measures. Additional functions provide means to compute aggregate estimates and mean squared errors, to minimally adjust the small area estimates to benchmarks at a higher aggregation level, and to graphically compare different sets of small area estimates.
This package provides a collection of helper functions and illustrative datasets to support learning and teaching of data science with R. The package is designed as a companion to the book <https://book-data-science-r.netlify.app>, making key data science techniques accessible to individuals with minimal coding experience. Functions include tools for data partitioning, performance evaluation, and data transformations (e.g., z-score and min-max scaling). The included datasets are curated to highlight practical applications in data exploration, modeling, and multivariate analysis. An early inspiration for the package came from an ancient Persian idiom about "eating the liver," symbolizing deep and immersive engagement with knowledge.
Simulates events from one dimensional nonhomogeneous Poisson point processes (NHPPPs) as per Trikalinos and Sereda (2024, <doi:10.48550/arXiv.2402.00358> and 2024, <doi:10.1371/journal.pone.0311311>). Functions are based on three algorithms that provably sample from a target NHPPP: the time-transformation of a homogeneous Poisson process (of intensity one) via the inverse of the integrated intensity function (Cinlar E, "Theory of stochastic processes" (1975, ISBN:0486497996)); the generation of a Poisson number of order statistics from a fixed density function; and the thinning of a majorizing NHPPP via an acceptance-rejection scheme (Lewis PAW, Shedler, GS (1979) <doi:10.1002/nav.3800260304>).
This package implements the generalized semi-supervised elastic-net. This method extends the supervised elastic-net problem, and thus it is a practical solution to the problem of feature selection in semi-supervised contexts. Its mathematical formulation is presented from a general perspective, covering a wide range of models. We focus on linear and logistic responses, but the implementation could be easily extended to other losses in generalized linear models. We develop a flexible and fast implementation, written in C++ using RcppArmadillo and integrated into R via Rcpp modules. See Culp, M. 2013 <doi:10.1080/10618600.2012.657139> for references on the Joint Trained Elastic-Net.
The distributions of the weight of evidence (log Bayes factor) favouring case over noncase status in a test dataset (or test folds generated by cross-validation) can be used to quantify the performance of a diagnostic test (McKeigue (2019), <doi:10.1177/0962280218776989>). The package can be used with any test dataset on which you have observed case-control status and have computed prior and posterior probabilities of case status using a model learned on a training dataset. To quantify how the predictor will behave as a risk stratifier, the quantiles of the distributions of weight of evidence in cases and controls can be calculated and plotted.
R package for analysis of transcript and translation features through manipulation of sequence data and NGS data like Ribo-Seq, RNA-Seq, TCP-Seq and CAGE. It is generalized in the sense that any transcript region can be analysed, as the name hints to it was made with investigation of ribosomal patterns over Open Reading Frames (ORFs) as it's primary use case. ORFik is extremely fast through use of C++, data.table and GenomicRanges. Package allows to reassign starts of the transcripts with the use of CAGE-Seq data, automatic shifting of RiboSeq reads, finding of Open Reading Frames for whole genomes and much more.
This package provides a tool for non linear mapping (non linear regression) using a mixture of regression model and an inverse regression strategy. The methods include the GLLiM model (see Deleforge et al (2015) <DOI:10.1007/s11222-014-9461-5>) based on Gaussian mixtures and a robust version of GLLiM, named SLLiM (see Perthame et al (2016) <DOI:10.1016/j.jmva.2017.09.009>) based on a mixture of Generalized Student distributions. The methods also include BLLiM (see Devijver et al (2017) <arXiv:1701.07899>) which is an extension of GLLiM with a sparse block diagonal structure for large covariance matrices (particularly interesting for transcriptomic data).
An R implementation of Matthew Thomas's Python library inteq'. First, this solves Fredholm integral equations of the first kind ($f(s) = \int_a^b K(s, y) g(y) dy$) using methods described by Twomey (1963) <doi:10.1145/321150.321157>. Second, this solves Volterra integral equations of the first kind ($f(s) = \int_0^s K(s,y) g(t) dt$) using methods from Betto and Thomas (2021) <doi:10.48550/arXiv.2106.08496>. Third, this solves Voltera integral equations of the second kind ($g(s) = f(s) + \int_a^s K(s,y) g(y) dy$) using methods from Linz (1969) <doi:10.1137/0706034>.
This package provides a robust collection of functions tailored for microbial ecology analysis, encompassing both data analysis and visualization. It introduces an encapsulation feature that streamlines the process into a summary object. With the initial configuration of this summary object, users can execute a wide range of analyses with a single line of code, requiring only two essential parameters for setup. The package delivers comprehensive outputs including analysis objects, statistical outcomes, and visualization-ready data, enhancing the efficiency of research workflows. Designed with user-friendliness in mind, it caters to both novices and seasoned researchers, offering an intuitive interface coupled with adaptable customization options to meet diverse analytical needs.
An implementation of Horn's technique for numerically and graphically evaluating the components or factors retained in a principle components analysis (PCA) or common factor analysis (FA). Horn's method contrasts eigenvalues produced through a PCA or FA on a number of random data sets of uncorrelated variables with the same number of variables and observations as the experimental or observational data set to produce eigenvalues for components or factors that are adjusted for the sample error-induced inflation. Components with adjusted eigenvalues greater than one are retained. paran may also be used to conduct parallel analysis following Glorfeld's (1995) suggestions to reduce the likelihood of over-retention.
Offers a systematic way for conditional reporting of figures and tables for many (and bivariate combinations of) variables, typically from survey data. Contains interactive ggiraph'-based (<https://CRAN.R-project.org/package=ggiraph>) plotting functions and data frame-based summary tables (bivariate significance tests, frequencies/proportions, unique open ended responses, etc) with many arguments for customization, and extensions possible. Uses a global options() system for neatly reducing redundant code. Also contains tools for immediate saving of objects and returning a hashed link to the object, useful for creating download links to high resolution images upon rendering in Quarto'. Suitable for highly customized reports, primarily intended for survey research.
Three steps variable selection procedure based on random forests. Initially developed to handle high dimensional data (for which number of variables largely exceeds number of observations), the package is very versatile and can treat most dimensions of data, for regression and supervised classification problems. First step is dedicated to eliminate irrelevant variables from the dataset. Second step aims to select all variables related to the response for interpretation purpose. Third step refines the selection by eliminating redundancy in the set of variables selected by the second step, for prediction purpose. Genuer, R. Poggi, J.-M. and Tuleau-Malot, C. (2015) <https://journal.r-project.org/articles/RJ-2015-018/>.
Analysis of corneal data obtained from a Placido disk corneal topographer with calculation of irregularity indices. This package performs analyses of corneal data obtained from a Placido disk corneal topographer, with the calculation of the Placido irregularity indices and the posterior analysis. The package is intended to be easy to use by a practitioner, providing a simple interface and yielding easily interpretable results. A corneal topographer is an ophthalmic clinical device that obtains measurements in the cornea (the anterior part of the eye). A Placido disk corneal topographer makes use of the Placido disk [Rowsey et al. (1981)]<doi:10.1001/archopht.1981.03930011093022>, which produce a circular pattern of measurement nodes. The raw information measured by such a topographer is used by practitioners to analyze curvatures, to study optical aberrations, or to diagnose specific conditions of the eye (e.g. keratoconus, an important corneal disease). The rPACI package allows the calculation of the corneal irregularity indices described in [Castro-Luna et al. (2020)]<doi:10.1016%2Fj.clae.2019.12.006>, [Ramos-Lopez et al. (2013)]<doi:10.1097%2FOPX.0b013e3182843f2a>, and [Ramos-Lopez et al. (2011)]<doi:10.1097/opx.0b013e3182279ff8>. It provides a simple interface to read corneal topography data files as exported by a typical Placido disk topographer, to compute the irregularity indices mentioned before, and to display summary plots that are easy to interpret for a clinician.
Multivariate tool for analyzing genome-wide association study results in the form of univariate summary statistics. The goal of bmass is to comprehensively test all possible multivariate models given the phenotypes and datasets provided. Multivariate models are determined by assigning each phenotype to being either Unassociated (U), Directly associated (D) or Indirectly associated (I) with the genetic variant of interest. Test results for each model are presented in the form of Bayes factors, thereby allowing direct comparisons between models. The underlying framework implemented here is based on the modeling developed in "A Unified Framework for Association Analysis with Multiple Related Phenotypes", M. Stephens (2013) <doi:10.1371/journal.pone.0065245>.
Biologically Explainable Machine Learning Framework for Phenotype Prediction using omics data described in Chen and Schwarz (2017) <doi:10.48550/arXiv.1712.00336>.Identifying reproducible and interpretable biological patterns from high-dimensional omics data is a critical factor in understanding the risk mechanism of complex disease. As such, explainable machine learning can offer biological insight in addition to personalized risk scoring.In this process, a feature space of biological pathways will be generated, and the feature space can also be subsequently analyzed using WGCNA (Described in Horvath and Zhang (2005) <doi:10.2202/1544-6115.1128> and Langfelder and Horvath (2008) <doi:10.1186/1471-2105-9-559> ) methods.
This software downloads and manages air quality data from the European Environmental Agency (EEA) dataflow (<https://www.eea.europa.eu/data-and-maps/data/aqereporting-9>). See the web page <https://eeadmz1-downloads-webapp.azurewebsites.net/> for details on the EEA's Air Quality Download Service. The package allows dynamically mapping the stations, summarising and time aggregating the measurements and building spatial interpolation maps. See the web page <https://www.eea.europa.eu/en> for further information on EEA activities and history. Further details, as well as, an extended vignette of the main functions included in the package, are available at the GitHub web page dedicated to the project.
Reads EXIF data using ExifTool <https://exiftool.org> and returns results as a data frame. ExifTool is a platform-independent Perl library plus a command-line application for reading, writing and editing meta information in a wide variety of files. ExifTool supports many different metadata formats including EXIF, GPS, IPTC, XMP, JFIF, GeoTIFF, ICC Profile, Photoshop IRB, FlashPix, AFCP and ID3, as well as the maker notes of many digital cameras by Canon, Casio, FLIR, FujiFilm, GE, HP, JVC/Victor, Kodak, Leaf, Minolta/Konica-Minolta, Motorola, Nikon, Nintendo, Olympus/Epson, Panasonic/Leica, Pentax/Asahi, Phase One, Reconyx, Ricoh, Samsung, Sanyo, Sigma/Foveon and Sony.
Tabacchi et al. (2011) published a very detailed study producing a uniform system of functions to estimate tree volume and phytomass components (stem, branches, stool). The estimates of the 2005 Italian forest inventory (<https://www.inventarioforestale.org/it/>) are based on these functions. The study documents the domain of applicability of each function and the equations to quantify estimates accuracies for individual estimates as well as for aggregated estimates. This package makes the functions available in the R environment. Version 2 exposes two distinct functions for individual and summary estimates. To facilitate access to the functions, tree species identification is now based on EPPO species codes (<https://data.eppo.int/>).
Generate the optimal maximin distance, minimax distance (only for low dimensions), and maximum projection designs within the class of Latin hypercube designs efficiently for computer experiments. Generate Pareto front optimal designs for each two of the three criteria and all the three criteria within the class of Latin hypercube designs efficiently. Provide criterion computing functions. References of this package can be found in Morris, M. D. and Mitchell, T. J. (1995) <doi:10.1016/0378-3758(94)00035-T>, Lu Lu and Christine M. Anderson-CookTimothy J. Robinson (2011) <doi:10.1198/Tech.2011.10087>, Joseph, V. R., Gul, E., and Ba, S. (2015) <doi:10.1093/biomet/asv002>.
Data integration Web application for biobanks by OBiBa'. Opal is the core database application for biobanks. Participant data, once collected from any data source, must be integrated and stored in a central data repository under a uniform model. Opal is such a central repository. It can import, process, validate, query, analyze, report, and export data. Opal is typically used in a research center to analyze the data acquired at assessment centres. Its ultimate purpose is to achieve seamless data-sharing among biobanks. This Opal client allows to interact with Opal web services and to perform operations on the R server side. DataSHIELD administration tools are also provided.