Métodos de aprendizagem de máquina em química analítica: Floresta Randômica aplicada na avaliação de petróleo
Nenhuma Miniatura disponível
Data
2019-11-29
Autores
Lovatti, Betina Pires Oliveira
Título da Revista
ISSN da Revista
Título de Volume
Editor
Universidade Federal do Espírito Santo
Resumo
Technological development has driven chemical laboratories with instruments capable of extracting more information from samples. This has especially affected the area of Analytical Chemistry. The use of multivariate statistical methods, which is part of a growing area of Analytical Chemistry, called Chemometrics, helps to explore the full potential of these new instruments. At the forefront of Chemometrics is the new machine learning method: Random Forest (RF). This method has its applications aimed at modeling complex matrices such as petroleum. The complexity of oil is due to the wide variation in the composition of its constituents, which gives it distinct physicochemical properties. These compositional variations can be observed by spectroscopic techniques such as Mid Infrared (MIR) spectroscopy, Hydrogen Nuclear Magnetic Resonance (1H NMR) and Carbon (13C NMR) that have the potential to extract information at the molecular level of petroleum. Through the application of chemometric methods, this chemical information can be related to the physicochemical properties of petroleum. Thus, the present work aims to classify petroleum samples using spectroscopic techniques associated with machine learning methods, as well as, to explore the potentiality of the RF when combined with variable selection methods, and to identify in this algorithm variables that most contribute for the classification of oil samples. The results showed that RF was able to discriminate petroleum samples according to the Maximum Pour Point (PFM) from 1H and 13C NMR data. Besides, was possible to identify the variables that most contributed to the modeling, in which a balance between aromatic and saturated compounds was observed. In a second application, RF was efficient in discriminating 1H and 13C NMR spectra in relation to the total acidy number (TAN) of oil, especially when associated with the Principal Component Analysis (PCA) and Fisher's Discriminant (FD). The identification of the most important variables for discrimination showed a subtly greater contribution from the aromatic region. In the third application, the pattern recognition methods: PCA and k-Nearest Neighbors were efficient to identify oil profiles from MIR data. This process provides information on the chemical similarity of oils without the need for complete oil characterization
Descrição
Palavras-chave
Aprendizagem de máquina , Floresta Randômica , Petróleo , Redução , Machine learning method , Random Forest , Crude Oil , Reduction of variables , NMR