Classificação espectral de PAHs via deep learning informado pela física
Nenhuma Miniatura disponível
Data
2026-03-19
Autores
Silva, Geovani Victor Soares da
Título da Revista
ISSN da Revista
Título de Volume
Editor
Universidade Federal do Espírito Santo
Resumo
The identification of Polycyclic Aromatic Hydrocarbons (PAHs) in astrophysical environ ments relies on comparing theoretical spectra computed by Density Functional Theory (DFT) with experimental infrared observations. However, the harmonic approximation employed in DFT calculations neglects anharmonic effects, such as non-uniform frequency shifts, Fermi resonances, and combination bands, producing a domain shift that com promises the generalization of deep learning models trained exclusively on theoretical data. This thesis proposes and validates a Spectroscopy-Guided Data Augmentation (SGDA) strategy, based on the stochastic simulation of physical artifacts, to overcome this limitation without requiring costly anharmonic calculations. The methodology rests on three pillars: (i) optimization of the spectral resolution, set at a Full Width at Half Maximum (FWHM) of 6.0 cm−1, which maximizes the geometric separability among chem ical classes; (ii) a physics-informed transformation pipeline, including elastic distortion of the frequency axis, insertion of synthetic peaks, and band masking, algorithmically formalized to ensure reproducibility; and (iii) a One-Dimensional Convolutional Neural Network (1D-CNN) based on the Inception architecture, adapted for multiscale spectral feature extraction. The model was trained on 10,775 theoretical spectra from the NASA Ames PAH IR Spectroscopic Database (PAHdb) and evaluated on 84 matrix-isolation experimental spectra. The Physics-Informed strategy achieved a weighted F1-Score of 0.826 on the experimental test set, significantly outperforming both the harmonic baseline (0.567) and the linear Bjerrum augmentation approach (0.558). Interpretability analyses via Grad-CAM demonstrated that the network bases its decisions on chemically coherent spectral regions: the shoulders of the C–H stretching band (∼ 3050 cm−1) for neutral PAHs, the skeletal deformation modes (1100–1600 cm−1) for PANHs, and signal suppression at high frequencies (> 1700 cm−1) for ionic species. The t-SNE analysis confirmed that the model reduces the distance between the theoretical and experimental domains in latent space, with reductions of up to 36% in centroid distance for the PAH Cation class. The results validate the hypothesis that incorporating physical knowledge into neural network training constitutes an effective domain adaptation strategy for computational spectroscopy, opening perspectives for the automated analysis of observational data from the James Webb Space Telescope (JWST)
Descrição
Palavras-chave
Hidrocarbonetos policíclicos aromáticos , Aprendizado profundo , Espectroscopia infravermelho , Aumento de dados informado pela física , Redes neurais convolucionais , Deslocamento de domínio , Astroquímica , Polycyclic aromatic hydrocarbons , Deep learning , Infrared spectroscopy , Physics-informed data augmentation , Convolutional neural networks , Domain shift , Astro chemistry