Informática

URI Permanente desta comunidade

http://repositorio.ufes.br/handle/10/17625

Programa de Pós-Graduação em Informática

Centro: CT
Telefone: (27) 4009 2324 R*5126
URL do programa: http://www.informatica.ufes.br/pt-br/pos-graduacao/PPGI

Navegar

Agora exibindo 1 - 2 de 2

Analysis of bias in GPT language models through fine-tuning with anti-vaccination speech
(Universidade Federal do Espírito Santo, 2024-12-02) Turi, Leandro Furlam; Badue, Claudine; Souza, Alberto Ferreira de; https://orcid.org/0000-0003-1561-8447; Pacheco, Andre Georghton Cardoso; Almeida Junior, Jurandy Gomes de
We examined the effects of integrating data containing divergent information, particularly concerning anti-vaccination narratives, in training a GPT-2 language model by fine-tuning it using content from anti-vaccination groups and channels on Telegram. Our objective was to analyze the model’s ability to generate coherent and rationalized texts compared to a model pre-trained on OpenAI’s WebText dataset. The results demonstrate that fine-tuning a GPT-2 model with biased data leads the model to perpetuate these biases in its responses, albeit with a certain degree of rationalization, highlighting the importance of using reliable and high-quality data in the training of natural language processing models and underscoring the implications for information dissemination through these models. We also explored the impact of data poisoning by incorporating anti-vaccination messages combined with general group messages in different proportions, aiming to understand how exposure to biased data can influence text generation and the introduction of harmful biases. The experiments highlight the change in frequency and intensity of anti-vaccination content generated by the model and elucidate the broader implications for reliability and ethics in using language models in sensitive applications. This study provides social scientists with a tool to explore and understand the complexities and challenges associated with misinformation in public health through the use of language models, particularly in the context of vaccine misinformation.
Copycat CNN: convolutional neural network extraction attack with unlabeled natural images
(Universidade Federal do Espírito Santo, 2023-04-25) Silva, Jacson Rodrigues Correia da; Santos, Thiago Oliveira dos; https://orcid.org/0000-0001-7607-635X; http://lattes.cnpq.br/5117339495064254; https://orcid.org/0000-0002-4314-1693; http://lattes.cnpq.br/0637308986252382; Goncalves, Claudine Santos Badue; https://orcid.org/0000-0003-1810-8581; http://lattes.cnpq.br/1359531672303446; Luz, Eduardo Jose da Silva; https://orcid.org/0000-0001-5249-1559; http://lattes.cnpq.br/5385878413487984; Almeida Junior, Jurandy Gomes de; https://orcid.org/0000-0002-4998-6996; http://lattes.cnpq.br/4495269939725770; Rauber, Thomas Walter; https://orcid.org/0000000263806584; http://lattes.cnpq.br/0462549482032704
Convolutional Neural Networks (CNNs) have been achieving state-of-the-art performance on a variety of problems in recent years, leading to many companies developing neuralbased products that require expensive data acquisition, annotation, and model generation. To protect their models from being copied or attacked, companies often deliver them as black-boxes only accessible through APIs, that must be secure, robust, and reliable across different problem domains. However, recent studies have shown that state-of-the-art CNNs have vulnerabilities, where simple perturbations in input images can change the model’s response, and even images unrecognizable to humans can achieve a higher level of confidence in the model’s output. These methods need to access the model parameters, but there are studies showing how to generate a copy (imitation) of a model using its probabilities (soft-labels) and problem domain data. By using the surrogate model, an adversary can perform attacks on the target model with a higher possibility of success. We further explored these vulnerabilities. Our hypothesis is that by using publicly available images (accessible to everyone) and responses that any model should provide (even blackboxes), it is possible to copy a model achieving high performance. Therefore, we proposed a method called Copycat to explore CNN classification models. Our main goal is to copy the model in two stages: first, by querying it with random natural images, such as those from ImageNet, and annotating its maximum probabilities (hard-labels). Then, using these labeled images to train a Copycat model that should achieve similar performance to the target model. We evaluated this hypothesis on seven real-world problems and against a cloud-based API. All Copycat models achieved performance (F1-Score) above 96.4% when compared to target models. After achieving these results, we performed several experiments to consolidate and evaluate our method. Furthermore, concerned about such vulnerability, we also analyzed various existing defenses against the Copycat method. Among the experiments, defenses that detect attack queries do not work against our method, but defenses that use watermarking can identify the target model’s Intellectual Property. Thus, the method proved to be effective in model extraction, having immunity to the literature defenses, but being identified only by watermark defenses.

Navegar

Navegando Informática por Autor "Almeida Junior, Jurandy Gomes de"

Resultados por página

Opções de Ordenação