One paper has been accepted by IEEE TNNLS.

One paper entitled “Multiobjective Simulated Annealing-Based Stopwords Substitution for Rubbish Text Attack” has been accepted by IEEE Transactions on Neural Networks and Learning Systems (TNNLS).

 

Title: Multiobjective Simulated Annealing-Based Stopwords Substitution for Rubbish Text Attack

Authors: Chen Li, Xinghao Yang, Ao Wang, Yongshun Gong, Baodi Liu, and Weifeng Liu*

Modern Natural Language Processing (NLP) models exhibit extreme sensitivity toward text adversarial examples, while their opposite insensitivity to text rubbish examples is greatly underestimated. Text rubbish examples usually refer to highly modified sentences that appear semantically confusing to humans but can keep the model's prediction unchanged, which are significant in model robustness evaluation, improvement, and interpretation. Existing methods usually design a single objective optimization method to simultaneously maximize the modification rate and the model confidence with some text modification strategies, such as word deletion and preposition substitution. However, the single objective optimization easily falls into local optima due to the conflicting objectives, and the simple text modification strategies greatly limit the diversity of rubbish examples. To address these problems, we propose a Multi-Objective Simulated Annealing-based Stopword Substitution (MOSA-S2) algorithm with three major merits. Firstly, the MOSA-S2 replaces the input words with meaningless stopwords and employs importance-based composite perturbation to simulate word substitution, enhancing the quality and diversity of the rubbish sample generation. Secondly, we formulate a multi-objective simulated annealing method to adaptively determine the priority of word replacements, which can Escape local optima with a controlled probability and balance multiple objectives via Pareto dominance. Thirdly, we design a grammatically constrained variant to enhance the readability of rubbish text, while maximizing its semantic deviation from the original to mislead human judgment. We evaluate the effectiveness and efficiency of our method on six text datasets by attacking seven popular neural models. Extensive experimental results demonstrate the superiority of our MOSA-S2 and reveal the fact: modern NLP models may not fully comprehend the textual semantics, as they make the same prediction with even higher confidence for nonsensical text sequences. More broadly, extended evaluations on large language models reveal that undersensitivity is a pervasive vulnerability beyond conventional DNNs, positioning rubbish text attacks as a crucial complement to existing LLM safety research.


登录用户可以查看和发表评论, 请前往  登录 或  注册
SCHOLAT.com 学者网
免责声明 | 关于我们 | 用户反馈
联系我们: