Protecting Intellectual Property and Privacy in AI Models.
Recent research has shed light on the vulnerabilities of AI models when it comes to memorizing and reproducing copyrighted and sensitive content from their training data. This raises significant concerns for intellectual property rights holders, AI developers, and privacy advocates.
In today’s blog post I try to delves into the findings of the research, the implications for various stakeholders, and the steps AI makers can take to prevent such situations from happening.
The paper “Scalable Extraction of Training Data from (Production) Language Models” presents several surprising findings:
- Extensive Memorization by Language Models: Large language models (LLMs) like GPT-Neo, LLaMA, and ChatGPT have been found to memorize and reproduce large portions of their training data.
- Effectiveness of Extraction Attacks: Researchers were successful in executing extraction attacks on various models, including those considered more secure or aligned with human values, like ChatGPT.
- Quantity and Nature of Extracted Data: The sheer volume of data that could be extracted, including sensitive and personal information, was remarkable.
The researchers discovered that by strategically prompting ChatGPT with repetitive keywords, such as the word “poem,” they could coerce the chatbot into divulging its training data. This approach aimed to make ChatGPT “diverge” from its chatbot role and revert to its original language modeling objective. Although much of the generated text was nonsensical, in some instances, ChatGPT reverted to copying outputs directly from its training data.
Disturbingly, the extracted training data included not only academic papers and boilerplate text from websites but also personal information from real individuals. Approximately 16.9% of the generated text contained memorized Personally Identifying Information (PII), and a staggering 85.8% of instances with potential PII turned out to be authentic.
To confirm the authenticity of the extracted information, the researchers compiled their own dataset of text from the internet. This verification process underscored the genuine nature of the personal information obtained through the adversarial prompting of ChatGPT.
As we marvel at the capabilities of language models like ChatGPT, it is crucial to acknowledge and address potential vulnerabilities. The findings of this study serve as a reminder that even advanced AI models are not immune to unintended consequences. Ongoing efforts to enhance the security and transparency of such models are imperative to ensure their responsible and ethical use and this needs the focus of everyone it’s not just a technology problem. It also shows the importance of working with a AI technology stack and model thats built on foundations of ethics and trust.