OpenAI has developed an incredibly advanced AI chatbot that has become a viral sensation. It is one of the fastest-growing applications ever launched and provides humanlike AI interactions to over 100 million users. But, how did it get so advanced? And what really is ChatGPT and AI’s data privacy problem?
An implication of the rapid progression of artificial intelligence is the reality of an under-discussed privacy risk for anyone who has ever posted on the internet. ChatGPT is just the most prominent example of technology learning through billions of your words — without your consent.
How You Taught AI
Any observation into generative AI requires a base-level understanding of language models. These are technologies that are essentially the foundation of something like ChatGPT and systems like it. However, for these language models to become the technology that we see today, they need incredible amounts of data fed into them, to train them. Or, in better words, to teach them.
The more data shoveled into the language model, the more advanced it also appears, and the better it gets. So, how much data was poured into OpenAI’s viral chatbot? The estimate is just around 300 billion words. Yet, the words are not ChatGPT and AI’s data privacy problem, but it’s where they were found.
OpenAI scrapped these words from all across the internet. Books, articles, websites, and blog posts were all where used in feeding the GPT language model. What makes it all the eerier is that these words could have your personal information — it could have your words and information — without any consent at all.
If you have ever written on a blog or commented on an article, the chance of ChatGPT using your words to teach it is immensely high. And, that is undoubtedly a problem.
ChatGPT and GDPR
General Data Protection Regulation (GDPR) is a guaranteed right granted to individuals in the European Union. It is legislation that is designed to secure the privacy of individuals, allowing everyone the right to maintain their own privacy, and at their own discretion. The main target of GDPR? Entities that accumulate mass amounts of data.
GDPR is a landmark legislative action that seeks to protect our information in an age where everything is so easily accessible all the time. Yet, there is no way of knowing if OpenAI is compliant with GDPR legislation. Moreover, other countries don’t have the same protection from online data accumulation tactics.
What GDPR executes for individuals is a right to be forgotten. In the case of ChatGPT, there is no real determination of what happened to the data collected by the language model. OpenAI has no procedures to know if your data was collected, or more concerning, what they did with it.
A Privacy Policy That Digs a Deepr Hole
The means by which ChatGPT accumulates data are concerning enough, but it doesn’t end there. Utilizing ChatGPT is essentially a data risk of its own. By inquiring with the technology, you may be creating a risk for sensitive information.
Subsequently, if a company is using the technology and inputting sensitive information, that information is now entered into the platform’s database. That data is now training the tool further, and can even be regurgitated by the technology for someone else’s prompts. Moreover, ChatGPT already gathers information from users according to its privacy policy, including IP Addresses, browser types, settings, and data on how they use the website.
Conclusively, the entire premise of creating and sustaining artificial technology relies on your words and information. Yet, the issue of consent in terms of your data is a massive point of contention for the system’s continued development.
The Only Real Answer
Artifiical technology will not go away; If anything, it will only get more advanced and more prominent. With Microsoft and Google engaging in an AI arms race, there is no denying that this is a problem that will not go away.
The answer was already raised by the program’s creator, who stated that AI, and the companies creating it, need regulation. GDPR was a brilliant step in protecting people’s data and right to privacy, but AI’s prominence cement’s why that regulatory sentiment must only grow in the coming years.