From the course: Everyday AI Concepts

Who owns what the machine learned?

From the course: Everyday AI Concepts

Who owns what the machine learned?

- In 2024, Google's AI system mistakenly recommended eating rocks for better health. This information was easy to track down to a comedy news website. The AI system just vacuumed up this data and rolled it into their foundation model. But what if it was a real news site with real health advice? Can this system just use that information as part of its chat system? Generative AI can compose music in someone else's voice or generate artwork that looks like an artist style. So, are these systems simply copying another artist's work, or are they using it for inspiration? A lot of this material is covered by something called copyright. Copyright is a legal protection that allows the owner to assert certain rights. That's why you can't just copy a local newspaper and then sell it as your own. But copyright doesn't give you absolute protection. You can still check out a book from a library, without paying the publisher. It's also why university students can freely watch movies in class. These copyright exceptions are called fair use. It means that you can use protected material as long as you're fair to the owner. In general, you can't do things that will harm the value of the protected material. Generative AI companies like OpenAI, have claimed that their systems, fall under a fair use exception. They argue that they're acting like a library, that they're not copying and selling the information, but instead the system is just inspired by their work. Fair use generally creates a balance, between sharing protected material, while giving creators incentives to create new material. Determining if these systems are considered fair use is a big unanswered question. If these systems are found to be using copyrighted material, then they will be breaking a law. So think of it this way, search engines like Google used to point you to a website that matched your question. Now, when you search for something, you can see a Google summary of your answer. You don't even have to visit the website that provided that answer. Newer systems can provide almost all the information you need without leaving the chat. If you write about travel, answer online questions or suggest that people eat rocks, your content will be included in the chat. If you don't visit the sites that supplied the information, then what incentives do authors have to answer questions? Why would photographers and artists keep creating images if AI can make realistic looking copies. Why would creators bother making new content if companies gather all the data for their benefit? If companies are allowed to freely use this material, that might completely change the way that people create content. Either one of these outcomes will have a huge impact on how you interact with these systems. If you create content, then you might be in danger of having another company, benefit from your work. On the other hand, if you're trying to use a foundation model, then you need to be aware of some of the training data that might include copyright protection. It's important to understand these issues. That way you can protect your own content or know when you're using work, based on someone else's material.

Contents