Please Don't Cite Me

The Ethics of Computer-Generated Content

From image generators to code autocomplete, generative AI is reshaping what it means to create. But as adoption accelerates, our existing legal and ethical frameworks are not sufficient.

This is a lightly edited version of my final essay for CS 3001: Computing & Society at Georgia Tech, written in Fall 2022 (just weeks before ChatGPT’s public release).

Artificial intelligence has existed for much longer than many of us realize. Around the mid-1950s, computers were powerful enough for researchers to finally experiment using what we call AI. Still, the idea of a thinking machine has been a staple of science fiction since the early 1900s (like the Tin Man from The Wizard of Oz). Naturally, as computers grew more capable, this idea captured the interests of many researchers who believed it would launch our entire species into the future. The 1980s were the next big push in the field, with the popularization of deep learning, expert systems, and heavy financial investment from the Japanese government.1 However, it could be argued that the late 1990s was when artificial intelligence truly materialized in the public eye since that was when Chess World Champion (at the time) Garry Kasparov was defeated by IBM’s Deep Blue.

In recent years, computers have gotten faster, smaller, and more efficient. Simultaneously, AI startup funding has doubled every three years from 2011 to 2021.2 With all this advancement and investment, it is no surprise that we find ourselves in 2022 surrounded by all sorts of AI-powered systems. From voice assistants to autonomous driver aids to automatic email completion, these systems objectively simplify our lives. With these tools becoming trivial to access, many questions can be raised about the ethics of their usage. This essay explores the ethics and ownership of generative AI as it stands today.

One of the more popular uses for computer-generated content is image generation, where an AI takes a text prompt and generates an original image based on it. There is a vast array of AI-generated image services available to the public today, like Midjourney, Stable Diffusion, and OpenAI’s DALL-E.3 These services can generate convincing images from even the most random prompts in just a few seconds. Many of these services (and other generative services) are based on a subclass of neural networks known as generative adversarial networks (GANs). These networks have two distinct components; one generates random-ish images (hence the “generative” title), and one judges them relative to an input set (“adversarial”). Over time, both the generative and adversarial parts of the network become better, with the generative part creating more convincing content and the adversarial part getting better at identifying the generated content.4

In the context of images, these networks are trained on the near-infinite images available on the internet (photographs, artwork). From these images, the network “learns” what a cat looks like, for instance. Then, when it is faced with a new prompt, it attempts to imitate the inputs, producing a range of output images. Although the initial results may not be convincing, one of the major advantages of these systems is that they generate so quickly (within a few seconds); the user is then free to tune various parameters and keywords to get an image closer to what they had in mind. Image generation isn’t all these networks are good for, however. Models like OpenAI’s GPT-3 can generate paragraphs of text from single-line prompts, and GitHub’s Copilot can auto-complete and generate code (having been trained on all the public code repositories on GitHub).5 It’s incredible to think that these aren’t artifacts of a far-away future, but tools that people are using today.

However, these services also present a very real risk to the general public. One such concern is possible misinformation. Being able to generate a full article from a single line of text makes it significantly easier to create articles containing fake and misleading information. A study conducted on 830 humans found that participants were unable to reliably differentiate between poems written by humans and those generated by GPT-2 (a previous iteration of GPT-3).6 It has also been repeatedly demonstrated that humans tend to be overconfident in their abilities.7 This overconfidence when attempting to differentiate computer-generated content leaves us significantly more vulnerable to being affected by these algorithms.

Computer-generated content also poses a new question of ownership and copyright. It is presently unclear where we draw the line between computer-generated and computer-assisted content. For instance, if a photographer uses the Auto Levels button on Photoshop or Lightroom to adjust the lighting on their digital photo, it doesn’t make sense for them to lose their copyright over it. Something that might be closer to that line, though, is the inpainting and outpainting service offered by OpenAI’s DALL-E8 and others.

Inpainting is the process of replacing parts of an existing image with something else. An in-production example of this is Google’s Magic Eraser function, available on their Pixel 6 and 7 smartphones. With the simple act of selecting whoever you want to remove from your photo, Google automatically infers what a good substitute background would be using the rest of the image as context, and generates a (mostly) seamless patch to easily remove photobombers.9 Compared to what it would take to do it manually (a very precise selection followed by deliberate stamping of the surroundings to create a convincing replacement), this system has minimal human intervention. There’s an argument to be made that this could fall into the category of AI-generated content instead of AI-assisted.

Outpainting is a similar idea, but instead of replacing parts of the image, the service generates content beyond the original canvas of the image. For instance, DALL-E can take the Mona Lisa (a portrait-orientation picture) and fill in the surroundings and environment to make it landscape-oriented. Although most of the content in the new image is now entirely AI-generated, it could be argued (however weakly) that it’s still AI-assisted since a human had to decide how large the outpainting region should be.

This distinction between computer-assisted and computer-generated is significant because it can affect whether or not the work is copyrightable. Many countries only recognize copyright for work “created by a human,”10 but this still leaves ambiguous where we draw the line between AI-assisted and AI-generated. Current U.S. copyright law recognizes the former as owned by the human author, but the latter as ineligible for copyright.11 It is clear that the legal landscape surrounding creativity is shifting and that copyright law around the world will need to adapt to these changes.

It may seem that this distinction is a minor one, but it would certainly have far-reaching implications on how these services are used in a commercial context. AI is already being used in video games and music,12 13 and hypothetically these works could fall into the public domain since they don’t have a human author. If it is deemed that these works are not protectable by copyright law, companies that sell them would stand to lose a lot of money. Presently, there is limited incentive for companies to invest in these automated systems if they are unsure whether or not the creations generated by these systems will be protected by copyright law in the future.

Another option lawmakers have is to recognize the programmer of the tool as the copyright holder. For instance, OpenAI would own all copyrights to images generated by DALL-E. Although this option initially looks viable, it doesn’t translate all that well to other analogous cases. For instance, would the copyright of this essay belong to Google because I’m using Google Docs to write it, or to Apple because I’m typing it on my MacBook? Neither of these options makes sense, and current copyright law recognizes that, attributing it to me, the user of these programs and tools. But, as established earlier, this may fall apart when applied to AI-generated content, since the user’s contribution is simply pressing a button or typing a handful of words.

A widespread case for these image generation services is applying an artist’s style to a different context. For instance, you could get pictures of modern-day cityscapes drawn in the style of Van Gogh. However, this also introduces its own nuanced issues. A lot of the independent art world functions on commissions, where an artist gets paid to create an art piece in their unique style. These generative systems introduce the possibility of a new style of piracy: art style piracy.

Greg Rutkowski is a Polish digital artist with a distinctive fantasy-oriented art style. Artists like Rutkowski are concerned about the ethical implications of open-source programs like Stable Diffusion because their work is often scraped from the internet for training these networks without permission or proper attribution. Rutkowski’s name has been used as a prompt by Stable Diffusion 93,000 times.14 Although Rutkowski’s initial reaction was that it might be a good way to reach new audiences, searching his name online mostly brought back artwork that was not his. This issue is further exacerbated by the fact that artists do not currently have any effective means to opt out of these training sets beyond just emailing the authors of each service and hoping for the best.

A related issue is that present-day systems have a tendency to overfit and simply regenerate what they learned instead of creating something truly “original.”15 So when you ask for a picture of a dog riding a dragon, it is more likely to combine multiple existing photos of dogs and dragons to make something that appears original without truly being unique. An example of this is when image generation services occasionally generate the semblance of an artist’s signature.16 The counterpoint to this is that we don’t consider it “stealing” when humans study other artists’ work, but training AI on it is frowned upon. To me, the distinction comes from the fact that these networks do not presently synthesize or learn much beyond the actual content. For instance, when an artist does a lighting study, the intent isn’t to reproduce the art piece as much as it is to understand the different creative decisions made by the original artist. Modern solutions tend to lack this deeper level of insight.

Accidental piracy isn’t limited to just art either; Microsoft, GitHub, and OpenAI are being sued for violating copyright and open-source licenses in their GitHub Copilot service.17 GitHub Copilot is for code what DALL-E is for images; using prompts written as code comments, it will auto-complete the full implementation of a given function. This system has been trained on the wide array of open-source projects on GitHub with the intent of minimizing the amount of duplicated code that developers would need to write. The issue here is two-fold, however. First is the aforementioned tendency of modern-day systems to regurgitate content from their training data (albeit mildly transformed to fit into context). Second (and this is the crux of the lawsuit) is the fact that GitHub Copilot tends to not include the original open-source license that the code was published under.

Open-source licensing is a complicated and intricate system. Open-source software is based on the idea that when many skilled developers look at the same problem, the solution becomes that much more reliable and well-built.18 However, public source code does not mean public domain. The original authors of the work still hold the copyright for that source code, and others who want to use it will need a license to do so. In order to facilitate the legalese, there are a handful of standardized open-source licenses that the authors can use on their codebases, but most of them do not permit the reproduction of code without proper attribution.19 Without this attribution, there is limited motivation for programmers to continue to publish their code on the internet, much like there is limited motivation for artists to continue to publish their works.

I believe the sourcing of training data for AI-driven solutions is one of the major ethical problems in the field today. Training a network requires a positively staggering amount of high-quality data, and it is easy to look at the internet (with its prelabelled images and text) and just scrape the content from there. The fact that computer scientists choose to use copyrighted content in their training data may not be an issue by itself. However, when combined with the fact that these systems tend to reproduce parts of their training examples in their output, it very quickly violates any copyright or license that prevents that sort of use. The lawsuit against GitHub Copilot has the potential to set a legal precedent, not just for Copilot but for all companies that offer generative services built on copyrighted content without attribution.

With all this accidental infringement, it may seem hard to justify continuing to develop these systems. However, their premise is far from flawed. Consider the case of Copilot. Oftentimes, a lot of work that developers do is very repetitive. For example, practically all Python Flask projects are structured the same way and begin with the same boilerplate. If a system is able to automate these in an effective and unintrusive manner, it would enable programmers to instead focus their efforts on solving unsolved problems. Text generation models like GPT-3 could be used to provide a more immersive and engaging gaming experience, with NPCs talking like actual people instead of restating the same canned voicelines. Writers can use models to quickly generate creative writing prompts. BuzzFeed published a quiz that uses your responses to generate an original Pokémon using an image generation AI that was exclusively trained on Pokémon.20 Creatives have already started using AI art generation to rapidly create moodboards and even concept art for different ideas. There is a lot of productivity to be unlocked using these new tools, but it is important that they are built in an ethical fashion.

That isn’t to say companies aren’t aware of the risks associated with the services they are creating. OpenAI famously decided not to publish their GPT-2 model21 until they ran smaller scale tests and created safeguards to prevent (or at least, mitigate) its usage in creating fake news and propaganda. With the current open beta of GPT-3, it is against OpenAI’s usage policy to generate content for harmful industries like gambling, misuse personal data, and influence politics, among other limitations.22 23

Dance Diffusion (an AI to generate music) has ensured that all their training sets solely include copyright-free music.24 This is a very effective way to avoid misappropriating copyrighted work, at least in the interim where these models occasionally include large sections of their training data in the output. This approach is better for Harmonai (the creators) since they won’t get hit with lawsuits like GitHub Copilot, and it’s better for songwriters who use Dance Diffusion, since they don’t need to worry about the legality of using the generated content.

In the art world, many art platforms (like Shutterstock and Newgrounds) have begun outright banning AI-generated work. They do this to deter people from flooding their sites with tons of AI artwork, but this raises an open question: How do you detect AI-generated content and separate it from human-generated work? With the results from modern-day image generators, it’s still pretty straightforward, since there are noticeable artifacts of misproportioned objects. But like the Köbis & Mossink study shows, we will soon reach a point where they will be indistinguishable.

Contrary to the other platforms, DeviantArt has decided to embrace AI art generation.25 They recognize that these systems are here to stay and will only get better with time. Instead of trying to detect and block the artwork, they chose to add a way for artists to opt out of having their work be used in training datasets. They also have a dedicated AI art tag, and users can choose to hide all posts with that tag if they do not want to see any AI-generated art. I think this approach is more realistic and might offer a better long-term solution than the never-ending chase of detecting and blocking it.

In conclusion, I think AI-generated content is extremely powerful. It has tangible benefits to many different fields, and could very easily help us make more rapid and optimal workflows. At the same time, the way it is being explored right now borders on negligent in terms of respecting intellectual property. As is often the case with new technology, present-day laws aren’t equipped to properly handle all the new gray areas that have been created through its usage. In particular, I believe the Copilot lawsuit will have a profound impact on how lawmakers and companies will perceive generative AI.

Footnotes

  1. Anyoha, R. (2017, August 28). The History of Artificial Intelligence. Science in the News. https://sitn.hms.harvard.edu/flash/2017/history-artificial-intelligence/

  2. Statista. (2021, July 22). Artificial intelligence (AI) startup funding worldwide from 2011 to 2021 (in billion U.S. dollars), by quarter [Graph]. In Statista. Retrieved November 24, 2022, from https://www.statista.com/statistics/943151/ai-funding-worldwide-by-quarter/

  3. Growcoot, M. (2022, October 21). The Best AI Image Generators in 2022. PetaPixel. https://petapixel.com/best-ai-image-generators/

  4. Google Developers. (2022, July 18). Overview of GAN Structure | Machine Learning. Google Developers. Retrieved November 24, 2022, from https://developers.google.com/machine-learning/gan/gan_structure

  5. GitHub. (2021). GitHub Copilot · Your AI pair programmer · GitHub. GitHub. Retrieved November 15, 2022, from https://github.com/features/copilot

  6. Köbis, N., & Mossink, L. D. (2021, January). Artificial intelligence versus Maya Angelou: Experimental evidence that people cannot differentiate AI-generated from human-written poetry. Computers in Human Behavior, 114. https://doi.org/10.1016/j.chb.2020.106553

  7. Wood, J. (2012). Unskilled and Unaware of It [Microwave Bytes Back]. IEEE Microwave Magazine, 13(1), 186–187. https://ieeexplore.ieee.org/document/6132260

  8. Robertson, A. (2022, April 6). OpenAI’s DALL-E AI image generator can now edit pictures, too. The Verge. https://www.theverge.com/2022/4/6/23012123/openai-clip-dalle-2-ai-text-to-image-generator-testing

  9. Wan, J. (2022, October 18). How to use Magic Eraser on the Google Pixel. ZDNET. https://www.zdnet.com/article/how-to-use-magic-eraser-on-the-google-pixel/

  10. Guadamuz, A. (2017). Artificial intelligence and copyright. WIPO Magazine, 2017(5). https://www.wipo.int/wipo_magazine/en/2017/05/article_0003.html

  11. Hristov, K. (2017). Artificial intelligence and the copyright dilemma. IDEA: The Journal of the Franklin Pierce Center for Intellectual Property, 57(3), 431–454. https://heinonline.org/HOL/Page?handle=hein.journals/idea57&id=449&collection=journals&index=

  12. Sturm, B. L. T., Iglesias, M., Ben-Tal, O., Miron, M., & Gómez, E. (2019). Artificial Intelligence and Music: Open Questions of Copyright Law and Engineering Praxis. Arts, 8(3), 115. https://www.mdpi.com/2076-0752/8/3/115

  13. Wu, A. J. (1997). From Video Games to Artificial Intelligence: Assigning Copyright Ownership to Works Generated by Increasingly Sophisticated Computer Programs. AIPLA Quarterly Journal, 25(1), 131–180. https://heinonline.org/HOL/Page?handle=hein.journals/aiplaqj25&id=139&collection=journals&index=

  14. Heikkilä, M. (2022, September 16). This artist is dominating AI-generated art. And he’s not happy about it. MIT Technology Review. https://www.technologyreview.com/2022/09/16/1059598/this-artist-is-dominating-ai-generated-art-and-hes-not-happy-about-it/

  15. Gray, K. (2022, October 22). How Do Game Developers And Artists Feel About The Rise Of AI Art? Nintendo Life. https://www.nintendolife.com/features/how-do-game-developers-and-artists-feel-about-the-rise-of-ai-art

  16. Palmer, RJ [@arvalis]. (2022, August 13). What makes this AI different is that it’s explicitly trained on current working artists. [Tweet]. Twitter. https://twitter.com/arvalis/status/1558623546879778816

  17. Vincent, J. (2022, November 8). The lawsuit that could rewrite the rules of AI copyright. The Verge. https://www.theverge.com/2022/11/8/23446821/microsoft-openai-github-copilot-class-action-lawsuit-ai-copyright-violation-training-data

  18. Pearson, H. E. (2000, June). Open source licences: Open source — the death of proprietary systems? Computer Law & Security Review, 16(3), 151–159. https://www.sciencedirect.com/science/article/pii/S0267364900889062

  19. Open Source Initiative. (n.d.). Licenses & Standards. Open Source Initiative. Retrieved November 25, 2022, from https://opensource.org/licenses

  20. Stopera, D., & Woolf, M. (2022, January 13). Answer These 5 Questions And We’ll Make You A Completely Unique AI Generated Fake Pokémon. BuzzFeed. https://www.buzzfeed.com/daves4/pokemon-ai-quiiz

  21. Vincent, J. (2019, November 7). OpenAI has published the text-generating AI it said was too dangerous to share. The Verge. https://www.theverge.com/2019/11/7/20953040/openai-text-generation-ai-gpt-2-full-model-release-1-5b-parameters

  22. OpenAI API. (n.d.). OpenAI API. Retrieved November 15, 2022, from https://beta.openai.com/docs/usage-policies/use-case-policy

  23. Chatterjee, P. (2022, March 16). OpenAI plans to tackle GPT-3’s safety issues. Analytics India Magazine. https://analyticsindiamag.com/finally-openai-plans-to-tackle-gpt-3s-safety-issues/

  24. Davies, T. (2022, September 26). Harmonai’s Dance Diffusion, Open-Source AI Audio Generation Tool For Music Producers. Weights & Biases. https://wandb.ai/wandb_gen/audio/reports/Harmonai-s-Dance-Diffusion-Open-Source-AI-Audio-Generation-Tool-For-Music-Producers—VmlldzoyNjkwOTM1

  25. Wiggers, K. (2022, November 11). DeviantArt provides a way for artists to opt out of AI art generators. TechCrunch. https://techcrunch.com/2022/11/11/deviantart-provides-a-way-for-artists-to-opt-out-of-ai-art-generators/