Having hit a double milestone recently for this site (six months and over 100,000 human-written words of content about the ethics of comic books!), I thought it was time for something different: examining the promise and peril of AI through my experience of researching and writing about comic book ethics.
The image above is from Rise of the Powers of X Vol 1, issue #5, 2024, written by Kieron Gillen, art by Luciano Vecchio, colors by David Curiel, and lettering by Clayton Cowles. It depicts Enigma, a human elevated to AI godhood (and having a very bad day). Amusingly, it also captures my frustration with generative AI – which relates back to my impetus in creating this site.
But first, a mea culpa – when I first launched the site, I used a bunch of AI-generated images of superhero-like characters on the headers of some of the posts. My intention was to ridicule the use of generative AI. But I realized belatedly that wasn’t obvious, as my explanation for these images was a bit buried on my About Site page. My thanks to Jed MacKay for thoughtfully pointing this out.
I removed those pictures from the site but will reintroduce and intersperse them down below with a direct explanation of why I found the AI mistakes amusing. Here’s one of my favorites to start us off, in response to a specific prompt of a “Viking Thor reading a medieval philosophy book” … I didn’t know they also carried cell phones back then!

Generative AI Background
UPDATE: I’ve provided some updated experience with the latest GenAI tools in my 1st Anniversary post. Spoiler alert: things have gotten worse than what I describe below.
Of course, artificial intelligence (AI) can mean a lot of very different things. At its core, it is simply machine-learning approaches that try to replicate (and potentially expand) on human cognitive abilities. I realize it may not always seem this way, but fundamentally AI is “good” at exactly the same sorts of things that human brains are good at – pattern recognition, inference-drawing, and learning through prediction errors.
The promise of AI has always been to do an as-good or better job of this than humans – by removing human cognitive biases from their training. As I explain in the bias section of my Ethics 101 page, human cognition is subject to all sorts of biases. Some of these we can try and correct for, but it is challenging as bias is a fundamental component of how our brains evolved (in other words, it is a “feature”, not a “bug” – although in some cases it does seem to be a byproduct). As a research field, machine-learning was initially all about overcoming our inherent biological limitations – combined with enhanced processing power and speed.
Sadly, it hasn’t turned out that way so far.
Most peoples’ experience with AI is through the current crop of commercial generative AI online tools (the “G” in ChatGPT, for example). These chatbots are based on large language models (LLMs) that can generate human-like text in response to user prompts. But where do these large training datasets come from? Existing human-written text – mainly scraped off the internet without creator consent (so, books, articles, websites, discussion forums, etc.). These chatbots are then subsequently “refined” by human programmers – ostensibly to “improve their conversational abilities”. In practice, this means two things – trying to prevent the chatbot from providing harmful information to users (for example, how to hurt yourself or others) while also enhancing the likelihood that users will continue to interact with the tool.
As an aside, chatbots don’t “learn” language through words like we do. They instead rely on smaller units (“tokens”), which are words, fragments of words, and symbols. These are then run through statistical language models to predict the next most likely token in sequence. In this sense, they are really just plausibility engines – they don’t understand the content they have been trained on, they are just predicting the next most likely token, based on experience. AI image generation tools work much the same way, relying typically on raster or vector data from existing images and illustrations available online in their training.
It’s not hard to see how this brought us to a bad place. These models are trained on human-biased material, try to fit it to a simplistic model that doesn’t reflect the complexity of the source material, and then refine the result to further keep humans engaged (which means, pander to our preferences and biases!). As a result, two of the common features of generative AI are “hallucinations” (where it provides non-factual responses to a query) and an almost sycophantic communication style (that is, it is always flattering the questioner).
Neither of these are accidents. The flattery is part of the programming to make them more palatable to humans. And hallucinations are actually a core part of how chatbot’s generative processes work. To quote Open AI’s co-founder Andrej Karpathy: “Hallucination is not a bug, it is LLM’s greatest feature”. Again, these tools are really just plausible prediction machines, not factfully-true evidence providers. They lack real-world understanding and are optimized for coherence over accuracy (as I experienced in my testing down below). This combination drastically lowers the value proposition of commonly available AI chatbots and image generation tools.
From a simple prompt of “Daredevil holding up a light bulb” … I didn’t expect it to be magically illuminated!

The evidence is also clear that this factually-wrong hallucination problem is getting worse on successive iterations of these tools. This makes intuitive sense when you consider that each model is based on a training dataset that has a cut-off date some time in the recent past. As more artificially generated content is posted to the internet from all such tools (often masquerading as human content), it is now being incorporated into subsequent iterations (i.e., the bias problem is getting worse). And the increasingly desperate attempts by human programmers to ostensibly “correct” these biases and errors is in some cases only making it worse.
It’s important to note that just because something is not done well doesn’t mean that it can’t still displace high quality work. I feel particularly bad for illustrators, as commercial enterprises appear to be increasingly relying on error-prone generative AI tools to create imagery for their products or campaigns (that, ironically, were all trained on skilled human-drawn source material originally). And the same is potentially true for all creative industries where AI is now doing seemingly passable-enough work (but actually isn’t).
My experience with AI and why I created this site
As I explained on my About Me page, my interest in exploring moral philosophy and ethics through comic book stories is long-standing. Having thought through the ethics of comic characters over the years, I wondered if the current generative AI tools could provide good summaries. And so, circa late 2024, I asked a bunch of chatbots exactly that.
Oh, and here’s what happened when I asked an AI image generation tool for “Clea reading a magic book” … it was definitely magical alright:

When you scratch below the surface, the chatbot responses on the ethics of comic characters were pretty bad. Oh, the chatbots all gave very well written responses, and they seemed to make sense, as they made arguments for and against the application of different normative ethics theories to a given character. But chatbot responses often reminded me of the opening to Charles Dickens’ Tale of Two Cities: “It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, …”. Dickens was trying to make a point – but modern AI can’t seem to help itself. Given the vast volume of often contradictory information on comics (and ethics), the responses would invariability ping-pong back and forth, trying to support every possible ethical theory for a given character.
That said, AI chatbots did fairly well when describing a character who has classic virtue ethics (and even sometimes brought up care ethics when relevant, which impressed me). It struggled more with deontology, as they only seemed to refer to Kantian ethics (often calling it that, but sometimes using the broader deontology term while still only referring to the narrower Kantian sense). Consequentialism (as a term) was rarely if ever used, as most AI chatbots typically limited themselves to utilitarianism alone (which is leaving a lot of ground uncovered, especially for all the non-utilitarian but still consequentialist villains and anti-heroes).
On the rare occasions where an AI chatbot got off the fence and articulated a clear preference for one normative ethics theory over the others, I often disagreed with its conclusions. For example, one of the chatbots told me that Professor X (Charles Xavier) was primarily deontological in his ethics. You could make an argument for that position from the 1960s and 70s comics (although even then it is debatable). But it is completely untenable to claim that for one of the most utilitarian characters these last 40 years or so. I suspect in this case the chatbot’s LLM was biased by sources that were largely historical and not modern (as a lot of more recent comic content is hidden behind paywalls now, and unreadable during AI pre-training). As the training material for all chatbots are not disclosed, it’s impossible to know for certain.
The poor showing of AI chatbots overall provided an impetus for me to actually sit down and begin to create this site – and to do my best to provide accurate information on both ethics and comics. It should go without saying, but I don’t use any AI-generated content in the writing of this site.
In the interest of fairness: I have continued to probe AI chatbots more recently and have found Anthropic’s Claude’s latest build (Sonnet 4) has improved considerably – and is the best of the bunch. I found it accurately proposed some less-common ethical theories that I had also flagged in my character overviews. Note that I specifically block AI chatbots from scrolling my site, and my reviews were only published after the pre-training cut-off date for Sonnet 4 (prior to March 2025, I blocked all robots, including search engine indexers, while I built this site). Sonnet 4 still gets muddled sometimes about how outcomes are considered in both consequentialism and deontology, but it now does a passable job for some characters.
When I started planning the site, I wondered if AI image generators would do a better job at creating images than the chatbots did for ethics. I would describe the appearance of a character and ask them to be posed a certain way (e.g., reading a book, or sitting like Rodin’s The Thinker). I soon got ridiculous examples, as you can see scattered across this page (and below, for the Vision … who was obviously thinking of something unexpected).

Overall, these images are about as accurate as the ethics descriptions the chatbots did of the characters. At a cursory glance, they may seem to describe them reasonably well. But when you look at the details, there is a lot of nonsense there.
For these reasons, and in support of comic artists, I have removed all AI-generated images on my character ethics posts. There are still some older generic and philosopher AI images on some of my background pages (including ones that came with template I purchased) that I am looking to replace with non-AI open source images.
Use of AI as a collaborative/critical tool
Online proponents of AI chatbots often claim that they don’t use them in a passive way looking for answers, but rather as a collaborative tool providing critical feedback in their work. That sounds pretty scary to me, given they don’t understand the material they are imperfectly mimicking. But I tried that too – by running my posted character ethics overviews through two of the best AI chatbots for technical reviews (Anthropic’s Claude and Open AI’s ChatGPT). And I specifically asked them to critique my posted analyses.
As an aside, I had higher hopes here, given the reported abilities of these tools to analyze and condense technical documents into briefer summaries while maintaining appropriate context. I also wished it would work. In my working career, I commonly developed large, complex, and innovative research funding programs to strategically address unmet human health and biomedical research needs. These programs had to be extensively reviewed by experts within and outside my agency during the design phase. I was forced to defend everything from the explicit need, the science, the researcher capacity, the potential impacts, the mechanisms proposed, the tools used, the feasibility, the risks, etc. I got pretty good at anticipating many of the common objections (and proactively building solutions into my designs). But there was incredible value in having my presumptions challenged by experts in each of the domains above. Exhausting at times, but it certainly led to better initiatives being built.
But despite my asking these chatbots for critiques, they both inevitably started out their responses with high praise for the quality of my work, both in terms of the writing and the ethical analysis. I am deeply suspicious of their flattery, given these chatbots’ well-known sycophantic nature. I suspect I would have gotten the same over-the-top praise to any generic input.
A common next step for these chatbots was to zero-in on any areas where I provided a possible counter argument to my main thesis. They would praise me for considering these counter-factuals (despite many of them being obvious, but whatever) and then go on to suggest that I didn’t provide sufficient arguments against them. That could be a fair criticism. My analyses are quite lengthy, and while I try to be fair, I also don’t want to spend any more time than I need to on counter-claims. But as these AI tools kept consistently offering up these same sorts of critiques on each of my posts, it looks to me like this is a programming trick where the chatbots glom on to something easy to spot (since I had already made the point for them in my own posts!).

So, were there any apparently new insights into my writing? Yes, actually. Typically, these two tools did move on to finish their critiques with suggesting novel interpretations that I hadn’t considered. Many of the ones Claude came up with gave me pause initially, as they seemed quite intelligent and well considered on the surface – offering a perspective that had never occurred to me. So, had I found a suitable replacement for my previous work experience of receiving targeted feedback from content experts challenging my assumptions?
No. On closer consideration, most of the AI chatbot suggestions were actually not well considered or reasonable after all (although they certainly seemed that way initially). A good example is what came up for my post on the ethics of the very well-established character, Spider-Man (Peter Parker). While reflecting back to me my observation of the inconsistent elements of the character’s ethics over the years (acknowledging consequentialist, deontological, and virtue ethics elements), Claude offered a seemingly novel thought: could this actually be why the character is so popular? It went on to expound that human beings are not entirely consistent in how they make moral judgments (something I certainly agree with, as I wrote on my Ethics 101 page). It then explicitly suggested that the internal conflicts within Peter between these modes of ethical thinking contributed to making him seem more “real” – and thus relatable – to comic readers.
I will admit, this seemingly complex reasoning stopped me cold for a moment.
But upon thinking about it further, I realized that the chatbot had committed a classic logical fallacy (specifically, the fallacy of composition) by not understanding the underlying structure of the source material (despite my having described it in the post). The chatbot’s argument might well have been true if Spider-Man had been written by a single author who had incorporated all three classic ethical frameworks into her stories. But the inconsistency comes from different authors, in different time periods, each focusing on a specific single ethical framework in their respective stories. In essence, the different ethical frameworks come up sequentially, not concurrently, in the composition of Spider-Man stories. You wouldn’t come away from all that thinking you just read a very “real” character – you would think you have come across a very inconsistent character!
It is just like the common error I used see when I taught statistics – applying the wrong kind of statistical test that ignores the composition of your data (or worse, assumes a pattern that isn’t there). Like having two independent groups but applying a paired Student’s t-test, leading to a spurious positive finding. As an aside, I’ve always been a big stickler on inspecting one’s data, as you will see in these two respective analyses from my earlier whiskyanalysis.com and flashlightreviews.ca review sites. Both sites also have articles linking you to a fun example of why you need to visualize data before applying statistical tests: Anscombe’s Quartet.
Of course, none of that is an option with AI chatbots – you don’t really get to look under the hood to see what they are doing. But the evidence is overwhelming (and increasing) that you can’t trust them.
Is there any value to AI?
Of course. Getting back to my original point: when they are trained on accurate and unbiased data, and optimized to prioritize accuracy over plausibility (or flattery). Imagine what it could potentially achieve in scientific or engineering fields, applying the core cognitive abilities of humans on much more massive datasets than our minds can handle?
It is really just the commercially available chatbots that give AI a bad name. There are in fact no end of unbiased AI tools in development for specific research fields. Brain imaging is always the first example that I think of – the overwhelming volume and complexity makes it very hard for us understand how the brain works (even with our tools to simplify the analysis). Personalized medicine is another good example, including identifying who is the most likely to benefit from a given treatment. Climate science, with its inordinate complexity to unravel. Protein visualization and design. The list goes on where there is so much potential for machine-learning approaches to exceed our human cognitive limitations.
I could even see the benefit of a carefully constructed, evidence-based mental health chatbot, given our innate tendency to anthropomorphize chat responses. This could be extremely helpful in cases where it is challenging for humans to provide treatment (for disorders that manifest with opposition to the therapist, or purposefully trying to manipulate and deceive them). But the risks are higher here too, when dealing with vulnerable populations, so serious safeguards would need to be built in.
Which reminds me of what happened when I asked for an AI image of “Daredevil on a psychiatrist’s couch” for my Daredevil Ethics page … certainly not the pose I was expecting:

It is a shame that biased commercial AI chatbots have given the field such a bad name – especially when you consider the huge environmental costs their operational entails – all to give you potentially misleading or outright inaccurate information. Think of the good that could be done with those existing massive computing resources instead (not to mention the computing cycles and power wasted mining bitcoin).
Alright, time to get off my soap box. To sum up, all of this reminds of a great quote I once heard:
The greatest enemy of knowledge is not ignorance; it is the illusion of knowledge.
That feels like exactly where we have landed with current commercially available generative AI. As an aside, that quote is an example of the false attribution fallacy. I have seen it attributed to Carl Sagan, Isaac Asimov and Stephen Hawking over the years, but it was actually the US historian Daniel J. Boorstin who phrased it that way (although to be fair, all three famous scientists did say similar things about the perils of ignorance).
Ethics is hard. Thinking about ethics is hard. Applying ethical thinking to the vast trove of comic book stories is hard (but fun!). After all, why were comics created if not to tell morality tales? Everywhere, people of conscience are carefully crafting comic stories with an intent to explore different ethical perspectives, with all the quandaries and conundrums than entails. There is value in carefully examining those stories, to see where they may take us.
I hope you enjoy this human-crafted site.
UPDATE: My view of current LLM chatbots has soured even further since this post was written, as there is now evidence that the latest versions are more inaccurate than ever – while expressing even greater confidence in their results. See my 1st Anniversary post for details.
See my Glossary post for a list of the key philosophical concepts and related links on this site.



That was interesting, thanks.
I don’t have much experience with AI, but I find the AI answers at the top of google are usually pretty good. Was that one of the ones you tested?
Ah, that’s a little different. The Google Gemini responses at the top of search queries still report content from actual factual search results in its database (with links). That’s different from most chatbots (built exclusively from LLMs) that work only from a “fuzzy” recollection of their training documents.
So, with commonly available information, it does a good job of giving you the “correct” answer. But when it can’t find that, it falls back into LLM hallucination mode, and gives you a plausible answer. I recently ran a search looking for the sequence on my dishwasher to disable the beeping feature on button press (it had been off, somehow accidentally turned back on). Google didn’t know my model number – but the Gemini response at the top quoted a common sequence for that manufacturer (from a link) as a direct response to my request, specifically stating this was for my model (in bold). But it wasn’t. And when I followed the link, my model was not one of the ones listed (and it didn’t work). But Gemini had expressed fake confidence that it was for my model specifically (because it had no knowledge on my model number whatsoever).
As long as Google’s search index returns high-quality results, the Gemini results are good. But when it has rely on its LLM, it doesn’t do a great job. From testing I’ve seen online, Microsoft’s Co-Pilot, Google’s Gemini, and Meta AI consistently underperform next to ChatGPT and Claude.
Great piece, Eric. You really nailed what’s been bugging me. Like a lot of us who have been watching this whole AI craze unfold from the sidelines, your experience mirrors what I’ve been seeing. The promises are always huge, but when you actually dig into what these chatbots produce, it’s mostly dressed-up BS that sounds impressive but falls apart under scrutiny.
I particularly liked your point about the “illusion of knowledge” – that’s exactly what we’re dealing with. These tools give people confidence in answers that are fundamentally unreliable. I’ve seen colleagues relying on ChatGPT for techincal summaries, and half the time the citations are made up or the conclusions are completely backwards.
The environmental angle really gets me too. We’re burning through massive amounts of electricity to run these glorified autocomplete systems that can’t even get basic facts right! Meanwhile, there are legitimate scientific applications where AI could actually make a difference – protein folding, climate modeling, medical diagnostics – but all the money and attention is going to these consumer chatbots that primarily serve to make people lazier thinkers.
Keep up the good work on the site. It’s refreshing to see someone actually doing the hard work of real analysis instead of just asking a machine to do it for them.
P.S.: I’ve noticed you’ve added a section on DC comics, and I’ve spotted an Absolute Wonder Woman image in your category section. Does this mean you will be branching out to DC shortly?
Thanks, it really is shocking how many errors current chatbots make.
And yes, I did add a section on DC comics to my overview page, to describe the history of the Multiverse there (and how it differs from Marvel). I am nearing the end of the detailed character overviews I was planning, and will soon start focusing more on individuals creators and specific stories. These will include DC characters, but also Image comics and other independent, creator-owned series.
And yes, the first one up will be Absolute Wonder Woman by Kelly Thompson, Hayden Sherman, Jordie Bellaire, and Becca Carey. You have good eyes to have spotted that category icon! I hope to have it up in a few days.