Testing DALL-E by creating single panel cartoons

I tested DALL-E for a specific real-world use case. I wanted to see how good it was for producing single panel cartoons. My testing has uncovered several promising aspects, some problems that need to be addressed and an interesting testing technique for DALL-E and ChatGPT like applications. I tried summarizing my findings in a blog post like an engineer would. But that turned out to be a bit too boring to share. Instead, I am going to walk you through some of my findings along with one or two cartoons that illustrate the point. That way, at least a few people will read this post 😛

Summary

Seemingly near, yet so far. I think DALL-E is at the place where a hobbyist cartoonist with an image editing software can produce cartoons quickly. However, we are still far away from replacing professional cartoonists. See the section “The finer points” to understand the gap between hobbyist and professional cartoonists.

So near, yet so far!
Everything appears so near and yet we are so far!

The above cartoon image was a “mistake” that I have repurposed. I wanted a scientist standing and a chef looking into the telescope for some other caption. However, DALL-E had other plans. I think the nice drawing coupled with the absurd chef hats do make the point in this section – seemingly near, yet so far!

The setup for my experiments

Engineers and testers read this blog. This section is to please them. If you are here just for DALL-E, skip this section.

Kangaroo wearing raincoat standing next to a giant tea pot.
A world of infinite potential, and we still favour the practical over the intriguing.

My testing is limited to a field that I sort of know well – cartooning. Apart from running Qxf2, I like drawing cartoons … especially when I am angry. A couple of decades ago, I was a cartoonist on the now defunct Sulekha.com. I landed the column because it was early days of “Internet portals” and digital art. So their editor let me get away with it. While I definitely sucked at it, at least I was on the path to being considered a professional.

I limited my tests to produce only single panel cartoons. Why? Because comic strips need continuity between images and are going to be hard for DALL-E at this stage. I truly wanted to discover if DALL-E was good enough to be used somewhere within the world of cartooning.

As part of the test, I set out to produce 3 cartoons every day within 60 minutes. I did this for 2 weeks straight.

My process, for the most part, involved the following:
1. Ask ChatGPT for 3 random words
2. Use these 3 words to come up with cartoon ideas.
3. Write the caption
4. Clean the caption and make it tighter using ChatGPT
5. Use DALL-E to create the images
5a. Create prompt
5b. Iterate until I am satisfied
6. Select the best image
7. Spend <2 minutes on post-processing the produced image

In this post, I am sharing 16 cartoons produced by this process that fit the narrative of this post. I do have more cartoons but there would be little point in sharing those. I have not shared any “total failures” in this post. If an image was bad, I just moved on to trying to get a better image.

Side note: While doing step 1, I noticed ChatGPT has a bias towards certain words. But that is for another post.

Speech bubbles and text are a struggle

DALL-E struggles with speech bubbles and text. The text almost always has typos. The speech bubbles are rarely placed in the right spot and are usually not the right size. I have seen some exceptions (see the planned obsolescence cartoon below) but they are few and far in between.

Man threatening a horse and an elephant to sand diagonally from him
I am truly sorry for this, chess fans!

My prompt had asked DALL-E to make this cartoon in the style of Amar Chitra Katha comics that I grew up reading. DALL-E did a good job of getting the style of drawing correctly.

Working around speech bubbles and text

One way to workaround the poor spelling is to simply generate the image and then edit the image to add the text, like in the cartoon below. As you can see, I did not have the right spacing to place the text properly. But with enough retries, you can get an image that has enough space to suit your textual needs.

We come in peas.
The dangers of onomatopoeia

Onomatopoeia refers to words that are pronounced exactly the same but spelt different. Like “peas” and “peace”. Visualizing a surreal cartoon like this is pretty hard even for a human. DALL-E surprisingly did a decent job here. It did have some difficulty drawing peas but after enough prompt magic, it got something resembling peas. BTW, I find this cartoon hilarious ¯\_(ツ)_/¯. The fact that the human race can get wiped out because of a quirk in English makes it so funny for me.

Mistakes in the drawing

DALL-E drawings have simple mistakes. In the cartoon below, you can see the sign post is not one vertical line but is cut up into two vertical lines. Similarly, the background has random text that is not Hindi but looks similar to it. I also had to post-edit the word “Status” on the signpost.

Voter
The Indian Voter.

I asked for a cartoon in the style of RK Laxman, one of my favourite cartoonists. The style was not a good match but I still like the way this drawing style looks.

Random insertions without understanding context

Another thing that spoils cartoons is that DALL-E likes to randomly insert stuff apart from what is in it’s prompt. For example, in both the cartoons in this section, DALL-E was inserting humans. That takes away from the attempted humour. There is no easy way to workaround this limitation. I kept asking for newer images and finally chose the one that was ok.

Underwater mermaid office with bicycles
We’ve got bikes everywhere on campus – it’s just like being at Google!

I (secretly) like laughing at startups that do something because “Google does it”. This drawing was produced after some 6-7 attempts.

Robots standing in line to buy ice cream
Little did they know we’d outdo them at consumerism too.

I really like the way this cartoon came out. I had asked for it to be in the style of The New Yorker. DALL-E gets that style really well. I also like the angle and perspective within this drawing. There are minor mistakes, like one of the robots in line is already eating an ice cream.

DALL-E understands cartoon tropes

I was pleasantly surprised that DALL-E understood cartoon tropes – death, man stranded on an island, etc.

At that moment, Aria realized that planned obsolescence was not a conspiracy theory.

It is a miracle that DALL-E got the spelling of the words in the speech bubble right!!

Prompting is a skill

DALL-E especially struggles when objects in an image have some sort of relationship. It often mistakes the relationship. If I wanted a “dinosaur eating a pizza and riding a skateboard”, DALL-E would produce a skateboard painted with the picture of a dinosaur eating a pizza. I am sure if I worked on better prompts, I could workaround this problem.

Evolution to jealousy
Jealousy

I wanted to show us evolving into a “green-eyed monster” i.e, jealous beings because of social media. However, my prompt was probably not good enough. So DALL-E seems to have interpreted this as a “Green monster” or something. Also, notice, it has randomly inserted numbers. And this was the best image produced from all the dozen plus prompts I tried for this cartoon idea.

Perspectives are a challenge

DALL-E thinks it knows perspective but often just produces whatever perspective it likes. Several times, it gave me single point perspective when I asked it for 2-point perspective. But you can workaround this by just prompting it again.

1 point perspective drawing of a CEO walking down a corridor
The CEO strolled down the corridor, ready to enlighten folks on the virtues of multiple perspectives.

Engineering drawing joke. For those that don’t get the cartoon, this image was drawn in 1-point perspective. You know, the kind of perspectives most “open minded” CEOs like me tend to have 😉

You can describe your camera angle

Another nice aspect of using DALL-E is that you can mention the camera angle you want a picture to be in. For the most part, DALL-E likes side views in two point perspective. But you can alter that.

Pirate and parrot looking at a lighthouse projecting a giant parrot
Looks like Marvel’s scraping the barrel, eh, Squawkbeard?

ChatGPT told me Squackbeard would be a funny name for a pirate’s parrot. I produced this image without mentioning a style. I asked for the camera to be behind the pirate.

Mimicking artists

ChatGPT is a mixed bag at mimicking artists. I tried some 20+ of my favourite cartoonists. Some it copied exceedingly well while others were total misses.

Man standing on a pyramid of people
I reached the top all by myself!

The cartoon was supposed to be in XKCD style – clearly DALL-E failed. But the drawing style does remind me of another creator. I do not remember their name though.

Dinosaurs in a Manhattan rooftop restaurant.
There is old money. And then there is really old money.

This was to be in the style of The New Yorker and DALL-E did well. The New Yorker style seems to be something DALL-E does well.

Stitch multiple images

I also discovered for simple images, you can decompose them and ask for separate parts. Below is a picture composed of a painting and it’s frame. I simply asked DALL-E for each separately and then stitched them together.

Prehistoric painting of a buffalo being shot by arrows
Buffalo being downvoted

I do want to improve this drawing over time. The joke has not come out well at all. I wanted to show a prehistoric drawing being misinterpreted with a very modern caption. But this failure of mine is a side-effect of trying to produce work on a deadline.

Embrace the serendipity

One point I want to emphasize is that the “mistakes” that DALL-E makes can often be the starting point for new cartoon ideas. I did enjoy that. It was a new aspect to cartooning for me.

Man looking at a cave drawing that looks like the wimpy kid character
Darwin stumbles upon his lesser known theory: ‘Survival of the Wimpiest’

The image for this was produced first without the idea. I had asked DALL-E for something else and it gave me this image. Almost immediately, I thought of Darwin and the “Diary of the wimpy kid” which ended up producing the caption.

The finer points

There are a couple of aspects that separate a professional cartoonist from a hobbyist like me – point of focus and timing. A good drawing moves the reader’s attention to where the cartoonist intends. Good cartoonists are always aware of where they want the viewer’s focus to be. Similarly, a good cartoon acts as a keyframe that makes the viewer imagine what happened before and/or after what is shown in the image. DALL-E sucks at both.

For example, in the cartoon above, I would have drawn Darwin literally stumbling with a light in hand. The facial expression would be one of realization. The light in the cave would only illuminate some portions of Darwin’s face and the drawing of the wimpy kid. There is no way you can get today’s DALL-E to draw that.

But I am not going to be too critical of DALL-E in these aspects. After all, as a human, I cannot get these aspects right either.

Man in art gallery full of mediocre paintings
Art got easier to make. Everyone made “Art”.

I feel this way about software too. As software has become easier to make, the amount of poor quality software being produced is too damn high!

BTW, I spent some time deciding on the style of this drawing. I finally settled on Cyanide and Happiness because it felt fitting. There is some happiness with how much AI has progressed and some – well, you know!


A testing technique as a takeaway

I had fun taking DALL-E for a spin. This is admittedly an extremely quirky piece of testing. You might have not noticed it but I did cover several aspects of quality that a single panel cartoonist might use to evaluate DALL-E’s capabilities. Totally bragging here – it takes skill to simplify the tests and make them look so natural.

And there is potential in the testing technique I used. If I had to generalize the testing technique I used, it would be like this. One, identify a solid use case. Two, get someone just below professional level in that field. Three, let that person further constrain the problem in a way so the AI has a chance to succeed. Now, let the person work with the AI for a sufficiently long period of time to produce their work while making observations. Finally, have the AI engineers generalize the findings and then work on making improvements.

As a nice side-effect, you might even discover other interesting tangents. Like the bias I found ChatGPT had towards certain words when I used it to get starting points for cartoon ideas.

Note: Why one level below professional? Because, at this early stage, they are better judges of what is “useful progress” than professionals. A professional cartoonist would probably have rejected DALL-E since it’s timing and point of focus are just not good enough. Someone like me, could look past that and still persist.


Hire technical testers from Qxf2

This was a whimsical piece of testing. But most of the testing Qxf2 does is pretty serious and technical. We are good testers, technical engineers and interesting people. Our experience extends to testing software that most testers normally cannot. We have tested AI/ML models, custom RAG implmentations, data pipelines, infrastructure stored as code, etc. If you want good testing integrated into your team’s workflow, contact us.


One thought on “%1$s”

Leave a Reply

Your email address will not be published. Required fields are marked *