ChatGPTs camera functionality has always interested me. AI’s vision of vision is to be much grander (think robot eyes), but the fact that right now we have AI assistants that can look through your camera and perceive the world around you is already a massive step for technology.
With this interest, I wanted to expand into handwriting. You’ll see mine is trash to put it lightly, so I wanted to see if ChatGPT could A: Interpret it and B: Execute the prompts.
It’s time to ‘‘write’’ history with this experiment.
Clear Writing:
We started strong with no prompt and clear writing. ChatGPT can read clearly printed handwriting, as you can see above. I figured this would be the case, no surprises here, although they did count thumbs as fingers and wanted to expand on it.
Messy Writing:
Can you read the note above? Maybe, maybe not. What I find interesting is that ChatGPT only got ‘‘Tom Brady’s” name wrong, thinking it looks like ‘‘Pam Beesly’’… Which I mean, if you squint your eyes, you can definitely see!
(hmmm… am I unconsciously thinking about Pam Beesly?)
On a serious note. Writing isn’t always clear and perfect, and if your writing is intended to go into an LLM at a later date, then the LLM getting the processing of your writing correct is crucial. In studying, especially. You don’t want to be studying the wrong materials!
Upside Down Writing:
I imagine this functionality would be used in a Tom Cruise-esque situation. Perhaps hanging upside down from the Burj Khalifa with the codes to a nuclear reactor in his hands. There is no real purpose or point here, but it is worth noting that ChatGPT got it right. Every single word.
There is purpose to the next directionally challenged experiment however…
Side-ways writing:
This doesn’t happen all the time, but it does happen. Uploading landscapes as vertical pictures. Or for a reason that you may never know, your picture is just coming out sideways??? As a result, reading the text is an absolute horror show of neck movement. You become an amateur contortionist, and then an entry-level Apple Genius as you press the landscape mode button.
The GPT did perfectly transcribe the message and the prompt here. We can guess how the GPT did it (computer magic), but I wanted to ask anyway to see the explanation, see below:
“Sure! When you upload a sideways image with text, I use a vision model that automatically detects and interprets the text, regardless of orientation. It processes the image as if it were upright, applies optical character recognition (OCR) to extract the words—even handwritten ones—and then restructures them into normal, readable sentences.’’
- His Holiness, ChatGPT
Computer magic it is.
In all seriousness, that explanation does give insight into how it works. While we humans would tilt our head physically to read that, the computer is doing it in its ‘‘mind’’. An interesting difference between getting to the same conclusion. Really though, it’s the easiest path for both parties, as humans can use the GPTs method (our mind), but we have the means not to. Our necks.
A story prompt:
Really weird story. Thank you for that ChatGPT.
It did leave out my requirement of mentioning wingsuits in the story.
I asked why it left out wingsuits:
You're right to call that out—and great eye. I did miss "wingsuits" from your list, but not because I couldn’t read it. I actually could see it clearly on the page, just made a mistake by overlooking it when incorporating the items into the story.
Thanks for pointing it out—next time, I’ll make sure every item gets its proper spotlight (and maybe a turtle in a wingsuit dive-bombing for pizza).
- His Holiness, Chat GPT
It did the thing, guys. What’s the thing you ask? Forgetting to do what you asked! Omission error is what it’s called. And it can happen when there is prompt complexity, limitations in model attention, and prioritization in certain parts of the input more than others.
Anyways, it’s to be expected, and I am certainly happy with the output, even if it is quite weird.
An Error Prompt:
This one was a flub. It read the division sign as an equal sign. However, what happened underneath the hood could be two things: 1. The GPT just mistakenly read the division sign as an equal sign 2. The GPT was influenced by the prompt, assuming that the equation was wrong.
I am hoping it’s not the second reason, ‘‘as when you assume you make a ___ out of you and me.’’ Perhaps ChatGPT is human after all.
Spell Checking Prompt:
Here’s one with a prompt telling it to correct my purposefully bad spelling. It did a terrific job. Correctly identifying the words it needed to and then correcting their spelling. This use case is really good for children learning to spell on their own. I was impressed!
Decoding Prompt:
Finally, I asked ChatGPT if it could do some decoding, which, of course, it successfully did. I don’t know who wouldn’t be able to do that simple decoding. But it does impress me that it did it so quickly, and I wonder where else this functionality could be used.
(Solving the Zodiac case?)
Actual Use Cases For This
People still do write things down. Surprisingly. In schools, at home, at work, and we even have an entire age range of people who prefer writing to typing.
The use case of this technology is that it’s the connector between something we enjoy (writing) and something that helps us (digital recordings). Do you remember Moleskins? Of course you do, but do you remember Moleskins with cameras on the pens so they could record what you are writing on your iPhone? It’s like that, but much easier.
Here are the use cases:
Logging your grocery lists, or any list for that matter.
Uploading journal diaries so that you can index and search them later.
Indexing old notes.
Taking notes in class and then creating automatic flashcards with them. Generally, having your LLM being your educational assistant.
Signature analysis.
Help to solve difficult handwritten problems or decode messages.
Decoding far-away handwritten text.
Decoding messy handwritten text. Doctors’ notes primarily.
And there has to be more…
Reader Questions:
What’s your favorite part about ChatGPTs camera features?
Do you think LLM reading handwriting is valuable in this day and age?
Would you upload your own handwriting into a LLM?
Thanks for reading!
Thanks for this. I wondered if vision models would be able to read my handwriting. I want to keep a hand-written journal (I enjoy the act of writing and it’s good to practice fine motor skills), but I want to be able to search it. I wondered if this would solve my problem and now I know it likely can. Cheers!
Very useful. Thank you