The History of Optical Character Recognition

The history of OCR

Optical Character Recognition, also known as OCR, has long been living in the shadows. People saw it as a usual everyday tool like many others we use.

But, OCR is much more than that.

It is technology that made our lives much easier and the workload much lesser.

Imagine hours and hours of transcribing all paper files into digital ones… it’s painful and time-consuming.

Thanks to OCR, we are able to digitalize paper files in no time, in a few clicks.

All you need to use is an OCR tool (reader) and a device to take a photo of the printed file.

It can be your phone, tablet, or laptop with a working camera.

Then, you just upload the photo to the OCR tool you use and you’ll have the file digitalized in a few seconds.

It’s that easy!

Can you imagine a world without OCR technology?

We can’t and don’t want to imagine.

But, there was a pre-OCR world and people had to work much harder to digitalize all paper files.

Speaking of the past, OCR wasn’t the same as we know it today. It wasn’t as sophisticated as it is today. What it was a century ago has nothing to do with what it is today. It has evolved throughout time, becoming better and better.

The Beginnings of Optical Character Recognition

The beginnings of Optical Character Recognition

The roots of OCR technology go back to the First World War, with the invention of the first telegraph.

The physicist Emanuel Goldberg invented a machine that could read and convert characters into telegraph code - a telegraph.

Later, in the 1920s, he invented the first electronic system for retrieving documents known as the statistical machine.

The machine automated the data extraction from paper files, thus replacing the method of microfilming which was used at that time.

Since the invention of Goldberg’s “Statistical Machine”, businesses around the globe started using it to reduce the time spent on paper data extraction and conversion.

The Reading Machine by Gustav Tauschek

Apart from Goldberg’s inventions, other OCR inventions took place in the meantime as well.

Gustav Tauschek invented the first OCR device, known as the Reading machine, in the late 1920s.

This machine used template matching with a photodetector to detect the letters on a picture.

Apart from the photodetector, there was also a disk with holes in the form of letters that rotated on the interior side of the objective lens.

To read a file, a picture with text has passed in front of the machine’s window.

When the images and letter-shaped holes matched in form, the printing drum rotated to the letter(s), and that letter was printed on paper.

The Evolution of Optical Character Recognition

After the first OCR technology has appeared, it started developing in the following decades.

Here’s a chronological list of all milestones throughout the years, up to today.

  • 1951 - David Shepard invented “Gismo”, a machine that could recognize all letters of the Latin alphabet produced by a standard typewriter. Later, it evolved into the Farrington Machine.
  • 1974 - The first omni-font OCR system, the CCD flatbed scanner, was invented by Kurzweil Technologies.
  • 1992 - The Newton MessagePad was launched. This OCR reader introduced the handwriting recognition OCR technology. That meant, the device could recognize handwritten text.
  • 2006 - The OCR software Tesseract becomes part of Google and the cooperation leads to a great evolution of the technology. Namely, OCR became independent as it could recognize patterns on its own instead of putting the language rules in a device.
  • 2019 - OCR can recognize any letter and number, in any language, printed or handwritten.

The Problems with the First OCR Machines

It’s clear that the first OCR machines were complex to use and not sophisticated as the technology we have today.

First and foremost, to use the OCR technology, one had to use a machine whereas, today, we only need a smartphone and an internet connection.

Second, they were very limited in their function.

They could recognize only one font at a time and had to be trained with images of each character to recognize them.

Last, OCR could read the font(s) that were specially designed to be read by machines.

How Has OCR Changed Throughout the Years?

OCR changes throughout the years

The first OCR machines had a quite complex design and limited function.

They could recognize one character at a time.

They were slow, not precise, and had to have the characters inserted so they could recognize them.

And, there’s the fact that people had to use an entire machine to read printed texts.

Throughout the years, the machines have evolved and started reading faster, more precisely, and were able to read different fonts at a time.

Later, they started to read handwritten text apart from the printed one.

Also, they could read any characters, not only the ones they were trained to recognize.

In the 21st century, OCR technology has drastically changed. First, because it became independent.

OCR became one with neural networks which lead to OCR being able to recognize patterns on its own without the need to insert language rules into the machine.

Second, because OCR has become available online, so there was no need for a machine.

It evolved into a cloud-based service that could be accessed via mobile and desktop apps.

Fast forward, today, we have sophisticated and highly accurate OCR technology that can extract text from photos and digitalize printed files in seconds and recognize every language and font.

The very first example is the online OCR tool here. Try it and see how OCR works in real-time.

Besides being the best ever, we know the OCR technology will continue to evolve.

We expect to see new advances in the future, especially since AI and Machine Learning are taking over the world.

The Future of Optical Character Recognition

The future of OCR

No one knows what the future holds, but we believe it’s gonna bring a lot of advancements to OCR technology.

Who knows… it may combine OCR with AI or Machine learning?

Whatever happens, we are sure it’s going to be better than what we have today.

OCR will continue to be an inevitable part of businesses, helping them digitalize their paper files and records and extract texts from pictures.

Moreover, it will play a very important role in environmental protection and deforestation as it will reduce paper use in offices, eventually creating paperless businesses.