How to extract text from an image via the CLI

Published at: Aug 01 2021
Updated at: Aug 01 2021
Reading time: 1min

I came across a post by Simon Willison that described how to use the tesseract CLI command to extract text from an image. I took the opportunity to fiddle around with some shell scripting and added an extract-text-from-image command to my dotfiles. It uses tesseract to analyze an image, creates a txt file with the embedded text, copies the ebedded text to the clipboard, and deletes the txt file again.

The tesseract command was available on my machine. If it's not available on yours, install it via your favorite package manager.

Terminal session showing the `extract-text-from-image` command

Find extract-image-from-texts last iteration below:

function extract-text-from-image() {
  if [ $# -eq 0 ]; then
    echo "Please specify the file you want to scan.";
    echo "  -> extract-text-from-image /some/path/image.png";
    return 1;
  fi

  TARGET_DIR=$(dirname "$1");
  FILENAME=$(basename -- "$1");
  FILENAME_WITHOUT_EXTENSION="${FILENAME%.*}";

  tesseract "$1" "$TARGET_DIR/$FILENAME_WITHOUT_EXTENSION" -l eng txt || return 1;
  pbcopy < "$TARGET_DIR/$FILENAME_WITHOUT_EXTENSION.txt";
  rm "$TARGET_DIR/$FILENAME_WITHOUT_EXTENSION.txt";
  echo "🎉 Text copied to clipboard!";
}

I don't think I'll use this command daily, but I'm amazed by this piece of CLI magic.

Edit: Wesley Martin shared the macOCR project with me and it's also worth a look!

If you enjoyed this article...

Join 6.5k readers and learn something new every week with Web Weekly.

Reply to this post and share your thoughts via good old email.

Stefan standing in the park in front of a green background

About Stefan Judis

Frontend nerd with over ten years of experience, freelance dev, "Today I Learned" blogger, conference speaker, and Open Source maintainer.

How to extract text from an image via the CLI

About Stefan Judis

Related Topics

Related Articles