How to extract text from an image via the CLI
- Published at
- Updated at
- Reading time
- 1min
I came across a post by Simon Willison that described how to use the tesseract
CLI command to extract text from an image. I took the opportunity to fiddle around with some shell scripting and added an extract-text-from-image
command to my dotfiles. It uses tesseract
to analyze an image, creates a txt
file with the embedded text, copies the ebedded text to the clipboard, and deletes the txt
file again.
The tesseract
command was available on my machine. If it's not available on yours, install it via your favorite package manager.
Find extract-image-from-text
s last iteration below:
function extract-text-from-image() {
if [ $# -eq 0 ]; then
echo "Please specify the file you want to scan.";
echo " -> extract-text-from-image /some/path/image.png";
return 1;
fi
TARGET_DIR=$(dirname "$1");
FILENAME=$(basename -- "$1");
FILENAME_WITHOUT_EXTENSION="${FILENAME%.*}";
tesseract "$1" "$TARGET_DIR/$FILENAME_WITHOUT_EXTENSION" -l eng txt || return 1;
pbcopy < "$TARGET_DIR/$FILENAME_WITHOUT_EXTENSION.txt";
rm "$TARGET_DIR/$FILENAME_WITHOUT_EXTENSION.txt";
echo "๐ Text copied to clipboard!";
}
I don't think I'll use this command daily, but I'm amazed by this piece of CLI magic.
Edit: Wesley Martin shared the macOCR project with me and it's also worth a look!
Join 5.5k readers and learn something new every week with Web Weekly.