A simple and lightweight tool for extracting text from a screenshot/image (on the fly)
-
Screen capture uses the fastest available hardware path (X11, Windows): DXGI Desktop Duplication on Windows acquires frames directly from the GPU's front buffer via a staging texture mapped for CPU read, avoiding any GDI software rasterization; XGetImage on X11 takes a direct 32bpp packed-pixel fast path (a single
memcpy-equivalent row scan), falling back to theXGetPixelgeneric path only when the pixel format does not match the expected mask layout. The screen is then kept as a single RGBA buffer in memory for the entire session; all cropping, annotation rendering, and encoding operate on that buffer without re-capturing. -
The fullscreen overlay is a borderless windowed surface rather than exclusive fullscreen, avoiding implicit GPU mode switches and the display state corruption they can leave behind on abnormal exit. Can also be changed via configuration file.
-
OCR, barcode scanning, and font loading are all on-demand: none are initialized at startup; Tesseract and ZBar are only configured when the user triggers an extraction, and the Tesseract engine instance is reused across extractions within a session, re-initializing only when the model or data path changes. Tesseract page segmentation mode is additionally dispatched in O(1) via area and aspect ratio heuristics before OCR runs, avoiding full-page layout analysis on small single-word or single-line regions.
-
Annotation geometry is rendered entirely through ImGui draw lists on the GPU, with CPU-side pixel rasterization only used when baking annotations into the saved image. The rasterizer uses Bresenham's line algorithm
O(max(Δx, Δy))and a midpoint circle algorithmO(radius)rather than naive scanline fills. -
Pencil stroke point reduction uses a squared-distance threshold, comparing
dx²+dy² > 4.0rather than computingsqrt, keeping the per-mouse-move check O(1) with no transcendental function call and keeping the point array small regardless of how long the user draws. -
Grayscale conversion for barcode scanning uses integer-only ITU-R BT.601 weights
(77r + 150g + 29b) >> 8rather than floating-point luminance coefficients, keeping the O(w×h) pixel walk entirely in the integer pipeline. -
Monitor detection is O(monitors), querying only the list of attached outputs and comparing cursor coordinates against their bounding rectangles, never touching pixel data.
-
The font cache is an O(log n) lookup keyed on
(path, size), ensuring repeated renders of the same annotated text at the same size never trigger atlas rebuilds or filesystem access. -
Image downscaling for oversized sources uses
stbir_resize_uint8_linear, a cache-friendly separable linear filter that processes pixels in a single O(w×h) pass with SIMD-friendly memory access patterns. -
VSync is user-configurable, allowing the overlay to drop to uncapped rendering on systems where the compositor introduces latency.
-
External dependencies are kept minimal: image loading, resizing, and writing use single-header stb libraries compiled only into the translation units that need them, with no transitive system library requirements beyond what the platform already provides.
Package names may vary by distribution and package manager.
If a package is not found, try searching by its base name (e.g., libglfw3-dev → glfw).
libx11-devlibxcb-devlibpng-devlibglfw3-devlibtesseract(including necessary language models, e.gtesseract-ocr-eng)libzbar-devlibappindicator3-devgrim(Wayland only)wl-clipboard(Wayland only)
$ git clone https://github.com/Toni500github/oshot/
$ cd oshot/
$ make
# You can move it in a custom directory in your $PATH (preferably in the home)
$ ./build/release/oshot$ git clone https://github.com/Toni500github/oshot/
$ cd oshot/
$ mkdir build2 && cd build2
$ cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Release
$ ninja
# You can move it in a custom directory in your $PATH (preferably in home)
$ ./oshotTesseract uses separate language model files (.traineddata) for each language.
You can store these files anywhere you like, as long as the path is configured correctly.
-
Download the required language model(s) from the official Tesseract repository:
https://github.com/tesseract-ocr/tessdata -
Place the downloaded
.traineddatafiles in one of the following locations:- The
models/directory next to theoshotbinary (recommended) - Or any other directory of your choice (configure the path in the config file)
- The
-
Configure the language data path in
config.toml:- Windows:
%APPDATA%/oshot/config.toml - Linux:
~/.config/oshot/config.toml
Set the
ocr-pathvariable to the directory containing the.traineddatafiles. Example:# Works on windows too ocr-path = "~/Downloads/oshot/models"
- Windows:
If when starting oshot, it starts to flick a screen black (or it won't launch), try the following steps:
- Download MesaForWindows-x64-20.1.8.7z
- Extract the
opengl32.dllfile into the directory whereoshot.exeis located - Try to launch it again
- If oshot gives linking library errors, when trying to run it, then try to use the AppImage release instead.
- If you try to copy the text into the clipboard and doesn't work, try to launch
oshot --trayand then from the system tray you launch oshot
If still errors, please open an Issue and take a screenshot/paste the text of the error appearing in the console when executing oshot