OCR

Methods on this page are called as client.ocr.<method>(...) where client is either a synchronous Goodmem or asynchronous AsyncGoodmem instance initialized below:

from goodmem import Goodmem
client = Goodmem(base_url='http://localhost:8080', api_key='gm_...')

from goodmem import AsyncGoodmem
client = AsyncGoodmem(base_url='http://localhost:8080', api_key='gm_...')

Run OCR on a document or image

ocr.document(*, content: str = None, end_page: int = None, file_path: str = None, format: OcrInputFormat = None, include_markdown: bool = None, include_raw_json: bool = None, start_page: int = None) -> OcrDocumentResponse

Run OCR on a document or image. Accepts either file_path (path to a local file) or content (base64-encoded bytes). When using file_path, the file is read and base64-encoded automatically, and format is inferred from the file extension if not provided.

Parameters

content (str, optional) — Base64-encoded document bytes. Mutually exclusive with file_path.
end_page (int, optional) — 0-based inclusive end page
file_path (str, optional) — Path to a local file to OCR. Mutually exclusive with content.
format (OcrInputFormat, optional) — Input format hint (AUTO, PDF, TIFF, PNG, JPEG, BMP)
include_markdown (bool, optional) — Include markdown rendering in the response
include_raw_json (bool, optional) — Include raw OCR JSON payload in the response
start_page (int, optional) — 0-based inclusive start page

Returns

OcrDocumentResponse — Returns extracted text and layout information.

Example

# image_b64 = base64.b64encode(open("image.png", "rb").read()).decode()
result = client.ocr.document(
    content=image_b64,
    format="PNG",
    include_markdown=True,
)
print(f"Pages: {result.page_count}")
for page in result.pages:
    if page.page:
        print(page.page.markdown)

Async usage: client.ocr exposes the same methods on AsyncGoodmem; use await / async for as needed.

Data Models

All data models are pydantic v2 models. Fields are shown with their Python attribute names; JSON responses use camelCase aliases (e.g., owner_id → ownerId).

OcrInputFormat

String enum: "AUTO" · "PDF" · "TIFF" · "PNG" · "JPEG" · "BMP"

OcrDocumentResponse

Response containing page-ordered OCR results.

detected_format (OcrInputFormat) — Detected input format
page_count (int) — Number of pages processed after applying the range
pages (list[OcrPageResult]) — Ordered per-page OCR results
timings (DocumentTimings) — Aggregate timing statistics

OcrPageResult

Per-page OCR result containing output or error status.

page_index (int) — 0-based page index
page (OcrPage, optional) — OCR output for the page
status (RpcStatus, optional) — Error status for the page

DocumentTimings

Aggregate timing statistics for an OCR request.

wall_time_ms (int) — End-to-end request time (ms)
sum_queue_wait_ms (int) — Sum of per-page queue wait times (ms)
sum_render_ms (int) — Sum of per-page render times (ms)
sum_ocr_ms (int) — Sum of per-page OCR times (ms)
sum_page_total_ms (int) — Sum of per-page total times (ms)

OcrPage

OCR output for a single page.

raw_json (str, optional) — Raw OCR JSON payload when requested
markdown (str, optional) — Markdown rendering when requested
layout (OcrLayout) — Parsed layout output
timings (PageTimings) — Timing breakdown for the page
image (ImageInfo) — Rendered image metadata

RpcStatus

Status payload for per-page OCR errors.

code (int) — gRPC status code as defined by google.rpc.Code
message (str, optional) — Human-readable error message

OcrLayout

Parsed OCR layout output for a page.

cells (list[OcrCell]) — Layout cells in reading order

PageTimings

Per-page timing breakdown for OCR processing.

queue_wait_ms (int) — Time spent waiting in the render queue (ms)
render_ms (int) — Time spent rendering the page image (ms)
ocr_ms (int) — Time spent running OCR (ms)
total_ms (int) — Total page processing time (ms)

ImageInfo

Rendered image metadata for an OCR page.

width_px (int) — Rendered image width in pixels
height_px (int) — Rendered image height in pixels
dpi (int) — Rendering DPI

OcrCell

Single OCR layout element.

bbox (BoundingBox, optional) — Bounding box in page coordinates
category_label (str) — Raw category label emitted by OCR
category (OcrCategory) — Normalized OCR category
text (str) — OCR text content

BoundingBox

Bounding box coordinates in page space.

x1 (float) — Left coordinate
y1 (float) — Top coordinate
x2 (float) — Right coordinate
y2 (float) — Bottom coordinate

OcrCategory

String enum: "UNSPECIFIED" · "CAPTION" · "FOOTNOTE" · "FORMULA" · "LIST_ITEM" · "PAGE_FOOTER" · "PAGE_HEADER" · "PICTURE" · "SECTION_HEADER" · "TABLE" · "TEXT" · "TITLE" · "OTHER" · "UNKNOWN"

OCR

Equivalent REST calls

On this page