GoodMem
ReferenceClient SDKsPython

OCR

Methods on this page are called as client.ocr.<method>(...) where client is either a synchronous Goodmem or asynchronous AsyncGoodmem instance initialized below:

from goodmem import Goodmem
client = Goodmem(base_url='http://localhost:8080', api_key='gm_...')
from goodmem import AsyncGoodmem
client = AsyncGoodmem(base_url='http://localhost:8080', api_key='gm_...')

Run OCR on a document or image

ocr.document(*, content: str = None, end_page: int = None, file_path: str = None, format: OcrInputFormat = None, include_markdown: bool = None, include_raw_json: bool = None, start_page: int = None) -> OcrDocumentResponse

Run OCR on a document or image. Accepts either file_path (path to a local file) or content (base64-encoded bytes). When using file_path, the file is read and base64-encoded automatically, and format is inferred from the file extension if not provided.

Parameters
  • content (str, optional) — Base64-encoded document bytes. Mutually exclusive with file_path.
  • end_page (int, optional) — 0-based inclusive end page
  • file_path (str, optional) — Path to a local file to OCR. Mutually exclusive with content.
  • format (OcrInputFormat, optional) — Input format hint (AUTO, PDF, TIFF, PNG, JPEG, BMP)
  • include_markdown (bool, optional) — Include markdown rendering in the response
  • include_raw_json (bool, optional) — Include raw OCR JSON payload in the response
  • start_page (int, optional) — 0-based inclusive start page
Returns

OcrDocumentResponse — Returns extracted text and layout information.

Example
# image_b64 = base64.b64encode(open("image.png", "rb").read()).decode()
result = client.ocr.document(
    content=image_b64,
    format="PNG",
    include_markdown=True,
)
print(f"Pages: {result.page_count}")
for page in result.pages:
    if page.page:
        print(page.page.markdown)


Async usage: client.ocr exposes the same methods on AsyncGoodmem; use await / async for as needed.


Data Models

All data models are pydantic v2 models. Fields are shown with their Python attribute names; JSON responses use camelCase aliases (e.g., owner_idownerId).

OcrInputFormat

String enum: "AUTO" · "PDF" · "TIFF" · "PNG" · "JPEG" · "BMP"

OcrDocumentResponse

Response containing page-ordered OCR results.

  • detected_format (OcrInputFormat) — Detected input format
  • page_count (int) — Number of pages processed after applying the range
  • pages (list[OcrPageResult]) — Ordered per-page OCR results
  • timings (DocumentTimings) — Aggregate timing statistics

OcrPageResult

Per-page OCR result containing output or error status.

  • page_index (int) — 0-based page index
  • page (OcrPage, optional) — OCR output for the page
  • status (RpcStatus, optional) — Error status for the page

DocumentTimings

Aggregate timing statistics for an OCR request.

  • wall_time_ms (int) — End-to-end request time (ms)
  • sum_queue_wait_ms (int) — Sum of per-page queue wait times (ms)
  • sum_render_ms (int) — Sum of per-page render times (ms)
  • sum_ocr_ms (int) — Sum of per-page OCR times (ms)
  • sum_page_total_ms (int) — Sum of per-page total times (ms)

OcrPage

OCR output for a single page.

  • raw_json (str, optional) — Raw OCR JSON payload when requested
  • markdown (str, optional) — Markdown rendering when requested
  • layout (OcrLayout) — Parsed layout output
  • timings (PageTimings) — Timing breakdown for the page
  • image (ImageInfo) — Rendered image metadata

RpcStatus

Status payload for per-page OCR errors.

  • code (int) — gRPC status code as defined by google.rpc.Code
  • message (str, optional) — Human-readable error message

OcrLayout

Parsed OCR layout output for a page.

  • cells (list[OcrCell]) — Layout cells in reading order

PageTimings

Per-page timing breakdown for OCR processing.

  • queue_wait_ms (int) — Time spent waiting in the render queue (ms)
  • render_ms (int) — Time spent rendering the page image (ms)
  • ocr_ms (int) — Time spent running OCR (ms)
  • total_ms (int) — Total page processing time (ms)

ImageInfo

Rendered image metadata for an OCR page.

  • width_px (int) — Rendered image width in pixels
  • height_px (int) — Rendered image height in pixels
  • dpi (int) — Rendering DPI

OcrCell

Single OCR layout element.

  • bbox (BoundingBox, optional) — Bounding box in page coordinates
  • category_label (str) — Raw category label emitted by OCR
  • category (OcrCategory) — Normalized OCR category
  • text (str) — OCR text content

BoundingBox

Bounding box coordinates in page space.

  • x1 (float) — Left coordinate
  • y1 (float) — Top coordinate
  • x2 (float) — Right coordinate
  • y2 (float) — Bottom coordinate

OcrCategory

String enum: "UNSPECIFIED" · "CAPTION" · "FOOTNOTE" · "FORMULA" · "LIST_ITEM" · "PAGE_FOOTER" · "PAGE_HEADER" · "PICTURE" · "SECTION_HEADER" · "TABLE" · "TEXT" · "TITLE" · "OTHER" · "UNKNOWN"