OCR
Methods on this page are called as client.ocr.<method>(...) where client is either a synchronous Goodmem or asynchronous AsyncGoodmem instance initialized below:
from goodmem import Goodmem
client = Goodmem(base_url='http://localhost:8080', api_key='gm_...')from goodmem import AsyncGoodmem
client = AsyncGoodmem(base_url='http://localhost:8080', api_key='gm_...')Run OCR on a document or image
Run OCR on a document or image. Accepts either file_path (path to a local file) or content (base64-encoded bytes). When using file_path, the file is read and base64-encoded automatically, and format is inferred from the file extension if not provided.
- content (
str, optional) — Base64-encoded document bytes. Mutually exclusive withfile_path. - end_page (
int, optional) — 0-based inclusive end page - file_path (
str, optional) — Path to a local file to OCR. Mutually exclusive withcontent. - format (
OcrInputFormat, optional) — Input format hint (AUTO, PDF, TIFF, PNG, JPEG, BMP) - include_markdown (
bool, optional) — Include markdown rendering in the response - include_raw_json (
bool, optional) — Include raw OCR JSON payload in the response - start_page (
int, optional) — 0-based inclusive start page
OcrDocumentResponse — Returns extracted text and layout information.
# image_b64 = base64.b64encode(open("image.png", "rb").read()).decode()
result = client.ocr.document(
content=image_b64,
format="PNG",
include_markdown=True,
)
print(f"Pages: {result.page_count}")
for page in result.pages:
if page.page:
print(page.page.markdown)Async usage: client.ocr exposes the same methods on AsyncGoodmem; use await / async for as needed.
Data Models
All data models are pydantic v2 models. Fields are shown with their Python attribute names; JSON responses use camelCase aliases (e.g., owner_id → ownerId).
OcrInputFormat
String enum: "AUTO" · "PDF" · "TIFF" · "PNG" · "JPEG" · "BMP"
OcrDocumentResponse
Response containing page-ordered OCR results.
- detected_format (
OcrInputFormat) — Detected input format - page_count (
int) — Number of pages processed after applying the range - pages (
list[OcrPageResult]) — Ordered per-page OCR results - timings (
DocumentTimings) — Aggregate timing statistics
OcrPageResult
Per-page OCR result containing output or error status.
- page_index (
int) — 0-based page index - page (
OcrPage, optional) — OCR output for the page - status (
RpcStatus, optional) — Error status for the page
DocumentTimings
Aggregate timing statistics for an OCR request.
- wall_time_ms (
int) — End-to-end request time (ms) - sum_queue_wait_ms (
int) — Sum of per-page queue wait times (ms) - sum_render_ms (
int) — Sum of per-page render times (ms) - sum_ocr_ms (
int) — Sum of per-page OCR times (ms) - sum_page_total_ms (
int) — Sum of per-page total times (ms)
OcrPage
OCR output for a single page.
- raw_json (
str, optional) — Raw OCR JSON payload when requested - markdown (
str, optional) — Markdown rendering when requested - layout (
OcrLayout) — Parsed layout output - timings (
PageTimings) — Timing breakdown for the page - image (
ImageInfo) — Rendered image metadata
RpcStatus
Status payload for per-page OCR errors.
- code (
int) — gRPC status code as defined by google.rpc.Code - message (
str, optional) — Human-readable error message
OcrLayout
Parsed OCR layout output for a page.
- cells (
list[OcrCell]) — Layout cells in reading order
PageTimings
Per-page timing breakdown for OCR processing.
- queue_wait_ms (
int) — Time spent waiting in the render queue (ms) - render_ms (
int) — Time spent rendering the page image (ms) - ocr_ms (
int) — Time spent running OCR (ms) - total_ms (
int) — Total page processing time (ms)
ImageInfo
Rendered image metadata for an OCR page.
- width_px (
int) — Rendered image width in pixels - height_px (
int) — Rendered image height in pixels - dpi (
int) — Rendering DPI
OcrCell
Single OCR layout element.
- bbox (
BoundingBox, optional) — Bounding box in page coordinates - category_label (
str) — Raw category label emitted by OCR - category (
OcrCategory) — Normalized OCR category - text (
str) — OCR text content
BoundingBox
Bounding box coordinates in page space.
- x1 (
float) — Left coordinate - y1 (
float) — Top coordinate - x2 (
float) — Right coordinate - y2 (
float) — Bottom coordinate
OcrCategory
String enum: "UNSPECIFIED" · "CAPTION" · "FOOTNOTE" · "FORMULA" · "LIST_ITEM" · "PAGE_FOOTER" · "PAGE_HEADER" · "PICTURE" · "SECTION_HEADER" · "TABLE" · "TEXT" · "TITLE" · "OTHER" · "UNKNOWN"