Skip to main content

ODF

Supported formats

ODT (OpenDocument Text)

Description

The ODF (Open Document Format) is an open, XML-based file format standard used for office documents. This backend currently only supports text documents (.odt).

info

Available in Contextal Platform 1.0 and later.

Features

All text and images will be extracted from the document for further processing, and images additionally will undergo the OCR process.

Symbols

Object

  • ODT → the document is text document
  • ODS → the document is spreadsheet document (not supported)
  • LIMITS_REACHED → limits triggered while processing the document

Children

  • TOOBIG → this child object was truncated or was not stored as it exceeds the limits
  • NOT_FOUND → this child object was referenced in the document but not found in the document container

Example Metadata

{
"org": "ctx",
"object_id": "9f2f078cc3eae5d9eea5f4093a1298d51d9e5b29eebf6e08006a956296b2bd34",
"object_type": "ODF",
"object_subtype": "ODT",
"recursion_level": 1,
"size": 257093,
"hashes": {
"md5": "078d243ff98629015376455339133740",
"sha1": "a9d15e223af670c669110b88b6b2f4375f2bc81c",
"sha256": "9f2f078cc3eae5d9eea5f4093a1298d51d9e5b29eebf6e08006a956296b2bd34",
"sha512": "34ba82ce91fb9d26f0a693321af28437c1971b6ffb285f81d4f73dfe5cee82d0dc42e90cbb2590aba82d56398c6cbeba5ee4a29221ff897d3e4221a657d4b4d8"
},
"ctime": 1726228508.805513,
"ok": {
"symbols": [
"ODT"
],
"object_metadata": {
"_backend_version": "1.0.0",
"manifest_version": "urn:oasis:names:tc:opendocument:xmlns:manifest:1.0",
"properties": {
"creation-date": "2024-06-05T14:36:00Z",
"creator": "John Doe",
"date": "2024-06-05T14:36:00Z",
"editing-cycles": "2",
"editing-duration": "PT0S",
"generator": "MicrosoftOffice/15.0 MicrosoftWord",
"initial-creator": "John Doe"
}
},
[...]

Example Queries

object_type == "ODF"
&& @match_object_meta($properties.generator regex("MicrosoftWord"))
&& @has_child(object_type == "Text"
&& @match_object_meta($natural_language_sentiment.compound < 0)
)
  • This query matches an ODF text document with generator metadata entry containing MicrosoftWord substring (case sensitive), and from which a Text object with negative sentiment is extracted.

Configuration Options

  • max_processed_size → maximum size of the input object that will be processed (default: 262144000)
  • max_children → maximum number of children objects to create (default: 100)
  • max_child_output_size → maximum size of a single children object (default: 41943040)