ODF
Supported formats
ODT (OpenDocument Text)
Description
The ODF (Open Document Format) is an open, XML-based file format standard used for office documents. This backend currently only supports text documents (.odt
).
info
Available in Contextal Platform 1.0 and later.
Features
All text and images will be extracted from the document for further processing, and images additionally will undergo the OCR process.
Symbols
Object
ODT
→ the document is text documentODS
→ the document is spreadsheet document (not supported)LIMITS_REACHED
→ limits triggered while processing the document
Children
TOOBIG
→ this child object was truncated or was not stored as it exceeds the limitsNOT_FOUND
→ this child object was referenced in the document but not found in the document container
Example Metadata
{
"org": "ctx",
"object_id": "9f2f078cc3eae5d9eea5f4093a1298d51d9e5b29eebf6e08006a956296b2bd34",
"object_type": "ODF",
"object_subtype": "ODT",
"recursion_level": 1,
"size": 257093,
"hashes": {
"md5": "078d243ff98629015376455339133740",
"sha1": "a9d15e223af670c669110b88b6b2f4375f2bc81c",
"sha256": "9f2f078cc3eae5d9eea5f4093a1298d51d9e5b29eebf6e08006a956296b2bd34",
"sha512": "34ba82ce91fb9d26f0a693321af28437c1971b6ffb285f81d4f73dfe5cee82d0dc42e90cbb2590aba82d56398c6cbeba5ee4a29221ff897d3e4221a657d4b4d8"
},
"ctime": 1726228508.805513,
"ok": {
"symbols": [
"ODT"
],
"object_metadata": {
"_backend_version": "1.0.0",
"manifest_version": "urn:oasis:names:tc:opendocument:xmlns:manifest:1.0",
"properties": {
"creation-date": "2024-06-05T14:36:00Z",
"creator": "John Doe",
"date": "2024-06-05T14:36:00Z",
"editing-cycles": "2",
"editing-duration": "PT0S",
"generator": "MicrosoftOffice/15.0 MicrosoftWord",
"initial-creator": "John Doe"
}
},
[...]
Example Queries
object_type == "ODF"
&& @match_object_meta($properties.generator regex("MicrosoftWord"))
&& @has_child(object_type == "Text"
&& @match_object_meta($natural_language_sentiment.compound < 0)
)
- This query matches an
ODF
text document withgenerator
metadata entry containingMicrosoftWord
substring (case sensitive), and from which aText
object with negative sentiment is extracted.
Configuration Options
max_processed_size
→ maximum size of the input object that will be processed (default: 262144000)max_children
→ maximum number of children objects to create (default: 100)max_child_output_size
→ maximum size of a single children object (default: 41943040)