Skip to main content

Gzip

Supported formats

Gzip

Description

Gzip is a compression format particularly popular in Unix environments, but widely supported across different operating systems and applications. This backend can decompress gzip objects for further processing.

info

Available in Contextal Platform 1.0 and later.

Features

This backend extracts metadata from member headers and supports single and multi-member gzip streams.

Symbols

Object

  • GZIP_MULTI_MEMBER → multi member are gzip files created with something akin to cat a.gz b.gz > multi.gz (typically used when compressing logs)
  • GZIP_TRAILING_GARBAGE → extra data exists after the logical end of the stream
  • LIMITS_REACHED → limits triggered while processing the stream

Children

  • TOOBIG → the stream was not extracted as it exceeds the limits

Example Metadata

{
"org": "ctx",
"object_id": "947c0bd817d9f88b318f1eb86825786334e6a831d6fa5b5a8237a77ade202f67",
"object_type": "Gzip",
"object_subtype": null,
"recursion_level": 1,
"size": 179607,
"hashes": {
"sha256": "947c0bd817d9f88b318f1eb86825786334e6a831d6fa5b5a8237a77ade202f67",
"md5": "064279a8438f6dec06d6fb5c899aa68e",
"sha1": "4f8e97b80c6a760456ff94dd2b545ccf3210f79b",
"sha512": "54889c6828c70c729903d63d5411e37bda942b8dade5894ac4e6f1d00a88c9c2d5869cd40e5bce0300b733031162eb4041f0d28403f2b23123266be408e6c59e"
},
"ctime": 1713293055.136413,
"ok": {
"symbols": [],
"object_metadata": {
"_backend_version": "1.0.0",
"has_comment": false,
"has_extra": false,
"has_name": false,
"members": [
{
"extra_flags": 0,
"has_comment": false,
"has_extra": false,
"is_text": false,
"os": 255,
"ts": 0
}
],
"total_members": 1
},
"children": [
{
"org": "ctx",
"object_id": "55e153d19cee0f23b7367850f3e5978c480c41c55c2c540dbbebfbaa4970ca81",
"object_type": "HTML",
"object_subtype": null,
"recursion_level": 2,
"size": 241797,
"hashes": {
"md5": "0dbbb837dab2998d4f4937f6e8d6cba7",
"sha1": "6aa1b7cc4ef34eeb432d64463206e960c5268ccf",
"sha256": "55e153d19cee0f23b7367850f3e5978c480c41c55c2c540dbbebfbaa4970ca81",
"sha512": "ae6e2f511c59eb56aefbd2f950ec932cfbc70fbd63fbd82cb2cdf9d6605021e4204c4d4e875247734317987559d519a41bed28e11161ca4754e3474ba8875703"
},
"ctime": 1713293055.136413,
"relation_metadata": {
"compression_factor": 1.3462559922497452,
"input_size": 179607,
"output_size": 241797
},
[...]

Example Queries

object_type == "Gzip"
&& @has_child(@match_relation_meta($compression_factor > 10))
  • This matches a Gzip object, which has a high compression factor.

Configuration Options

  • max_headers → maximum number of headers that will be processed and stored in metadata (default: 64)
  • max_child_input_size → maximum size of the input children object (default: 262144000)
  • max_child_output_size → maximum size of the output children object (default: 262144000)