The Document Resource is used to access the most basic element in Stax.ai - a document. The document resources contains all metadata for the document.
Resource Schema
{
"_id": "", // [String] 24-char unique document ID
"team": "", // [String] 24-char unique team ID
"stack": "", // [String] 24-char unique stack ID
"name": "", // [String] Document name
"thumbnail": "", // [String] Base-64 encoded string of the thumbnail of the first page
"source": "", // [String] Document source (uploaded, email address, fax number)
"reviewed": false, // [Boolean] True if document has been viewed
"archivged": false, // [Boolean] True if document is archived
"urgent": false, // [Boolean] True if document is of high priority
"deadline": null, // [ISODate] Deadline date-time if exists
"flagged": false, // [Boolean] True if flagged by a module or worker
"assignee": "", // [String] 24-char unique user ID
"checksum": "", // [String] SHA-512 checksum of extracted document content
"original" : "", // [String] Document ID of original if this is a duplicate
"keywords": [ "", ... ], // [List.String] Extracted keywords if applicable
"receivedOn": ISODate(), // [ISODate] When this document was received on
"lastModified": ISODate(),// [ISODate] When this document was last modified on
// List of page access keys
"pages": [
"", // [String] Page access key (1st page)
...
],
// Extracted words
"words": [
{
"x": 0.0, // [Double] Horizontal position of top-left corner of word relative to page width
"y": 0.0, // [Double] Vertical position of top-left corner of word relative to page height
"width": 0.0, // [Double] Width of word relative to page width
"height": 0.0, // [Double] Height of word relative to page height
"angle": 0.0, // [Double] Angle of words in degrees - CCW positive
"page": 0, // [Integer] Page index
"group": 0, // [Integer] Group index within page
"line": 0, // [Integer] Line index within group
"word": 0, // [Integer] Word index within line
"class": 0, // [Integer] Class index (paragraph, abstract, etc.)
"text": "", // [String] Word text as a string
},
...
],
// Document "Fields" (metadata)
"metadata": [
{
"key": "", // [String] Field key
"value": "", // [String] Field value
"tags": [ 0, ... ], // [List.Integer] List of word indices
"pages": [ 0, ... ], // [List.Integer] List of pages the value is in
"confidence": 1.0, // [Double] Prediction confidence (0-1) if applicable
},
...
],
// Extracted/labeled bounding boxes
"bounds": [
{
"label": "", // [String] Label or class
"data": {}, // [Object] Any attached data
"page": 0, // [Integer] Page this box is in
"x": 0.0, // [Double] Horizontal position of top-left corner relative to the page width
"y": 0.0, // [Double] Vertical position of top-left corner relative to the page height
"width": 0.0, // [Double] Width relative to the page width
"height": 0.0, // [Double] Height relative to the page width
},
...
],
// Files attached to document
"attachments": [
{
"filename": "", // [String] Attachment file name
"document": "", // [String] Document ID if the attachment is a Stax.ai document
"filepath": "", // [String] Access key to document if it is a true attachment
},
...
]
// User notes on the document
"notes": [
{
"page": 0, // [Integer] Page index
"tags": [ "", ... ], // [List.String] User IDs of tagged users
"x": 0.0, // Horizontal position of note relative to page width
"y": 0.0, // Vertical position of note relative to page width
// Comment thread is an array of objects
"thread": [
{
"text": "", // [String] Comment
"user": "", // [String] User ID
"timestamp": ISODate(), // [ISODate] When this comment was posted
},
...
]
}
],
// Sharing configuration
"sharing": {
"public": false, // [Boolean] True if document can be accessed publicly
// Password protection
"protected": {
"emails": [ "", ... ], // [Array.String] List of emails with access to file
"teams": [ "", ... ], // [Array.String] List of IDs of teams with access
"password": "", // [String] Password to access document
},
// Log of protected access by email address
"access": [
{
"email": "", // [String] Email address
"usage": 0, // [Integer] Number of times accessed with email
},
...
]
},
// Internal document processing job data
"job": {
"working": false, // [Boolean] True if actively being processed
"success": true, // [Boolean] True if job processing is successful
"status": "sort", // [String] Step of job processing pipeline (one of preproc, read, or sort)
"error": "", // [String] Error message if job processing failed
... // Other internal variables - IGNORE
},
// History of document modifications
"time": [
{
"event": "", // [String] Description of modification event
"stamp": ISODate(), // [ISODate] Modification timestmap
"user": "", // [String] User ID of user that made the modification
},
...
]
}
Fields (Metadata)
Document fields are stored in the metadata array inside the Document resource. Fields are key-value pairs that can be labeled by users, automatically based on extracted information, or programmatically.
The field object has a pages property - which is an array of integers representing the page indices (0-indexed) this field value is present in.
The field object has key and value string properties. Additionally, a tag property is present - which is a list of integers representing the indices of the words (see "words" property) that make up the field value.
The general schema of a field object is as follows:
{
"key": "", // [String] Field key
"value": "", // [String] Field value
"tags": [ 0, ... ], // [List.Integer] List of word indices
"pages": [ 0, ... ], // [List.Integer] List of pages the value is in
"confidence": 1.0, // [Double] Prediction confidence (0-1) if applicable
}
Fields with duplicate keys
Stax.ai supports multiple fields with the same key. This is to allow for keys in different pages, or multiple labels of the same data. This feature can also be used to store a list of values under a key by setting the same key for all values in the list.
Bounding Boxes
Bounding boxes are used to label non-textual information in a document - such as signatures, images, figures, fields, etc. Bounding boxes are also used to label form fields for the static form data extraction modules.
{
"label": "", // [String] Label or class
"data": {}, // [Object] Any attached data
"page": 0, // [Integer] Page this box is in
"x": 0.0, // [Double] Horizontal position of top-left corner relative to the page width
"y": 0.0, // [Double] Vertical position of top-left corner relative to the page height
"width": 0.0, // [Double] Width relative to the page width
"height": 0.0, // [Double] Height relative to the page width
}
The general schema of a bounding box object is as follows:
Customizable data for bounding boxes
Each bounding box has a data property where you can store any JSON-formatted data. This can be used to hold information regarding extracted images, signatures, etc.
Attachments
Each Stax.ai document can have one or more attached files. The attachments can either be other Stax.ai documents (in which case, attachments are just links), or they can be actual files attached to the document.
If an attachment is just another Stax.ai file, set the attachment's document property to match the attachment's document ID. If it's an actual file, the filepath property can be used to download the attachment. Stax.ai does not support manually uploading attachments to documents at this time.
The general schema of an attachment object is:
{
"filename": "", // [String] Attachment file name
"document": "", // [String] Document ID if the attachment is a Stax.ai document
"filepath": "", // [String] Access key to document if it is a true attachment
}
Additional resource properties
The Document Resource has a lot of additional properties that are not detailed in this reference. This is because those properties are meant only for internal use for the Stax.ai portal.