id
: document id, also called upload_id
, you can use it to get the document by GET /documents/{upload_id}
. It's of uuid format.id
in your database, because we don't have a GET /documents/list
API to list all documents for now.status
: Once uploaded, the status is UN_PARSED
, after a series of processing, theUN_PARSED = 1 file uploaded or collection has no document
# final statuses
ELEMENT_PARSED = 300 analysis of the document has succeeded
ERROR_STATUSES (< 0) error occurred during analysis
GET /documents/{upload_id}
at interval of 10s , generally it takes 1-2 minutes toUN_PARSED = 1 file uploaded or collection has no document
LINK_UN_PARSED = 10 file link submitted
PARSING = 12 parsing, mainly used for collection
LINK_DOWNLOADING = 15 file link downloading
PDF_CONVERTING = 20 docx to pdf converting
PDF_CONVERTED = 30 docx to pdf success
TEXT_PARSING = 40 text embedding(when element parse timeout 2min)
ELEMENT_PARSING = 50 element embedding
INSIGHT_CALLBACK = 70 element parse success
TEXT_PARSED = 210 text embedding success
ELEMENT_PARSED = 300 element embedding success
TEXT_PARSE_ERROR = -1 text embedding failed
ELEMENT_PARED_ERROR = -2 element embedding failed
PDF_CONVERT_ERROR = -3 docx to pdf failed
LINK_DOWNLOAD_ERROR = -4 file link download failed
EXCEED_SIZE_ERROR = -5 file size exceed limit
EXCEED_TOKENS_ERROR = -6 exceed tokens limit
PAGE_PACKAGE_NOT_ENOUGH_ERROR = -9 page package not enough
PAGE_LIMIT_ERROR = -10 page limit error
TITLE_COMPLETE_ERROR = -11 complete title failed
READ_TMP_FILE_ERROR = -12 read tmp file error
OCR_PAGE_LIMIT_ERROR = -13 ocr page limit error
CONTENT_POLICY_ERROR = -14 content security check did not pass
CONTENT_DECODE_ERROR = -15 file content decode error
HTML_CONVERT_ERROR = -16 html convert error
HTML_EMPTY_BODY_ERROR = -17 content is empty
HTML_PARSE_ERROR = -18 html parse error
HTML_DOWNLOAD_ERROR = -19 html download error from website
PACKAGE_NOT_ENOUGH_ERROR = -25 package not enough
curl --location --request POST 'https://dev.your-api-server.com/documents/website' \
--header 'Content-Type: application/json' \
--data-raw '{
"website": "http://example.com",
"collection_id": "stringstringstringstringstringstring"
}'
{
"id": "string",
"status": 1,
"name": "string",
"created_at": 0,
"type": "collection"
}