Data Sources

List Data Sources for a Knowledge Base

knowledge_bases.data_sources.list(, ) -> DataSourceListResponse

get/v2/gen-ai/knowledge_bases/{knowledge_base_uuid}/data_sources

Add Data Source to a Knowledge Base

knowledge_bases.data_sources.create(, ) -> DataSourceCreateResponse

post/v2/gen-ai/knowledge_bases/{knowledge_base_uuid}/data_sources

Delete a Data Source from a Knowledge Base

knowledge_bases.data_sources.delete(, ) -> DataSourceDeleteResponse

delete/v2/gen-ai/knowledge_bases/{knowledge_base_uuid}/data_sources/{data_source_uuid}

Create Presigned URLs for Data Source File Upload

knowledge_bases.data_sources.create_presigned_urls() -> DataSourceCreatePresignedURLsResponse

post/v2/gen-ai/knowledge_bases/data_sources/file_upload_presigned_urls

ModelsExpand Collapse

class APIFileUploadDataSource: …

File to upload as data source for knowledge base.

original_file_name: Optional[str]

The original file name

size_in_bytes: Optional[str]

The size of the file in bytes

formatuint64

stored_object_key: Optional[str]

The object key the file was stored as

class APIKnowledgeBaseDataSource: …

Data Source configuration for Knowledge Bases

aws_data_source: Optional[AwsDataSource]

AWS S3 Data Source for Display

bucket_name: Optional[str]

Spaces bucket name

item_path: Optional[str]

region: Optional[str]

Region of bucket

bucket_name: Optional[str]

Name of storage bucket - Deprecated, moved to data_source_details

created_at: Optional[datetime]

Creation date / time

formatdate-time

dropbox_data_source: Optional[DropboxDataSource]

Dropbox Data Source for Display

folder: Optional[str]

file_upload_data_source: Optional[APIFileUploadDataSource]

File to upload as data source for knowledge base.

original_file_name: Optional[str]

The original file name

size_in_bytes: Optional[str]

The size of the file in bytes

formatuint64

stored_object_key: Optional[str]

The object key the file was stored as

google_drive_data_source: Optional[GoogleDriveDataSource]

Google Drive Data Source for Display

folder_id: Optional[str]

folder_name: Optional[str]

Name of the selected folder if available

item_path: Optional[str]

Path of folder or object in bucket - Deprecated, moved to data_source_details

last_datasource_indexing_job: Optional[APIIndexedDataSource]

completed_at: Optional[datetime]

Timestamp when data source completed indexing

formatdate-time

data_source_uuid: Optional[str]

Uuid of the indexed data source

error_details: Optional[str]

A detailed error description

error_msg: Optional[str]

A string code provinding a hint which part of the system experienced an error

failed_item_count: Optional[str]

Total count of files that have failed

formatuint64

indexed_file_count: Optional[str]

Total count of files that have been indexed

formatuint64

indexed_item_count: Optional[str]

Total count of files that have been indexed

formatuint64

removed_item_count: Optional[str]

Total count of files that have been removed

formatuint64

skipped_item_count: Optional[str]

Total count of files that have been skipped

formatuint64

started_at: Optional[datetime]

Timestamp when data source started indexing

formatdate-time

status: Optional[Literal["DATA_SOURCE_STATUS_UNKNOWN", "DATA_SOURCE_STATUS_IN_PROGRESS", "DATA_SOURCE_STATUS_UPDATED", 4 more]]

Accepts one of the following:

"DATA_SOURCE_STATUS_UNKNOWN"

"DATA_SOURCE_STATUS_IN_PROGRESS"

"DATA_SOURCE_STATUS_UPDATED"

"DATA_SOURCE_STATUS_PARTIALLY_UPDATED"

"DATA_SOURCE_STATUS_NOT_UPDATED"

"DATA_SOURCE_STATUS_FAILED"

"DATA_SOURCE_STATUS_CANCELLED"

total_bytes: Optional[str]

Total size of files in data source in bytes

formatuint64

total_bytes_indexed: Optional[str]

Total size of files in data source in bytes that have been indexed

formatuint64

total_file_count: Optional[str]

Total file count in the data source

formatuint64

last_indexing_job: Optional[APIIndexingJob]

IndexingJob description

completed_datasources: Optional[int]

Number of datasources indexed completed

formatint64

created_at: Optional[datetime]

Creation date / time

formatdate-time

data_source_jobs: Optional[List[APIIndexedDataSource]]

Details on Data Sources included in the Indexing Job

completed_at: Optional[datetime]

Timestamp when data source completed indexing

formatdate-time

data_source_uuid: Optional[str]

Uuid of the indexed data source

error_details: Optional[str]

A detailed error description

error_msg: Optional[str]

A string code provinding a hint which part of the system experienced an error

failed_item_count: Optional[str]

Total count of files that have failed

formatuint64

indexed_file_count: Optional[str]

Total count of files that have been indexed

formatuint64

indexed_item_count: Optional[str]

Total count of files that have been indexed

formatuint64

removed_item_count: Optional[str]

Total count of files that have been removed

formatuint64

skipped_item_count: Optional[str]

Total count of files that have been skipped

formatuint64

started_at: Optional[datetime]

Timestamp when data source started indexing

formatdate-time

status: Optional[Literal["DATA_SOURCE_STATUS_UNKNOWN", "DATA_SOURCE_STATUS_IN_PROGRESS", "DATA_SOURCE_STATUS_UPDATED", 4 more]]

Accepts one of the following:

"DATA_SOURCE_STATUS_UNKNOWN"

"DATA_SOURCE_STATUS_IN_PROGRESS"

"DATA_SOURCE_STATUS_UPDATED"

"DATA_SOURCE_STATUS_PARTIALLY_UPDATED"

"DATA_SOURCE_STATUS_NOT_UPDATED"

"DATA_SOURCE_STATUS_FAILED"

"DATA_SOURCE_STATUS_CANCELLED"

total_bytes: Optional[str]

Total size of files in data source in bytes

formatuint64

total_bytes_indexed: Optional[str]

Total size of files in data source in bytes that have been indexed

formatuint64

total_file_count: Optional[str]

Total file count in the data source

formatuint64

data_source_uuids: Optional[List[str]]

finished_at: Optional[datetime]

formatdate-time

is_report_available: Optional[bool]

Boolean value to determine if the indexing job details are available

knowledge_base_uuid: Optional[str]

Knowledge base id

phase: Optional[Literal["BATCH_JOB_PHASE_UNKNOWN", "BATCH_JOB_PHASE_PENDING", "BATCH_JOB_PHASE_RUNNING", 4 more]]

Accepts one of the following:

"BATCH_JOB_PHASE_UNKNOWN"

"BATCH_JOB_PHASE_PENDING"

"BATCH_JOB_PHASE_RUNNING"

"BATCH_JOB_PHASE_SUCCEEDED"

"BATCH_JOB_PHASE_FAILED"

"BATCH_JOB_PHASE_ERROR"

"BATCH_JOB_PHASE_CANCELLED"

started_at: Optional[datetime]

formatdate-time

status: Optional[Literal["INDEX_JOB_STATUS_UNKNOWN", "INDEX_JOB_STATUS_PARTIAL", "INDEX_JOB_STATUS_IN_PROGRESS", 4 more]]

Accepts one of the following:

"INDEX_JOB_STATUS_UNKNOWN"

"INDEX_JOB_STATUS_PARTIAL"

"INDEX_JOB_STATUS_IN_PROGRESS"

"INDEX_JOB_STATUS_COMPLETED"

"INDEX_JOB_STATUS_FAILED"

"INDEX_JOB_STATUS_NO_CHANGES"

"INDEX_JOB_STATUS_PENDING"

tokens: Optional[int]

Number of tokens [This field is deprecated]

formatint64

total_datasources: Optional[int]

Number of datasources being indexed

formatint64

total_items_failed: Optional[str]

Total Items Failed

formatuint64

total_items_indexed: Optional[str]

Total Items Indexed

formatuint64

total_items_removed: Optional[str]

Total Items Removed

formatuint64

total_items_skipped: Optional[str]

Total Items Skipped

formatuint64

total_tokens: Optional[str]

Total Tokens Consumed By the Indexing Job

formatuint64

updated_at: Optional[datetime]

Last modified

formatdate-time

uuid: Optional[str]

Unique id

region: Optional[str]

Region code - Deprecated, moved to data_source_details

spaces_data_source: Optional[APISpacesDataSource]

Spaces Bucket Data Source

bucket_name: Optional[str]

Spaces bucket name

item_path: Optional[str]

region: Optional[str]

Region of bucket

updated_at: Optional[datetime]

Last modified

formatdate-time

uuid: Optional[str]

Unique id of knowledge base

web_crawler_data_source: Optional[APIWebCrawlerDataSource]

WebCrawlerDataSource

base_url: Optional[str]

The base url to crawl.

crawling_option: Optional[Literal["UNKNOWN", "SCOPED", "PATH", 2 more]]

Options for specifying how URLs found on pages should be handled.

UNKNOWN: Default unknown value
SCOPED: Only include the base URL.
PATH: Crawl the base URL and linked pages within the URL path.
DOMAIN: Crawl the base URL and linked pages within the same domain.
SUBDOMAINS: Crawl the base URL and linked pages for any subdomain.

Accepts one of the following:

"UNKNOWN"

"SCOPED"

"PATH"

"DOMAIN"

"SUBDOMAINS"

embed_media: Optional[bool]

Whether to ingest and index media (images, etc.) on web pages.

exclude_tags: Optional[SequenceNotStr[str]]

Declaring which tags to exclude in web pages while webcrawling

class APISpacesDataSource: …

Spaces Bucket Data Source

bucket_name: Optional[str]

Spaces bucket name

item_path: Optional[str]

region: Optional[str]

Region of bucket

class APIWebCrawlerDataSource: …

WebCrawlerDataSource

base_url: Optional[str]

The base url to crawl.

crawling_option: Optional[Literal["UNKNOWN", "SCOPED", "PATH", 2 more]]

Options for specifying how URLs found on pages should be handled.

UNKNOWN: Default unknown value
SCOPED: Only include the base URL.
PATH: Crawl the base URL and linked pages within the URL path.
DOMAIN: Crawl the base URL and linked pages within the same domain.
SUBDOMAINS: Crawl the base URL and linked pages for any subdomain.

Accepts one of the following:

"UNKNOWN"

"SCOPED"

"PATH"

"DOMAIN"

"SUBDOMAINS"

embed_media: Optional[bool]

Whether to ingest and index media (images, etc.) on web pages.

exclude_tags: Optional[SequenceNotStr[str]]

Declaring which tags to exclude in web pages while webcrawling

class AwsDataSource: …

AWS S3 Data Source

bucket_name: Optional[str]

Spaces bucket name

item_path: Optional[str]

key_id: Optional[str]

The AWS Key ID

region: Optional[str]

Region of bucket

secret_key: Optional[str]

The AWS Secret Key