Data Sources

List Data Sources for a Knowledge Base

client.knowledgeBases.dataSources.list(, ?, ?): DataSourceListResponse { knowledge_base_data_sources, links, meta }

get/v2/gen-ai/knowledge_bases/{knowledge_base_uuid}/data_sources

Add Data Source to a Knowledge Base

client.knowledgeBases.dataSources.create(, ?, ?): DataSourceCreateResponse { knowledge_base_data_source }

post/v2/gen-ai/knowledge_bases/{knowledge_base_uuid}/data_sources

Delete a Data Source from a Knowledge Base

client.knowledgeBases.dataSources.delete(, , ?): DataSourceDeleteResponse { data_source_uuid, knowledge_base_uuid }

delete/v2/gen-ai/knowledge_bases/{knowledge_base_uuid}/data_sources/{data_source_uuid}

Create Presigned URLs for Data Source File Upload

client.knowledgeBases.dataSources.createPresignedURLs(?, ?): DataSourceCreatePresignedURLsResponse { request_id, uploads }

post/v2/gen-ai/knowledge_bases/data_sources/file_upload_presigned_urls

ModelsExpand Collapse

APIFileUploadDataSource { original_file_name, size_in_bytes, stored_object_key }

File to upload as data source for knowledge base.

original_file_name?: string

The original file name

size_in_bytes?: string

The size of the file in bytes

formatuint64

stored_object_key?: string

The object key the file was stored as

APIKnowledgeBaseDataSource { aws_data_source, bucket_name, created_at, 10 more }

Data Source configuration for Knowledge Bases

aws_data_source?: AwsDataSource { bucket_name, item_path, region }

AWS S3 Data Source for Display

bucket_name?: string

Spaces bucket name

item_path?: string

region?: string

Region of bucket

bucket_name?: string

Name of storage bucket - Deprecated, moved to data_source_details

created_at?: string

Creation date / time

formatdate-time

dropbox_data_source?: DropboxDataSource { folder }

Dropbox Data Source for Display

folder?: string

file_upload_data_source?: APIFileUploadDataSource { original_file_name, size_in_bytes, stored_object_key }

File to upload as data source for knowledge base.

original_file_name?: string

The original file name

size_in_bytes?: string

The size of the file in bytes

formatuint64

stored_object_key?: string

The object key the file was stored as

google_drive_data_source?: GoogleDriveDataSource { folder_id, folder_name }

Google Drive Data Source for Display

folder_id?: string

folder_name?: string

Name of the selected folder if available

item_path?: string

Path of folder or object in bucket - Deprecated, moved to data_source_details

last_datasource_indexing_job?: APIIndexedDataSource { completed_at, data_source_uuid, error_details, 11 more }

completed_at?: string

Timestamp when data source completed indexing

formatdate-time

data_source_uuid?: string

Uuid of the indexed data source

error_details?: string

A detailed error description

error_msg?: string

A string code provinding a hint which part of the system experienced an error

failed_item_count?: string

Total count of files that have failed

formatuint64

indexed_file_count?: string

Total count of files that have been indexed

formatuint64

indexed_item_count?: string

Total count of files that have been indexed

formatuint64

removed_item_count?: string

Total count of files that have been removed

formatuint64

skipped_item_count?: string

Total count of files that have been skipped

formatuint64

started_at?: string

Timestamp when data source started indexing

formatdate-time

status?: "DATA_SOURCE_STATUS_UNKNOWN" | "DATA_SOURCE_STATUS_IN_PROGRESS" | "DATA_SOURCE_STATUS_UPDATED" | 4 more

Accepts one of the following:

"DATA_SOURCE_STATUS_UNKNOWN"

"DATA_SOURCE_STATUS_IN_PROGRESS"

"DATA_SOURCE_STATUS_UPDATED"

"DATA_SOURCE_STATUS_PARTIALLY_UPDATED"

"DATA_SOURCE_STATUS_NOT_UPDATED"

"DATA_SOURCE_STATUS_FAILED"

"DATA_SOURCE_STATUS_CANCELLED"

total_bytes?: string

Total size of files in data source in bytes

formatuint64

total_bytes_indexed?: string

Total size of files in data source in bytes that have been indexed

formatuint64

total_file_count?: string

Total file count in the data source

formatuint64

region?: string

Region code - Deprecated, moved to data_source_details

spaces_data_source?: APISpacesDataSource { bucket_name, item_path, region }

Spaces Bucket Data Source

bucket_name?: string

Spaces bucket name

item_path?: string

region?: string

Region of bucket

updated_at?: string

Last modified

formatdate-time

uuid?: string

Unique id of knowledge base

web_crawler_data_source?: APIWebCrawlerDataSource { base_url, crawling_option, embed_media, exclude_tags }

WebCrawlerDataSource

base_url?: string

The base url to crawl.

crawling_option?: "UNKNOWN" | "SCOPED" | "PATH" | 3 more

Options for specifying how URLs found on pages should be handled.

UNKNOWN: Default unknown value
SCOPED: Only include the base URL.
PATH: Crawl the base URL and linked pages within the URL path.
DOMAIN: Crawl the base URL and linked pages within the same domain.
SUBDOMAINS: Crawl the base URL and linked pages for any subdomain.
SITEMAP: Crawl URLs discovered in the sitemap.

Accepts one of the following:

"UNKNOWN"

"SCOPED"

"PATH"

"DOMAIN"

"SUBDOMAINS"

"SITEMAP"

embed_media?: boolean

Whether to ingest and index media (images, etc.) on web pages.

exclude_tags?: Array<string>

Declaring which tags to exclude in web pages while webcrawling

APISpacesDataSource { bucket_name, item_path, region }

Spaces Bucket Data Source

bucket_name?: string

Spaces bucket name

item_path?: string

region?: string

Region of bucket

APIWebCrawlerDataSource { base_url, crawling_option, embed_media, exclude_tags }

WebCrawlerDataSource

base_url?: string

The base url to crawl.

crawling_option?: "UNKNOWN" | "SCOPED" | "PATH" | 3 more

Options for specifying how URLs found on pages should be handled.

UNKNOWN: Default unknown value
SCOPED: Only include the base URL.
PATH: Crawl the base URL and linked pages within the URL path.
DOMAIN: Crawl the base URL and linked pages within the same domain.
SUBDOMAINS: Crawl the base URL and linked pages for any subdomain.
SITEMAP: Crawl URLs discovered in the sitemap.

Accepts one of the following:

"UNKNOWN"

"SCOPED"

"PATH"

"DOMAIN"

"SUBDOMAINS"

"SITEMAP"

embed_media?: boolean

Whether to ingest and index media (images, etc.) on web pages.

exclude_tags?: Array<string>

Declaring which tags to exclude in web pages while webcrawling

AwsDataSource { bucket_name, item_path, key_id, 2 more }

AWS S3 Data Source

bucket_name?: string

Spaces bucket name

item_path?: string

key_id?: string

The AWS Key ID

region?: string

Region of bucket

secret_key?: string

The AWS Secret Key