Skip to content

Data Sources

List Data Sources for a Knowledge Base
knowledge_bases.data_sources.list(strknowledge_base_uuid, DataSourceListParams**kwargs) -> DataSourceListResponse
get/v2/gen-ai/knowledge_bases/{knowledge_base_uuid}/data_sources
Add Data Source to a Knowledge Base
knowledge_bases.data_sources.create(strpath_knowledge_base_uuid, DataSourceCreateParams**kwargs) -> DataSourceCreateResponse
post/v2/gen-ai/knowledge_bases/{knowledge_base_uuid}/data_sources
Delete a Data Source from a Knowledge Base
knowledge_bases.data_sources.delete(strdata_source_uuid, DataSourceDeleteParams**kwargs) -> DataSourceDeleteResponse
delete/v2/gen-ai/knowledge_bases/{knowledge_base_uuid}/data_sources/{data_source_uuid}
Create Presigned URLs for Data Source File Upload
knowledge_bases.data_sources.create_presigned_urls(DataSourceCreatePresignedURLsParams**kwargs) -> DataSourceCreatePresignedURLsResponse
post/v2/gen-ai/knowledge_bases/data_sources/file_upload_presigned_urls
Update Data Source options
knowledge_bases.data_sources.update(strpath_data_source_uuid, DataSourceUpdateParams**kwargs) -> DataSourceUpdateResponse
put/v2/gen-ai/knowledge_bases/{knowledge_base_uuid}/data_sources/{data_source_uuid}
ModelsExpand Collapse
class APIFileUploadDataSource:

File to upload as data source for knowledge base.

original_file_name: Optional[str]

The original file name

size_in_bytes: Optional[str]

The size of the file in bytes

formatuint64
stored_object_key: Optional[str]

The object key the file was stored as

class APIKnowledgeBaseDataSource:

Data Source configuration for Knowledge Bases

aws_data_source: Optional[AwsDataSource]

AWS S3 Data Source for Display

bucket_name: Optional[str]

Spaces bucket name

item_path: Optional[str]
region: Optional[str]

Region of bucket

bucket_name: Optional[str]

Name of storage bucket - Deprecated, moved to data_source_details

chunking_algorithm: Optional[Literal["CHUNKING_ALGORITHM_UNKNOWN", "CHUNKING_ALGORITHM_SECTION_BASED", "CHUNKING_ALGORITHM_HIERARCHICAL", 2 more]]

The chunking algorithm to use for processing data sources.

Note: This feature requires enabling the knowledgebase enhancements feature preview flag.

Accepts one of the following:
"CHUNKING_ALGORITHM_UNKNOWN"
"CHUNKING_ALGORITHM_SECTION_BASED"
"CHUNKING_ALGORITHM_HIERARCHICAL"
"CHUNKING_ALGORITHM_SEMANTIC"
"CHUNKING_ALGORITHM_FIXED_LENGTH"
chunking_options: Optional[ChunkingOptions]

Configuration options for the chunking algorithm.

Note: This feature requires enabling the knowledgebase enhancements feature preview flag.

child_chunk_size: Optional[int]

Hierarchical options

formatint64
max_chunk_size: Optional[int]

Section_Based and Fixed_Length options

formatint64
parent_chunk_size: Optional[int]

Hierarchical options

formatint64
semantic_threshold: Optional[float]

Semantic options

formatfloat
created_at: Optional[datetime]

Creation date / time

formatdate-time
dropbox_data_source: Optional[DropboxDataSource]

Dropbox Data Source for Display

folder: Optional[str]
file_upload_data_source: Optional[APIFileUploadDataSource]

File to upload as data source for knowledge base.

original_file_name: Optional[str]

The original file name

size_in_bytes: Optional[str]

The size of the file in bytes

formatuint64
stored_object_key: Optional[str]

The object key the file was stored as

google_drive_data_source: Optional[GoogleDriveDataSource]

Google Drive Data Source for Display

folder_id: Optional[str]
folder_name: Optional[str]

Name of the selected folder if available

item_path: Optional[str]

Path of folder or object in bucket - Deprecated, moved to data_source_details

last_datasource_indexing_job: Optional[APIIndexedDataSource]
completed_at: Optional[datetime]

Timestamp when data source completed indexing

formatdate-time
data_source_uuid: Optional[str]

Uuid of the indexed data source

error_details: Optional[str]

A detailed error description

error_msg: Optional[str]

A string code provinding a hint which part of the system experienced an error

failed_item_count: Optional[str]

Total count of files that have failed

formatuint64
indexed_file_count: Optional[str]

Total count of files that have been indexed

formatuint64
indexed_item_count: Optional[str]

Total count of files that have been indexed

formatuint64
removed_item_count: Optional[str]

Total count of files that have been removed

formatuint64
skipped_item_count: Optional[str]

Total count of files that have been skipped

formatuint64
started_at: Optional[datetime]

Timestamp when data source started indexing

formatdate-time
status: Optional[Literal["DATA_SOURCE_STATUS_UNKNOWN", "DATA_SOURCE_STATUS_IN_PROGRESS", "DATA_SOURCE_STATUS_UPDATED", 4 more]]
Accepts one of the following:
"DATA_SOURCE_STATUS_UNKNOWN"
"DATA_SOURCE_STATUS_IN_PROGRESS"
"DATA_SOURCE_STATUS_UPDATED"
"DATA_SOURCE_STATUS_PARTIALLY_UPDATED"
"DATA_SOURCE_STATUS_NOT_UPDATED"
"DATA_SOURCE_STATUS_FAILED"
"DATA_SOURCE_STATUS_CANCELLED"
total_bytes: Optional[str]

Total size of files in data source in bytes

formatuint64
total_bytes_indexed: Optional[str]

Total size of files in data source in bytes that have been indexed

formatuint64
total_file_count: Optional[str]

Total file count in the data source

formatuint64
region: Optional[str]

Region code - Deprecated, moved to data_source_details

spaces_data_source: Optional[APISpacesDataSource]

Spaces Bucket Data Source

bucket_name: Optional[str]

Spaces bucket name

item_path: Optional[str]
region: Optional[str]

Region of bucket

updated_at: Optional[datetime]

Last modified

formatdate-time
uuid: Optional[str]

Unique id of knowledge base

web_crawler_data_source: Optional[APIWebCrawlerDataSource]

WebCrawlerDataSource

base_url: Optional[str]

The base url to crawl.

crawling_option: Optional[Literal["UNKNOWN", "SCOPED", "PATH", 3 more]]

Options for specifying how URLs found on pages should be handled.

  • UNKNOWN: Default unknown value
  • SCOPED: Only include the base URL.
  • PATH: Crawl the base URL and linked pages within the URL path.
  • DOMAIN: Crawl the base URL and linked pages within the same domain.
  • SUBDOMAINS: Crawl the base URL and linked pages for any subdomain.
  • SITEMAP: Crawl URLs discovered in the sitemap.
Accepts one of the following:
"UNKNOWN"
"SCOPED"
"PATH"
"DOMAIN"
"SUBDOMAINS"
"SITEMAP"
embed_media: Optional[bool]

Whether to ingest and index media (images, etc.) on web pages.

exclude_tags: Optional[SequenceNotStr[str]]

Declaring which tags to exclude in web pages while webcrawling

class APISpacesDataSource:

Spaces Bucket Data Source

bucket_name: Optional[str]

Spaces bucket name

item_path: Optional[str]
region: Optional[str]

Region of bucket

class APIWebCrawlerDataSource:

WebCrawlerDataSource

base_url: Optional[str]

The base url to crawl.

crawling_option: Optional[Literal["UNKNOWN", "SCOPED", "PATH", 3 more]]

Options for specifying how URLs found on pages should be handled.

  • UNKNOWN: Default unknown value
  • SCOPED: Only include the base URL.
  • PATH: Crawl the base URL and linked pages within the URL path.
  • DOMAIN: Crawl the base URL and linked pages within the same domain.
  • SUBDOMAINS: Crawl the base URL and linked pages for any subdomain.
  • SITEMAP: Crawl URLs discovered in the sitemap.
Accepts one of the following:
"UNKNOWN"
"SCOPED"
"PATH"
"DOMAIN"
"SUBDOMAINS"
"SITEMAP"
embed_media: Optional[bool]

Whether to ingest and index media (images, etc.) on web pages.

exclude_tags: Optional[SequenceNotStr[str]]

Declaring which tags to exclude in web pages while webcrawling

class AwsDataSource:

AWS S3 Data Source

bucket_name: Optional[str]

Spaces bucket name

item_path: Optional[str]
key_id: Optional[str]

The AWS Key ID

region: Optional[str]

Region of bucket

secret_key: Optional[str]

The AWS Secret Key