Skip to content

Add Data Source to a Knowledge Base

client.knowledgeBases.dataSources.create(stringknowledgeBaseUuid, DataSourceCreateParams { aws_data_source, chunking_algorithm, chunking_options, 3 more } body?, RequestOptionsoptions?): DataSourceCreateResponse { knowledge_base_data_source }
post/v2/gen-ai/knowledge_bases/{knowledge_base_uuid}/data_sources

To add a data source to a knowledge base, send a POST request to /v2/gen-ai/knowledge_bases/{knowledge_base_uuid}/data_sources.

ParametersExpand Collapse
knowledgeBaseUuid: string
body: DataSourceCreateParams { aws_data_source, chunking_algorithm, chunking_options, 3 more }
aws_data_source?: AwsDataSource { bucket_name, item_path, key_id, 2 more }

AWS S3 Data Source

bucket_name?: string

Spaces bucket name

item_path?: string
key_id?: string

The AWS Key ID

region?: string

Region of bucket

secret_key?: string

The AWS Secret Key

chunking_algorithm?: "CHUNKING_ALGORITHM_UNKNOWN" | "CHUNKING_ALGORITHM_SECTION_BASED" | "CHUNKING_ALGORITHM_HIERARCHICAL" | 2 more

The chunking algorithm to use for processing data sources.

Note: This feature requires enabling the knowledgebase enhancements feature preview flag.

Accepts one of the following:
"CHUNKING_ALGORITHM_UNKNOWN"
"CHUNKING_ALGORITHM_SECTION_BASED"
"CHUNKING_ALGORITHM_HIERARCHICAL"
"CHUNKING_ALGORITHM_SEMANTIC"
"CHUNKING_ALGORITHM_FIXED_LENGTH"
chunking_options?: ChunkingOptions

Configuration options for the chunking algorithm.

Note: This feature requires enabling the knowledgebase enhancements feature preview flag.

child_chunk_size?: number

Hierarchical options

formatint64
max_chunk_size?: number

Section_Based and Fixed_Length options

formatint64
parent_chunk_size?: number

Hierarchical options

formatint64
semantic_threshold?: number

Semantic options

formatfloat
knowledge_base_uuid?: string

Knowledge base id

spaces_data_source?: APISpacesDataSource { bucket_name, item_path, region }

Spaces Bucket Data Source

bucket_name?: string

Spaces bucket name

item_path?: string
region?: string

Region of bucket

web_crawler_data_source?: APIWebCrawlerDataSource { base_url, crawling_option, embed_media, exclude_tags }

WebCrawlerDataSource

base_url?: string

The base url to crawl.

crawling_option?: "UNKNOWN" | "SCOPED" | "PATH" | 3 more

Options for specifying how URLs found on pages should be handled.

  • UNKNOWN: Default unknown value
  • SCOPED: Only include the base URL.
  • PATH: Crawl the base URL and linked pages within the URL path.
  • DOMAIN: Crawl the base URL and linked pages within the same domain.
  • SUBDOMAINS: Crawl the base URL and linked pages for any subdomain.
  • SITEMAP: Crawl URLs discovered in the sitemap.
Accepts one of the following:
"UNKNOWN"
"SCOPED"
"PATH"
"DOMAIN"
"SUBDOMAINS"
"SITEMAP"
embed_media?: boolean

Whether to ingest and index media (images, etc.) on web pages.

exclude_tags?: Array<string>

Declaring which tags to exclude in web pages while webcrawling

ReturnsExpand Collapse
DataSourceCreateResponse { knowledge_base_data_source }

Information about a newly created knowldege base data source

knowledge_base_data_source?: APIKnowledgeBaseDataSource { aws_data_source, bucket_name, chunking_algorithm, 12 more }

Data Source configuration for Knowledge Bases

aws_data_source?: AwsDataSource { bucket_name, item_path, region }

AWS S3 Data Source for Display

bucket_name?: string

Spaces bucket name

item_path?: string
region?: string

Region of bucket

bucket_name?: string

Name of storage bucket - Deprecated, moved to data_source_details

chunking_algorithm?: "CHUNKING_ALGORITHM_UNKNOWN" | "CHUNKING_ALGORITHM_SECTION_BASED" | "CHUNKING_ALGORITHM_HIERARCHICAL" | 2 more

The chunking algorithm to use for processing data sources.

Note: This feature requires enabling the knowledgebase enhancements feature preview flag.

Accepts one of the following:
"CHUNKING_ALGORITHM_UNKNOWN"
"CHUNKING_ALGORITHM_SECTION_BASED"
"CHUNKING_ALGORITHM_HIERARCHICAL"
"CHUNKING_ALGORITHM_SEMANTIC"
"CHUNKING_ALGORITHM_FIXED_LENGTH"
chunking_options?: ChunkingOptions { child_chunk_size, max_chunk_size, parent_chunk_size, semantic_threshold }

Configuration options for the chunking algorithm.

Note: This feature requires enabling the knowledgebase enhancements feature preview flag.

child_chunk_size?: number

Hierarchical options

formatint64
max_chunk_size?: number

Section_Based and Fixed_Length options

formatint64
parent_chunk_size?: number

Hierarchical options

formatint64
semantic_threshold?: number

Semantic options

formatfloat
created_at?: string

Creation date / time

formatdate-time
dropbox_data_source?: DropboxDataSource { folder }

Dropbox Data Source for Display

folder?: string
file_upload_data_source?: APIFileUploadDataSource { original_file_name, size_in_bytes, stored_object_key }

File to upload as data source for knowledge base.

original_file_name?: string

The original file name

size_in_bytes?: string

The size of the file in bytes

formatuint64
stored_object_key?: string

The object key the file was stored as

google_drive_data_source?: GoogleDriveDataSource { folder_id, folder_name }

Google Drive Data Source for Display

folder_id?: string
folder_name?: string

Name of the selected folder if available

item_path?: string

Path of folder or object in bucket - Deprecated, moved to data_source_details

last_datasource_indexing_job?: APIIndexedDataSource { completed_at, data_source_uuid, error_details, 11 more }
completed_at?: string

Timestamp when data source completed indexing

formatdate-time
data_source_uuid?: string

Uuid of the indexed data source

error_details?: string

A detailed error description

error_msg?: string

A string code provinding a hint which part of the system experienced an error

failed_item_count?: string

Total count of files that have failed

formatuint64
indexed_file_count?: string

Total count of files that have been indexed

formatuint64
indexed_item_count?: string

Total count of files that have been indexed

formatuint64
removed_item_count?: string

Total count of files that have been removed

formatuint64
skipped_item_count?: string

Total count of files that have been skipped

formatuint64
started_at?: string

Timestamp when data source started indexing

formatdate-time
status?: "DATA_SOURCE_STATUS_UNKNOWN" | "DATA_SOURCE_STATUS_IN_PROGRESS" | "DATA_SOURCE_STATUS_UPDATED" | 4 more
Accepts one of the following:
"DATA_SOURCE_STATUS_UNKNOWN"
"DATA_SOURCE_STATUS_IN_PROGRESS"
"DATA_SOURCE_STATUS_UPDATED"
"DATA_SOURCE_STATUS_PARTIALLY_UPDATED"
"DATA_SOURCE_STATUS_NOT_UPDATED"
"DATA_SOURCE_STATUS_FAILED"
"DATA_SOURCE_STATUS_CANCELLED"
total_bytes?: string

Total size of files in data source in bytes

formatuint64
total_bytes_indexed?: string

Total size of files in data source in bytes that have been indexed

formatuint64
total_file_count?: string

Total file count in the data source

formatuint64
region?: string

Region code - Deprecated, moved to data_source_details

spaces_data_source?: APISpacesDataSource { bucket_name, item_path, region }

Spaces Bucket Data Source

bucket_name?: string

Spaces bucket name

item_path?: string
region?: string

Region of bucket

updated_at?: string

Last modified

formatdate-time
uuid?: string

Unique id of knowledge base

web_crawler_data_source?: APIWebCrawlerDataSource { base_url, crawling_option, embed_media, exclude_tags }

WebCrawlerDataSource

base_url?: string

The base url to crawl.

crawling_option?: "UNKNOWN" | "SCOPED" | "PATH" | 3 more

Options for specifying how URLs found on pages should be handled.

  • UNKNOWN: Default unknown value
  • SCOPED: Only include the base URL.
  • PATH: Crawl the base URL and linked pages within the URL path.
  • DOMAIN: Crawl the base URL and linked pages within the same domain.
  • SUBDOMAINS: Crawl the base URL and linked pages for any subdomain.
  • SITEMAP: Crawl URLs discovered in the sitemap.
Accepts one of the following:
"UNKNOWN"
"SCOPED"
"PATH"
"DOMAIN"
"SUBDOMAINS"
"SITEMAP"
embed_media?: boolean

Whether to ingest and index media (images, etc.) on web pages.

exclude_tags?: Array<string>

Declaring which tags to exclude in web pages while webcrawling

Add Data Source to a Knowledge Base
import Gradient from '@digitalocean/gradient';

const client = new Gradient({
  accessToken: 'My Access Token',
});

const dataSource = await client.knowledgeBases.dataSources.create('"123e4567-e89b-12d3-a456-426614174000"');

console.log(dataSource.knowledge_base_data_source);
{
  "knowledge_base_data_source": {
    "aws_data_source": {
      "bucket_name": "example name",
      "item_path": "example string",
      "region": "example string"
    },
    "bucket_name": "example name",
    "chunking_algorithm": "CHUNKING_ALGORITHM_SECTION_BASED",
    "chunking_options": {
      "child_chunk_size": 350,
      "max_chunk_size": 750,
      "parent_chunk_size": 1000,
      "semantic_threshold": 0.5
    },
    "created_at": "2023-01-01T00:00:00Z",
    "dropbox_data_source": {
      "folder": "example string"
    },
    "file_upload_data_source": {
      "original_file_name": "example name",
      "size_in_bytes": "12345",
      "stored_object_key": "example string"
    },
    "google_drive_data_source": {
      "folder_id": "123e4567-e89b-12d3-a456-426614174000",
      "folder_name": "example name"
    },
    "item_path": "example string",
    "last_datasource_indexing_job": {
      "completed_at": "2023-01-01T00:00:00Z",
      "data_source_uuid": "123e4567-e89b-12d3-a456-426614174000",
      "error_details": "example string",
      "error_msg": "example string",
      "failed_item_count": "12345",
      "indexed_file_count": "12345",
      "indexed_item_count": "12345",
      "removed_item_count": "12345",
      "skipped_item_count": "12345",
      "started_at": "2023-01-01T00:00:00Z",
      "status": "DATA_SOURCE_STATUS_UNKNOWN",
      "total_bytes": "12345",
      "total_bytes_indexed": "12345",
      "total_file_count": "12345"
    },
    "region": "example string",
    "spaces_data_source": {
      "bucket_name": "example name",
      "item_path": "example string",
      "region": "example string"
    },
    "updated_at": "2023-01-01T00:00:00Z",
    "uuid": "123e4567-e89b-12d3-a456-426614174000",
    "web_crawler_data_source": {
      "base_url": "example string",
      "crawling_option": "UNKNOWN",
      "embed_media": true,
      "exclude_tags": [
        "example string"
      ]
    }
  }
}
Returns Examples
{
  "knowledge_base_data_source": {
    "aws_data_source": {
      "bucket_name": "example name",
      "item_path": "example string",
      "region": "example string"
    },
    "bucket_name": "example name",
    "chunking_algorithm": "CHUNKING_ALGORITHM_SECTION_BASED",
    "chunking_options": {
      "child_chunk_size": 350,
      "max_chunk_size": 750,
      "parent_chunk_size": 1000,
      "semantic_threshold": 0.5
    },
    "created_at": "2023-01-01T00:00:00Z",
    "dropbox_data_source": {
      "folder": "example string"
    },
    "file_upload_data_source": {
      "original_file_name": "example name",
      "size_in_bytes": "12345",
      "stored_object_key": "example string"
    },
    "google_drive_data_source": {
      "folder_id": "123e4567-e89b-12d3-a456-426614174000",
      "folder_name": "example name"
    },
    "item_path": "example string",
    "last_datasource_indexing_job": {
      "completed_at": "2023-01-01T00:00:00Z",
      "data_source_uuid": "123e4567-e89b-12d3-a456-426614174000",
      "error_details": "example string",
      "error_msg": "example string",
      "failed_item_count": "12345",
      "indexed_file_count": "12345",
      "indexed_item_count": "12345",
      "removed_item_count": "12345",
      "skipped_item_count": "12345",
      "started_at": "2023-01-01T00:00:00Z",
      "status": "DATA_SOURCE_STATUS_UNKNOWN",
      "total_bytes": "12345",
      "total_bytes_indexed": "12345",
      "total_file_count": "12345"
    },
    "region": "example string",
    "spaces_data_source": {
      "bucket_name": "example name",
      "item_path": "example string",
      "region": "example string"
    },
    "updated_at": "2023-01-01T00:00:00Z",
    "uuid": "123e4567-e89b-12d3-a456-426614174000",
    "web_crawler_data_source": {
      "base_url": "example string",
      "crawling_option": "UNKNOWN",
      "embed_media": true,
      "exclude_tags": [
        "example string"
      ]
    }
  }
}