All AI-powered software applications require data to perform their jobs. Since large language models (LLMs) are pre-trained on a considerable volume of publicly accessible information, some use cases such as a basic chatbot may not require anything more than a user's input 'prompt' to generate a useful output.

However, in many cases, applications need to connect with additional data: highly specific and detailed data, very recent or constantly updating data, or secure data that simply isn't directly available to LLMs.

Here are some examples:

Enterprise wiki to power a search assistant
Support articles for customer service automation
Customer feedback to perform sentiment analysis
Product details to generate purchase recommendations
News and press releases for summarization & fact checking
Financial reports for cost analysis
Technical documents for compliance & risk assessment

In these cases, you will need to connect external data sources to your application.

What are data sources?

Data sources allow you to bring your own data to Griptape Cloud. You point us at your data, and we make it accessible to LLM-powered applications.

Connecting to external data -- by creating a data source -- is the first step of building a retrieval-powered AI application. Once you create a data source, you can make it available to your application by adding it to a knowledge base.

Griptape data sources extract, ingest, and prepare your data so that it can be retrieved and used by LLMs. This is an important step because LLMs work best with data when it is represented in a particular way. These formats often differ from how the information is presented to human users or even other software applications. For example, the text of a web page must be cleaned to remove extraneous information, annotated with metadata, segmented into chunks, and converted into vector embeddings before it can be stored in a suitable database.

Typically, developers must build, deploy, and operate this process themselves. It can be time consuming, complicated, and costly. In Griptape Cloud, this process is automated for you.

Developer guide resource: Creating data sources

How to create a data source

Follow these steps to create a data source. For this example, we will create a data source from a web page.

Navigate to the Data Sources screen.
Click Create data source.
Select a type of data source. For this example, choose Web Page.
Give your data source a name and a description (optional).
Enter the URL of a web page that you want to use as a data source, for example https://www.griptape.ai.
Click Create to submit the form.

What's happening?

Once you have created the data source, we will automatically begin the process of extracting, cleaning, transforming, and storing your data into a data lake so that it can subsequently be loaded into an LLM-compatible database index. This process is known as a data job. It can take just a few seconds or several minutes or more, depending on how much data the source contains.

While this job is in progress, you will be directed to the data source detail page where you can observe the job status as well as view and edit details such as the name, description, and source URLs.

When your underlying data changes, you can select Refresh from the Actions menu to update your data source. Additionally, you can schedule periodic updates to your data source. This can be helpful for sources that update frequently.

How to use a data source

The next step of using your data source is making it available to an application for data retrieval. To do this, add it to a knowledge base.

Navigate to the Knowledge Bases screen.
Click Create knowledge base.
Select the Griptape Cloud knowledge base type.
Give your knowledge base a name and a description (optional).
Select the data source(s) you want to include in the knowledge base.
Click Create to submit the form.

You will be directed to the knowledge base detail page while the knowledge base job proceeds. This typically takes just a few moments. Once your knowledge base is ready, the data it contains becomes available for applications to retrieve via Griptape assistants, or structures such as agents.

You can perform a test query by selecting the Query tab and entering some information that you know is in your data. The result will be a 'raw' response that contains the embedded text and other query parameters. This feature is useful for quick testing and debugging.

With these steps completed, you can now connect your data to an assistant or structure that will be able to query it programmatically. See Getting Started with Assistants for more information.

Types of data sources

The following types of data source types are supported.

Web Page

Scrape the text of publicly available web pages by providing their URLs.

Amazon S3

Connect Amazon S3 objects by providing their S3 URIs. Supported file types include PDF, CSV, Markdown, and most text-based file types.

Google Drive

Connect individual Google Drive files or entire folders. Supported file types include Google Apps files such as Docs, Sheets, and Slides, as well as most text-based file types such as PDF, CSV, and Markdown.

Atlassian Confluence

Connect to your Confluence wiki by providing the URL of the site, space, or page.

Data Lake

Connect files from your Griptape Cloud data lake by providing their bucket and asset names. Supported file types include PDF, CSV, and Markdown, and most text-based file types.

Custom Data Source

Connect to any data by selecting a Griptape Cloud structure that is configured as a data source.

Get Started with Data Sources