All AI-powered software applications require data to perform their jobs. Large language models (LLMs) are already infused with a considerable volume of publicly accessible information, and so a basic chatbot application may not need anything other than a user's input 'prompt' to generate a useful output.
However, in many cases, you need an application to use highly specific and detailed data, data that is very recent or constantly updating, or sensitive data that simply isn't available to LLMs.
Here are some examples:
Enterprise wiki to power a search assistant
Support articles for customer service automation
Customer feedback to perform sentiment analysis
Product details to generate purchase recommendations
News and press releases for summarization & fact checking
Financial reports for cost analysis
Technical documents for compliance & risk assessment
In these cases, you will need to connect external data sources to your application.
What are data sources?
Data sources allow you to bring your own data to Griptape Cloud. You point us at your data, and we make it accessible to LLM-powered applications.
Connecting to external data -- by creating a data source -- is the first step of building a retrieval-powered AI application. Once you create a data source, you can make it available to your application by adding it to a knowledge base.
Griptape data sources extract, ingest, and prepare your data so that it can be retrieved and used by LLMs. This is an important step because LLMs work best with data when it is represented in a particular way. These formats often differ from how the information is presented to human users or even other software applications. For example, the text of a web page must be cleaned to remove extraneous information, annotated with metadata, segmented into chunks, and converted into vector embeddings before it can be stored in a suitable database.
Typically, developers must deploy and operate this process themselves. It can be time consuming, error-prone, and costly. In Griptape Cloud, this process is automated for you.
Developer guide resource: Creating data sources
How to create a data source
Follow these steps to create a data source. For this example, we will create a data source from a web page.
Navigate to the Data Sources screen.
Click
Create data source
.Select a type of data source. For this example, choose
Web Page
.Give your data source a name and a description (optional).
Enter the URL of a web page that you want to use as a data source, for example https://www.griptape.ai.
Click
Create
to submit the form.
What's happening?
Once you have created the data source, we will automatically begin the process of extracting, cleaning, transforming, and storing your data into a data lake so that it can subsequently be loaded into an LLM-compatible database index. This process is known as a data job. It can take just a few seconds or several minutes or more, depending on how much data the source contains.
While this job is in progress, you will be directed to the data source detail page where you can observe the job status as well as view and edit details such as the name, description, and source URLs.
When your underlying data changes, you can select Refresh from the Actions menu to update your data source. Additionally, you can schedule periodic updates to your data source. This can be helpful for sources that update frequently.
How to use a data source
The next step of using your data source is making it available to an application for data retrieval. To do this, add it to a knowledge base.
Navigate to the Knowledge Bases screen.
Click
Create knowledge base
.Select the Griptape Cloud knowledge base type.
Give your knowledge base a name and a description (optional).
Select the data source(s) you want to include in the knowledge base.
Click
Create
to submit the form.
You will be directed to the knowledge base detail page while the knowledge base job proceeds. This typically takes just a few moments. Once your knowledge base is ready, the data it contains becomes available for applications to retrieve via Griptape assistants, or structures such as agents.
You can perform a test query by selecting the Query
tab and entering some information that you know is in your data. The result will be a 'raw' response that contains the embedded text and other query parameters. This feature is useful for quick testing and debugging.
With these steps completed, you can now connect your data to an assistant or structure that will be able to query it programmatically. See Getting Started with Assistants for more information.
Types of data sources
The following types of data source types are supported.
Web Page
Scrape the text of publicly available web pages by providing their URLs.
Amazon S3
Connect Amazon S3 objects by providing their S3 URIs. Supported file types include PDF, CSV, Markdown, and most text-based file types.
Google Drive
Connect individual Google Drive files or entire folders. Supported file types include Google Apps files such as Docs, Sheets, and Slides, as well as most text-based file types such as PDF, CSV, and Markdown.
Atlassian Confluence
Connect to your Confluence wiki by providing the URL of the site, space, or page.
Data Lake
Connect files from your Griptape Cloud data lake by providing their bucket and asset names. Supported file types include PDF, CSV, and Markdown, and most text-based file types.
Custom Data Source
Connect to any data by selecting a Griptape Cloud structure that is configured as a data source.