How to Build Your Own ChatGPT Using Enterprise Data

Chatting with your Large Datasets using Azure Cognitive Search, Azure Open AI, and ChatGPT

In today’s blog, we’re going to cover how you can leverage Azure Cognitive Search, Azure Open AI, and ChatGPT to sift through large amounts of structured and unstructured data.

Introduction

ChatGPT, OpenAI’s advanced language model, is an incredibly powerful tool when it comes to answering queries or generating human-like text. But what if you want to interact with a large amount of your own data, perhaps data that is specific to your organization and not necessarily available on the Internet? This is where Cognitive Search comes in, which can help you index and search your data using advanced AI models.

In this guide, we will explain how you can use these powerful tools to build a web application that enables you to chat with your own data.

How It Works

Here’s an overview of the architecture of the system we’re going to implement:

You start by asking a question through a web application interface.
The question is then forwarded to ChatGPT, which generates a query.
This query is sent to Azure Cognitive Search, which fetches relevant data from your datasets.
This data is brought back into the model, which then generates an answer to the initial question based on this relevant data.
You then receive a response to your question from the ChatGPT model.

Setting Up Your Environment

To get started, you need to have Azure OpenAI and Azure Cognitive Search enabled in your Azure subscription. Once these services are ready, you can leverage this demo repository, which will set up the necessary infrastructure for you.

The repository deploys several resources including

Azure OpenAI service
Azure Cognitive Search
Azure App Service plan
Azure App Service
Azure Form recognizer
Azure Blob Storage, with some PDF files as sample data for the demo.

You can deploy the repository via GitHub Codespaces or VS Code Remote Containers. For the purpose of this guide, we will use GitHub Codespaces.

Customizing Your Data

Before deploying the repository, you might want to replace the sample data with your own. To do this:

Go to the data folder in the repository.
Delete the existing PDF files.

Upload your own PDF files by right-clicking on the data folder and selecting Upload.

Remember that the data you upload here is what you will be able to query using the web application.

Customizing Your ChatGPT Prompt

In the notebook folder, there’s a notebook named Chat-Retrieve-Refine.ipynb. This notebook contains the prompt you will be using for your ChatGPT model. By default, the prompt is set up to ask questions about a healthcare plan, which is related to the sample data provided. However, you can customize the prompt to suit your needs.

Deploying the Repository

Once you’ve set up your data and prompt, it’s time to deploy the repository.

Open the GitHub Codespaces and initiate a new environment using the azd init azure-search-openai-demo command.

Define an environment name, select your subscription, and specify the location.
Run azd up to deploy the project.

The deployment creates a new resource group and deploys all the services, including the ChatGPT model. This process may take several minutes.

Let’s validate the ChatGPT-3.5 Module used in the Azure AI to process our company Data, to do so we have to jump to the Azure AI Studio portal (https://oai.azure.com/portal)and look into the module section.

As shown in the screen above we have ChatGPT 3.5 Turbo built in our environment.

Congratulations! now you have a fully functional web application where you can chat with your data using Azure Cognitive Search and ChatGPT. The application fetches relevant information from your dataset and generates an appropriate response using ChatGPT. This can be a powerful tool for organizations dealing with large, complex datasets.

The Benefits of Integrating ChatGPT and Azure Cognitive Search

The integration offers a number of benefits for enterprises, including:

Improved data analysis: It can be used to analyze data in a more sophisticated way than traditional methods. For example, it can be used to identify patterns and trends in data and to generate insights that would be difficult or impossible to find using other methods.
Increased efficiency: It can automate many of the tasks involved in data analysis, such as data cleaning and preparation. This can free up employees to focus on more strategic and value-added activities.
Reduced costs: t can help enterprises to reduce the costs associated with data analysis. For example, it can be used to replace expensive data analysts and consultants.
Improved decision-making: It can help enterprises to make better decisions by providing them with access to insights that would not be available otherwise. This can lead to improved efficiency, profitability, and customer satisfaction.

Use Cases for ChatGPT and Azure Cognitive Search

ChatGPT and Azure Cognitive Search can be used in a variety of use cases, including:

Customer service: It can be used to provide customer service by answering questions, resolving issues, and providing support.
Sales and marketing: It can be used to generate leads, qualify prospects, and close deals.
Product development: It can be used to gather feedback from customers, identify new product opportunities, and improve existing products.
Risk management: It can be used to identify and mitigate risks, and to ensure compliance with regulations.
Fraud detection: ChatGPT can be used to detect fraud and other malicious activity.
Compliance: ChatGPT can be used to ensure compliance with regulations, such as those governing privacy and data protection.

Microsoft Azure OpenAI Service and Data Privacy

Here are some common questions and answers related to data, privacy, and security for the Azure OpenAI Service (ChatGPT):

Q: What type of data does the Azure OpenAI Service process?
A: it processes user-submitted prompts, the completions generated by the service, user-provided training and validation data, and results data from the training process.
Q: How does the Azure OpenAI Service use and store this data?
A: The service uses this data to provide its services, to monitor for misuse, and to maintain the quality and security of its services. It does not store prompts or completions in the model during operations.
Q: Are there any data privacy concerns with using Azure Open AI (ChatGPT)?
A: It doesn’t use customer data to retrain models, training data provided by the customer is only used to fine-tune the customer’s model and is not used by Microsoft to train or improve any Microsoft models.Also, you can use the encryption keys to protect your data. CMK encrypts all customer data stored at rest in the Service (such as data uploaded for fine-tuning)
- Quote from Microsoft:
  
  Prompts and completions. The prompts and completions data may be temporarily stored by the Azure OpenAI Service in the same region as the resource for up to 30 days. This data is encrypted and is only accessible to authorized Microsoft employees for (1) debugging purposes in the event of a failure, and (2) investigating patterns of abuse and misuse to determine if the service is being used in a manner that violates the applicable product terms. Note: When a customer is approved for modified abuse monitoring, prompts and completions data are not stored, and thus Microsoft employees have no access to the data.
Q: What mechanisms does Azure OpenAI have in place for data privacy and security?
A: Azure OpenAI uses encryption, content filtering, and temporary storage (up to 30 days) of prompts and completions to monitor for misuse. It also uses Customer Managed Keys for data encryption.
Q: Can customers opt out of the logging and human review process?
A: Yes, Microsoft allows customers who meet additional Limited Access eligibility criteria to apply to modify the Azure OpenAI content management features.
Q: Does Azure OpenAI use customer data to train their models?
A: No, Microsoft does not use customer data to train, retrain or improve the models in the Azure OpenAI Service.
Q: Is customer data processed by Azure OpenAI sent to OpenAI?
A: No, all customer data sent to Azure OpenAI remains within the Azure OpenAI service.
Q: What happens if Microsoft needs access to customer data?
A: In rare cases where Microsoft personnel needs access to customer data, the Customer Lockbox feature provides an interface for customers to review and approve or reject these requests.
Q: Is customer data logged with content filtering?
A: No, content filtering functions differently than abuse monitoring, and logging or storage of data is not needed.

These answers provide a brief overview, but you can find more detailed information in Microsoft’s data processing, privacy, and security documents for Azure OpenAI.

Azure OpenAI security layers: recap

Security Controls: Azure OpenAI vs OpenAI

Conclusion

The integration of ChatGPT and Azure Cognitive Search offers a powerful and versatile solution for enterprises that are looking to improve their data analysis capabilities. By combining the strengths of these two technologies, businesses can gain a competitive edge by making better decisions, improving customer service, and driving innovation.

Mahmoud A. ATALLAH

Microsoft MVP | Speaker | Azure Service Delivery Lead at Bespin Global MEA, helping customers build successful Azure practices. Talks about #AzureCloud and #AI