SNOWFLAKE CERTIFIED SOLUTION

Understanding Customer Reviews using Snowflake Cortex

classified_reviews AS (
  SELECT
    review_id,
    review_text,
    AI_CLASSIFY(
      review_text,

      [
        'Sizing issue',
        'Color issue',
        'Fabric quality issue',
        'Washing problem',
        'Pricing issue'
      ]
    ) as classification
  FROM clothing_issue_reviews
)
classified_reviews AS (
  SELECT
    review_id,
    review_text,
    AI_CLASSIFY(
      review_text,

      [
        'Sizing issue',
        'Color issue',
        'Fabric quality issue',
        'Washing problem',
        'Pricing issue'
      ]
    ) as classification
  FROM clothing_issue_reviews
)
classified_reviews AS (
  SELECT
    review_id,
    review_text,
    AI_CLASSIFY(
      review_text,

      [
        'Sizing issue',
        'Color issue',
        'Fabric quality issue',
        'Washing problem',
        'Pricing issue'
      ]
    ) as classification
  FROM clothing_issue_reviews
)
classified_reviews AS (
  SELECT
    review_id,
    review_text,
    AI_CLASSIFY(
      review_text,

      [
        'Sizing issue',
        'Color issue',
        'Fabric quality issue',
        'Washing problem',
        'Pricing issue'
      ]
    ) as classification
  FROM clothing_issue_reviews
)
classified_reviews AS (
  SELECT
    review_id,
    review_text,
    AI_CLASSIFY(
      review_text,

      [
        'Sizing issue',
        'Color issue',
        'Fabric quality issue',
        'Washing problem',
        'Pricing issue'
      ]
    ) as classification
  FROM clothing_issue_reviews
)

Overview

Understanding customer feedback is critical for businesses, but analyzing large volumes of unstructured text can be challenging. In this solution, you will use Cortex AISQL to systematically gain insights from unstructured customer feedback.

This solution template allows you to leverage multiple AISQL functions to answer different use case questions upon customer reviews.

Context

Tasty Bytes is a global e-commerce company selling different merchandise. They collect customer reviews to gain insights into the how their products are performing.

In this notebook, we will leverage multiple AISQL functions to answer different use case questions upon customer reviews.

Step 1:  Set up your environment and data

Let's begin by running the query below. It sets the proper context for this session. It also creates and populates two tables, PRODUCT_REVIEWS and PRODUCT_CATALOG, with sample data for our analysis.

SQL as IMPORT_DATA_SQL


USE ROLE SNOWFLAKE_LEARNING_ROLE;

-- use the existing database and schema
USE DATABASE SNOWFLAKE_LEARNING_DB;

SET schema_name = CONCAT(current_user(), '_CUSTOMER_REVIEW_ANALYSIS_WITH_AISQL');

CREATE SCHEMA IF NOT EXISTS IDENTIFIER($schema_name);

USE SCHEMA IDENTIFIER($schema_name);

/*--
• file format and stage creation
--*/

CREATE OR REPLACE FILE FORMAT csv_ff

  TYPE = 'csv'
  SKIP_HEADER = 1;

CREATE OR REPLACE STAGE s3load

  COMMENT = 'Quickstarts S3 Stage Connection'
  URL = 's3://sfquickstarts/misc/aisql/ecommerce_customer_review/'
  FILE_FORMAT = csv_ff;


/*-- • raw zone table build --*/


CREATE OR REPLACE TABLE customer_data
(
  CUSTOMER_ID VARCHAR(16777216),
  CUSTOMER_SEGMENT VARCHAR(16777216),
  JOIN_DATE DATE,
  LIFETIME_VALUE NUMBER(38,2),
  PREVIOUS_PURCHASES NUMBER(38,0),
  AGE_RANGE VARCHAR(16777216),
  GENDER VARCHAR(16777216),
  PREFERRED_CATEGORY VARCHAR(16777216)
);

CREATE OR REPLACE TABLE product_catalog
(
  PRODUCT_ID VARCHAR(16777216),
  PRODUCT_NAME VARCHAR(16777216),
  CATEGORY VARCHAR(16777216),
  SUBCATEGORY VARCHAR(16777216),
  MANUFACTURER VARCHAR(16777216),
  PRICE NUMBER(38,2),
  RELEASE_DATE DATE,
  REVIEW_COUNT NUMBER(38,0)
);

CREATE OR REPLACE TABLE product_reviews
(
  REVIEW_ID VARCHAR(16777216),
  PRODUCT_ID VARCHAR(16777216),
  CUSTOMER_ID VARCHAR(16777216),
  REVIEW_TEXT VARCHAR(16777216),
  RATING NUMBER(38,0),
  REVIEW_DATE DATE,
  PURCHASE_DATE DATE,
  VERIFIED_PURCHASE BOOLEAN,
  HELPFUL_VOTES NUMBER(38,0)
);

/*-- • raw zone table load --*/

COPY INTO customer_data
FROM @s3load/customer_data.csv
ON_ERROR = CONTINUE;

COPY INTO product_catalog
FROM @s3load/product_catalog.csv
ON_ERROR = CONTINUE;

COPY INTO product_reviews
FROM @s3load/product_reviews.csv
ON_ERROR = CONTINUE;

-- setup completion note
SELECT 'Setup is complete' AS note;

-- Quick preview of the table
SELECT *
FROM product_reviews
LIMIT 25;

SQL as CHECK_DATA_SQL

-- Quick preview of the table 
SELECT *
FROM product_reviews
LIMIT 25;

Step 2: Correlate sentiment with ratings

As a first step, let's perform a quick sanity check. We'll use the SNOWFLAKE.CORTEX.SENTIMENT function to score the sentiment of each review. We can then check if it correlates with the user-provided star rating to see if they align.

SQL as SENTIMENT_CHECK_SQL

WITH EXTRACTED_SENTIMENT AS (
    SELECT 
        RATING,
        SNOWFLAKE.CORTEX.SENTIMENT(REVIEW_TEXT) AS SENTIMENT
    FROM PRODUCT_REVIEWS
)
SELECT CORR(SENTIMENT, RATING) AS SENTIMENT_RATING_CORRELATION
FROM EXTRACTED_SENTIMENT;

Step 3: Find top issues in a category

Now, let's dig deeper. Suppose you want to know what the biggest complaints are for 'Electronics'. You can focus on the ones with negative sentiments, and use AI_AGG to analyze all relevant reviews and aggregate the common themes into a single summary.

-- The text may not display fully in the SQL cell. Please hover or double-click on the SQL cell to view the full text.

SELECT 
  AI_AGG(
    REVIEW_TEXT, 
    'What are the top 3 most common product issues reported in these reviews?'
  ) AS TOP_ISSUES
FROM PRODUCT_REVIEWS pr
JOIN PRODUCT_CATALOG pc ON pr.product_id = pc.product_id
WHERE pc.category = 'Electronics'
AND SNOWFLAKE.CORTEX.SENTIMENT(REVIEW_TEXT) < 0;

Now, we'll print the result to a dataframe for easier reading.

# to view the result
df = AGG_TOP_ISSUES_SQL.to_pandas()
print(df['TOP_ISSUES'].iloc)

Step 4:  Identify the most common issues

To answer this question, we start with filtering to Clothing category. Another way to identify comments that mentioned product issue is to leverage our latest AI_FILTER to conduct filtering using natural language.

The next step we use the AI_AGG function to get a list of all product issues mentioned.

-- Create temporary table on the filtered result to be re-used in next step analytics.

-- Create temporary table on the filtered result to be re-used in next step analytics.
CREATE OR REPLACE TEMP TABLE filtered_product_reviews AS
SELECT *
FROM product_reviews
WHERE AI_FILTER(PROMPT('This review mentions a product issue or complaint: {0}', review_text));

-- Leverage AI_AGG functions to find the common issues mentioned.
-- The text may not display fully in the SQL cell. Please hover around or double check on the SQL cell to view the full text.
SELECT
  AI_AGG(
    review_text,
    'Analyze these clothing product reviews and provide a comprehensive list of all product issues mentioned. Format your response as a bulleted list of issues with their approximate frequency in percentage.'
  ) as clothing_issues
FROM filtered_product_reviews pr
JOIN product_catalog pc ON pr.product_id = pc.product_id
WHERE pc.category = 'Clothing';

Now, we'll print the result to a dataframe for easier reading.

SELECT CLOTHING_ISSUES
FROM COMMON_ISSUE_SQL
LIMIT 1;

Step 5:  Productionalize the pipeline

With the issues suggested through the AI_AGG function pipeline above, we can now leverage AI_CLASSIFY to turn into continuous data pipeline to keep classify the reviews.

WITH clothing_issue_reviews AS (
  SELECT
    pr.review_id,
    pr.review_text
  FROM filtered_product_reviews pr
  JOIN product_catalog pc
    ON pr.product_id = pc.product_id
  WHERE
    pc.category = 'Clothing'
), classified_reviews AS (
  SELECT
    review_id,
    review_text,
    AI_CLASSIFY(
      review_text,
      [
        'Sizing issue',
        'Color issue',
        'Fabric quality issue',
        'Washing problem',
        'Pricing issue'
      ]
    ) AS classification
  FROM clothing_issue_reviews
)
SELECT
  review_id,
  review_text,
  classification:labels::text AS issue_category
FROM classified_reviews;

Step 6:  Generate responses to customer complaints

Finally, let's close the loop. You can use AI_COMPLETE to help your support team draft empathetic and relevant responses to negative reviews, improving customer satisfaction at scale.

WITH clothing_issue_reviews AS (

  SELECT 
    pr.review_id,
    pr.review_text
  FROM filtered_product_reviews pr
  JOIN product_catalog pc ON pr.product_id = pc.product_id
  WHERE pc.category = 'Clothing'
)
SELECT
    review_id,
    review_text,
    AI_COMPLETE('llama4-maverick', 'Please draft a concise response to the customer complaints below. Please only include the draft and nothing else: ' || review_text) as response
  FROM clothing_issue_reviews

Key Takeaways

  • End-to-End Workflow: You can chain Cortex AI functions together (SENTIMENT -> AI_AGG -> AI_CLASSIFY -> AI_COMPLETE) to build a powerful analysis pipeline entirely within Snowflake.

  • Insight from Unstructured Data: You don't need complex data science tools to extract valuable insights from text. All of this was done with familiar SQL.

  • Automate and Scale: By identifying common issues and creating classifiers, you can automate the process of tracking feedback and responding to customers more efficiently.

SNOWFLAKE CERTIFIED SOLUTION

This solution was created by an in-house Snowflake expert and has been verified to work with current Snowflake instances as of the date of publication.

Solution not working as expected? Contact our team for assistance.

SHARE SOLUTION

what’s next?

Explore more developer content and build your skills.