Query Optimization in Snowflake (QOSF)

Skyline Query to identify problematic queries in Snowflake

Selecting the correct Warehouse size in Snowflake can be tricky. If your queries are spilling over to the Remote Storage then you either need to re-write the query or increase the Warehouse size. But how do you identify these problematic queries that are running on undersized Warehouse? Identifying these queries is a Maximum Vector problem that be solved using Skyline queries. A Skyline query finds points in a vector space that dominate other points. Here is how a Skyline Query can be used to identify Snowflake queries that are problematic.

image

Skyline Query

with query_history as (
  select 
    query_text
    , bytes_spilled_to_remote_storage
    , decode(warehouse_size, 'X-Small', 1, 'Small', 2, 'Medium', 3, 'Large', 4, 'X-Large', 5) as warehouse_size
    from snowflake.account_usage.query_history
    where warehouse_size is not null and bytes_spilled_to_remote_storage > 1000
    order by start_time desc
    limit 10000
)
select 
  query_text 
  , bytes_spilled_to_remote_storage
  , warehouse_size
from query_history as o
where not exists ( -- Skyline query to identify worst performing queries
  select 1
  from query_history as i
  where i.warehouse_size <= o.warehouse_size and i.bytes_spilled_to_remote_storage >= o.bytes_spilled_to_remote_storage
  and (i.warehouse_size < o.warehouse_size or i.bytes_spilled_to_remote_storage > o.bytes_spilled_to_remote_storage)
);

Query Output

QUERY_TEXT bytes_spilled_to_remote_storage (GB) WAREHOUSE_SIZE
select * from … 4 1
select * from … 5 4

See also