Time series data is not always perfectly aligned. Timestamps may be slightly off, or there may be a delay between cause and effect. This can make connecting two sets of ordered data challenging. ASOF joins are a tool for solving this and other similar problems.
One of the problems that ASOF joins are used to solve is finding the value of a varying property at a specific point in time.
More generally, however, ASOF joins embody some common temporal analytic semantics, which can be cumbersome and slow to implement in standard SQL.
Let’s start with a Stock Portofolio example.
We have a table stock_prices of stock prices with timestamps:
| SYMBOL | PRICE | STOCK_PRICE_ASOF |
|---|---|---|
| TSLA | 9.93 | 2024-02-02 14:29:10.753 |
| TSLA | 9.62 | 2024-02-02 14:29:52.806 |
| TSLA | 9.61 | 2024-02-02 14:30:43.379 |
| TSLA | 9.93 | 2024-02-02 14:32:35.397 |
| TSLA | 9.68 | 2024-02-02 14:41:38.263 |
| TSLA | 9.05 | 2024-02-02 14:45:00.480 |
| TSLA | 9.06 | 2024-02-02 14:45:32.556 |
| TSLA | 9.88 | 2024-02-02 15:04:29.979 |
| TSLA | 9.72 | 2024-02-02 15:06:40.404 |
| TSLA | 9.44 | 2024-02-02 15:23:07.893 |
| TSLA | 9.13 | 2024-02-02 15:23:21.083 |
We have another table stock_portfolio containing portfolio holdings at various points in time:
| SYMBOL | NUMBER_OF_SHARES | TRANSACTION_TIMESTAMP |
|---|---|---|
| TSLA | 100.00 | 2024-02-02 14:29:20.264 |
| TSLA | 77.00 | 2024-02-02 14:30:51.040 |
| TSLA | 56.00 | 2024-02-02 14:45:41.192 |
| TSLA | 86.00 | 2024-02-02 15:06:48.475 |
| TSLA | 60.00 | 2024-02-02 15:23:12.587 |
Note that the
transaction_timestampfromstock_portfoliodoes not align with thestock_price_asofin thestock_pricestable.
We can compute the value of each holding at that point in time by finding the most recent price before the holding’s timestamp by using an ASOF join.
select
stock_portfolio.symbol
, price as stock_price
, number_of_shares
, transaction_timestamp
, stock_price_asof
, price * number_of_shares as value
from stock_portfolio
asof join stock_prices
match_condition (stock_portfolio.transaction_timestamp >= stock_prices.stock_price_asof)
on stock_prices.symbol = stock_portfolio.symbol ;
This attaches the value of the holding at that time to each row.
| SYMBOL | STOCK_PRICE | NUMBER_OF_SHARES | TRANSACTION_TIMESTAMP | STOCK_PRICE_ASOF | VALUE |
|---|---|---|---|---|---|
| TSLA | 9.93 | 100.00 | 2024-02-02 14:29:20.264 | 2024-02-02 14:29:10.753 | 993.0000 |
| TSLA | 9.61 | 77.00 | 2024-02-02 14:30:51.040 | 2024-02-02 14:30:43.379 | 739.9700 |
| TSLA | 9.06 | 56.00 | 2024-02-02 14:45:41.192 | 2024-02-02 14:45:32.556 | 507.3600 |
| TSLA | 9.72 | 86.00 | 2024-02-02 15:06:48.475 | 2024-02-02 15:06:40.404 | 835.9200 |
| TSLA | 9.44 | 60.00 | 2024-02-02 15:23:12.587 | 2024-02-02 15:23:07.893 | 566.4000 |
It essentially executes a function defined by looking up nearby values in the stock_prices table.
Standard SQL can implement this kind of join, but you need to use an effectivity aka state table using a windowing function as following:
with stock_price_effectivity as (
select
symbol
, stock_price_asof as start_time
, lead (stock_price_asof) over (partition by symbol order by start_time asc) as end_time
, price
from stock_prices
)
select
stock_portfolio.symbol
, price
, number_of_shares * price as value
from state
inner join stock_portfolio on state.symbol = stock_portfolio.symbol
and stock_portfolio.transaction_timestamp between start_time and end_time;
Note that we are first calculating the
effectivityorstateof stock price at each of the interval using thestock_price_asoftimestamp. This is an expensive operation.
Here is what the output of the stock_price_effectivity will look like:
| SYMBOL | START_TIME | END_TIME | PRICE |
|---|---|---|---|
| TSLA | 2024-02-02 14:29:10.753 | 2024-02-02 14:29:52.806 | 9.93 |
| TSLA | 2024-02-02 14:29:52.806 | 2024-02-02 14:30:43.379 | 9.62 |
| TSLA | 2024-02-02 14:30:43.379 | 2024-02-02 14:32:35.397 | 9.61 |
| TSLA | 2024-02-02 14:32:35.397 | 2024-02-02 14:41:38.263 | 9.93 |
| TSLA | 2024-02-02 14:41:38.263 | 2024-02-02 14:45:00.480 | 9.68 |
| TSLA | 2024-02-02 14:45:00.480 | 2024-02-02 14:45:32.556 | 9.05 |
| TSLA | 2024-02-02 14:45:32.556 | 2024-02-02 15:04:29.979 | 9.06 |
| TSLA | 2024-02-02 15:04:29.979 | 2024-02-02 15:06:40.404 | 9.88 |
| TSLA | 2024-02-02 15:06:40.404 | 2024-02-02 15:23:07.893 | 9.72 |
| TSLA | 2024-02-02 15:23:07.893 | 2024-02-02 15:23:21.083 | 9.44 |
| TSLA | 2024-02-02 15:23:21.083 | 9.13 |