Event Analysis

Hashquery provides out of the box support for event analytics such as funnels and retention curves.

Modeling event analytics

We recommend defining a model’s activity schema using Model.with_activity_schema(). This attaches a semantic description to your model on how to apply event analytics functions to it. It requires:

group – This will split the events for each actor that invokes the event, such as user_id or customer_id.
timestamp – When the event was detected, such as created_at or timestamp.
event_key – Typically this is a column like event_name or event_type denoting the kind of event that occurred.

Funnels

The simplest event analytics primitive is Model.funnel(). Funnel takes a list of events and returns a summary table indicating how many distinct actors (split by activity_schema.group) reached each stage of the funnel.

events.funnel(
  # replace these with the events sequence you care about
  top_of_funnel="users",
  steps=["ad_impression", "visit", "purchase"],
)

May result in a table like the following:

step	count
users	30,040
ad_impression	27,028
visit	8,794
purchase	2,341

This means 30,040 distinct users exist in our dataset, and 27,028 of them were served at least one ad. Of those users, 8,794 went to the website and then 2,341 made at least one purchase.

Note that if a user purchased something, but didn’t see an ad before their purchase, they wouldn’t be included in the funnel. They would only counted in the users row. Users must match every proceeding step in order to be included in step in the funnel. For this reason, funnels strictly decrease as they move downward.

Comparing Funnels

Let’s imagine we want to compare how well one funnel behaves compares to another. For example, we may be running two ad campaigns at once, and curious to see which is resulting in more conversions.

Let’s call seeing the first ad event ad_impression_a and the second ad ad_impression_b. If your events table does not capture this inside of a single column, you might form the expression using an expression like the following, concatenating the event name with the user group:

events = events.with_activity_schema(
  # ...existing
  event_key=func.cases(
    ((attr.event_name == "ad_impression"), attr.event_name + "_" + rel.user.ab_test_group)
    other=attr.event_name
  )
)

Once we have our events set up, let’s define two funnels:

funnel_a = events.funnel(["ad_impression_a", "visit", "purchase"])
funnel_b = events.funnel(["ad_impression_b", "visit", "purchase"])

and then we’ll join them together to form a new table:

funnels = (
  funnel_a.with_join_one(
    funnel_b,
    named="funnel_b",
    condition=func.or_(
      attr.step == rel.funnel_b.step,
      func.and_(
        attr.step == "ad_impression_a",
        rel.funnel_b.step == "ad_impression_b",
      )
    )
  )
  .pick(
    attr.step,
    attr.count.named("A funnel"),
    rel.funnel_b.count.named("B funnel"),
  )
)

step	A funnel	B funnel
count	15,020	15,020
ad_impression	14,308	12,720
visit	2,594	6,200
purchase	2,101	240

Looks like ad B may have been more evocative, as more people clicked into the website, but it lead to less purchases overall. Maybe the ad misinformed users? Meanwhile, while A didn’t capture as many clicks, it lead to a lot of purchases. Perhaps we could make a new ad that has the benefits of both. Time to contact marketing.

Retention/Survival Curves

In future, Hashquery will provide Model.retention to achieve retention and survival curves quickly. For now, you will need to use .match_steps to form your own retention curves.

Advanced: `.match_steps`

Model.match_steps() is the core building block that other analytics functions use. It allows you to efficiently collect and join together an ordered list of events for a user.

Similar to .funnel we pass it an ordered list of steps to match:

events.match_steps([
  "ad_impression",
  "visit",
  "purchase",
])

Unlike .funnel, which aggregates into a summary table, .match_steps actually returns a table aggregated by activity_schema.group. For each row, you have columns representing each event matched by the analysis, modeled as a JOIN to the original events table. Effectively this creates a series of “first after” temporal joins.

That’s a little hard to picture, so let’s draw it out.

Imagine our events table looks like the following, containing a single user. This table now includes all the properties of our event, not just the ones we added to .with_activity_schema.

group	timestamp	event_name	value	event_id
user_a	2024-01-01	ad_impression	-$0.04	a1
user_a	2024-01-02	visit	NULL	b2
user_a	2024-01-03	visit	NULL	c3
user_a	2024-01-04	purchase	+$40	d4
user_a	2024-01-05	ad_impression	-$0.05	e5

Now we run .match_steps(["ad_impression", "purchase"]). For each user, this will construct one row. Let’s look at the user_a row:

group	ad_impression			purchase
	timestamp	value	event_id	timestamp	value	event_id
user_a	2024-01-01	-$0.04	a1	2024-01-04	+$40	d4

As you can see, we selected the first matching step for ad_impression and joined that event in, and then selected the first matching purchase event after that and joined that in. We can access each event’s data through rel.ad_impression and rel.purchase respectively.

For a user which doesn’t match all the steps, that relation’s data will contain all NULLs, similar to the result of a LEFT INNER JOIN like in SQL.

This table is sometimes not useful on its own. However, you can perform lots of neat aggregation on top. For example, we could compute the time between ad and purchase (rel.purchase.timestamp - rel.ad_impression.timestamp), or look for correlations between how much we paid for an ad rel.ad_impression.value, and conversion rate rel.purchase.timestamp != None.