Models

Attention

Hashquery is currently in public preview. Here are some caveats to keep in mind for early adopters:

Backwards compatibility may not be preserved between version upgrades, and you may need to update your package regularly.
There may be differences in the SQL logic generated by hashquery compared to the Hashboard app.
You may encounter dialect-specific SQL syntax errors or other unexpected errors. If you come across one, let us know by reporting an issue.

class Model( connection=None, table=None, *, schema=None, sql_query=None, sql_query_refs: Dict[str, Model | Callable[[str], Model]] | None = None, )

with_connection( connection: Connection, ) → Model: Returns a new Model which runs inside of the provided Connection. If the model already had a connection attached, this overwrites it.

with_source( table=None, *, schema=None, sql_query=None, sql_query_refs=None, ) → Model: Returns a new Model which uses the provided content as the underlying table. If the model already had a source attached, this overwrites it.

with_attributes(

*args: ColumnExpression | str,

**kwargs: ColumnExpression | str,

) → Model: Returns a new Model with the provided column expressions included as attributes, accessible via attr.<name>. If a string is passed, it will be interpreted as a column name (aka column(str)).

with_primary_key( expression: ColumnExpression, ) → Model: Returns a new Model with the provided column expression configured as the primary key. This should be a unique value across all records in the source.

with_measures(

*args: ColumnExpression,

**kwargs: ColumnExpression,

) → Model: Returns a new Model with the provided column expressions included as measure definitions, accessible via msr.<name>. This does not perform any aggregation on its own, this only attaches a definition for later use.

Returns a new Model with a new property which can be used to reference the properties of the joined Model. Records are aligned using foreign_key and/or condition. Attributes on joined relations can be referenced with rel.<name>.<attr_name>.

Similar to with_measures and with_attributes, with_join_one has no performance cost on its own. No JOIN statement is added to queries unless the relation is actually referenced.

If no records match, NULL values are filled in for the missing columns, unless drop_unmatched=True is passed.

with_activity_schema( *, group: ColumnExpression, timestamp: ColumnExpression, event_key: ColumnExpression, ) → Model

Returns a new Model configured for event analysis.

Parameters:

group – Used to split event sequences into distinct groups. Typically this is a single attribute, representing a unique value for each actor that invokes the event, such as user_id or customer_id.
timestamp – Column used to order events. Typically this is a timestamp representing when the event was detected, such as created_at or timestamp.
event_key – A column representing the name of the event. Typically this is a column like event_name or event_type.

aggregate( *, measures: List[ColumnExpression] = None, groups: List[ColumnExpression] = None, ) → Model: Returns a new model aggregated into measures split up by groups. Analogous to SELECT *groups, *measures FROM … GROUP BY *groups.

Returns a new Model with a new property that represents the records matched to a sequence of steps, aggregated by group.

Parameters:

steps – The sequence of steps to analyze. If a string is passed, it will be matched to values in the event_key column. If a boolean column expression is passed, an event will be considered a match if it passes the condition. A string step value can be a tuple to denote (step_value, output_name) which can be used to provide unique names when matching the same step multiple times.
group – The column to group the analysis by. This is typically a unique identifier for a user/customer/etc.
timestamp – The temporal column to order the events by.
event_key – The column representing the name of the event.
partition_start_events – A list of column expressions to partition the funnel analysis by. These grouping expressions are applied only to the first event in the journey.
time_limit – The maximum time allowed between the initial step and any subsequent step. If a user takes longer than this time for a given step, future steps are not matched.

Useful for defining funnels, retention, or temporal joins.

Returns a new Model which performs a funnel analysis on the provided sequence of steps.

Parameters:

steps – The sequence of steps to analyze. If a string is passed, it will be matched to values in the event_key column. If a boolean column expression is passed, an event will be considered a match if it passes the condition. A string step value can be a tuple to denote (step_value, output_name) which can be used to provide unique names when matching the same step multiple times.
group – The column to group the funnel analysis by. This is typically a unique identifier for a user/customer/etc.
timestamp – The temporal column to order the events by.
event_key – The column representing the name of the event.
time_limit –

The maximum time allowed between the initial funnel step and any subsequent step.
If a user takes longer than this time for a given step, they are not counted in the

funnel for that step.
partition_start_events – A list of column expressions to partition the first events in the funnel analysis by. This can result in single entities being evaluated and counted multiple times in the funnel analysis – once per value of the partitions.
partition_matches – A list of column expressions to group a cohort of users together. The funnel will split the aggregated counts of each step into separate groups for each of these expressions. This can be used to further break out the entities flowing through the funnel.
top_of_funnel –
Determines where and how the funnel “starts”. If an index, the funnel begins at that step’s index. All steps will be matched, this only affects the output table. If a string, the funnel includes an extra step which represents the count of all evaluated entities.

The name will match the passed string.

The default is 0, meaning the funnel starts on the first step.

Example:

events # this is presorted only for clarity, it need not be sorted
'''
user_id     event                   timestamp
----------------------------------------------
0           ad_impression           2024-01-01
0           visit                   2024-01-02
0           purchase                2024-01-04

1           ad_impression           2024-01-01
1           visit                   2024-01-02
1           purchase                2024-01-03
1           purchase                2024-01-04

2           ad_impression           2024-01-01
2           visit                   2024-01-02

3           ad_impression           2024-01-01
3           visit                   2024-01-02

4           ad_impression           2024-01-01

5           visit                   2024-01-01
5           purchase                2024-01-02

6           other_event             2024-01-01
'''

events.funnel(
    top_of_funnel="users",
    steps=["ad_impression", "visit", "purchase"]
)
'''
step                 count
------------------------------
users                7
ad_impression        5
visit                5
purchase             2
'''
# `users` is 7 because there are 7 unique users.
# `ad_impression` is 5 because of those 7 unique users, 5 of them saw an ad:
#        This is users 0, 1, 2, 3, and 4.
#        Users 5 and 6 did not see ads, so they are not included.
# `visit` is 4 because of the 5 users who saw an ad, 4 of them went on to visit:
#        This is users 0, 1, 2, and 3.
#        User 5 visited, but not after seeing an ad, so they are not included in the funnel.
# `purchase` is 2 because of the 4 users who saw an ad, then visited, 2 of them went on to purchase:
#        This is users 0 and 1. User 1 made two purchases but is only counted once.
#        User 5 purchased, but not after seeing an ad and visiting, so they are not included in the funnel.

Returns a new Model which performs a funnel analysis on the provided sequence of steps, and computes the conversion rate.

Parameters:

steps – The sequence of steps to analyze. If a string is passed, it will be matched to values in the event_key column. If a boolean column expression is passed, an event will be considered a match if it passes the condition. A string step value can be a tuple to denote (step_value, output_name) which can be used to provide unique names when matching the same step multiple times.
group – The column to group the funnel analysis by. This is typically a unique identifier for a user/customer/etc.
timestamp – The temporal column to order the events by.
event_key – The column representing the name of the event.
time_limit –

The maximum time allowed between the initial funnel step and any subsequent step.
If a user takes longer than this time for a given step, they are not counted in the

funnel for that step.
partition_start_events – A list of column expressions to partition the first events in the funnel analysis by. This can result in single entities being evaluated and counted multiple times in the funnel analysis – once per value of the partitions.
partition_matches – A list of named boolean conditions to group a cohort of users together. The funnel will split the aggregated counts of each step into separate groups for each of these expressions. This can be used to further break out the entities flowing through the funnel.

pick( *columns: ColumnExpression, ) → Model: Returns a new Model with only the included attributes.

filter( condition: ColumnExpression, ) → Model: Returns a new Model with records filtered to only those which match the given condition.

sort( sort: ColumnExpression, dir: Literal['asc', 'desc'] = 'asc', nulls: Literal['auto', 'first', 'last'] = 'auto', ) → Model: Returns a new Model with records ordered by the provided column. Sort direction dir can be either “asc” or “desc” (defaults to “asc”). Nulls ordering nulls can be “first”, “last”, or “auto” (“first” when ascending, “last” when descending) (defaults to “auto”)

limit(count: int, *, offset: int | None = 0) → Model: Returns a new Model with only the first N records.

union_all(other: Model) → Model

Returns a new model with its records merged with another Model, using a sql UNION ALL operator. This always results in a new CTE, so any joins will be _flattened_.

The columns of each model must exactly align – if they do not, you may want to use .pick on one or both models to reduce them to a series of columns which do.

fold( ids: List[ColumnExpression], values: List[ColumnExpression], key_name: str | None = 'key', value_name: str | None = 'value', ) → Model

Transforms a N-separate column expressions into a key/value pair, where the key_name column becomes the name of the input column expression and value_name becomes the result value. All values expressions must have the same type. The ids columns are untouched.

This multiplies the count of records by len(values). This removes any attribute not specified in ids or values.

This can be used to “unpivot” or “melt” a dataset, from a wide format to a long format.

For example:

sales
'''
year        income       expenses
---------------------------------
2023        $150K        $30K
2024        $500K        $130K
'''

sales.fold(
    ids=[a.year],
    values=[a.income, a.expenses],
    key_name="type",
    value_name="value"
)
'''
year        type         value
------------------------------
2023        income       $150K
2024        income       $500K
2023        expenses     $30K
2024        expenses     $130K
'''

run( *, freshness: datetime | Literal['latest'] | None = None, print_warnings: bool = True, print_exec_stats: bool = False, ): Fetches the final table for the model, returning a RunResults structure which contains the executed sql query, the results, and additional metadata.

df( *, freshness: datetime | Literal['latest'] | None = None, print_warnings: bool = True, print_exec_stats: bool = False, ): Fetches the final table for the model as a pandas’ dataframe. This compiles and runs a query within the model’s database, and returns an object which can be used to view result rows and query metadata.

sql( *, freshness: datetime | Literal['latest'] | None = None, print_warnings: bool = True, )

Compiles the SQL that would be run if you executed this Model with run and returns it as a string. Nothing will be sent to the database.

The returned SQL string will not include parameterization, and so it may be prone to SQL injection if you were to execute it directly. If your intent is to execute this query, use .run or .df instead.

with_custom_meta(name: str, value: Any) → Model: Returns a new Model with the custom metadata attached. Hashquery will never read or write to this key, making it a good spot to put any custom configuration or encode semantic information about the Model which you want to use.

get_custom_meta(name: str): Returns a value from the custom metadata dictionary for this model, or None if the key does not exist. You can set custom metadata properties using .with_custom_meta().

class Source: Represents the underlying data table for a Model. Consumers should not interact with this class directly, instead modifying the table through methods on the model.

class RunResults(*, print_warnings: bool = True, print_exec_stats: bool = False)

Represents the records and metadata from an invocation of run.

property sql_query: str: The SQL query that was executed.

property df: DataFrame: The result records as a pandas DataFrame.

property py_records: List[Dict]: The result records as a Python list of dictionaries.

exception RunResultsError(msg: str, phase: Literal['compile', 'data'])

Indicates a problem occurred when executing a Model with run or compile_sql.

The .phase property will indicate whether this error occurred during SQL compilation or data fetching.