Reusing functions and nodes¶
A common question on Slack: “I want to run the same logic for several regions / datasets / model variants – what is the Hamilton way?” Hamilton has four answers, and the right one depends on how the variation is shaped.
This page walks through them in order from simplest to most advanced:
Reuse a function module across multiple Drivers – the data is what varies, the dataflow is the same.
Override a module with another that has the same function names – one or two specific functions need to be swapped (e.g. for testing or for a different runtime context).
Use
@subdag– you want the same transformation graph evaluated several times inside one Driver, with different inputs or config.Use
@parameterized_subdag– the variation is large enough that writing one@subdagper case becomes tedious. (Advanced.)
Every code sample below is taken from a runnable example in the examples folder, so you can copy any of them, run them locally, and adapt them.
1. Reuse a function module across multiple Drivers¶
If the dataflow is the same and only the data changes, you do not need any decorator – you just import the function module and build a Driver wherever you need one. This is the most common form of reuse and the one to reach for first.
The
feature_engineering_multiple_contexts
example shows this pattern across an offline ETL and an online FastAPI
service: features.py is written once, then driven from two contexts.
The offline ETL builds a Driver and executes it on a batch of rows; the
online server builds another Driver from the same module and executes it
per-request.
When to reach for this pattern:
The same feature definitions need to run in batch and in a request handler.
You want to share code between training and inference.
You want different teams to consume the same canonical module with their own inputs.
What you do not need: any Hamilton-specific decorator. The reuse is just ordinary Python imports plus building a Driver per context.
2. Override a module to swap same-named functions¶
Sometimes you want most of a dataflow to stay the same and only swap one or two functions – for example, replacing a real data loader with a mock one in tests, or switching between two implementations of the same business rule.
By default Hamilton refuses to build a DAG when two modules define
functions with the same name, because the resulting graph would be
ambiguous. The
module_overrides
example shows how to opt in to a “later wins” rule with
Builder.allow_module_overrides():
examples/module_overrides/module_a.py¶def foo() -> str:
return "This is module a."
examples/module_overrides/module_b.py¶def foo() -> str:
return "This is module b."
examples/module_overrides/run.py¶import module_a
import module_b
from hamilton import driver
if __name__ == "__main__":
dr = (
driver.Builder()
.with_modules(
module_a,
module_b,
)
.allow_module_overrides()
.build()
)
print("builder: ", dr.execute(inputs={}, final_vars=["foo"]))
When allow_module_overrides() is set, the function from the
later-imported module wins, so the example above prints
"This is module b.".
When to reach for this pattern:
You have a stable dataflow but want a small, well-named seam for swapping in a test double, a mock data source, or an environment-specific function.
You want the swap to be visible in the Driver-construction code, rather than buried inside a function or a config flag.
When not to reach for this pattern:
If many functions need to vary, prefer keeping the variations in distinct modules and choosing which one to import. Module overrides are best as a surgical tool.
3. @subdag – repeat the same transform inside one Driver¶
Sometimes you want the same transformation graph evaluated several times inside the same DAG, each time with a different input or configuration – for example, computing unique-user counts at daily / weekly / monthly grains across two regions.
The @subdag decorator from hamilton.function_modifiers does this
declaratively. From the source documentation:
@subdagenables you to rerun components of your DAG with varying parameters. That is, it enables you to “chain” what you could express with a Driver into a single DAG.
The
reusing_functions
example computes unique_users for two regions and three time grains.
The shared logic lives in unique_users.py:
examples/reusing_functions/unique_users.py¶import pandas as pd
_grain_mapping = {"day": "D", "week": "W", "month": "M"}
def _validate_grain(grain: str):
assert grain in ["day", "week", "month"]
def filtered_interactions(website_interactions: pd.DataFrame, region: str) -> pd.DataFrame:
return website_interactions[website_interactions.region == region]
def unique_users(filtered_interactions: pd.DataFrame, grain: str) -> pd.Series:
"""Gives the number of shares traded by the frequency"""
_validate_grain(grain)
return filtered_interactions.resample(_grain_mapping[grain])["user_id"].nunique()
Then in reusable_subdags.py, each @subdag declaration creates one
named instance of that subgraph, with its own inputs and config:
examples/reusing_functions/reusable_subdags.py¶@subdag(
unique_users,
inputs={"grain": value("day")},
config={"region": "US"},
)
def daily_unique_users_US(unique_users: pd.Series) -> pd.Series:
return unique_users
Each decorated function:
Takes the output of its sub-DAG as its argument. Above, the sub-DAG ends in
unique_users, so the wrapping function receivesunique_users: pd.Seriesand returns it (perhaps after post-processing).Receives
inputs={"grain": value("day")}– this binds the sub-DAG inputgrainto the literal"day"for this instance only.Receives
config={"region": "US"}– this scopes Hamilton’s@config.whenselection to"US"for this sub-DAG.
The same module then defines five more analogous functions (weekly_*,
monthly_*, the CA variants), giving twelve nodes that all reuse the
same underlying definitions.
Two parameters worth knowing:
namespace– a string prefix for the nodes that@subdagmaterialises. By default Hamilton uses the wrapping function’s name, which is normally what you want.external_inputs– declare any function parameter that comes from outside the sub-DAG (e.g. from the surrounding DAG). This makes the boundary between the sub-DAG and its surroundings explicit.
When to reach for this pattern:
You want one Driver, one visualised DAG, and one
executecall to produce all the variants – rather than a Pythonforloop over many Drivers in your application code.You want lineage and execution metadata for every variant captured by Hamilton, not by a wrapper script.
4. @parameterized_subdag – many subdags at once (advanced)¶
If you have many subdags that differ only along a small number of
parameters, writing one @subdag declaration per case becomes verbose.
@parameterized_subdag is syntactic sugar that produces several subdags
from a single decorator – analogous to how @parameterize produces
several nodes from one function.
From the source documentation:
@parameterized_subdag(
feature_modules,
from_datasource_1={"inputs": {"data": value("datasource_1.csv")}},
from_datasource_2={"inputs": {"data": value("datasource_2.csv")}},
from_datasource_3={
"inputs": {"data": value("datasource_3.csv")},
"config": {"filter": "only_even_client_ids"},
},
)
def feature_engineering(feature_df: pd.DataFrame) -> pd.DataFrame:
return feature_df
Each entry below the decorator becomes one subdag, all built from the same
feature_modules but with different inputs / config.
The Hamilton source itself includes a deliberate warning on this decorator:
Think about whether this feature is really the one you want – often times, verbose, static DAGs are far more readable than very concise, highly parameterized DAGs.
In practice: prefer the explicit form from section 3 until the repetition
genuinely hurts. Reach for @parameterized_subdag when the parameter
list comes from elsewhere (e.g. a config file resolved with @resolve)
or when you have a dozen-plus near-identical subdags.
The full reference for both decorators lives at:
Choosing between the four patterns¶
A short decision tree:
The data varies, the code does not → just build another Driver from the same module (section 1).
One or two named functions need to be swapped → put the swaps in another module and use
allow_module_overrides()(section 2).You want N copies of the same transform graph in one DAG → use
@subdag(section 3).You have many copies and the parameter list is itself data → consider
@parameterized_subdag(section 4).
In practice, most production Hamilton projects rely heavily on (1), use (2) sparingly for testing seams, reach for (3) when modeling per-segment or per-grain pipelines, and treat (4) as an advanced tool.
Where to go from here¶
Walk through the runnable examples linked above: feature_engineering_multiple_contexts, module_overrides, and reusing_functions.
Read Code Organization for the module layout that makes these patterns natural.
For an end-to-end deep-dive on subdags and reuse, see the Hamilton March 2024 Meetup tutorial notebook.