Reusing functions and nodes¶

A common question on Slack: “I want to run the same logic for several regions / datasets / model variants – what is the Hamilton way?” Hamilton has four answers, and the right one depends on how the variation is shaped.

This page walks through them in order from simplest to most advanced:

  1. Reuse a function module across multiple Drivers – the data is what varies, the dataflow is the same.

  2. Override a module with another that has the same function names – one or two specific functions need to be swapped (e.g. for testing or for a different runtime context).

  3. Use @subdag – you want the same transformation graph evaluated several times inside one Driver, with different inputs or config.

  4. Use @parameterized_subdag – the variation is large enough that writing one @subdag per case becomes tedious. (Advanced.)

Every code sample below is taken from a runnable example in the examples folder, so you can copy any of them, run them locally, and adapt them.

1. Reuse a function module across multiple Drivers¶

If the dataflow is the same and only the data changes, you do not need any decorator – you just import the function module and build a Driver wherever you need one. This is the most common form of reuse and the one to reach for first.

The feature_engineering_multiple_contexts example shows this pattern across an offline ETL and an online FastAPI service: features.py is written once, then driven from two contexts. The offline ETL builds a Driver and executes it on a batch of rows; the online server builds another Driver from the same module and executes it per-request.

When to reach for this pattern:

  • The same feature definitions need to run in batch and in a request handler.

  • You want to share code between training and inference.

  • You want different teams to consume the same canonical module with their own inputs.

What you do not need: any Hamilton-specific decorator. The reuse is just ordinary Python imports plus building a Driver per context.

2. Override a module to swap same-named functions¶

Sometimes you want most of a dataflow to stay the same and only swap one or two functions – for example, replacing a real data loader with a mock one in tests, or switching between two implementations of the same business rule.

By default Hamilton refuses to build a DAG when two modules define functions with the same name, because the resulting graph would be ambiguous. The module_overrides example shows how to opt in to a “later wins” rule with Builder.allow_module_overrides():

examples/module_overrides/module_a.py¶
def foo() -> str:
    return "This is module a."
examples/module_overrides/module_b.py¶
def foo() -> str:
    return "This is module b."
examples/module_overrides/run.py¶
import module_a
import module_b

from hamilton import driver

if __name__ == "__main__":
    dr = (
        driver.Builder()
        .with_modules(
            module_a,
            module_b,
        )
        .allow_module_overrides()
        .build()
    )

    print("builder: ", dr.execute(inputs={}, final_vars=["foo"]))

When allow_module_overrides() is set, the function from the later-imported module wins, so the example above prints "This is module b.".

When to reach for this pattern:

  • You have a stable dataflow but want a small, well-named seam for swapping in a test double, a mock data source, or an environment-specific function.

  • You want the swap to be visible in the Driver-construction code, rather than buried inside a function or a config flag.

When not to reach for this pattern:

  • If many functions need to vary, prefer keeping the variations in distinct modules and choosing which one to import. Module overrides are best as a surgical tool.

3. @subdag – repeat the same transform inside one Driver¶

Sometimes you want the same transformation graph evaluated several times inside the same DAG, each time with a different input or configuration – for example, computing unique-user counts at daily / weekly / monthly grains across two regions.

The @subdag decorator from hamilton.function_modifiers does this declaratively. From the source documentation:

@subdag enables you to rerun components of your DAG with varying parameters. That is, it enables you to “chain” what you could express with a Driver into a single DAG.

The reusing_functions example computes unique_users for two regions and three time grains. The shared logic lives in unique_users.py:

examples/reusing_functions/unique_users.py¶
import pandas as pd

_grain_mapping = {"day": "D", "week": "W", "month": "M"}


def _validate_grain(grain: str):
    assert grain in ["day", "week", "month"]


def filtered_interactions(website_interactions: pd.DataFrame, region: str) -> pd.DataFrame:
    return website_interactions[website_interactions.region == region]


def unique_users(filtered_interactions: pd.DataFrame, grain: str) -> pd.Series:
    """Gives the number of shares traded by the frequency"""
    _validate_grain(grain)
    return filtered_interactions.resample(_grain_mapping[grain])["user_id"].nunique()

Then in reusable_subdags.py, each @subdag declaration creates one named instance of that subgraph, with its own inputs and config:

One @subdag invocation from examples/reusing_functions/reusable_subdags.py¶
@subdag(
    unique_users,
    inputs={"grain": value("day")},
    config={"region": "US"},
)
def daily_unique_users_US(unique_users: pd.Series) -> pd.Series:
    return unique_users

Each decorated function:

  • Takes the output of its sub-DAG as its argument. Above, the sub-DAG ends in unique_users, so the wrapping function receives unique_users: pd.Series and returns it (perhaps after post-processing).

  • Receives inputs={"grain": value("day")} – this binds the sub-DAG input grain to the literal "day" for this instance only.

  • Receives config={"region": "US"} – this scopes Hamilton’s @config.when selection to "US" for this sub-DAG.

The same module then defines five more analogous functions (weekly_*, monthly_*, the CA variants), giving twelve nodes that all reuse the same underlying definitions.

Two parameters worth knowing:

  • namespace – a string prefix for the nodes that @subdag materialises. By default Hamilton uses the wrapping function’s name, which is normally what you want.

  • external_inputs – declare any function parameter that comes from outside the sub-DAG (e.g. from the surrounding DAG). This makes the boundary between the sub-DAG and its surroundings explicit.

When to reach for this pattern:

  • You want one Driver, one visualised DAG, and one execute call to produce all the variants – rather than a Python for loop over many Drivers in your application code.

  • You want lineage and execution metadata for every variant captured by Hamilton, not by a wrapper script.

4. @parameterized_subdag – many subdags at once (advanced)¶

If you have many subdags that differ only along a small number of parameters, writing one @subdag declaration per case becomes verbose. @parameterized_subdag is syntactic sugar that produces several subdags from a single decorator – analogous to how @parameterize produces several nodes from one function.

From the source documentation:

@parameterized_subdag(
    feature_modules,
    from_datasource_1={"inputs": {"data": value("datasource_1.csv")}},
    from_datasource_2={"inputs": {"data": value("datasource_2.csv")}},
    from_datasource_3={
        "inputs": {"data": value("datasource_3.csv")},
        "config": {"filter": "only_even_client_ids"},
    },
)
def feature_engineering(feature_df: pd.DataFrame) -> pd.DataFrame:
    return feature_df

Each entry below the decorator becomes one subdag, all built from the same feature_modules but with different inputs / config.

The Hamilton source itself includes a deliberate warning on this decorator:

Think about whether this feature is really the one you want – often times, verbose, static DAGs are far more readable than very concise, highly parameterized DAGs.

In practice: prefer the explicit form from section 3 until the repetition genuinely hurts. Reach for @parameterized_subdag when the parameter list comes from elsewhere (e.g. a config file resolved with @resolve) or when you have a dozen-plus near-identical subdags.

The full reference for both decorators lives at:

Choosing between the four patterns¶

A short decision tree:

  • The data varies, the code does not → just build another Driver from the same module (section 1).

  • One or two named functions need to be swapped → put the swaps in another module and use allow_module_overrides() (section 2).

  • You want N copies of the same transform graph in one DAG → use @subdag (section 3).

  • You have many copies and the parameter list is itself data → consider @parameterized_subdag (section 4).

In practice, most production Hamilton projects rely heavily on (1), use (2) sparingly for testing seams, reach for (3) when modeling per-segment or per-grain pipelines, and treat (4) as an advanced tool.

Where to go from here¶