dataloader¶

Reference Documentation

class hamilton.function_modifiers.dataloader¶

Decorator for specifying a data loading function within the Hamilton framework. This decorator is used to annotate functions that load data, allowing them to be treated specially in the Hamilton DAG (Directed Acyclic Graph). The decorated function should return a tuple containing the loaded data and a dictionary of metadata about the loading process.

The dataloader decorator captures loading data metadata and ensures the function’s return type is correctly annotated to be a tuple, where the first element is the loaded data and the second element is a dictionary containing metadata about the data loading process.

Downstream functions need only to depend on the type of data loaded.

Example Usage:¶

Assuming you have a function that loads data from a JSON file and you want to expose the metadata in your Hamilton DAG to be captured in the Hamilton UI / adapters:

import pandas as pd
from hamilton.function_modifiers import dataloader


@dataloader()  # you need ()
def load_json_data(json_path: str = "data/my_data.json") -> tuple[pd.DataFrame, dict]:
    '''Loads a dataframe from a JSON file.

    :return: A tuple containing two dictionaries:
        - The first dictionary contains the loaded JSON data as a dataframe
        - The second dictionary contains metadata about the loading process.
    '''
    # Load the data
    data = pd.read_json(json_path)

    # Metadata about the loading process
    metadata = {"source": json_path, "format": "json"}

    return data, metadata

generate_nodes(fn: Callable, config) → List[Node]¶

Generates two nodes. We have to add tags appropriately.

The first one is just the fn - with a slightly different name. The second one uses the proper function name, but only returns the first part of the tuple that the first returns.

Parameters:

fn
config

Returns:

validate(fn: Callable)¶: Validates that the output type is correctly annotated.