h_ray.RayGraphAdapter¶
The graph adapter to delegate execution of the individual nodes in a Apache Hamilton graph to Ray.
- class hamilton.plugins.h_ray.RayGraphAdapter(result_builder: ResultMixin, ray_init_config: Dict[str, Any] = None, shutdown_ray_on_completion: bool = False)¶
Class representing what’s required to make Hamilton run on Ray.
This walks the graph and translates it to run onto Ray.
Use pip install sf-hamilton[ray] to get the dependencies required to run this.
Use this if:
you want to utilize multiple cores on a single machine, or you want to scale to larger data set sizes with a Ray cluster that you can connect to. Note (1): you are still constrained by machine memory size with Ray; you can’t just scale to any dataset size. Note (2): serialization costs can outweigh the benefits of parallelism so you should benchmark your code to see if it’s worth it.
Notes on scaling:¶
Multi-core on single machine ✅
Distributed computation on a Ray cluster ✅
Scales to any size of data ⛔️; you are LIMITED by the memory on the instance/computer 💻.
Function return object types supported:¶
Works for any python object that can be serialized by the Ray framework. ✅
Pandas?¶
⛔️ Ray DOES NOT do anything special about Pandas.
CAVEATS¶
Serialization costs can outweigh the benefits of parallelism, so you should benchmark your code to see if it’s worth it.
DISCLAIMER – this class is experimental, so signature changes are a possibility!
- __init__(result_builder: ResultMixin, ray_init_config: Dict[str, Any] = None, shutdown_ray_on_completion: bool = False)¶
Constructor
You have the ability to pass in a ResultMixin object to the constructor to control the return type that gets produce by running on Ray.
- Parameters:
result_builder – Required. An implementation of base.ResultMixin.
ray_init_config – allows to connect to an existing cluster or start a new one with custom configuration (https://docs.ray.io/en/latest/ray-core/api/doc/ray.init.html)
shutdown_ray_on_completion – by default we leave the cluster open, but we can also shut it down
- do_build_result(outputs: Dict[str, Any]) Any ¶
Builds the result and brings it back to this running process.
- Parameters:
outputs – the dictionary of key -> Union[ray object reference | value]
- Returns:
The type of object returned by self.result_builder.
- static do_check_edge_types_match(type_from: Type, type_to: Type) bool ¶
Method that checks whether two types are equivalent. This is used when the function graph is being created.
- Parameters:
type_from – The type of the node that is the source of the edge.
type_to – The type of the node that is the destination of the edge.
- Return bool:
Whether or not they are equivalent
- do_remote_execute(*, execute_lifecycle_for_node: Callable, node: Node, **kwargs: Dict[str, Any]) Any ¶
Function that is called as we walk the graph to determine how to execute a hamilton function.
- Parameters:
execute_lifecycle_for_node – wrapper function that executes lifecycle hooks and methods
kwargs – the arguments that should be passed to it.
- Returns:
returns a ray object reference.
- static do_validate_input(node_type: Type, input_value: Any) bool ¶
Method that an input value maches an expected type.
- Parameters:
node_type – The type of the node.
input_value – The value that we want to validate.
- Returns:
Whether or not the input value matches the expected type.
- post_graph_execute(*args, **kwargs)¶
We have the option to close the cluster down after execution.