DagsterDocs

Solids#

Solids are the functional unit of work in Dagster. A solid's responsibility is to read its inputs, perform an action, and emit outputs. Multiple solids can be connected to create a Pipeline.

solids

Relevant APIs#

NameDescription
@solidThe decorator used to define solids. The decorated function is called the compute_fn. The decorator returns a SolidDefinition
InputDefinitionInputDefinitions define the inputs to a solid compute function. These are defined on the input_defs argument to the @solid decorator
OutputDefinitionOutputDefinitions define the outputs of a solid compute function. These are defined on the output_defs argument to the @solid decorator
SolidDefinitionClass for solids. You almost never want to use initialize this class directly. Instead, you should use the @solid which returns a SolidDefinition

Overview#

Solids are used to define computations. Solids can later be assembled into Pipelines. Solids generally perform one specific action and are used for batch computations. For example, you can use a solid to:

  • Derive a data set from some other data sets.
  • Execute a database query.
  • Initiate a Spark job in a remote cluster.
  • Query an API and store the result in a data warehouse.
  • Send an email or Slack message.

By default, all solids in a pipeline execute in the same process. In production environments, Dagster is usually configured so that each solid executes in its own process.

Solids have several important properties:

  • Inputs and Outputs: Solids have defined inputs and outputs, which can be optionally typed. These types are validated at runtime.
  • Configurable: Solids can be configured, using a strongly typed configuration system.
  • Dependencies: Solids inputs can depend on the outputs from other solids. A solid will not execute until all of its inputs have been resolved successfully. The dependency structure is defined using a Pipeline.
  • Emit an Event Stream: Solids can emit a stream of structured events, such as AssetMaterializations. These events can be viewed in Dagit, Dagster's UI tool.
  • Individually testable: See the testing page for more detail.

Defining a solid#

To define a solid, use the @solid decorator. The decorated function is called the compute_fn.

@solid
def my_solid():
    return "hello"

Inputs and Outputs#

Each solid has a set of inputs and outputs, which define the data it consumes and produces. Inputs and outputs are used to define dependencies between solids and to pass data between solids.

Both definitions have a few important properties:

  • They are named.
  • They are optionally typed. These types are validated at runtime.
  • (Advanced) They can be linked to an IOManager, which defines how the output or input is stored and loaded. See the IOManager concept page for more info.

Inputs#

Inputs are passed as arguments to a solid's compute_fn. The value of an input can be passed from the output of another solid, or stubbed (hardcoded) using config.

The most common way to define inputs is just to add arguments to the decorated function:

@solid
def my_input_solid(abc, xyz):
    pass

A solid only starts to execute once all of its inputs have been resolved. Inputs can be resolved in two ways:

  • The upstream output that the input depends on has been successfully emitted and stored.
  • The input was stubbed through config.

You can use a Dagster Type to provide a function that validates a solid's input every time the solid runs. In this case, you use InputDefinitions corresponding to the decorated function arguments.

MyDagsterType = DagsterType(type_check_fn=lambda _, value: value % 2 == 0, name="MyDagsterType")


@solid(input_defs=[InputDefinition(name="abc", dagster_type=MyDagsterType)])
def my_typed_input_solid(abc):
    pass

Outputs#

Outputs are yielded from a solid's compute_fn. By default, all solids have a single output called "result".

When you have one output, you can return the output value directly.

@solid
def my_output_solid():
    return 5

To define multiple outputs, or to use a different output name than "result", you can provide OutputDefinitions to the @solid decorator.

When you have more than one output, you must yield an instance of the Output class to disambiguate between outputs.

@solid(
    output_defs=[
        OutputDefinition(name="first_output"),
        OutputDefinition(name="second_output"),
    ],
)
def my_multi_output_solid():
    yield Output(5, output_name="first_output")
    yield Output(6, output_name="second_output")

Like inputs, outputs can also have Dagster Types.

Solid Context#

When writing a solid, users can optionally provide a first parameter, context. When this parameter is supplied, Dagster will supply a context object to the body of the solid, which is an instance of SolidExecutionContext. The context provides access to:

  • Solid configuration (context.solid_config)
  • Loggers (context.log)
  • Resources (context.resources)
  • The current run ID: (context.run_id)

For example, to access the logger and log a info message:

@solid(config_schema={"name": str})
def context_solid(context):
    name = context.solid_config["name"]
    context.log.info(f"My name is {name}")

Solid Configuration#

All definitions in dagster expose a config_schema, making them configurable and parameterizable. The configuration system is explained in detail on Config Schema.

Solid definitions can specify a config_schema for the solid's configuration. The configuration is accessible through the solid context at runtime. Therefore, solid configuration can be used to specify solid behavior at runtime, making solids more flexible and reusable.

For example, we can define a solid where the API endpoint it queries is define through it's configuration:

@solid(config_schema={"api_endpoint": str})
def my_configurable_solid(context):
    api_endpoint = context.solid_config["api_endpoint"]
    data = requests.get(f"{api_endpoint}/data").json()
    return data

Using a solid#

Solids are used within a @pipeline. You can see more information on the Pipelines page. You can also execute a single solid, usually within a test context, using the execute_solid function. More information can be found at Testing single solid execution

Patterns#

Solid Factory#

You may find the need to create utilities that help generate solids. In most cases, you should parameterize solid behavior by adding solid configuration. You should reach for this pattern if you find yourself needing to vary the arguments to the @solid decorator or SolidDefinition themselves, since they cannot be modified based on solid configuration.

To create a solid factory, you define a function that returns a SolidDefinition, either directly or by decorating a function with the solid dectorator.

def x_solid(
    arg,
    name="default_name",
    input_defs=None,
    **kwargs,
):
    """
    Args:
        args (any): One or more arguments used to generate the nwe solid
        name (str): The name of the new solid.
        input_defs (list[InputDefinition]): Any input definitions for the new solid. Default: None.

    Returns:
        function: The new solid.
    """

    @solid(name=name, input_defs=input_defs or [InputDefinition("start", Nothing)], **kwargs)
    def _x_solid(context):
        # Solid logic here
        pass

    return _x_solid

FAQ#

Why "Solid"?#

Why is a solid called a "solid"? It is a journey from a novel concept, to a familiar acronym, and back to a word.

In a data management system, there are two broad categories of data: source data––meaning the data directly inputted by a user, gathered from an uncontrolled external system, or generated directly by a device––and computed data––data produced by computation over other data. Management of computed data is the primary concern of Dagster. Another name for computed data would be software-structured data. Or SSD. Given that SSD is already a well-known acronym for Solid State Drives we named our core concept for software-structured data a Solid.