Step Functions とは

複数のAWSサービスやタスクを「ワークフロー」として視覚的に組み立て、実行できるサーバーレスオーケストレーションサービス

With AWS Step Functions, you can create workflows, also called State machines, to build distributed applications, automate processes, orchestrate microservices, and create data and machine learning pipelines.

どういう時に使うのか

1. 複数のLambda関数を順番に実行したい時

例えば「データ取得 → 加工 → 保存 → 通知」のような処理を、Lambda関数をチェーンさせて実行したい場合

2. 条件分岐が必要な時

入力データに応じて処理を分岐させたい場合（Choice state）

3. 並列処理が必要な時

複数の処理を同時に実行し、すべて完了したら次に進む（Parallel state, Map state）

4. エラーハンドリングを堅牢にしたい時

リトライ、タイムアウト、フォールバック処理を宣言的に定義できる

5. 人間の承認ステップが必要な時

ワークフローを一時停止し、人間の承認を待ってから再開

出典: Discover use cases for Step Functions workflows

Use cases for Step Functions include data processing, machine learning, microservice orchestration, and IT and security automation.

何が嬉しいのか

Lambdaオーケストレーターとの比較

Lambda関数内で他のLambda関数を呼び出す方式と比較すると:

観点	Lambda内でオーケストレーション	Step Functions
エラーハンドリング	自前でtry-catch実装	宣言的にRetry/Catch定義
実行時間制限	15分	Standard: 1年、Express: 5分
可視性	ログを追う必要あり	コンソールで視覚的に確認
待機中のコスト	Lambda実行時間として課金	待機中は課金なし
並列処理	自前で実装	Parallel/Map stateで宣言的に

出典: Streamlining AWS Serverless workflows

The benefits of using Step Functions include reduced code complexity, improved maintainability, native AWS service integrations, cost optimization, and long-running processes

具体的なメリット

コード量削減: オーケストレーションロジックをJSONで定義するため、Lambda関数はビジネスロジックに集中できる
視覚的デバッグ: 実行履歴がステップごとに可視化され、どこで失敗したか一目でわかる
220以上のAWSサービス統合: Lambda経由せずに直接DynamoDB、SQS、SNSなどを呼び出せる
状態管理: ワークフローの状態をStep Functionsが管理するため、自前でDB管理不要

Step Functionsを使わない方がいいケース

単純な1つのLambda関数で完結する処理
大量データのCPU集約的な処理 → AWS Batch、EMRの方が適切
複雑なデータ変換 → AWS Glueの方が適切

出典: Orchestrating Lambda functions with Step Functions

The document also discusses scenarios where Step Functions might not be the best fit, such as simple applications, complex data processing, and CPU-intensive workloads.

基本概念・用語集

核となる概念

用語	説明
State Machine（ステートマシン）	ワークフロー全体を定義したもの。JSONで記述され、複数のStateで構成される
Workflow（ワークフロー）	State Machineの別名。ビジネスプロセスを反映した一連のステップ
State（ステート）	ワークフロー内の個々のステップ。入力を受け取り、処理し、出力を次のStateに渡す
Execution（実行）	State Machineを実際に動かしたインスタンス。1つのState Machineから複数のExecutionを同時に実行可能

出典: Learn about state machines in Step Functions

Step Functions is based on state machines, which are also called workflows. Workflows are comprised of a series of event-driven steps.

図解: 用語の関係性

┌─────────────────────────────────────────────────────────────────┐
│                     State Machine (ASLで定義)                    │
│                                                                  │
│  ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌──────────┐     │
│  │  State   │──▶│  State   │──▶│  State   │──▶│  State   │     │
│  │  (Task)  │   │ (Choice) │   │  (Task)  │   │(Succeed) │     │
│  └──────────┘   └──────────┘   └──────────┘   └──────────┘     │
│       │              │              │                           │
│    Input          Input          Input                          │
│    Output         Output         Output                         │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
                    ┌─────────────────┐
                    │   Execution 1   │  ← 実行インスタンス
                    │   Execution 2   │
                    │   Execution 3   │
                    └─────────────────┘

State（ステート）の種類

Stateは大きく2種類に分類される

Task State（タスクステート）

実際の「仕事」を行うState。AWSサービスの呼び出しや外部APIの実行など

出典: Discovering workflow states

Using Task states, or actions in Workflow Studio, you can call third party services, invoke functions, and use hundreds of AWS service endpoints.

Flow State（フローステート）

ワークフローの流れを制御するState。7種類ある

State	役割	ユースケース
Choice	条件分岐	入力値に応じて次のStateを選択
Parallel	並列実行	複数の処理を同時に実行し、すべて完了を待つ
Map	繰り返し処理	配列の各要素に対して同じ処理を実行
Wait	待機	指定時間または指定日時まで一時停止
Pass	パススルー	入力をそのまま出力に渡す（デバッグやデータ変換に使用）
Succeed	成功終了	ワークフローを成功として終了
Fail	失敗終了	ワークフローをエラーとして終了

出典: Discovering workflow states

States are separated in Workflow Studio into Actions, also known as Task states, and seven Flow states.

各フローを実際に試してみる

Amazon States Language（ASL）

State Machineを定義するためのJSON形式の言語

{
  "StartAt": "FirstState",
  "States": {
    "FirstState": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:region:123456789012:function:MyFunction",
      "Next": "SecondState"
    },
    "SecondState": {
      "Type": "Succeed"
    }
  }
}

出典: Using Amazon States Language

The Amazon States Language is a JSON-based, structured language used to define your state machine, a collection of states, that can do work (Task states), determine which states to transition to next (Choice states), stop an execution with an error (Fail states), and so on.

データの流れに関する用語

用語	説明
Input	Stateが受け取るJSONデータ。前のStateの出力または初期入力
Output	Stateが次のStateに渡すJSONデータ
Variables	ワークフロー内で保持できる変数。後続のStateから参照可能

出典: Learn about state machines

Each step can pass data to subsequent steps using variables and state output. Data stored in variables can be used by later steps. State output becomes the input for the very next step.

クエリ言語

データ変換に使用する言語。2種類ある

言語	特徴
JSONata	2024年re:Invent以降の推奨。式・演算子・関数が使える
JSONPath	従来の方式。InputPath, Parameters, ResultPath等のフィールドを使用

サービス統合パターン

Task StateからAWSサービスを呼び出す際の3つのパターン

パターン	動作	ユースケース
Request Response（デフォルト）	HTTPレスポンスを受け取ったら即座に次へ進む	非同期処理の開始
Run a Job (.sync)	ジョブの完了を待つ	Batch、Glue、ECSタスクの完了待ち
Wait for Callback (.waitForTaskToken)	タスクトークンでコールバックを待つ	人間の承認、外部システムからの応答待ち

出典: Learn about state machines

When calling an AWS service, you use one of the following service integration patterns: Request a response (default), Run a job (.sync), Wait for a callback with a task token (.waitForTaskToken)

ワークフロータイプ

タイプ	実行時間	実行レート	課金	ユースケース
Standard	最大1年	2,000/秒	状態遷移ごと	長時間実行、監査が必要な処理
Express	最大5分	100,000/秒	実行回数と時間	高頻度イベント処理、IoT

出典: What is Step Functions?

Standard workflows have exactly-once workflow execution and can run for up to one year. Express workflows have at-least-once workflow execution and can run for up to five minutes.

Standard Workflow

長時間実行・監査可能なワークフロー向けに設計されたタイプ

出典: Choosing workflow type in Step Functions

Standard Workflows are ideal for long-running (up to one year), durable, and auditable workflows. You can retrieve the full execution history using the Step Functions API for up to 90 days after your execution completes.

Standard vs Express の比較

項目	Standard Workflow	Express Workflow
最大実行時間	1年	5分
実行開始レート	約2,000/秒	約100,000/秒
状態遷移レート	約4,000/秒	制限なし
課金単位	状態遷移ごと	実行回数 × 実行時間 × メモリ
実行保証	Exactly-once	At-least-once / At-most-once
実行履歴	90日間保持	CloudWatch Logsに出力

実行保証に関する用語

Exactly-once（厳密に1回）

Standard Workflowの実行保証モデル。各タスクとステートは1回だけ実行される（Retry設定がない限り）

出典: Choosing workflow type

Standard Workflows follow an exactly-once model, where your tasks and states are never run more than once, unless you have specified Retry behavior in ASL.

決済処理やEMRクラスター起動など、「2回実行されると困る処理」（非冪等な処理）に適している

At-least-once（少なくとも1回）

Express Workflow（非同期）の実行保証モデル。同じ処理が複数回実行される可能性がある

出典: Choosing workflow type

Express Workflows use an at-least-once model, so an execution could potentially run more than once. The at-least-once model makes Express Workflows better suited for orchestrating idempotent actions.

At-most-once（最大1回）

Express Workflow（同期）の実行保証モデル。例外発生時にワークフローは再実行されない

冪等性（Idempotency）

同じ操作を何度実行しても結果が変わらない性質のこと

操作	冪等性	理由
DynamoDB PutItem	冪等	同じキーで上書きされるだけ
決済処理	非冪等	2回実行すると2回課金される
S3 PutObject	冪等	同じキーで上書きされるだけ
メール送信	非冪等	2回実行すると2通届く

状態遷移（State Transition）

ワークフロー内で1つのステートが完了し、次のステートに移ることを指す。Standard Workflowの課金単位

┌──────┐    ┌──────┐    ┌──────┐
│State1│───▶│State2│───▶│State3│
└──────┘    └──────┘    └──────┘
       ↑          ↑
    遷移1回    遷移1回  = 合計2回の状態遷移

実行履歴（Execution History）

ワークフロー実行中の各ステートの入出力、タイムスタンプ、エラー情報などの記録

ワークフロータイプ	実行履歴の保存先	保持期間
Standard	Step Functions内部	90日間
Express	CloudWatch Logs	ログ設定による

出典: Choosing workflow type

Execution history data removed after 90 days. Execution history is not captured by Step Functions [for Express]. Logging must be enabled through Amazon CloudWatch Logs.

Express Workflowの2種類

タイプ	動作	ユースケース
Asynchronous（非同期）	開始確認を返してすぐ終了。結果はCloudWatch Logsで確認	メッセージング、他サービスに依存しないデータ処理
Synchronous（同期）	完了まで待機し、結果を返す	マイクロサービスのオーケストレーション、API Gateway連携

出典: Choosing workflow type

Synchronous Express Workflows start a workflow, wait until it completes, and then return the result.

サービス統合パターンの制限

パターン	Standard	Express
Request Response	○	○
Run a Job (.sync)	○	×
Wait for Callback (.waitForTaskToken)	○	×

Express Workflowでは .sync と .waitForTaskToken が使えない

ワークフロータイプは変更不可

一度作成したState Machineのワークフロータイプは後から変更できない

出典: Choosing workflow type

The workflow type can not be updated after you create a state machine.

ワークフロータイプ選択の指針

                    ┌─────────────────────────────┐
                    │ 実行時間が5分を超える？      │
                    └─────────────────────────────┘
                           │
              ┌────────────┴────────────┐
              ▼                         ▼
             Yes                        No
              │                         │
              ▼                         ▼
        ┌──────────┐          ┌─────────────────────┐
        │ Standard │          │ 高頻度実行が必要？   │
        └──────────┘          └─────────────────────┘
                                       │
                          ┌────────────┴────────────┐
                          ▼                         ▼
                         Yes                        No
                          │                         │
                          ▼                         ▼
                    ┌──────────┐              ┌──────────┐
                    │ Express  │              │ Standard │
                    └──────────┘              │（監査が  │
                          │                   │ 必要なら）│
                          ▼                   └──────────┘
                 ┌─────────────────┐
                 │ 結果を即座に    │
                 │ 受け取りたい？  │
                 └─────────────────┘
                          │
             ┌────────────┴────────────┐
             ▼                         ▼
            Yes                        No
             │                         │
             ▼                         ▼
      ┌────────────┐           ┌─────────────┐
      │ Synchronous│           │Asynchronous │
      │  Express   │           │   Express   │
      └────────────┘           └─────────────┘

Parallel State（並列状態）

複数の処理を同時に実行し、すべての処理が完了するまで待機するFlow State

出典: Parallel workflow state

The Parallel state (“Type”: “Parallel”) can be used to add separate branches of execution in your state machine.

基本的な動作

                    ┌─────────────────┐
                    │  Parallel State │
                    └────────┬────────┘
                             │
           ┌─────────────────┼─────────────────┐
           ▼                 ▼                 ▼
      ┌─────────┐       ┌─────────┐       ┌─────────┐
      │ Branch1 │       │ Branch2 │       │ Branch3 │
      │  Task   │       │  Task   │       │  Task   │
      └────┬────┘       └────┬────┘       └────┬────┘
           │                 │                 │
           └─────────────────┼─────────────────┘
                             │
                             ▼
                    ┌─────────────────┐
                    │   すべて完了後   │
                    │   次のStateへ   │
                    └─────────────────┘

出典: Parallel workflow state

A Parallel state causes AWS Step Functions to execute each branch, starting with the state named in that branch’s StartAt field, as concurrently as possible, and wait until all branches terminate (reach a terminal state) before processing the Parallel state’s Next field.

具体例

顧客情報を取得する際に、住所検索と電話番号検索を同時に実行:

{
  "LookupCustomerInfo": {
    "Type": "Parallel",
    "End": true,
    "Branches": [
      {
        "StartAt": "LookupAddress",
        "States": {
          "LookupAddress": {
            "Type": "Task",
            "Resource": "arn:aws:lambda:region:account-id:function:AddressFinder",
            "End": true
          }
        }
      },
      {
        "StartAt": "LookupPhone",
        "States": {
          "LookupPhone": {
            "Type": "Task",
            "Resource": "arn:aws:lambda:region:account-id:function:PhoneFinder",
            "End": true
          }
        }
      }
    ]
  }
}

入出力の特徴

入力: 各ブランチは同じ入力データのコピーを受け取る
出力: 各ブランチの出力が配列として結合される

入力: [3, 2]
         │
    ┌────┴────┐
    ▼         ▼
  Add      Subtract
   │          │
   5          1
   │          │
    └────┬────┘
         ▼
出力: [5, 1]  ← 配列として返る

エラーハンドリング

いずれかのブランチが失敗すると、Parallel State全体が失敗となり、他のブランチも停止される

出典: Parallel workflow state

If any branch fails, because of an unhandled error or by transitioning to a Fail state, the entire Parallel state is considered to have failed and all its branches are stopped.

Retry と Catch フィールドでエラー処理を定義可能

ブランチの制約

各ブランチは自己完結している必要がある
ブランチ内のStateから、ブランチ外のStateへ遷移することはできない
ブランチ外からブランチ内への遷移もできない

Parallel vs Map の違い

項目	Parallel	Map
用途	異なる処理を同時実行	同じ処理を複数データに適用
ブランチ数	定義時に固定	入力配列の要素数で動的に決定
入力	全ブランチに同じ入力	配列の各要素が各イテレーションの入力

Parallel: 住所検索 と 電話検索 を同時に
Map:      注文リスト の 各注文 に対して同じ処理を実行

odayakalife

エクスプローラー

Step Function

Step Functions とは

どういう時に使うのか

1. 複数のLambda関数を順番に実行したい時

2. 条件分岐が必要な時

3. 並列処理が必要な時

4. エラーハンドリングを堅牢にしたい時

5. 人間の承認ステップが必要な時

何が嬉しいのか

Lambdaオーケストレーターとの比較

具体的なメリット

Step Functionsを使わない方がいいケース

基本概念・用語集

核となる概念

図解: 用語の関係性

State（ステート）の種類

Task State（タスクステート）

Flow State（フローステート）

Amazon States Language（ASL）

データの流れに関する用語

クエリ言語

サービス統合パターン

ワークフロータイプ

Standard Workflow

Standard vs Express の比較

実行保証に関する用語

Exactly-once（厳密に1回）

At-least-once（少なくとも1回）

At-most-once（最大1回）

冪等性（Idempotency）

状態遷移（State Transition）

実行履歴（Execution History）

Express Workflowの2種類

サービス統合パターンの制限

ワークフロータイプは変更不可

ワークフロータイプ選択の指針

Parallel State（並列状態）

基本的な動作

具体例

入出力の特徴

エラーハンドリング

ブランチの制約

Parallel vs Map の違い

Recent writing

Amazon ECR

EC2 インスタンス上に開発環境を作成するテンプレート

Step Function

Kubernetes

Amazon EventBridge

目次