FireFly Analytics LogoFireFly Analytics
Solutions

Pipeline Editor

A visual, node-based interface for designing data pipelines. Drag and drop nodes, connect them with edges, and execute pipelines against Databricks Delta Live Tables.

Overview

The Pipeline Editor provides a no-code/low-code interface for building data pipelines. Users can visually design ETL workflows by dragging transformation nodes onto a canvas and connecting them to define data flow.

Pipelines are stored in the application database and can be executed against Databricks Delta Live Tables (DLT) for production workloads. The editor integrates with the Data Catalog for table selection and the SQL Editor for custom transformations.

Key Benefits

  • Visual pipeline design without writing code
  • Drag-and-drop node palette with common transformations
  • Real-time execution preview with sample data
  • Integration with Databricks Delta Live Tables
  • Pipeline sharing and collaboration
  • Version history and rollback (enhancement)

How It Works

The Pipeline Editor is built with React Flow for the visual canvas, Zustand for state management, and TanStack Query for persistence. Pipeline definitions are stored in PostgreSQL and can be executed via Databricks DLT APIs.

Architecture

Data Model

Pipelines are stored as JSON documents containing React Flow nodes and edges, along with metadata like name, description, and access controls.

-- Pipeline storage schema
CREATE TABLE pipelines (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  name VARCHAR(255) NOT NULL,
  description TEXT,
  nodes JSONB NOT NULL,           -- React Flow nodes
  edges JSONB NOT NULL,           -- React Flow edges
  organization_id INTEGER REFERENCES organizations(id),
  created_by_id INTEGER REFERENCES users(id),
  access VARCHAR(50) DEFAULT 'private',  -- private, organization, public
  created_at TIMESTAMP DEFAULT NOW(),
  updated_at TIMESTAMP DEFAULT NOW()
);

-- Example node structure
{
  "id": "source-1",
  "type": "source",
  "position": { "x": 100, "y": 100 },
  "data": {
    "label": "Customer Data",
    "catalog": "main",
    "schema": "sales",
    "table": "customers"
  }
}

User Experience

The Pipeline Editor features a multi-panel layout with a node palette, canvas, properties panel, and execution console.

Features

  • Drag-and-drop node palette with common data transformations
  • Visual canvas with zoom, pan, and minimap navigation
  • Properties panel for configuring selected nodes
  • Edge connections representing data flow between nodes
  • Execution console with live output and logs
  • Save, rename, and share pipelines
  • Parallel sampling for data preview at each stage
  • Undo/redo for design changes

Node Types

Source Nodes

  • Unity Catalog Table
  • Delta Sharing Table
  • File Upload (CSV, JSON)
  • External Database

Transform Nodes

  • Select/Project Columns
  • Filter Rows
  • Join Tables
  • Aggregate/Group By
  • Custom SQL

Sink Nodes

  • Delta Table
  • Unity Catalog Table
  • File Export
  • Webhook/API

Canvas Interactions

Mouse Controls

  • Drag from palette: Add new node
  • Click node: Select and show properties
  • Drag node: Reposition on canvas
  • Drag from handle: Create edge connection
  • Scroll wheel: Zoom in/out
  • Click + drag canvas: Pan view

Keyboard Shortcuts

  • Delete/Backspace: Remove selected node
  • Cmd+Z: Undo
  • Cmd+Shift+Z: Redo
  • Cmd+S: Save pipeline
  • Cmd+Enter: Run pipeline
  • Escape: Deselect

Backend Architecture

Pipeline definitions are persisted to PostgreSQL with Zustand managing the in-memory state. Execution is handled via Databricks Delta Live Tables APIs.

API Routes

RouteMethodDescription
/api/pipelinesGETList user's pipelines
/api/pipelinesPOSTCreate new pipeline
/api/pipelines/{id}GETLoad pipeline definition
/api/pipelines/{id}PUTSave pipeline definition
/api/pipelines/{id}/clonePOSTClone pipeline
/api/databricks/pipelines/{id}/startPOSTTrigger DLT execution
/api/databricks/pipelines/{id}/stopPOSTStop running pipeline

State Management

The Pipeline Editor uses Zustand for client-side state management, providing a simple API for updating nodes, edges, and selection state.

// Zustand store for pipeline state
import { create } from 'zustand';
import { addEdge, applyNodeChanges, applyEdgeChanges } from 'reactflow';

interface PipelineStore {
  nodes: Node[];
  edges: Edge[];
  selectedNode: Node | null;

  // Node operations
  addNode: (node: Node) => void;
  updateNodeData: (nodeId: string, data: Partial<NodeData>) => void;
  removeNode: (nodeId: string) => void;

  // Edge operations
  connectNodes: (connection: Connection) => void;
  removeEdge: (edgeId: string) => void;

  // Selection
  selectNode: (node: Node | null) => void;

  // Persistence
  loadPipeline: (pipeline: Pipeline) => void;
  savePipeline: () => Pipeline;
}

const usePipelineStore = create<PipelineStore>((set, get) => ({
  nodes: [],
  edges: [],
  selectedNode: null,

  addNode: (node) => set((state) => ({
    nodes: [...state.nodes, node]
  })),

  updateNodeData: (nodeId, data) => set((state) => ({
    nodes: state.nodes.map(n =>
      n.id === nodeId ? { ...n, data: { ...n.data, ...data } } : n
    )
  })),

  connectNodes: (connection) => set((state) => ({
    edges: addEdge(connection, state.edges)
  })),

  selectNode: (node) => set({ selectedNode: node }),

  loadPipeline: (pipeline) => set({
    nodes: pipeline.nodes,
    edges: pipeline.edges,
  }),

  savePipeline: () => ({
    nodes: get().nodes,
    edges: get().edges,
  }),
}));

Delta Live Tables Execution

When a pipeline is executed, the visual definition is converted to DLT code and submitted to Databricks for processing.

// Convert visual pipeline to DLT code
const convertToDLT = (nodes: Node[], edges: Edge[]): string => {
  const dltCode: string[] = [];

  // Process nodes in topological order
  const sortedNodes = topologicalSort(nodes, edges);

  for (const node of sortedNodes) {
    switch (node.type) {
      case 'source':
        dltCode.push(`
@dlt.table(name="${node.data.outputName}")
def ${node.id.replace('-', '_')}():
    return spark.table("${node.data.catalog}.${node.data.schema}.${node.data.table}")
`);
        break;

      case 'filter':
        const inputNode = getInputNode(node, edges);
        dltCode.push(`
@dlt.table(name="${node.data.outputName}")
def ${node.id.replace('-', '_')}():
    return dlt.read("${inputNode.data.outputName}").filter("${node.data.condition}")
`);
        break;

      // ... more node types
    }
  }

  return dltCode.join('\n');
};

Enhancement Opportunities

The Pipeline Editor can be extended with additional features to improve productivity and enable more advanced use cases.

Version History

Track changes to pipeline definitions over time with ability to view diffs, compare versions, and rollback to previous states.

Real-time Collaboration

Enable multiple users to edit pipelines simultaneously with cursors, presence indicators, and conflict resolution.

Pipeline Templates

Provide pre-built templates for common patterns like ETL, CDC, medallion architecture, and ML feature pipelines.

Execution Scheduling

Schedule pipeline runs with cron expressions, dependencies, and monitoring with alerts for failures.

Data Quality Rules

Add data quality expectations to nodes with automatic validation and alerting when constraints are violated.

Custom Node Types

Allow developers to create custom node types with custom UIs for organization-specific transformations.