FireFly Analytics LogoFireFly Analytics
Architecture / Request Flow

Request Flow Architecture

A comprehensive guide to how data flows through FireFly Analytics using Service Principal (SPN) authentication. Users authenticate via any OAuth 2.0/OIDC provider (Okta in our case), while all Databricks API calls use organization-specific Service Principals.

Overview

FireFly Analytics acts as a secure intermediary between users and the Databricks platform. The architecture uses a two-layer authentication model: users authenticate via any OAuth 2.0/OIDC provider (Okta in FireFly's case), while all Databricks API calls use organization-specific Service Principals (SPN).

This document describes the complete request lifecycle, including user session validation, SPN token management, and API proxying to various Databricks services.

OAuth 2.0 / OIDC Compatible

While this documentation references Okta as the identity provider, the architecture is designed to work with any OAuth 2.0 or OIDC-compliant provider including Azure AD, Auth0, Google, Keycloak, or custom OIDC servers. The user authentication layer is decoupled from the Databricks SPN authentication, making it easy to swap providers.

SSO-SPN Authentication Model

Users authenticate via OAuth 2.0/OIDC (e.g., Okta), but Databricks operations use Service Principals:

  • Users don't need individual Databricks accounts
  • Centralized permission management via SPN per organization
  • Clear audit trail with organization-level access control
  • Simplified onboarding - just add users to your identity provider

Key Components

  • FireFly Frontend: React/Next.js application with TanStack Query for data fetching
  • Next.js API Routes: Server-side endpoints that handle all Databricks communication
  • Better-Auth: Authentication framework managing user sessions
  • PostgreSQL: Persistent storage for sessions, SPN credentials, and application data
  • OAuth 2.0/OIDC Provider: Any compliant IDP for user authentication (Okta, Azure AD, Auth0, etc.)
  • Service Principal: Databricks service principal for API authentication (per organization)
  • Databricks APIs: Unity Catalog, SQL, DBFS, and other platform services

High-Level Architecture

The following diagram shows the high-level request flow through all system components. Notice how the frontend never directly communicates with Databricks - all requests are proxied through the Next.js backend.

Request Flow Steps:

1

API Request: Frontend sends request to Next.js API route

2

Validate User Session: Better-Auth validates the session cookie

3

Lookup Session: Query PostgreSQL for session and user data

4

Verify OIDC Token: Validate user's SSO token if needed

5

Get SPN Credentials: Retrieve organization's Service Principal config

6

Read Encrypted SPN: Get encrypted SPN credentials from database

7

Get/Refresh SPN Token: Exchange credentials for Databricks access token

8

Make API Call: Call Databricks API with SPN bearer token

9

Response: Databricks returns data (Unity Catalog, SQL, etc.)

10

Return Data: API route returns response to frontend

User Auth (OIDC)
SPN Token
Databricks API
Response

API Request Flow

Every API request from the frontend goes through a standardized flow that handles authentication, token management, and error handling. This ensures consistent security and user experience across all operations.

Request Sequence Diagram

The following sequence diagram shows the detailed flow of a typical API request, including user session validation via your OIDC provider and SPN token retrieval for Databricks API calls.

Request Phases

1. User Session Validation Phase

Every request begins with user session validation. The session cookie is extracted and verified against the database. The user's OIDC token is validated if needed. Invalid or expired sessions redirect to the identity provider login.

2. SPN Token Retrieval Phase

Once the user session is validated, the organization's Service Principal credentials are retrieved from the database. If the SPN access token is missing or expired, a new token is obtained from Databricks using the client_credentials grant and cached for future requests.

3. Databricks API Call Phase

With a valid SPN access token, the request is forwarded to the appropriate Databricks API. The SPN's permissions determine what data the user can access.

4. Response Handling Phase

The API response is transformed as needed and sent back to the frontend. TanStack Query caches the response client-side for subsequent requests.

Unity Catalog API Flow

Unity Catalog operations are among the most common API calls in FireFly Analytics. Users browse catalogs, schemas, tables, and preview data - all through a consistent request pattern.

Unity Catalog Operations

  • List Catalogs: GET /api/2.1/unity-catalog/catalogs
  • List Schemas: GET /api/2.1/unity-catalog/schemas
  • List Tables: GET /api/2.1/unity-catalog/tables
  • Get Table Details: GET /api/2.1/unity-catalog/tables/{full_name}
  • Preview Data: Uses Statement Execution API for samples

Catalog Browsing Sequence

This diagram shows the complete flow for browsing the Unity Catalog hierarchy, from listing catalogs to previewing table data.

Catalog Caching Strategy

Catalog metadata is cached at multiple levels to improve performance:

Server-Side Caching

  • unstable_cache with catalog tags
  • Revalidated on schema changes
  • Shared across all users

Client-Side Caching

  • TanStack Query with staleTime
  • Refetch on window focus
  • Per-user cache isolation

File Upload Flow

File uploads to Databricks (DBFS or Unity Catalog Volumes) follow a streaming pattern that handles both small and large files efficiently without overwhelming server memory.

Upload Considerations

  • Small files (<10MB): Single PUT request
  • Large files (>10MB): Chunked upload with progress tracking
  • All uploads are streamed to avoid memory issues
  • Upload metadata stored in PostgreSQL for audit trail

Upload Sequence Diagram

The following diagram shows both small and large file upload patterns, including chunked uploads for large datasets.

Service Principal Authentication

FireFly Analytics uses a two-layer authentication model: users authenticate via any OAuth 2.0/OIDC provider (Okta in our case), while all Databricks operations use organization-specific Service Principals. This separation provides maximum security and flexibility.

Benefits of SSO-SPN Architecture

  • No Databricks accounts needed: Users only need their OIDC provider credentials
  • Centralized permissions: SPN permissions apply to all org users
  • Clear audit trail: All API calls traced to organization SPN
  • Simplified management: One SPN per organization to manage
  • Consistent access: All users in an org have same Databricks access
  • Easy onboarding: Add user to your identity provider, they immediately have access

SPN Authentication Sequence

This diagram shows the complete flow: user authentication via your OIDC provider is validated first, then the organization's Service Principal token is used for all Databricks API calls.

SPN Token Caching

Service Principal tokens are cached to avoid unnecessary token exchanges. The caching strategy includes:

  • In-memory cache: Fast lookup for active requests
  • Database backup: Encrypted tokens stored in PostgreSQL
  • Proactive refresh: Tokens refreshed 5 minutes before expiry
  • Per-organization isolation: Each org has its own SPN token

SQL Execution Flow

SQL query execution uses the Databricks Statement Execution API, which supports both synchronous (short queries) and asynchronous (long-running queries) execution patterns.

SQL Execution Sequence

The following diagram shows the complete SQL execution flow, including handling for long-running queries with polling.

Execution Modes

Synchronous Mode

For queries completing within 50 seconds

  • Single request-response
  • Results returned immediately
  • Simpler client implementation

Asynchronous Mode

For long-running analytical queries

  • Statement ID returned immediately
  • Client polls for status/results
  • Supports query cancellation

Database Interactions

PostgreSQL serves as the central data store for all authentication and application data. Understanding the database interaction patterns is crucial for performance optimization.

Database Schema Overview

Complete Request Lifecycle

This comprehensive diagram shows the complete lifecycle of a request from user action to rendered response, including Okta user validation, SPN token management, and caching layers.

Lifecycle Summary

1

User Action

User interacts with the FireFly UI (click, form submit, etc.)

2

User Session Validation (OIDC)

Session cookie verified against PostgreSQL, OIDC token validated if needed

3

SPN Token Retrieval

Organization's Service Principal token retrieved from cache or refreshed via Databricks OAuth

4

Databricks API Call

Request proxied to Databricks with SPN bearer token

5

Response & Caching

Response cached (server and client), UI updated

Error Handling

Errors can occur at any stage of the request flow. The system implements consistent error handling to provide meaningful feedback to users while protecting sensitive information.

Authentication Errors

  • 401 Unauthorized: Session invalid or expired - redirect to login
  • 403 Forbidden: User lacks permission - show access denied
  • Token Refresh Failed: Clear session, force re-authentication

Databricks API Errors

  • 400 Bad Request: Invalid query syntax - show error message
  • 404 Not Found: Resource doesn't exist - show helpful message
  • 429 Rate Limited: Implement backoff and retry
  • 500+ Server Error: Show generic error, log details

Client-Side Error Handling

  • TanStack Query automatic retries (3 attempts by default)
  • Error boundaries for component-level failures
  • Toast notifications for transient errors
  • Error pages for unrecoverable failures

Conclusion

The SSO-SPN request flow architecture ensures secure, performant, and reliable communication between FireFly Analytics and Databricks. Every request follows these 4 high-level steps:

1

User Session Validation

Validate session cookie and verify user's OIDC token. Users authenticate via your identity provider - no Databricks account needed.

2

SPN Token Retrieval

Get organization's Service Principal credentials from database. Refresh SPN token via Databricks OAuth if expired.

3

Databricks API Call

Make API request to Databricks with SPN bearer token. Access Unity Catalog, SQL, DBFS, and other services.

4

Response & Caching

Return data to frontend with server and client caching. TanStack Query manages client-side cache invalidation.

Explore More

Learn about other aspects of the FireFly Analytics architecture.