FireFly Analytics LogoFireFly Analytics
Architecture / Authentication

Databricks Identity Authentication

A comprehensive guide to the Databricks Identity authentication architecture, covering OAuth flows, token management, session handling, and multi-organization support.

Overview

The Databricks Identity authentication system provides a flexible, secure way to authenticate users with Databricks workspaces and accounts. It supports two primary authentication modes:

  • Account OAuth: For administrative access to Databricks accounts, enabling workspace management and account-level operations
  • Workspace OAuth: For per-workspace authentication, providing direct access to specific Databricks workspaces

The system is built on top of Better-Auth, a modern authentication framework, and uses Postgres for data persistence. All OAuth tokens are encrypted at rest using AES-256-GCM encryption, and sessions are managed with secure, HTTP-only cookies.

Key Features

  • Dual OAuth flow support (Account and Workspace)
  • Multi-organization/multi-tenant architecture
  • Encrypted token storage with automatic refresh
  • Seamless organization switching without re-authentication
  • Session-based authentication with 30-day expiry
  • Per-user, per-workspace OAuth token isolation

High-Level Architecture

The following simplified diagram shows the high-level interaction between the application, Databricks platform, and various Databricks services. This provides a bird's eye view of how authentication flows through the system.

Note: This is a simplified overview. Detailed sequence diagrams for each flow are provided in subsequent sections.

Detailed Authentication Flow

The following diagram illustrates the complete authentication flow from initial login through session management and organization switching. Notice how both Account and Workspace OAuth flows converge into a unified session management system.

Database Schema

The authentication system uses Postgres with a carefully designed schema that supports multi-tenancy, multiple OAuth providers, and flexible workspace mapping. The schema is built with Drizzle ORM for type-safe database operations.

Entity Relationship Diagram

Core Tables

USER Table

Stores user profile information. A user can have multiple accounts (one per Databricks account or workspace) and belong to multiple organizations.

  • id: Unique user identifier (UUID)
  • email: User email address (unique)
  • name: User display name
  • emailVerified: Email verification status

ACCOUNT Table

Links users to their Databricks accounts or workspaces. Each account represents one OAuth connection to either a Databricks account or a specific workspace.

  • provider: Either "databricks-account" or "databricks-workspace"
  • accountId: Databricks account ID (for account OAuth)
  • workspaceId: Databricks workspace ID (for workspace OAuth)
  • workspaceUrl: Workspace URL (for workspace OAuth)

OAUTH_TOKEN Table

Stores encrypted OAuth tokens for each account. Tokens are encrypted using AES-256-GCM with a unique initialization vector (IV) per token. The encryption key is stored in environment variables.

  • accessToken: Encrypted access token
  • refreshToken: Encrypted refresh token
  • expiresAt: Token expiration timestamp
  • scope: OAuth scopes granted

SESSION Table

Manages user sessions. Each session is associated with a user and optionally tracks the active organization. Sessions are valid for 30 days by default and include security metadata.

  • token: Random session token (stored in cookie)
  • activeOrganizationId: Currently selected organization
  • ipAddress: Client IP for security auditing
  • userAgent: Client user agent for device tracking

ORGANIZATION & WORKSPACE Tables

Support multi-tenancy by organizing users into organizations, with each organization containing one or more Databricks workspaces. Users can be members of multiple organizations with different roles.

  • ORGANIZATION: Logical grouping of users and workspaces
  • ORGANIZATION_MEMBER: Junction table with role-based access
  • WORKSPACE: Maps Databricks workspaces to organizations

Account OAuth Flow

The Account OAuth flow is designed for users who need administrative access to Databricks accounts. This flow enables account-level operations such as workspace management, user provisioning, and account configuration.

When to Use Account OAuth

  • Managing multiple workspaces across an account
  • Provisioning users and service principals
  • Configuring account-level settings and policies
  • Accessing account console features

OAuth Flow Sequence

The Account OAuth flow follows the OAuth 2.0 authorization code flow with PKCE (Proof Key for Code Exchange) for enhanced security. The flow involves redirecting users to Databricks for authentication, exchanging the authorization code for tokens, and securely storing the encrypted tokens in the database.

Key Steps Explained

1. OAuth Initiation

Better-Auth generates a random state parameter and PKCE code verifier. The state is stored in a secure, HTTP-only cookie to prevent CSRF attacks. The code challenge (SHA-256 hash of the verifier) is sent to Databricks.

2. User Authentication

The user is redirected to Databricks Account Console for authentication. After successful login, Databricks redirects back with an authorization code.

3. Token Exchange

Better-Auth verifies the state parameter matches, then exchanges the authorization code for access and refresh tokens using the PKCE code verifier. This prevents authorization code interception attacks.

4. User Profile Fetching

Using the access token, the application fetches the user's profile from Databricks, including email, name, and account ID.

5. Database Operations

The user record is upserted (created or updated) in the USER table. An ACCOUNT record is created with provider set to "databricks-account". Tokens are encrypted using AES-256-GCM and stored in OAUTH_TOKEN table.

6. Session Creation

A new SESSION record is created with a random 32-byte token, 30-day expiration, and security metadata (IP address and user agent). The session token is set as an HTTP-only, Secure, SameSite=Lax cookie.

Security Features

  • PKCE prevents authorization code interception
  • State parameter prevents CSRF attacks
  • Tokens encrypted at rest with AES-256-GCM
  • HTTP-only cookies prevent XSS token theft
  • Secure flag ensures HTTPS-only transmission
  • IP and user agent tracking for anomaly detection

Workspace OAuth Flow

The Workspace OAuth flow enables users to authenticate directly with specific Databricks workspaces. This is the recommended approach for most users who need to access workspace features like notebooks, SQL queries, and data catalogs.

When to Use Workspace OAuth

  • Accessing notebooks and collaborative features
  • Running SQL queries and data analysis
  • Exploring Unity Catalog and data assets
  • Per-workspace scoped permissions

OAuth Flow Sequence

Similar to Account OAuth, the Workspace OAuth flow uses OAuth 2.0 with PKCE. However, it authenticates against a specific workspace's OAuth endpoint and stores workspace-scoped tokens.

Workspace Selection

Before initiating the OAuth flow, users must specify which workspace they want to authenticate with. This is done through a workspace selector that accepts the workspace URL (e.g., https://company.cloud.databricks.com).

Organization Mapping

During workspace authentication, the system automatically maps the workspace to an organization. This enables:

  • Multi-workspace organizations: A single organization can contain multiple workspaces (e.g., dev, staging, prod)
  • User workspace access: Users see only workspaces they've authenticated with
  • Organization-level switching: Switch between organizations without re-authenticating to individual workspaces

Token Scoping

Unlike Account OAuth tokens which provide broad account-level access, Workspace OAuth tokens are scoped to the specific workspace. This provides:

  • Enhanced security through principle of least privilege
  • Workspace-level permission enforcement
  • Isolated token compromise (breach of one token doesn't affect others)
  • Per-workspace token refresh and lifecycle management

ID Token Claims

Workspace OAuth returns an ID token (JWT) with claims that are decoded and stored:

  • sub: User ID within the workspace
  • email: User email address
  • name: User display name
  • workspace_id: Unique workspace identifier

Token Storage & Security

Token storage is a critical security component. Our architecture ensures that OAuth tokens are never exposed to the client and are encrypted at rest in the database.

Storage Architecture

Encryption Strategy

All OAuth tokens (both access and refresh tokens) are encrypted before storage using AES-256-GCM (Galois/Counter Mode), which provides both confidentiality and authenticity.

Encryption Algorithm: AES-256-GCM

  • Key Size: 256 bits (32 bytes)
  • Mode: Galois/Counter Mode (GCM)
  • IV: Unique 12-byte initialization vector per token
  • Auth Tag: 16-byte authentication tag for integrity

Key Management

  • Storage: Encryption key stored in environment variable
  • Rotation: Support for key rotation without downtime
  • Access: Only Better-Auth layer has access to encryption key
  • Never Logged: Plain-text tokens never appear in logs

Token Lifecycle

OAuth access tokens typically have a short lifetime (1-2 hours). Our system automatically handles token refresh:

  1. Application detects expired access token before API call
  2. Retrieves encrypted refresh token from database
  3. Decrypts refresh token using encryption service
  4. Exchanges refresh token for new access token with Databricks
  5. Encrypts and stores new tokens in database
  6. Updates expiration timestamp
  7. Proceeds with original API call using new token

Token Security Best Practices

  • Never log plain-text tokens
  • Never send tokens to client-side JavaScript
  • Always use server-side token handling
  • Rotate encryption keys periodically
  • Monitor for unusual token usage patterns
  • Implement token revocation on logout

Session Management

Session management is the backbone of user authentication. Every API request must include a valid session cookie, which is verified against the database to ensure the user is authenticated and has permission to perform the requested action.

Session Verification Flow

Session Validation Steps

Every authenticated request goes through a multi-step validation process to ensure security and proper access control:

1. Cookie Extraction

Extract the better-auth.session_token cookie from the request. If missing, immediately return 401 Unauthorized.

2. Session Lookup

Query the SESSION table using the token. The query includes joins to fetch related user data in a single database round-trip.

3. Expiration Check

Compare the session's expiresAt timestamp with the current time. Expired sessions are deleted and the request is rejected.

4. Organization Validation

If the session has an activeOrganizationId, verify that the user is still a member of that organization. If not, clear the active org and require selection.

5. Context Building

Build a complete session context object containing user details, organization membership, role, and available workspaces.

6. Token Retrieval (if needed)

If the request requires making API calls to Databricks, retrieve the appropriate OAuth token (Account or Workspace) based on the active organization context.

Session Context Object

After successful validation, a session context object is created and passed to the request handler. This object contains all necessary information for authorization decisions:

interface SessionContext {
  userId: string;
  email: string;
  name: string;
  organizationId: string | null;
  organizationSlug: string | null;
  organizationRole: "admin" | "member" | "viewer" | null;
  availableWorkspaces: Array<{
    workspaceId: string;
    workspaceUrl: string;
    workspaceName: string;
  }>;
  sessionMetadata: {
    createdAt: Date;
    expiresAt: Date;
    ipAddress: string;
    userAgent: string;
  };
}

Automatic Token Refresh

When a request needs to use an OAuth token to call Databricks APIs, the session management system automatically checks token expiration and refreshes if necessary:

  1. Token Validation: Check if the stored access token has expired by comparing expiresAt with current time
  2. Refresh Decision: If expired, retrieve and decrypt the refresh token
  3. Token Exchange: Call Databricks token endpoint with the refresh token to get new access and refresh tokens
  4. Token Update: Encrypt and store the new tokens, update expiration timestamp
  5. Proceed: Use the fresh access token for the original API call

Session Expiration vs Token Expiration

It's important to distinguish between these two concepts:

  • Session Expiration: 30 days, determines if user is logged in
  • Token Expiration: ~1-2 hours, determines if OAuth tokens need refresh
  • A valid session can have expired tokens (which are automatically refreshed)
  • An expired session requires the user to re-authenticate, even if tokens are valid

Activity Tracking

After successful request processing, the session's last activity timestamp is updated. This enables:

  • Session activity monitoring and anomaly detection
  • Automatic cleanup of inactive sessions
  • User activity analytics
  • Security auditing and compliance reporting

Organization Switching

One of the most powerful features of the authentication system is seamless organization switching. Users who are members of multiple organizations can switch between them without re-authenticating, making it easy to work across different teams or projects.

Organization Switching Flow

How Organization Switching Works

Organization switching is implemented as a simple session update that changes the activeOrganizationId field. This design provides several benefits:

No Re-authentication Required

Users stay logged in and simply change context

Instant Context Switch

Single database UPDATE operation, sub-50ms latency

Automatic Token Selection

Subsequent API calls automatically use the correct workspace tokens

Cache Invalidation

Next.js cache is automatically invalidated to fetch fresh org-specific data

Organization Switcher UI

The organization switcher is typically placed in the application header or navigation bar. It displays:

  • Current active organization (highlighted)
  • List of all organizations the user is a member of
  • User's role in each organization (admin, member, viewer)
  • Number of workspaces in each organization

Client-Side State Management

Organization switching triggers coordinated updates across multiple layers:

1. Server-Side Session Update

The API endpoint updates the session's activeOrganizationId in the database

2. Next.js Cache Invalidation

Server-side cache tags are revalidated using revalidateTag(), ensuring subsequent requests fetch fresh data

3. TanStack Query Cache Update

Client-side React Query cache is invalidated for organization-specific queries, triggering automatic refetch

4. React State Update

Local component state is updated to reflect the new active organization

5. Data Refresh

The UI automatically fetches and displays data for the new organization

Permission Verification

Before allowing an organization switch, the system verifies that the user is actually a member of the target organization by checking the ORGANIZATION_MEMBER table. This prevents unauthorized access through direct API calls.

Optimistic UI Updates

For the best user experience, the UI can implement optimistic updates:

  • Immediately update UI to show the new organization
  • Start loading state for new organization data
  • If the switch fails, revert to previous organization with error message
  • This provides instant feedback while the API call processes

Multi-Workspace Support

When a user switches to an organization that contains multiple workspaces, the system:

  1. Fetches all workspace accounts linked to the organization from the WORKSPACE table
  2. For each workspace, checks if the user has authenticated (has an ACCOUNT record)
  3. Displays only workspaces the user has access to in the workspace selector
  4. When the user makes API calls, automatically routes to the appropriate workspace based on context

Better-Auth Integration

Better-Auth is the authentication framework that powers the entire system. It provides a robust, type-safe abstraction over OAuth flows, session management, and token handling.

What Better-Auth Provides

OAuth Management

  • • PKCE flow implementation
  • • State parameter generation and verification
  • • Token exchange handling
  • • Provider configuration

Session Management

  • • Session creation and validation
  • • Cookie configuration and security
  • • Session expiration handling
  • • Multi-session support

Database Integration

  • • Drizzle ORM integration
  • • User and account management
  • • Type-safe database operations
  • • Migration support

Security Features

  • • CSRF protection
  • • XSS prevention
  • • Token encryption
  • • Secure cookie handling

Custom OAuth Providers

Better-Auth is extended with custom OAuth providers for both Databricks Account and Databricks Workspace authentication. Each provider is configured with:

Authorization Endpoint

URL where users are redirected for authentication

Account: https://accounts.cloud.databricks.com/oidc/v1/authorizeWorkspace: {workspace_url}/oidc/v1/authorize

Token Endpoint

URL for exchanging authorization codes for tokens

Account: https://accounts.cloud.databricks.com/oidc/v1/tokenWorkspace: {workspace_url}/oidc/v1/token

User Info Endpoint

URL for fetching authenticated user profile

Account: https://accounts.cloud.databricks.com/oidc/v1/userinfoWorkspace: {workspace_url}/oidc/v1/userinfo

Session Hooks and Middleware

Better-Auth provides hooks and middleware that allow custom logic to be injected at various points in the authentication flow:

  • onSignIn: Hook called after successful OAuth authentication, used to create organization memberships
  • onSessionVerify: Hook called on every session validation, used to attach organization context
  • onTokenRefresh: Hook called when OAuth tokens are refreshed, used for logging and monitoring
  • onSessionDelete: Hook called on logout, used to clean up resources and log activity

Postgres Database

Postgres provides the database that stores all authentication data. The system uses a serverless Postgres deployment that offers several advantages for this use case:

Serverless Architecture

Automatic scaling from zero to handle any load, with instant compute provisioning and pay-per-use pricing

Branching

Database branching for development and testing, enabling instant copies of production data without storage costs

Connection Pooling

Built-in connection pooling optimized for serverless functions, eliminating cold start connection overhead

High Availability

Automatic backups, point-in-time recovery, and 99.95% uptime SLA for production workloads

Drizzle ORM Integration

Drizzle ORM provides type-safe database access with excellent TypeScript integration. All database operations are:

  • Type-checked at compile time
  • Automatically validated against the schema
  • Optimized with query batching and caching
  • Easier to refactor with IDE support

Database Indexes

The schema includes carefully designed indexes to ensure fast query performance:

SESSION.token (unique)
Fast session lookup by cookie token
USER.email (unique)
Quick user lookup by email address
ACCOUNT.(userId, provider, workspaceId)
Fast workspace account lookup
ORGANIZATION_MEMBER.(userId, organizationId)
Efficient organization membership checks
WORKSPACE.organizationId
Quick workspace listing per organization

Connection Management

Next.js API routes are serverless functions that can scale to zero. Connection pooling is essential for:

  • Reusing database connections across function invocations
  • Avoiding connection limit exhaustion
  • Reducing latency from connection establishment
  • Supporting high concurrency without over-provisioning

Security Considerations

Security is paramount in an authentication system. Here are the key security measures implemented throughout the architecture:

OAuth PKCE

All OAuth flows use PKCE (Proof Key for Code Exchange) to prevent authorization code interception attacks. The code verifier is never sent to the client, only the challenge.

CSRF Protection

State parameters in OAuth flows prevent cross-site request forgery. State is stored in secure cookies and verified on callback.

Token Encryption at Rest

All OAuth tokens are encrypted using AES-256-GCM before storage. Even if the database is compromised, tokens remain secure.

HTTP-Only Cookies

Session tokens are stored in HTTP-only cookies, preventing JavaScript access and XSS-based token theft.

Server-Side Token Handling

OAuth tokens never reach the client. All Databricks API calls are proxied through Next.js API routes.

Automatic Token Refresh

Expired tokens are automatically refreshed server-side. Users never need to manually handle token expiration.

Security Best Practices

  • Always use HTTPS in production
  • Rotate encryption keys periodically
  • Implement rate limiting on authentication endpoints
  • Monitor for unusual authentication patterns
  • Keep dependencies updated to patch vulnerabilities
  • Use environment variables for all secrets
  • Enable database query logging for security audits
  • Implement IP allowlisting for admin operations

Conclusion

The Databricks Identity authentication architecture provides a robust, secure, and flexible foundation for building applications on top of Databricks. By combining Better-Auth's authentication framework with serverless Postgres, we achieve:

Security

Military-grade encryption, OAuth best practices, and comprehensive security measures at every layer

Flexibility

Support for both Account and Workspace OAuth, multi-organization architecture, and seamless switching

Scalability

Serverless architecture that scales from zero to millions of users without infrastructure management

The architecture is production-ready and battle-tested, powering applications that serve thousands of users across multiple organizations and workspaces. The separation of concerns, type-safe operations, and comprehensive error handling make it maintainable and reliable.

Ready to Build?

This authentication architecture is the foundation of FireFly Analytics. Explore our solutions to see it in action.

Explore Solutions