FireFly Analytics LogoFireFly Analytics
Architecture / Authentication

SSO-Mapped Service Principal Authentication

A comprehensive guide to the SSO-Mapped SPN authentication pattern, where users authenticate via SSO (Okta/OIDC) while Databricks API calls are made using Service Principal credentials. This pattern enables multi-tenant applications without requiring users to have Databricks accounts.

Overview

The SSO-Mapped SPN authentication pattern decouples user identity from Databricks API access. Users authenticate through your organization's Single Sign-On (SSO) provider, while all Databricks API calls are made using Service Principal (SPN) credentials stored and managed by the platform.

This architecture is ideal for building multi-tenant SaaS applications on Databricks where end users don't need—and shouldn't have—direct Databricks accounts.

Key Benefits

  • No Databricks Accounts Required: End users authenticate via SSO only—no Databricks account provisioning needed
  • Simplified Identity Management: Manage users in your SSO provider, not in Databricks
  • Multi-Tenant Ready: Each organization gets isolated SPN credentials for data separation
  • Centralized Access Control: Platform manages all Databricks API access through controlled SPNs
  • Flexible Audit Options: Use organization-level or per-user SPN mapping for different audit granularity

When to Use This Pattern

This pattern is best suited for:

  • Multi-tenant SaaS applications built on Databricks
  • Embedded analytics platforms
  • Customer-facing data products
  • Applications where users shouldn't see Databricks UI
  • Scenarios requiring abstracted data access layers

Architecture Overview

The following diagram shows the high-level architecture of the SSO-Mapped SPN pattern. Notice how user authentication (SSO) and API authentication (SPN) are completely separate concerns.

Architecture Layers

1. Authentication Layer

Users authenticate via SSO (Okta, Azure AD, or any OIDC provider). This establishes their identity and creates a session in the platform. No Databricks credentials are involved at this stage.

2. Identity Mapping Layer

After SSO authentication, the user selects an organization. The platform resolves the appropriate Service Principal credentials for that organization—either a shared organization-level SPN or a per-user SPN mapping.

3. API Proxy Layer

All Databricks API calls are proxied through the platform. The proxy obtains OAuth tokens using SPN credentials, caches them for performance, and makes authenticated API calls on behalf of users.

Identity Model

Understanding the data model is crucial for implementing this pattern correctly. The following entity-relationship diagram shows how users, organizations, sessions, and SPN credentials are related.

Key Entities

user

Core user identity from SSO authentication. The accountIdUserIdMapping field stores SCIM IDs for users who are also provisioned in Databricks accounts.

session

Tracks authenticated user sessions. The critical field is activeOrganizationId—this determines which organization's SPN credentials are used for API calls.

byodDatabricksSpns

Organization-level Service Principal credentials. Each organization has one or more SPNs configured. Credentials are encrypted at rest using AES-256-GCM.

userSpns

Optional per-user SPN mappings. When present, the user's individual SPN is used instead of the organization-level SPN, enabling per-user audit trails in Databricks.

byodDatabricksWorkspaces

Maps workspace URLs to SPNs. An organization can have multiple workspaces, each potentially using a different SPN for authentication.

Authentication Flow

The authentication flow has three distinct phases: SSO authentication (establishing user identity), organization selection (setting context), and Databricks API access (using SPN credentials).

Flow Phases Explained

Phase 1: SSO Authentication

User authenticates via Okta (or another OIDC provider) using standard OAuth 2.0 with PKCE. After successful authentication, a session is created in PostgreSQL and a secure HTTP-only cookie is set. At this point, the user is authenticated but has no organization context.

Phase 2: Organization Selection

User selects an organization they're a member of. The session's activeOrganizationId is updated to track this selection. This organization context determines which SPN credentials will be used for subsequent API calls.

Phase 3: Databricks API Access

When the user requests Databricks data, the platform retrieves the organization's SPN credentials, exchanges them for an OAuth access token using the M2M client_credentials flow, and makes the API call on behalf of the user.

Session Cookie Security

  • HttpOnly: Prevents JavaScript access, blocking XSS attacks
  • Secure: Only transmitted over HTTPS
  • SameSite=Lax: Prevents CSRF while allowing navigation
  • 30-day expiry: Balances security with user convenience

Service Principal OAuth (M2M)

Service Principal authentication uses the OAuth 2.0 client_credentials grant type, also known as Machine-to-Machine (M2M) authentication. This is the industry standard for server-to-server authentication.

Databricks OAuth Endpoints

Databricks provides two OAuth token endpoints depending on the scope of access required:

Workspace-Level Token

For accessing workspace-specific resources (notebooks, SQL warehouses, catalogs, etc.)

Endpoint
POST https://{workspace-url}/oidc/v1/token

Account-Level Token

For accessing account-wide resources (SCIM APIs, workspace management, etc.)

Endpoint
POST https://accounts.cloud.databricks.com/oidc/accounts/{account-id}/v1/token

Token Request Format

cURL Example
# Workspace-level token request
curl --request POST \
  --url "https://your-workspace.cloud.databricks.com/oidc/v1/token" \
  --header "Content-Type: application/x-www-form-urlencoded" \
  --header "Authorization: Basic $(echo -n 'CLIENT_ID:CLIENT_SECRET' | base64)" \
  --data "grant_type=client_credentials&scope=all-apis"

# Response:
# {
#   "access_token": "eyJraWQiOiJkYTA4...",
#   "token_type": "Bearer",
#   "expires_in": 3600
# }

Token Caching

Access tokens are cached in memory with a TTL of expires_in - 60 seconds to prevent using expired tokens. The platform automatically refreshes tokens before they expire.

Using Databricks SDK

If you're integrating directly with Databricks, the SDK handles token management automatically:

Python SDK Example
from databricks.sdk import WorkspaceClient

# SDK handles token acquisition and refresh automatically
client = WorkspaceClient(
    host="https://your-workspace.cloud.databricks.com",
    client_id="your-spn-client-id",
    client_secret="your-spn-client-secret"
)

# Make API calls - tokens are managed internally
clusters = client.clusters.list()
catalogs = client.catalogs.list()

Organization vs User SPN Mapping

The platform supports two modes of SPN mapping, each with different tradeoffs for audit granularity and management complexity.

Organization-Level SPN (Default)

Shared SPN per Organization

Advantages
  • Simpler setup - one SPN per organization
  • Easier credential rotation
  • Lower management overhead
  • Faster onboarding for new users
Considerations
  • Audit logs show SPN name, not individual users
  • All users share the same permissions
  • Cannot revoke access for individual users in Databricks

Per-User SPN Mapping (Optional)

Individual SPN per User

Advantages
  • Per-user audit trails in Databricks
  • Individual permission granularity
  • Can revoke access for specific users
  • Better compliance for regulated industries
Considerations
  • More SPNs to manage in Databricks
  • Manual SPN creation per user (currently)
  • More complex credential rotation
  • Higher setup overhead for new users

Choosing the Right Mode

Use Organization-Level SPN when:

  • Audit at the organization level is sufficient
  • Users have equivalent data access needs
  • Rapid user onboarding is a priority
  • Management simplicity is preferred

Use Per-User SPN Mapping when:

  • Per-user audit trails are required
  • Compliance requires individual accountability
  • Users need different permission levels
  • You need to revoke individual access quickly

Security Model

Security is paramount when dealing with authentication and API credentials. The SSO-Mapped SPN architecture implements multiple layers of security to protect user sessions and SPN credentials.

Security Layers

Trust Boundary 1: SSO Provider

User authentication is delegated to a trusted SSO provider (Okta, Azure AD, etc.). The platform never sees user passwords—only OIDC tokens after successful authentication.

Trust Boundary 2: FireFly Platform

  • Session Cookies: HttpOnly, Secure, SameSite=Lax prevent common web attacks
  • Encryption at Rest: SPN credentials encrypted with AES-256-GCM
  • Key Management: Encryption key in environment variables, never in code
  • Token Caching: In-memory only, never persisted

Trust Boundary 3: Databricks

Databricks validates SPN credentials and issues short-lived access tokens (1 hour). The platform automatically refreshes tokens before expiry without user interaction.

Credential Storage

SPN credentials (client_id and client_secret) are encrypted before storage using AES-256-GCM encryption:

Encryption Details

  • Algorithm: AES-256-GCM (Galois/Counter Mode)
  • Key Size: 256 bits (32 bytes)
  • IV: Unique 12-byte initialization vector per encryption
  • Auth Tag: 16-byte authentication tag for integrity verification
  • Key Storage: Environment variable (ENCRYPTION_KEY)

Security Best Practices

Do

  • Rotate SPN secrets periodically
  • Use separate SPNs per organization
  • Monitor for unusual API patterns
  • Enable audit logging in Databricks
  • Use HTTPS everywhere
  • Keep dependencies updated

Don't

  • Log SPN credentials or tokens
  • Store credentials in code
  • Send tokens to client-side JavaScript
  • Share SPNs across organizations
  • Skip encryption for "convenience"
  • Ignore failed authentication attempts

Databricks Service Principal Setup

Setting up Service Principals in Databricks requires account admin privileges. The following diagram shows the complete setup process, including both manual Databricks steps and automated FireFly configuration.

Detailed Setup Steps

Step 1: Create Service Principal in Databricks

  1. Log in to Databricks Account Console (accounts.cloud.databricks.com)
  2. Navigate to User Management > Service Principals
  3. Click "Add service principal"
  4. Enter a descriptive name (e.g., "firefly-org-acme")
  5. Click Create

Step 2: Generate OAuth Secret

  1. Select the newly created Service Principal
  2. Go to the "OAuth secrets" tab
  3. Click "Generate a secret"
  4. Copy the Client ID and Secret immediately—the secret is shown only once

Important

The client secret is only displayed once. Store it securely before closing the dialog. If lost, you must generate a new secret.

Step 3: Assign Workspace Access

  1. In Account Console, go to Workspaces
  2. Select the target workspace
  3. Go to Permissions tab
  4. Add the Service Principal with appropriate role (User or Admin)

Step 4: Configure Unity Catalog Permissions

  1. Create or select an account-level group
  2. Add the Service Principal to the group
  3. Grant the group permissions on catalogs:
    • USE CATALOG
    • SELECT (for read access)
    • MODIFY (for write access)
    • CREATE SCHEMA (if needed)

Step 5: Configure in FireFly (Automated)

  1. Navigate to Settings > Bring Your Own Data in FireFly
  2. Click "Add Service Principal"
  3. Enter the Client ID and Client Secret
  4. Map the workspace URL to this SPN
  5. Click Validate to test the connection
  6. Configure storage settings (group, catalog)

API Reference

Key API endpoints for managing SPN credentials and workspace mappings:

GET /api/sso-spn/byod/databricks/spns - List configured SPNs
POST /api/sso-spn/byod/databricks/spns - Add new SPN credentials
GET /api/sso-spn/byod/databricks/workspaces - List workspace mappings
POST /api/sso-spn/byod/databricks/workspaces - Map workspace to SPN
POST /api/sso-spn/byod/databricks/workspaces/validate - Test SPN connection

Authentication Strategy Comparison

FireFly supports multiple authentication strategies. Understanding when to use each helps you choose the right approach for your use case.

Detailed Comparison

AspectSSO-Mapped SPNDatabricks IdentityCustom Federation
User AuthenticationSSO (Okta/OIDC)Databricks OAuthYour IDP
API AuthenticationService PrincipalUser's OAuth tokenFederated tokens
Databricks Account RequiredNoYesYes (SCIM-synced)
Audit GranularitySPN-level (or per-user if mapped)Per-userPer-user
Setup ComplexityMediumLowHigh
Best ForMulti-tenant SaaS appsSingle-tenant, direct accessEnterprise SSO integration

Troubleshooting

Common issues and their solutions when implementing SSO-Mapped SPN authentication.

401 Unauthorized from Databricks

The SPN token request is failing.

  • Verify Client ID and Secret are correct (no extra spaces)
  • Check the SPN is assigned to the workspace
  • Ensure the secret hasn't expired or been rotated
  • Verify the workspace URL is correct

403 Forbidden on API Calls

Token is valid but permissions are insufficient.

  • Check Unity Catalog permissions for the SPN's group
  • Verify the SPN has the correct workspace role
  • Ensure the target resources exist and are accessible

Session Not Found After SSO

User authenticates but session cookie isn't set.

  • Check browser cookie settings allow the domain
  • Verify HTTPS is being used (Secure cookie requires HTTPS)
  • Check for SameSite issues with cross-origin requests

No Organizations Available

User logs in but sees no organizations to select.

  • Verify the user has been added as a member to an organization
  • Check the member record exists in the database
  • Ensure the organization has SSO enabled

Key Takeaways

The SSO-Mapped SPN pattern provides a powerful way to build multi-tenant applications on Databricks without requiring users to have Databricks accounts. Here are the key points to remember:

1

Separation of Concerns

User identity (SSO) and API access (SPN) are completely decoupled

2

Multi-Tenant by Design

Each organization has isolated SPN credentials and workspace mappings

3

Flexible Audit Options

Choose between organization-level or per-user SPN mapping based on compliance needs

4

Security First

Credentials encrypted at rest, tokens never sent to clients, secure session management

5

Standard OAuth 2.0

Uses industry-standard M2M (client_credentials) flow for SPN authentication

Related Documentation

Learn more about how organizations and users are managed in the FireFly platform.