SSO-Mapped Service Principal Authentication
A comprehensive guide to the SSO-Mapped SPN authentication pattern, where users authenticate via SSO (Okta/OIDC) while Databricks API calls are made using Service Principal credentials. This pattern enables multi-tenant applications without requiring users to have Databricks accounts.
Overview
The SSO-Mapped SPN authentication pattern decouples user identity from Databricks API access. Users authenticate through your organization's Single Sign-On (SSO) provider, while all Databricks API calls are made using Service Principal (SPN) credentials stored and managed by the platform.
This architecture is ideal for building multi-tenant SaaS applications on Databricks where end users don't need—and shouldn't have—direct Databricks accounts.
Key Benefits
- No Databricks Accounts Required: End users authenticate via SSO only—no Databricks account provisioning needed
- Simplified Identity Management: Manage users in your SSO provider, not in Databricks
- Multi-Tenant Ready: Each organization gets isolated SPN credentials for data separation
- Centralized Access Control: Platform manages all Databricks API access through controlled SPNs
- Flexible Audit Options: Use organization-level or per-user SPN mapping for different audit granularity
When to Use This Pattern
This pattern is best suited for:
- Multi-tenant SaaS applications built on Databricks
- Embedded analytics platforms
- Customer-facing data products
- Applications where users shouldn't see Databricks UI
- Scenarios requiring abstracted data access layers
Architecture Overview
The following diagram shows the high-level architecture of the SSO-Mapped SPN pattern. Notice how user authentication (SSO) and API authentication (SPN) are completely separate concerns.
Architecture Layers
1. Authentication Layer
Users authenticate via SSO (Okta, Azure AD, or any OIDC provider). This establishes their identity and creates a session in the platform. No Databricks credentials are involved at this stage.
2. Identity Mapping Layer
After SSO authentication, the user selects an organization. The platform resolves the appropriate Service Principal credentials for that organization—either a shared organization-level SPN or a per-user SPN mapping.
3. API Proxy Layer
All Databricks API calls are proxied through the platform. The proxy obtains OAuth tokens using SPN credentials, caches them for performance, and makes authenticated API calls on behalf of users.
Identity Model
Understanding the data model is crucial for implementing this pattern correctly. The following entity-relationship diagram shows how users, organizations, sessions, and SPN credentials are related.
Key Entities
user
Core user identity from SSO authentication. The accountIdUserIdMapping field stores SCIM IDs for users who are also provisioned in Databricks accounts.
session
Tracks authenticated user sessions. The critical field is activeOrganizationId—this determines which organization's SPN credentials are used for API calls.
byodDatabricksSpns
Organization-level Service Principal credentials. Each organization has one or more SPNs configured. Credentials are encrypted at rest using AES-256-GCM.
userSpns
Optional per-user SPN mappings. When present, the user's individual SPN is used instead of the organization-level SPN, enabling per-user audit trails in Databricks.
byodDatabricksWorkspaces
Maps workspace URLs to SPNs. An organization can have multiple workspaces, each potentially using a different SPN for authentication.
Authentication Flow
The authentication flow has three distinct phases: SSO authentication (establishing user identity), organization selection (setting context), and Databricks API access (using SPN credentials).
Flow Phases Explained
Phase 1: SSO Authentication
User authenticates via Okta (or another OIDC provider) using standard OAuth 2.0 with PKCE. After successful authentication, a session is created in PostgreSQL and a secure HTTP-only cookie is set. At this point, the user is authenticated but has no organization context.
Phase 2: Organization Selection
User selects an organization they're a member of. The session's activeOrganizationId is updated to track this selection. This organization context determines which SPN credentials will be used for subsequent API calls.
Phase 3: Databricks API Access
When the user requests Databricks data, the platform retrieves the organization's SPN credentials, exchanges them for an OAuth access token using the M2M client_credentials flow, and makes the API call on behalf of the user.
Session Cookie Security
- HttpOnly: Prevents JavaScript access, blocking XSS attacks
- Secure: Only transmitted over HTTPS
- SameSite=Lax: Prevents CSRF while allowing navigation
- 30-day expiry: Balances security with user convenience
Service Principal OAuth (M2M)
Service Principal authentication uses the OAuth 2.0 client_credentials grant type, also known as Machine-to-Machine (M2M) authentication. This is the industry standard for server-to-server authentication.
Databricks OAuth Endpoints
Databricks provides two OAuth token endpoints depending on the scope of access required:
Workspace-Level Token
For accessing workspace-specific resources (notebooks, SQL warehouses, catalogs, etc.)
POST https://{workspace-url}/oidc/v1/tokenAccount-Level Token
For accessing account-wide resources (SCIM APIs, workspace management, etc.)
POST https://accounts.cloud.databricks.com/oidc/accounts/{account-id}/v1/tokenToken Request Format
# Workspace-level token request
curl --request POST \
--url "https://your-workspace.cloud.databricks.com/oidc/v1/token" \
--header "Content-Type: application/x-www-form-urlencoded" \
--header "Authorization: Basic $(echo -n 'CLIENT_ID:CLIENT_SECRET' | base64)" \
--data "grant_type=client_credentials&scope=all-apis"
# Response:
# {
# "access_token": "eyJraWQiOiJkYTA4...",
# "token_type": "Bearer",
# "expires_in": 3600
# }Token Caching
Access tokens are cached in memory with a TTL of expires_in - 60 seconds to prevent using expired tokens. The platform automatically refreshes tokens before they expire.
Using Databricks SDK
If you're integrating directly with Databricks, the SDK handles token management automatically:
from databricks.sdk import WorkspaceClient
# SDK handles token acquisition and refresh automatically
client = WorkspaceClient(
host="https://your-workspace.cloud.databricks.com",
client_id="your-spn-client-id",
client_secret="your-spn-client-secret"
)
# Make API calls - tokens are managed internally
clusters = client.clusters.list()
catalogs = client.catalogs.list()Organization vs User SPN Mapping
The platform supports two modes of SPN mapping, each with different tradeoffs for audit granularity and management complexity.
Organization-Level SPN (Default)
Shared SPN per Organization
Advantages
- Simpler setup - one SPN per organization
- Easier credential rotation
- Lower management overhead
- Faster onboarding for new users
Considerations
- Audit logs show SPN name, not individual users
- All users share the same permissions
- Cannot revoke access for individual users in Databricks
Per-User SPN Mapping (Optional)
Individual SPN per User
Advantages
- Per-user audit trails in Databricks
- Individual permission granularity
- Can revoke access for specific users
- Better compliance for regulated industries
Considerations
- More SPNs to manage in Databricks
- Manual SPN creation per user (currently)
- More complex credential rotation
- Higher setup overhead for new users
Choosing the Right Mode
Use Organization-Level SPN when:
- Audit at the organization level is sufficient
- Users have equivalent data access needs
- Rapid user onboarding is a priority
- Management simplicity is preferred
Use Per-User SPN Mapping when:
- Per-user audit trails are required
- Compliance requires individual accountability
- Users need different permission levels
- You need to revoke individual access quickly
Security Model
Security is paramount when dealing with authentication and API credentials. The SSO-Mapped SPN architecture implements multiple layers of security to protect user sessions and SPN credentials.
Security Layers
Trust Boundary 1: SSO Provider
User authentication is delegated to a trusted SSO provider (Okta, Azure AD, etc.). The platform never sees user passwords—only OIDC tokens after successful authentication.
Trust Boundary 2: FireFly Platform
- Session Cookies: HttpOnly, Secure, SameSite=Lax prevent common web attacks
- Encryption at Rest: SPN credentials encrypted with AES-256-GCM
- Key Management: Encryption key in environment variables, never in code
- Token Caching: In-memory only, never persisted
Trust Boundary 3: Databricks
Databricks validates SPN credentials and issues short-lived access tokens (1 hour). The platform automatically refreshes tokens before expiry without user interaction.
Credential Storage
SPN credentials (client_id and client_secret) are encrypted before storage using AES-256-GCM encryption:
Encryption Details
- Algorithm: AES-256-GCM (Galois/Counter Mode)
- Key Size: 256 bits (32 bytes)
- IV: Unique 12-byte initialization vector per encryption
- Auth Tag: 16-byte authentication tag for integrity verification
- Key Storage: Environment variable (ENCRYPTION_KEY)
Security Best Practices
Do
- Rotate SPN secrets periodically
- Use separate SPNs per organization
- Monitor for unusual API patterns
- Enable audit logging in Databricks
- Use HTTPS everywhere
- Keep dependencies updated
Don't
- Log SPN credentials or tokens
- Store credentials in code
- Send tokens to client-side JavaScript
- Share SPNs across organizations
- Skip encryption for "convenience"
- Ignore failed authentication attempts
Databricks Service Principal Setup
Setting up Service Principals in Databricks requires account admin privileges. The following diagram shows the complete setup process, including both manual Databricks steps and automated FireFly configuration.
Detailed Setup Steps
Step 1: Create Service Principal in Databricks
- Log in to Databricks Account Console (accounts.cloud.databricks.com)
- Navigate to User Management > Service Principals
- Click "Add service principal"
- Enter a descriptive name (e.g., "firefly-org-acme")
- Click Create
Step 2: Generate OAuth Secret
- Select the newly created Service Principal
- Go to the "OAuth secrets" tab
- Click "Generate a secret"
- Copy the Client ID and Secret immediately—the secret is shown only once
Important
The client secret is only displayed once. Store it securely before closing the dialog. If lost, you must generate a new secret.
Step 3: Assign Workspace Access
- In Account Console, go to Workspaces
- Select the target workspace
- Go to Permissions tab
- Add the Service Principal with appropriate role (User or Admin)
Step 4: Configure Unity Catalog Permissions
- Create or select an account-level group
- Add the Service Principal to the group
- Grant the group permissions on catalogs:
- USE CATALOG
- SELECT (for read access)
- MODIFY (for write access)
- CREATE SCHEMA (if needed)
Step 5: Configure in FireFly (Automated)
- Navigate to Settings > Bring Your Own Data in FireFly
- Click "Add Service Principal"
- Enter the Client ID and Client Secret
- Map the workspace URL to this SPN
- Click Validate to test the connection
- Configure storage settings (group, catalog)
API Reference
Key API endpoints for managing SPN credentials and workspace mappings:
Authentication Strategy Comparison
FireFly supports multiple authentication strategies. Understanding when to use each helps you choose the right approach for your use case.
Detailed Comparison
| Aspect | SSO-Mapped SPN | Databricks Identity | Custom Federation |
|---|---|---|---|
| User Authentication | SSO (Okta/OIDC) | Databricks OAuth | Your IDP |
| API Authentication | Service Principal | User's OAuth token | Federated tokens |
| Databricks Account Required | No | Yes | Yes (SCIM-synced) |
| Audit Granularity | SPN-level (or per-user if mapped) | Per-user | Per-user |
| Setup Complexity | Medium | Low | High |
| Best For | Multi-tenant SaaS apps | Single-tenant, direct access | Enterprise SSO integration |
Troubleshooting
Common issues and their solutions when implementing SSO-Mapped SPN authentication.
401 Unauthorized from Databricks
The SPN token request is failing.
- Verify Client ID and Secret are correct (no extra spaces)
- Check the SPN is assigned to the workspace
- Ensure the secret hasn't expired or been rotated
- Verify the workspace URL is correct
403 Forbidden on API Calls
Token is valid but permissions are insufficient.
- Check Unity Catalog permissions for the SPN's group
- Verify the SPN has the correct workspace role
- Ensure the target resources exist and are accessible
Session Not Found After SSO
User authenticates but session cookie isn't set.
- Check browser cookie settings allow the domain
- Verify HTTPS is being used (Secure cookie requires HTTPS)
- Check for SameSite issues with cross-origin requests
No Organizations Available
User logs in but sees no organizations to select.
- Verify the user has been added as a member to an organization
- Check the member record exists in the database
- Ensure the organization has SSO enabled
Key Takeaways
The SSO-Mapped SPN pattern provides a powerful way to build multi-tenant applications on Databricks without requiring users to have Databricks accounts. Here are the key points to remember:
Separation of Concerns
User identity (SSO) and API access (SPN) are completely decoupled
Multi-Tenant by Design
Each organization has isolated SPN credentials and workspace mappings
Flexible Audit Options
Choose between organization-level or per-user SPN mapping based on compliance needs
Security First
Credentials encrypted at rest, tokens never sent to clients, secure session management
Standard OAuth 2.0
Uses industry-standard M2M (client_credentials) flow for SPN authentication
Related Documentation
Learn more about how organizations and users are managed in the FireFly platform.