Architecture / Scalability

Scalability Architecture

A comprehensive guide to how FireFly Analytics scales to handle growing workloads, from application tier auto-scaling to Databricks Serverless SQL and isolated compute environments.

Overview

FireFly Analytics is designed to scale seamlessly from a handful of users to thousands, leveraging cloud-native patterns and Databricks' elastic compute capabilities. The platform scales at multiple levels: application tier, database tier, and compute tier.

This document covers the scalability architecture, including auto-scaling patterns, serverless compute, workspace isolation, and Databricks Apps that enable high performance at any scale.

Scalability Highlights

Application auto-scaling: Next.js and Go proxy scale horizontally based on demand
Serverless SQL: Databricks warehouses scale from 0 to N clusters automatically (nodes per cluster are fixed)
Isolated compute: Databricks Apps run in containerized, isolated environments (2 vCPU, 6GB RAM)
Workspace per organization: Each organization is configured with its own Databricks workspace by default
Pay-per-use: Serverless architecture means you only pay for what you use

Scaling Architecture Overview

The following diagram shows how FireFly scales across all tiers, from user traffic through application processing to Databricks compute:

Application Tier

Next.js and Go proxy auto-scale based on CPU, memory, and request rate metrics.

Data Tier

PostgreSQL scales horizontally with read replicas for session and configuration data.

Databricks Tier

Serverless SQL warehouses scale vertically and horizontally. Containerized apps provide isolation between users for the code and notebook editors.

Application Tier Scaling

The application tier consists of Next.js API routes and Go proxy servers, both designed for horizontal scaling with zero shared state.

Scaling Architecture

Next.js Application Scaling

Next.js can be deployed in multiple modes, each with different scaling characteristics:

Serverless Mode

Deploy as serverless functions (Vercel, AWS Lambda, Cloud Functions)

Auto-scales to zero when idle
Instant scaling on traffic spikes
Pay-per-invocation pricing
Cold start latency (~100-500ms)

Container Mode

Deploy as containers (Kubernetes, ECS, Cloud Run)

Horizontal Pod Autoscaler (HPA)
Predictable performance
No cold starts with warm pods
More control over resources

Stateless Design

Next.js instances are completely stateless, enabling seamless horizontal scaling:

No in-memory sessions - all sessions stored in PostgreSQL
No shared state between instances
Any instance can handle any request
Load balancer distributes traffic evenly

Go Proxy Scaling

The Go proxy is optimized for high concurrency and low resource usage:

Resource Efficiency

Binary size: ~15MB
Memory: ~50MB per instance
Startup time: <1 second

Concurrency

10,000+ concurrent connections
Goroutines for parallelism
Efficient WebSocket handling

Performance

Token decrypt: <1ms
Request latency: <5ms overhead
Low GC pause times (<1ms)

Kubernetes HPA Configuration

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: go-proxy-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: go-proxy
  minReplicas: 2
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 30
      policies:
      - type: Pods
        value: 4
        periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300

Databricks Serverless SQL

Databricks Serverless SQL Warehouses provide elastic compute that automatically scales based on query workload. This is the recommended compute option for FireFly Analytics.

Serverless SQL Architecture

Key Features

Cluster vs Node Scaling

Serverless SQL scales by adding more clusters, not by adding nodes to existing clusters. Each warehouse size has a fixed number of nodes per cluster. When query load increases, Databricks spins up additional clusters to handle concurrent queries.

Cluster Auto-Scaling

Warehouses scale clusters automatically based on query load:

Scale to zero: No charges when idle (0 clusters)
Instant scale-up: ~5 second cold start for new cluster
Parallel queries: Multiple clusters for concurrent users
Workload isolation: Heavy queries get dedicated clusters

Cost Optimization

Pay only for the compute you actually use:

Per-second billing: Charges stop when queries complete
No idle costs: Zero charges when scaled to zero
Shared infrastructure: Databricks manages underlying clusters
Predictable performance: SLA-backed query latency

Warehouse Sizing

Choose the right warehouse size based on your workload:

2X-Small

Light queries

Metadata, LIMIT queries

Small

Standard BI

Dashboards, reports

Medium

Analytics

Complex joins, aggregations

Large+

Heavy workloads

Full table scans, ML prep

Query Performance Tips

Use LIMIT for Previews

When previewing data, always use LIMIT to avoid scanning entire tables. FireFly automatically applies LIMIT 1000 for data previews.

Leverage Delta Lake Caching

Delta Lake automatically caches frequently accessed data. Repeated queries on the same tables benefit from cached results.

Filter Early

Apply WHERE clauses as early as possible in your queries. Predicates on partition columns are especially efficient.

Select Only Needed Columns

Avoid SELECT * when possible. Selecting only needed columns reduces data scanned and improves query performance.

Workspace Scaling & Isolation

By default, FireFly is designed to configure each organization with its own dedicated Databricks workspace. This provides the strongest isolation guarantees and simplifies access control management.

Default: One Workspace Per Organization

FireFly's default architecture maps each organization to its own Databricks workspace. This design provides:

Complete isolation: No risk of data leakage between organizations
Simple auditing: All activity in a workspace belongs to one org
Independent scaling: Each org's compute is fully separate
Clear billing: Costs are naturally separated by workspace

Multi-Org Per Workspace (Advanced)

FireFly can be modified to support multiple organizations sharing the same workspace, but this requires significant additional safeguards:

Rigorous Unity Catalog permissions: Catalog-level grants must be carefully managed per SPN
Enhanced auditing: Additional logging to track which org accessed what data
Code review practices: All changes must be reviewed for potential cross-org data leakage
SPN isolation: Each org must still have its own SPN with strict permission boundaries
Regular security audits: Periodic reviews to ensure no permission drift

This configuration is not recommended unless you have specific requirements that necessitate shared workspace infrastructure.

Multi-Workspace Architecture

Common Workspace Patterns

Environment Separation

Separate workspaces for different environments:

Production: Business-critical data access
Staging: Pre-production testing
Development: Experimentation and feature dev

Geographic Distribution

Workspaces in different regions for:

Data residency: GDPR, data sovereignty
Latency: Users closer to data
Disaster recovery: Cross-region redundancy

Team Isolation

Separate workspaces for different teams:

Cost allocation: Chargeback by workspace
Access control: Team-specific permissions
Resource limits: Per-workspace quotas

Workload Isolation

Separate workspaces for different workloads:

ETL: Heavy batch processing
BI/Analytics: Interactive queries
ML/AI: GPU-intensive workloads

Workspace Isolation Guarantees

Each Databricks workspace provides strong isolation:

Network isolation: Separate VPC/VNet per workspace (optional)
Compute isolation: Separate clusters and warehouses
Storage isolation: Separate managed storage
Identity isolation: Separate user/SPN namespaces

Databricks Apps Isolation

Databricks Apps (like the VSCode code editor) run in isolated containers, providing secure compute for interactive workloads. FireFly embeds these apps using the Go proxy for transparent authentication.

Fixed Container Resources

All Databricks Apps run with fixed, standardized resources:

2 vCPU

Processing power

6 GB RAM

Memory allocation

Custom resource configurations are not currently supported by Databricks Apps.

Container Isolation Architecture

Isolation Features

Process Isolation

Each app instance runs in its own container with:

Separate PID namespace (processes isolated)
Fixed resource limits (2 vCPU, 6GB RAM)
No access to other containers or host system

Network Isolation

Network access is strictly controlled:

Sandboxed network namespace
Outbound access only to approved endpoints
No inbound connections except through proxy

Ephemeral Storage

Storage is ephemeral - all data is lost on container restart:

Filesystem cleared on every restart or timeout
No persistent storage available
Files read from Unity Catalog volumes (read-only)
Local changes exist only during active session

Data Loss Warning

Important: Any files created or modified within a Databricks App are lost when the container restarts. Users should save important work to Unity Catalog volumes or external storage before ending their session.

Future Improvement: Volume Sync

A planned enhancement is to implement automatic file synchronization with Databricks Volumes:

Auto-backup workspace files to user's Unity Catalog volume
Restore files on container startup
Periodic sync during active sessions
Versioned backups for recovery

Embedded Apps Use Cases

Code Editor

VSCode-based editor for notebooks, Python, SQL with full IDE features (IntelliSense, debugging, Git).

Notebook Viewer

Read-only notebook rendering for viewing outputs and visualizations without execution capability.

Custom Apps

Build custom Databricks Apps for specialized workflows (data quality tools, ML experiments, dashboards).

Performance Monitoring

Effective scaling requires visibility into system performance. FireFly recommends monitoring these key metrics:

Application Metrics

Request rate (req/sec)
Response time (P50, P95, P99)
Error rate (4xx, 5xx)
Instance count (scaling)
CPU/memory utilization

Database Metrics

Query duration
Connection pool usage
Active connections
Rows read/written
Replication lag

Databricks Metrics

Warehouse uptime/utilization
Query queue depth
Query duration by type
SPN token refresh rate
API error rate

User Experience

Time to first byte (TTFB)
Largest contentful paint (LCP)
Page load time
Query completion time
Error page views

Conclusion

FireFly Analytics is designed for scalability at every layer. By combining auto-scaling application infrastructure, Databricks Serverless SQL, and workspace-per-organization isolation, the platform can grow seamlessly from small teams to enterprise deployments.

Horizontal Scaling

Stateless Next.js and Go proxy instances scale horizontally based on demand with zero manual intervention.

Elastic Compute

Databricks Serverless SQL scales from zero to handle any query workload with automatic resource management.

Cost Efficiency

Pay-per-use pricing and scale-to-zero capabilities ensure you only pay for what you actually use.

Explore More

Learn about other aspects of the FireFly Analytics architecture.

Security Docs Request Flow