Scalability Architecture
A comprehensive guide to how FireFly Analytics scales to handle growing workloads, from application tier auto-scaling to Databricks Serverless SQL and isolated compute environments.
Overview
FireFly Analytics is designed to scale seamlessly from a handful of users to thousands, leveraging cloud-native patterns and Databricks' elastic compute capabilities. The platform scales at multiple levels: application tier, database tier, and compute tier.
This document covers the scalability architecture, including auto-scaling patterns, serverless compute, workspace isolation, and Databricks Apps that enable high performance at any scale.
Scalability Highlights
- Application auto-scaling: Next.js and Go proxy scale horizontally based on demand
- Serverless SQL: Databricks warehouses scale from 0 to N clusters automatically (nodes per cluster are fixed)
- Isolated compute: Databricks Apps run in containerized, isolated environments (2 vCPU, 6GB RAM)
- Workspace per organization: Each organization is configured with its own Databricks workspace by default
- Pay-per-use: Serverless architecture means you only pay for what you use
Scaling Architecture Overview
The following diagram shows how FireFly scales across all tiers, from user traffic through application processing to Databricks compute:
Application Tier
Next.js and Go proxy auto-scale based on CPU, memory, and request rate metrics.
Data Tier
PostgreSQL scales horizontally with read replicas for session and configuration data.
Databricks Tier
Serverless SQL warehouses scale vertically and horizontally. Containerized apps provide isolation between users for the code and notebook editors.
Application Tier Scaling
The application tier consists of Next.js API routes and Go proxy servers, both designed for horizontal scaling with zero shared state.
Scaling Architecture
Next.js Application Scaling
Next.js can be deployed in multiple modes, each with different scaling characteristics:
Serverless Mode
Deploy as serverless functions (Vercel, AWS Lambda, Cloud Functions)
- Auto-scales to zero when idle
- Instant scaling on traffic spikes
- Pay-per-invocation pricing
- Cold start latency (~100-500ms)
Container Mode
Deploy as containers (Kubernetes, ECS, Cloud Run)
- Horizontal Pod Autoscaler (HPA)
- Predictable performance
- No cold starts with warm pods
- More control over resources
Stateless Design
Next.js instances are completely stateless, enabling seamless horizontal scaling:
- No in-memory sessions - all sessions stored in PostgreSQL
- No shared state between instances
- Any instance can handle any request
- Load balancer distributes traffic evenly
Go Proxy Scaling
The Go proxy is optimized for high concurrency and low resource usage:
Resource Efficiency
- Binary size: ~15MB
- Memory: ~50MB per instance
- Startup time: <1 second
Concurrency
- 10,000+ concurrent connections
- Goroutines for parallelism
- Efficient WebSocket handling
Performance
- Token decrypt: <1ms
- Request latency: <5ms overhead
- Low GC pause times (<1ms)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: go-proxy-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: go-proxy
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleUp:
stabilizationWindowSeconds: 30
policies:
- type: Pods
value: 4
periodSeconds: 60
scaleDown:
stabilizationWindowSeconds: 300Databricks Serverless SQL
Databricks Serverless SQL Warehouses provide elastic compute that automatically scales based on query workload. This is the recommended compute option for FireFly Analytics.
Serverless SQL Architecture
Key Features
Cluster vs Node Scaling
Serverless SQL scales by adding more clusters, not by adding nodes to existing clusters. Each warehouse size has a fixed number of nodes per cluster. When query load increases, Databricks spins up additional clusters to handle concurrent queries.
Cluster Auto-Scaling
Warehouses scale clusters automatically based on query load:
- Scale to zero: No charges when idle (0 clusters)
- Instant scale-up: ~5 second cold start for new cluster
- Parallel queries: Multiple clusters for concurrent users
- Workload isolation: Heavy queries get dedicated clusters
Cost Optimization
Pay only for the compute you actually use:
- Per-second billing: Charges stop when queries complete
- No idle costs: Zero charges when scaled to zero
- Shared infrastructure: Databricks manages underlying clusters
- Predictable performance: SLA-backed query latency
Warehouse Sizing
Choose the right warehouse size based on your workload:
Query Performance Tips
Use LIMIT for Previews
When previewing data, always use LIMIT to avoid scanning entire tables. FireFly automatically applies LIMIT 1000 for data previews.
Leverage Delta Lake Caching
Delta Lake automatically caches frequently accessed data. Repeated queries on the same tables benefit from cached results.
Filter Early
Apply WHERE clauses as early as possible in your queries. Predicates on partition columns are especially efficient.
Select Only Needed Columns
Avoid SELECT * when possible. Selecting only needed columns reduces data scanned and improves query performance.
Workspace Scaling & Isolation
By default, FireFly is designed to configure each organization with its own dedicated Databricks workspace. This provides the strongest isolation guarantees and simplifies access control management.
Default: One Workspace Per Organization
FireFly's default architecture maps each organization to its own Databricks workspace. This design provides:
- Complete isolation: No risk of data leakage between organizations
- Simple auditing: All activity in a workspace belongs to one org
- Independent scaling: Each org's compute is fully separate
- Clear billing: Costs are naturally separated by workspace
Multi-Org Per Workspace (Advanced)
FireFly can be modified to support multiple organizations sharing the same workspace, but this requires significant additional safeguards:
- Rigorous Unity Catalog permissions: Catalog-level grants must be carefully managed per SPN
- Enhanced auditing: Additional logging to track which org accessed what data
- Code review practices: All changes must be reviewed for potential cross-org data leakage
- SPN isolation: Each org must still have its own SPN with strict permission boundaries
- Regular security audits: Periodic reviews to ensure no permission drift
This configuration is not recommended unless you have specific requirements that necessitate shared workspace infrastructure.
Multi-Workspace Architecture
Common Workspace Patterns
Environment Separation
Separate workspaces for different environments:
- Production: Business-critical data access
- Staging: Pre-production testing
- Development: Experimentation and feature dev
Geographic Distribution
Workspaces in different regions for:
- Data residency: GDPR, data sovereignty
- Latency: Users closer to data
- Disaster recovery: Cross-region redundancy
Team Isolation
Separate workspaces for different teams:
- Cost allocation: Chargeback by workspace
- Access control: Team-specific permissions
- Resource limits: Per-workspace quotas
Workload Isolation
Separate workspaces for different workloads:
- ETL: Heavy batch processing
- BI/Analytics: Interactive queries
- ML/AI: GPU-intensive workloads
Workspace Isolation Guarantees
Each Databricks workspace provides strong isolation:
- Network isolation: Separate VPC/VNet per workspace (optional)
- Compute isolation: Separate clusters and warehouses
- Storage isolation: Separate managed storage
- Identity isolation: Separate user/SPN namespaces
Databricks Apps Isolation
Databricks Apps (like the VSCode code editor) run in isolated containers, providing secure compute for interactive workloads. FireFly embeds these apps using the Go proxy for transparent authentication.
Fixed Container Resources
All Databricks Apps run with fixed, standardized resources:
Custom resource configurations are not currently supported by Databricks Apps.
Container Isolation Architecture
Isolation Features
Process Isolation
Each app instance runs in its own container with:
- Separate PID namespace (processes isolated)
- Fixed resource limits (2 vCPU, 6GB RAM)
- No access to other containers or host system
Network Isolation
Network access is strictly controlled:
- Sandboxed network namespace
- Outbound access only to approved endpoints
- No inbound connections except through proxy
Ephemeral Storage
Storage is ephemeral - all data is lost on container restart:
- Filesystem cleared on every restart or timeout
- No persistent storage available
- Files read from Unity Catalog volumes (read-only)
- Local changes exist only during active session
Data Loss Warning
Important: Any files created or modified within a Databricks App are lost when the container restarts. Users should save important work to Unity Catalog volumes or external storage before ending their session.
Future Improvement: Volume Sync
A planned enhancement is to implement automatic file synchronization with Databricks Volumes:
- Auto-backup workspace files to user's Unity Catalog volume
- Restore files on container startup
- Periodic sync during active sessions
- Versioned backups for recovery
Embedded Apps Use Cases
Code Editor
VSCode-based editor for notebooks, Python, SQL with full IDE features (IntelliSense, debugging, Git).
Notebook Viewer
Read-only notebook rendering for viewing outputs and visualizations without execution capability.
Custom Apps
Build custom Databricks Apps for specialized workflows (data quality tools, ML experiments, dashboards).
Performance Monitoring
Effective scaling requires visibility into system performance. FireFly recommends monitoring these key metrics:
Application Metrics
- Request rate (req/sec)
- Response time (P50, P95, P99)
- Error rate (4xx, 5xx)
- Instance count (scaling)
- CPU/memory utilization
Database Metrics
- Query duration
- Connection pool usage
- Active connections
- Rows read/written
- Replication lag
Databricks Metrics
- Warehouse uptime/utilization
- Query queue depth
- Query duration by type
- SPN token refresh rate
- API error rate
User Experience
- Time to first byte (TTFB)
- Largest contentful paint (LCP)
- Page load time
- Query completion time
- Error page views
Conclusion
FireFly Analytics is designed for scalability at every layer. By combining auto-scaling application infrastructure, Databricks Serverless SQL, and workspace-per-organization isolation, the platform can grow seamlessly from small teams to enterprise deployments.
Horizontal Scaling
Stateless Next.js and Go proxy instances scale horizontally based on demand with zero manual intervention.
Elastic Compute
Databricks Serverless SQL scales from zero to handle any query workload with automatic resource management.
Cost Efficiency
Pay-per-use pricing and scale-to-zero capabilities ensure you only pay for what you actually use.
Explore More
Learn about other aspects of the FireFly Analytics architecture.