What it is
Azure Data Explorer (ADX), based on the Kusto engine developed internally by Microsoft, is a time-series data analytics service designed for massive ingestion and interactive analytical queries. Its columnar storage architecture with aggressive compression (LZ4 + Huffman coding) makes it especially efficient for IoT data where the values of the same metric have high temporal correlation.
ADX is not a transactional database: it is optimised for the "write once, read many times" access pattern typical of industrial telemetry. Queries are expressed in KQL (Kusto Query Language), a declarative pipe-based language with specialised primitives for time series.
Role in IN-SIGHT
- Central telemetry store: It receives vibration, temperature and door telemetry from the whole fleet via IoT Hub through Event Hub. Each record includes vehicle-id, pod-id, timestamp (nanosecond precision), metric and value.
- Health-metric computation: Scheduled queries compute the estimated RUL (Remaining Useful Life), bearing degradation indices and vibration energy percentiles per subsystem.
- Comparison with Golden Run: The cloud EKF queries ADX to obtain the vehicle baseline and compute the innovation (difference between predicted and measured state).
- Feeding the portal: KQL queries are the data source for the alert portal and the real-time technical dashboards via ADX's native REST API.
- Historical retention: Historical data is retained in the "hot" tier (SSD, access < 1 s) for 90 days and in the "cold" tier (Azure Blob Storage) indefinitely for audit and model retraining.
KQL query example: Detection of a bearing degradation trend over the last 7 days for a specific vehicle, with a 1-hour moving average.
telemetry
| where vehicle_id == "TMB-5042"
and metric == "bearing_rms"
and timestamp > ago(7d)
| summarize avg_rms = avg(value)
by bin(timestamp, 1h), pod_id
| order by timestamp asc
| extend trend = series_fit_line_dynamic(
avg_rms, timestamp)
Internal architecture
ADX organises data into immutable columnar extents (shards) of ~1 GB compressed. When a new batch of telemetry is ingested, the engine creates new extents and indexes them; it periodically merges them to optimise query performance.
Each extent's index includes the timestamp range and the min/max values of each column, which lets the query planner discard whole extents without reading them when the query filters by time range or by a specific vehicle.