Unsupervised Event Detection

Recent technology evolution of network equipment allow to continuously stream a wealth of information, pertaining to multiple protocols and layers of the stack, at a very fine spatial-grain and at furthermore high-frequency. Processing this deluge of telemetry data in real-time clearly offers new opportunities for network control and troubleshooting, but also poses serious challenges. In this demonstration, we tackle this challenge by applying streaming machine-learning techniques to the continuous flow of control and data-plane telemetry data, with the purpose of real-time detection of BGP anomalies. In particular, we implement an anomaly detection engine that leverages DenStream, an unsupervised clustering technique, and apply it to telemetry features collected from a large-scale testbed comprising tens of routers traversed by 1 Terabit/s worth of real application traffic.

Testbed

We have setup a full-scale testbed comprising 23 nodes in Cisco premises that will be accessed remotely during the demo. The testbed, shown in the figure, replicates a traditional clos topology of a Content Service Provider (CSP) datacenter. On the physical level, it comprises 8 leaf nodes interconnected via 4 spine nodes. For redundancy, each leaf is connected to each spine via 4 x 100Gbps fiber links. On the operational level, the datacenter is designed with BGP as the only routing protocol, following guidelines in RFC7938.

Every 5 seconds we collect, from leafs, spines and drs (15 nodes in total) a snapshot of the features (e.g. Packets Sent/Received/Dropped per each interface, Paths-Count, Number of BGP update messages) and we process them in real time.

Methodology

In this demonstration, we leverage unsupervised techniques (such as DenStream [1]) to process MDT data in real-time for online event detection. We make our code available as open-source software on GitHub [2] -- which is interesting per se as, to the best of our knowledge, there is no fully functional DenStream implementation available. Additionally, we make the datasets generated for the demonstration available as well on GitHub [3].

Resources:

References

[1] Feng Cao and Martin Ester and Weining Qian and Aoying Zhou, "Density-based clustering over an evolving data stream with noise", 2006 SIAM Conference on Data Mining

[2] OutlierDenStream - GitHub - https://github.com/anrputina/OutlierDenStream

[3] Cisco-ie/telemetry - GitHub - https://github.com/cisco-ie/telemetry

[4] Putina, Andrian and Rossi, Dario and Bifet, Albert and Barth, Steven and Pletcher, Drew and Precup, Cristina and Nivaggioli, Patrice, "Unsupervised real-time detection of BGP anomalies leveraging high-rate and fine-grained telemetry data", 2018 INFOCOM, Demo Session

[5] Putina, Andrian and Rossi, Dario and Bifet, Albert and Barth, Steven and Pletcher, Drew and Precup, Cristina and Nivaggioli, Patrice, "Telemetry-based stream-learning of BGP anomalies", 2018 ACM SIGCOMM Workshop on Big Data Analytics and Machine Learning for Data Communication Networks (Big-DAMA'18)