This time, apart the always inventive networking approaches, we also had talks on digital innovation, business transformation by FlashStack, 2STiC program initiative and DevOps approaches, DC automation from design to deployment in a highly volatile ICT-ecosystem.
Roman Vogel (PureStorage) opened the workshop explaining current issues why all is about business transformation and data centric. He introduced how Software Defined Data Centres (SD-DC) help to address today’s business issues with “FlashStack” – old has to be merged with new, and silos have to be given up, but the right time for this process is never explicitly there – so lets go ahead.
Therefore, the mechanism is the evolution to a data centric architecture. There is a high degree of virtualisation, and the whole stack, network, compute and storage can be automated!…He argued that FlashStack is Cisco UCS for compute, Cisco fabric (Nexus and MDS FC switches), and pure storage. Thus, it’s a system as a private cloud constituted on security, where policy driven service layers are customisable on demand. He explained the anatomy of such a system as disk stateless servers, fast converged client networks with fast parallel protocols, and NVMe (Non-Volatile Memory Express) over fabric. Protocols are any protocols for block or object storage and for analytics object store of NFS or S3.
The current solutions with FlashStack eliminate silos, simplify operations, scale with agility (compute and/or storage part of it), and provide transformative values, e.g. decreasing complexity on compute, network and storage layer.
Kamila Součková (ETH Zurich) is researching on P4 (Programming Protocol-independent Packet Processors) and presented “P4 in the wild: Line-rate packet forwarding of the SCION future Internet architecture”.
She introduced SCION as a design for providing route control, failure isolation, and explicit trust information for end-2-end-communications. With SCION there is a next generation of networks – SCION offers in details scalability by packet-carrier forwarding, and hierarchical design, network access control so that the end-host selects paths (the ISPs decide path availability), multi tenancy by isolation domains (core and non-core ASs, failures stay within) and built-in DoS protection. With SCION native we have a new concept on control and data plane, replaces IP +BGP and endhost-controlled multipath “for free”.
Based on the generic concept Kamila introduced her expertise, to build a SCION Border router in P4 with NetFPGA. She’s assuming four main assumptions: – (#1) To elaborate a ready-to-deploy concept requiring data forwarding at 40 Gbps or more, usable under real conditions/traffic and with an integration form on existing SCION infrastructure regarding control plan and monitoring, metrics; – (#2) SCION is set as a library, should be modular, portable, and including high-performance P4 code for parsing, verification and data forwarding; – (#3) To elaborate guidelines (as recommendation) for high-speed in P4 by checking the critical path in the design and – (#4) Optimising the SCION protocol for hardware or asking the question: “How can we adjust SCION to enable a more efficient implementation in HW?”
Many challenges in #1 to #4 were discussed, so here is an extract on underlined spots: A FPGA based hardware is a good basis for evolving P4 code, but not recommended for P4 projects (experimenting). Deployment steps were iteratively – Reuse, Reduce, and Recycle. Further, SCION does not determine the packet fields, so parser has to be deployed for this, which can be used for external monitoring of SCION. The most important inside of parser deployment is not to parse the whole path, just to save it, but this needs to modify the NetFPGA design in direction to build sub-parser, and this again costs many FPGA-area and RAM. Limiter of high-speed on P4 are CAM-tables, then meeting the time requirements is a frequent issue, therefore the critical path on implemented design has to be checked. However at the end a special end host with P4-enabled SmartNICs is desirable, so that processing is on the host.
The project she explained is mostly done, currently parsing and validation is challenging and timing starts failing. The FPGA deployment is still not production-ready as power requirements has to be optimised in comparison to IP, and last but not least it has to make faster by planning 1 Tbps using multiple FPGA-enabled-NICs.
Victor Reijs (SIDN) reports about Future Internet activities at the SIDN. He pointed out to the 2STiC Program: Security, Stability and Transparency of inter-network Communication. This is a program in collaboration with NLnet labs, SIDN labs, SURF, TU Delft, UTwente, and UvA. They approached on realistic/practical use cases and build demonstrators and testbeds, covering multi domain, governance and trust aspects, and acting as a think thank providing publications, guidelines and evaluation expertise of new technologies, and experience around future Internet technologies.
Victor referred to the presentation of Kamilla and mentioned that work in SCION is currently done at SIDN, a testbed/prototype 2STiC in P4 is in place, where they determine maturity of P4 implementations for Future Internet technologies like SCION, also RINA and Content Centric Networks. P4-nodes of the testbed he mentioned are Barefoot switches and servers with Netronome SmartNics. The In-band Network Telemetry was a use case for the 2STiC testbed. Data collector design, gathering data from control and data plane as well as topology design is a part of the initial setup.
In the future they see P4 as a tool, they will investigate in above mentioned technologies, and enforce community work as a part by providing the testbed for use cases and demonstrators. In context of porting protocols to hardware, there are some crucial questions: “How open are these opportunities?…How hardware dependent is the P4 code?And Victor argued too, that NDAs becomes an important role – NDAs for hardware can make cooperations more cumbersome.
Serge Monney (IBM) illustrated how next generation IT technical support for cloud and storage can facilitate monitoring and troubleshooting of large scaled virtualized multi tenant clouds and their DC environment. Through evolution of virtualization, DC complexity is increased, so we need to be proactive, pure anomaly detection and root cause analysis does not anymore satisfy. Further, many trash-holds and KPIs have to be managed successfully; for example in IBM they collect up to 800 metrics, and high-level workload may be normal or will appear from issues. Thus the semi-automated approaches are no longer practical, nor sustainable and doesn’t live up to next generation of cloud technical support. This concludes to address novel ML-based approaches for timely detected IT-issues and events in (virtualized) cloud/storage fabrics with its associated causal troubleshooting.
So the question comes up with “what do compare to what”?…, and can we say that’s true that “Performance analysis is more like art than science?” Serge introduced the building blocks for performance analysis: Source: DC => Monitoring => Thresholds => Alerting => Collecting raw data => Time Series made correlations => Ranking of similar time series metrics => Target: Support, focus on specific areas.
Serge pointed out that the nucleus to find metrics is to have causal relationship with the alert; that means a Time series cross correlation. From this cross-correlation picture we are able to make image recognition using by auto encoder in a neural network, also making output layer to reproduce the input signal as much as possible. However, if data does not match at least of 60% then we conclude to have errors, since the machine learning is not able to reconstruct it based on the most important metrics.
Now questions from the audience came up with – how many metrics the system was tested – 2, 3 metrics? …and what’s about fingerprinting when we detect real issues to use this later against other data? He pointed out that he was able to find a definition, if 1,4,10,200…or more are in charge, then you can fingerprint and apply this by type of machine, industry, through collecting a lot of performance data, tag them (type of machine and workload) and then create a model.
Hanieh Rajabi (SWITCH) gave as a view of how to build an automatable data centre, from design to deployment. In a first step she introduced SWITCHengines and cloud offerings. She pointed out that IaaS, PaaS, and SCALE-UP, community work are currently the “main pillars” of service delivery. With SWITCHengines she pointed out how difficult it is to automate – We know how to swim, but not to automate. Issues like having less time for provisioning, configuring, updating and maintaining services, and how important to detect problem as quickly as possible and to resolve it was mentioned. She linked to the goals of infrastructure as a code and underlined benefits of automation like – solution to problem are proven through implementation, testing and measuring, and therefore simplifying network operations for IT teams not only for network engineers.
Hanieh introduced to the CLOS-DC-Toplogy at SWITCH according to the motto – Scale the datacenter like you scale the internet. She explained a Spine-Leaf Architecture using standard L3 routing protocols (BGP), L2 data plane built on VXLAN tunneled and L2 control plane by means of eVPN protocol. As a background: A CLOS topology is comprised of spine and leaf layers. Servers are connected to leaf switches (TOR switches) and each leaf is connected to all spines. There is no direct leaf-to-leaf and spine-to-spine connection. She pointed out that in the SWITCH-CLOS-topology switches are manufactured from different vendors, that the topology is highly scalable, where it’s possible to add switches as to add nodes to the Internet.
Further, all switches speak to each other with the standard routing protocol BGP; reasons for using BGP are “one for all” and a good implementation form (see RFC7938). The routing at the host stays in context how the servers would be attached to the fabric. So she explained that through L2 bonding (could be active/passive), through L3 routing with BGP policy, servers accept only a default route and announce only their loop back address. Main-reasons for using eBGP and not iBGP are robust implementation, fully featured (iBGP limited in multi path support), multiple implementation of iBGP, and last but not least simpler to understand.
Automation processes/procedures shall simplify network operations, complex configurations and device management while providing business agility to adapt in a steadily-changing environment. So automation can be approached differently, e.g. most device have an option to import configuration file through TFTP, by dynamically ( template, declaratively defined infrastructure) generate the configuration file, or by testing environment, which allows evaluation of configuration-correctness. Used software and tools are – ONIE (open network installation environment) that will be used to bootstrap switches, Jinja2, the full featured template engine and Ansible for automation configuration, e.g. single play book with different roles. User-facing cloud network monitoring (availability, latency) and measuring Service Level Indicators/Objectives (SLI/SLO) for users from 3 different locations will be covered with Nagios, site 24×7 monitoring platform.
As a wrap up, let’s give a tribute to the female community and networkers who are working hard to get into any hacks and technology details. It was great to see that the interest is increasing each time and so we hope this trend will continue in the future. Also, as the SDN topic is shaping and reshaping continuously, we are as well adapting the topics to match the recent trends, so we can offer to our community the most interesting and up-to-date research and implementation solutions. Therefore, thanks for all who came and see you on the 13th SDN event. Don’t wait for the announcement, you can already start sending us ideas and topics for presentations. Till next time, enjoy the summer.
Authors: Kurt Baumann (SWITCH), Irena Trajkovska (Cisco)