Does Google Compute Engine offer SR-IOV (Single Root I/O Virtualization)? - networking

Amazon / AWS EC2 offers SR-IOV (Single Root I/O Virtualization) instances, which it dubs "enhanced networking" -- does Google offer this on Compute Engine?
Specifically, are any GCE instance types able to bypass the hypervisor and have direct access to a multi-queue NIC?
SRV-IOV support is needed to take advantage of Scylla DB's architecture?
HN Discussion: https://news.ycombinator.com/item?id=10262719

Currently Google Compute Engine does not offer SR-IOV. That said, SR-IOV is not strictly necessary to take advantage of Scylla's architecture.
GCE offers multi-queue networking and it is possible to directly user-mode assign the virtio-net queues using Intel's DPDK. This should allow our virtio-net NIC to work with Scylla, although at least at one point DPDK made certain qemu specific assumptions with respect to virtio-net (in particular it assumed Tx/Rx queue depths of 256 descriptors; the virtio-net NIC in GCE currently advertises 16,384 entry queues although this is likely to change in the near future).
For applications like Scylla this should offer superior network performance and better in-guest compute overhead over utilizing the kernel TCP/IP stack.
Additionally, for all GCE instances with >= 1 cores (i.e., not fractional core instances) we offer multi-Gbps throughput subject to fabric availability. Latency is likely to be lowest in zones with Haswell processors. We do not currently guarantee specific network characteristics, but we offer up to 2 Gbps/core of network throughput shared between the virtual NIC and any attached persistent disk volumes (Local SSD throughput does not count against this limit). Throughput wise this makes 8-vCPU and larger instances comparable to EC2 Enhanced Networking.

At the moment, nothing that we offer is similar to AWS' "enhanced networking".
You are more than welcome posting this as a Feature Request on our Compute Engine Issue tracker though, so we can look at implementing a similar feature.

Related

Software Routing

"Commercial software routers from companies such as Vyatta can typically only attain transfer data at speeds of up to three gigabits per second. That isn’t fast enough to take advantage of the full speed of a typical network card, which operates at 10 gigabits per second." [1]
How is the speed of the network interface card relevant in this scenario? Aren't software routers connecting multiple Virtual Machines running on the same physical host? [2] Unless a PC has multiple network interface cards, it is unlikely that it functions as a packet switch between different physical hosts.
My interpretation suggests that there seem to exist two different kinds of software routing: (1) Embedding a real time operating system on an actual router. (2) Writing application layer code on a PC that can handle packets being transmitted between different virtual machines running on that very PC. Is this correct?
It depends on what your router is doing. If it's literally just looking at a static route table and forwarding packets out another interface, there isn't much hit in performance.
It's when you get into things like NAT, Crypto, QoS, SPI... that you will see performance degradation. Hardware vendors are usually using custom silicon to process the more advanced features, this allows for higher throughput packet forwarding.
Now that merchant silicon is fast enough and the open source applications are getting better, the performance gap is closing.
It really depends on your use case as far as what you want to use. I've gone with both and not seen performance hits, but the software versions weren't handling high throughput workloads.
Performance of the link from the virtual network to the physical eventually becomes important at any reasonable scale. You're right that, within the same physical host, things can be pretty quick, but that requires that one can get everything needed in one box.
While merchant silicon has come a long way in improving the performance of networking equipment, greater gains are taking place getting CPU's to handle networking tasks better. Both AMD and Intel have improved their architectures to the point where 10 Gbps forwarding is a reality. Intel has developed a specialized library (DPDK Wiki Page) that takes care of a lot of low-level networking functions at high performance.

Confusing about NFV implementation (Network Functions Virtualization)

I'm researching about SDN and NFV.
In the concept of NFV on Wikipedia , it says : "Network Functions Virtualization (NFV) is a network architecture concept that proposes using IT virtualization related technologies, to virtualize entire classes of network node functions into building blocks that may be connected, or chained, together to create communication services."==> first thing to consider that it will reduce the cost of facilities.
So in real life implementation, for example, how can we virtualize a network nodes like a router?
NFV was created for the networks to be capable to extend in a dynamically way(virtualize the router) , not a static way(buy a new router), that is we must implement the router functions in the server or a computer instead of buying and then adapting the new router to the current nextwork , in this case I don't see any different in this implementation , because buying a server to implement a virtualized router is not cheaper than buying a new router.
Can anyone explain this for me , or Am i wrong understanding the NFV concept?
Thanks.
SDN is just that, software defined networking. In a Hybrid SDN model SDN decouples the logic from the physical box, rendering the physical box a simple "forwarding" box. The logic rests with the SDN controller where developers create APIs that manage these forwarding boxes (we call them network elements now) with flow tables that get pushed to them. The benefit here is that the devices can now be configured and provisioned through this controller, as opposed to having to log into each and every box.
Then you have the cloud. A small office can literally get away with porting all of their apps and services into the cloud, doing away with most of their physical boxes. Of course you still need a LAN in the office and a way to get out to the Internet and eventually the cloud. You can even ask the cloud provider to provision load-balancing on specific applications, firewalls and content delivery services. So basically your office applications and most of the supporting LAN and databases can be safely ported to cloud providers.
When you said "...because buying a server to implement a virtualized router is not cheaper than buying a new router", it depends: As it's a virtualized resource, you can use this new server to run your router and another resource from your infrastructure, if the machine has more hardware capacity than you need for a single router.
In fact, you might not even need to buy a new machine, if you have your resources in a cloud like AWS (or your own private cloud), when you have need for more routers, you can just flexibly allocate more hardware resources and spawn a new router instance (scale out) and, whenever your router demand is lower than what you have allocated, you can reduce your number of routers (scale in) and stop losing money with an infrastructure that you are not using at the moment.
Consider that a really high level explanation, if you want to know the details about how a Virtual Network Function scales in and out in a NFV implementation, I recommend you to read the ETSI specification about how it should work: http://www.etsi.org/standards-search#page=1&search=&title=1&etsiNumber=1&content=0&version=1&onApproval=1&published=1&historical=0&startDate=1988-01-15&endDate=2017-04-13&harmonized=0&keyword=&TB=789,,832,,831,,795,,796,,800,,798,,799,,797,,828&stdType=&frequency=&mandate=&collection=&sort=3
Let me continue with your example of the router. Traditionally, these routers are vendor specific. For example, the major sellers are companies like Cisco, Juniper, etc. They are implemented on proprietary hardware and therefore if you want to buy a new router you need to buy from them only. Further, when they go into some problems, you need a dedicated engineer to repair them. Therefore, the telecommunication has to take care of high Capital Expenditure (COPEX) and Operational Expenditure (OPEX).
With NFV, the entire router function is implemented as a software and deployed on a general purpose servers (GPP) or cloud. These GPPs are relatively very cheap when compared to proprietary hardware. Thanks to cloud computing, even small companies can afford servers on Amazon and Google clouds. Because of cheap availability, COPEX is now relatively cheaper. Further, you don't need a dedicated engineer when the hardware goes into a problem, the same engineer who works for GPP server maintenance is enough. This way OPEX is reduced.
Now imagine, like routers there are many networking elements present in Telecommunication. If every networking element requires a dedicated engineer, how much a Teleco operator will be spending money. Apart from this, due to software implementation, suppose, when you have very high traffic than expected, you can just roll out a new router (software network function) on GPP or Cloud instead of completely buying a new router, which is very costly. As you already know, in the cloud you pay based on usage.
There are many more uses. To know more you need to read research papers.

GCE Instances Network connection speeds on internal network

Coming from a background of vSphere vm's with vNIC's defined on creation as I am do the GCE instances internal and public ip network connections use a particular virtualised NIC and if so what speed is it 100Mbit/s 1Gb or 10Gb?
I'm not so much interested in the bandwidth from the public internet in but more what kind of connection is possible between instances given networks can span regions
Is it right to think of a GCE project network as a logical 100Mbit/s 1Gb or 10Gb network spanning the atlantic I plug my instances into or should there be no minimum expectation because too many variables exist like noisy neighbours and inter region bandwidth not to mention physical distance?
The virtual network adapter advertised in GCE conforms to the virtio-net specification (specifically virtio-net 0.9.5 with multiqueue). Within the same zone we offer up to 2Gbps/core of network throughput. The NIC itself does not advertise a specific speed. Performance between zones and between regions is subject to capacity limits and quality-of-service within Google's WAN.
The performance relevant features advertised by our virtual NIC as of December 2015 are support for:
IPv4 TCP Transport Segmentation Offload
IPv4 TCP Large Receive Offload
IPv4 TCP/UDP Tx checksum calculation offload
IPv4 TCP/UDP Rx checksum verification offload
Event based queue signaling/interrupt suppression.
In our testing for best performance it is advantageous to enable of all of these features. Images supplied by Google will take advantage of all the features available in the shipping kernel (that is, some images ship with older kernels for stability and may not be able to take advantage of all of these features).
I can see up to 1Gb/s between instances within the same zone, but AFAIK that is not something which is guaranteed, especially for tansatlantic communication. Things might change in the future, so I'd suggest to follow official product announcements.
There have been a few enhancements in the years since the original question and answers were posted. In particular, the "2Gbps/core" (really, per vCPU) is still there but there is now a minimum cap of 10 Gbps for VMs with two or more vCPUs. The maximum cap is currently 32 Gbps, with 50 Gbps and 100 Gbps caps in the works.
The per-VM egress caps remain "guaranteed not to exceed" not "guaranteed to achieve."
In terms of achieving peak, trans-Atlantic performance, one suggestion would be the same as for any high-latency path. Ensure that your sources and destinations are tuned to allow sufficient TCP window to achieve the throughput you desire. In particular, this formula would be in effect:
Throughput <= WindowSize / RoundTripTime
Of course that too is a "guaranteed not to exceed" rather than a "guaranteed to achieve" thing. As was stated before "Performance between zones and between regions is subject to capacity limits and quality-of-service within Google's WAN."

Openstack: How to decide hardware capacity?

I'm reading some OpenStack material recently, but didn't get a chance to try yet. I got the sense that Openstack could management a large number of virtual machines via API or dashboard interface. User could easily create/start virtual machines.
Then I come out a confusion. As the underlying computer hardware might vary, some computer maybe only able to host one virtual machine, some maybe ten. When user start a virtual machine, does user manually or Openstack automatically designate a hardware computer to host the virtual machine? In either case, how to decide the hardware computer's capacity? Does Openstack provide the functionality to set capacity attribute of hardware computer?
When you run OpenStack, each physical machine (which OpenStack calls compute hosts) will periodically report how many CPUs it has and how much RAM it has, as well as how many CPUs and how much RAM have been allocated to virtual machines that are currently running.
The OpenStack scheduler uses this information to determine which compute host to run a VM on. First, it checks to see if a host has enough CPUs (by applying the CoreFilter) and enough RAM (by applying the RamFilter). Compute hosts that don't have enough CPUs or RAM available won't even be considered.
Once it has a set of candidate hosts that have enough CPU and RAM, the scheduler needs to pick one of them. By default, the scheduler will use a "spread-first" strategy, allocating VMs to machines that have the most amount of CPU/RAM that isn't currently allocated to VM. It's possible to change this strategy to a "fill-first" behavior, so that the compute host with the least amount of free resources will get allocated first. This is configured by setting the nova.scheduler.least_cost.compute_fill_first_cost_fn parameter.
For more information, see the chapter on scheduling in the OpenStack Compute Admin guide.

Developing Serverless Lan Chat Program Help!

I want to develop simple Serverless LAN Chat program just for fun. How can I do this ? What type Architecture should I use?
Last year I have worked on TCP,UDP Client/ Server application Project.It was simple (Server listens to certain port/socket and Client connect to server's port etc..) But I have no idea about how to develop "Serverless" LAN Chat program. How can I do this? UDP,TCP,Multicast,Broadcast? or Should program behave like both server and client?
The simplest way would be to use UDP and simply broadcast your messages all over the network.
A little bit more advanced version would be to only use the broadcast to discover other nodes in the network.
Every node maintains a list of known peers.
Messages are sent with TCP to all known peers.
When a node starts up, it sends out an UDP broadcast to discover other nodes.
When a node receives a discovery broadcast, it sends "itself" to the source of the broadcast, in order to make it self known. The receiving node adds the broadcaster to it's own list of known peers.
When a node drops out of the network, it sends another broadcast in order to inform the remaining nodes that they should remove the dropped client from their list.
You would also have to consider handling the dropping out of nodes without them informing the rest of the network.
The spread toolkit may be a bit overkill for what you want, but an interesting starting point.
From the blurb:
Spread is an open source toolkit that provides a high performance messaging service that is resilient to faults across local and wide area networks. Spread functions as a unified message bus for distributed applications, and provides highly tuned application-level multicast, group communication, and point to point support. Spread services range from reliable messaging to fully ordered messages with delivery guarantees.
Spread can be used in many distributed applications that require high reliability, high performance, and robust communication among various subsets of members. The toolkit is designed to encapsulate the challenging aspects of asynchronous networks and enable the construction of reliable and scalable distributed applications.
Spread consists of a library that user applications are linked with, a binary daemon which runs on each computer that is part of the processor group, and various utility and demonstration programs.
Some of the services and benefits provided by Spread:
Reliable and scalable messaging and group communication.
A very powerful but simple API simplifies the construction of distributed architectures.
Easy to use, deploy and maintain.
Highly scalable from one local area network to complex wide area networks.
Supports thousands of groups with different sets of members.
Enables message reliability in the presence of machine failures, process crashes and recoveries, and network partitions and merges.
Provides a range of reliability, ordering and stability guarantees for messages.
Emphasis on robustness and high performance.
Completely distributed algorithms with no central point of failure.
Apples iChat is an example of the very product you are envisioning. It uses Bonjour (apple's zero-conf networking protocol) to identify peers on a LAN. You can then chat or audio/video chat with them.
I'm not entirely sure how Bonjour works inside, but I know it uses multicast. Clients "register" services on the LAN, and the Bonjour protocol allows for each host to pull up a directory of hosts for a given service (all without central management).

Resources