My confusion is based on these 3 thoughts -
Is it 2^(number of all address pin available on the cpu)?
Is it the 2^(size of one specific register)?
Is it a hardware circuit which understands all the addresses within a range of addresses? Then what is it?
I'm not asking about virtual address space here, I don't know what it's called , maybe it's the physical address space of all physical devices including ram.
Besides even if I get a correct answer then I would like to ask, why does my cpu has 2^39 bits(512GB) of memory address space and 64KB+3 I/O memory space. This information is written on intel documentation for my system in package (intel core i3-4005U with an integrated lynx point-m PCH).
You are welcome to edit my question if I'm asking it wrong. Thank you.
The size of physical address space for a CPU is an arbitrary choice made by the designers. The width of cache tags and TLB entries have to be wide enough because caches are physically tagged (including L1d in most CPUs, including all Intel), and other internal things that deal with physical addresses. (Like the store buffer, for matching load addresses against outstanding stores. And also for matching stores against in-flight code addresses.)
All Haswell-client CPUs share the same core microarchitecture, so even though a laptop chip doesn't need that much, some single-socket non-Xeon desktops might. (I think this is true; saving a small amount of space and/or power by changing cache tag widths by 1 or 2 bits might plausible but IDK if Intel does that; I think they really only want to validate a design once.)
Remember that device memory (including PCIe card such as VGA or a Xeon Phi compute card) will normally be mapped into physical address space so the CPU can access it with loads/stores (after pointing virtual pages at those regions of physical address space). PCIe uses fixed width links and sends addresses as part of message "packets"; no extra pins are required for more addresses.
The DDR3 DRAM controllers have a number of address lines on each channel to send row/column addresses; it might be possible to leave one pin unused. It's very similar to other DDR versions; Wikipedia has diagram of some of the signals in the DDR4 article: https://en.wikipedia.org/wiki/DDR4_SDRAM#Command_encoding
Related
Page 526 of the textbook Operating Systems – Internals and Design Principles, eighth edition, by William Stallings, says the following:
At the lowest level, device drivers communicate directly with peripheral devices or their controllers or channels. A device driver is responsible for starting I/O operations on a device and processing the completion of an I/O request. For file operations, the typical devices controlled are disk and tape drives. Device drivers are usually considered to be part of the operating system.
Page 527 continues by saying the following:
The next level is referred to as the basic file system, or the physical I/O level. This is the primary interface with the environment outside of the computer system. It deals with blocks of data that are exchanged with disk or tape systems.
The functions of device drivers and basic file systems seem identical to me. As such, I'm not exactly sure how Stallings is differentiating them. What are the differences between these two?
EDIT
From page 555 of the ninth edition of the same textbook:
The next level is referred to as the basic file system, or the physical I/O level. This is the primary interface with the environment outside of the computer system. It deals with blocks of data that are exchanged with disk or tape systems. Thus, it is concerned with the placement of those blocks on the secondary storage device and on the buffering of those blocks in main memory. It does not understand the content of the data or the structure of the files involved. The basic file system is often considered part of the operating system.
Break this down into layer:
Layer 1) Physical I/O to a disk requires specifying the platter, sector and track to read or write to a block.
Layer 2) Logical I/O to a disk arranges the blocks in a numeric sequence and one reads or writes to a specific logical block number that gets translated into into the track/platter/sector.
Operating systems generally have support for a Logical I/O and physical I/O to the disk. That said, most disks these days do the logical to physical translation. O/S support for that is only needed for older disks.
If the device supports logical I/O the device driver performs the I/O. If the device only supports physical I/O the device driver usually handles both the Logical and Physical layers. Thus, the physical I/O layer only exists in drivers for disks that do not do logical I/O in hardware. If the disk supports logical I/O, there is no layer 1 in the driver.
All of the above is what is appears the your first quote is addressing.
Layer 3) Virtual I/O writes to a specific bytes or blocks (depending upon the O/S) to a file. This layer is usually handled outside the device driver. At this layer there are separate modules for each supported file system. Virtual I/O requests to all disks using the same file system go through the same module.
Handling Virtual I/O requires much more complexity than simply reading an writing disk blocks. The virtual I/O layer requires working with the underlying disk file system structure to allocate the blocks to a specific file.
This appears to be what is referred to in the second quote. What is confusing to me is why it is calling this the "physical I/O" layer instead of the "virtual I/O" layer.
Everywhere I have been Physical I/O and Logical I/O are the writing of raw blocks to a disk without regard to the file system on the disk.
"Commercial software routers from companies such as Vyatta can typically only attain transfer data at speeds of up to three gigabits per second. That isn’t fast enough to take advantage of the full speed of a typical network card, which operates at 10 gigabits per second." [1]
How is the speed of the network interface card relevant in this scenario? Aren't software routers connecting multiple Virtual Machines running on the same physical host? [2] Unless a PC has multiple network interface cards, it is unlikely that it functions as a packet switch between different physical hosts.
My interpretation suggests that there seem to exist two different kinds of software routing: (1) Embedding a real time operating system on an actual router. (2) Writing application layer code on a PC that can handle packets being transmitted between different virtual machines running on that very PC. Is this correct?
It depends on what your router is doing. If it's literally just looking at a static route table and forwarding packets out another interface, there isn't much hit in performance.
It's when you get into things like NAT, Crypto, QoS, SPI... that you will see performance degradation. Hardware vendors are usually using custom silicon to process the more advanced features, this allows for higher throughput packet forwarding.
Now that merchant silicon is fast enough and the open source applications are getting better, the performance gap is closing.
It really depends on your use case as far as what you want to use. I've gone with both and not seen performance hits, but the software versions weren't handling high throughput workloads.
Performance of the link from the virtual network to the physical eventually becomes important at any reasonable scale. You're right that, within the same physical host, things can be pretty quick, but that requires that one can get everything needed in one box.
While merchant silicon has come a long way in improving the performance of networking equipment, greater gains are taking place getting CPU's to handle networking tasks better. Both AMD and Intel have improved their architectures to the point where 10 Gbps forwarding is a reality. Intel has developed a specialized library (DPDK Wiki Page) that takes care of a lot of low-level networking functions at high performance.
I am reading Intel virtualization manual where manual says that if bit 6 of EPTP(a VM execution control field) is set, the processor will set the Accessed and dirty bits in relevant EPT entries according to some rules.
I am trying to understand that if processor sets A/D bits in EPT on access and modification of relevant pages how guest Operating will get benefit from this setting as guest Os has no access to EPT. In my understanding A/D bits are used by memory manager of the OS for optimization and swapping algorithms and there is no role of these bits in page walker.
I(being programmer of VMM) have to add code in VMM to search the relevant entry in GPA space and mark the bits accordingly?
If this is the case then how can we say that these bits are set with out the knowledge of VMM?
kvm way of dealing this issue will be a good answer also
In general, the guest OS would not benefit from the access and dirty bits in the EPT from being set. As you stated the guest does not typically have access to the EPT. This is purely for the hypervisor/VMM. It is analogous to the dirty and access bit in a process page table, the process does not use it, only the OS.
With regard to your second question, it is a bit unclear so I'm not sure what you are asking. However, the hardware will mark the access and dirty bits assuming it has been set up correctly, you do not have to do it manually.
Coming from a background of vSphere vm's with vNIC's defined on creation as I am do the GCE instances internal and public ip network connections use a particular virtualised NIC and if so what speed is it 100Mbit/s 1Gb or 10Gb?
I'm not so much interested in the bandwidth from the public internet in but more what kind of connection is possible between instances given networks can span regions
Is it right to think of a GCE project network as a logical 100Mbit/s 1Gb or 10Gb network spanning the atlantic I plug my instances into or should there be no minimum expectation because too many variables exist like noisy neighbours and inter region bandwidth not to mention physical distance?
The virtual network adapter advertised in GCE conforms to the virtio-net specification (specifically virtio-net 0.9.5 with multiqueue). Within the same zone we offer up to 2Gbps/core of network throughput. The NIC itself does not advertise a specific speed. Performance between zones and between regions is subject to capacity limits and quality-of-service within Google's WAN.
The performance relevant features advertised by our virtual NIC as of December 2015 are support for:
IPv4 TCP Transport Segmentation Offload
IPv4 TCP Large Receive Offload
IPv4 TCP/UDP Tx checksum calculation offload
IPv4 TCP/UDP Rx checksum verification offload
Event based queue signaling/interrupt suppression.
In our testing for best performance it is advantageous to enable of all of these features. Images supplied by Google will take advantage of all the features available in the shipping kernel (that is, some images ship with older kernels for stability and may not be able to take advantage of all of these features).
I can see up to 1Gb/s between instances within the same zone, but AFAIK that is not something which is guaranteed, especially for tansatlantic communication. Things might change in the future, so I'd suggest to follow official product announcements.
There have been a few enhancements in the years since the original question and answers were posted. In particular, the "2Gbps/core" (really, per vCPU) is still there but there is now a minimum cap of 10 Gbps for VMs with two or more vCPUs. The maximum cap is currently 32 Gbps, with 50 Gbps and 100 Gbps caps in the works.
The per-VM egress caps remain "guaranteed not to exceed" not "guaranteed to achieve."
In terms of achieving peak, trans-Atlantic performance, one suggestion would be the same as for any high-latency path. Ensure that your sources and destinations are tuned to allow sufficient TCP window to achieve the throughput you desire. In particular, this formula would be in effect:
Throughput <= WindowSize / RoundTripTime
Of course that too is a "guaranteed not to exceed" rather than a "guaranteed to achieve" thing. As was stated before "Performance between zones and between regions is subject to capacity limits and quality-of-service within Google's WAN."
I'm using IT Guru's Opnet to simulate different networks. I've run the basic HomeLAN scenario and by default it uses an ethernet connection running at a data rate of 20Kbps. Throughout the scenarios this is changed from 20K to 40K, then to 512K and then to a T1 line running at 1.544Mbps. My question is - does increasing the data rate for the line increase the throughput?
I have this graph output from the program to display my results:
Please note it's the image on the forefront which is of interest
In general, the signaling capacity of a data path is only one factor in the net throughput.
For example, TCP is known to be sensitive to latency. For any particular TCP implementation and path latency, there will be a maximum speed beyond which TCP cannot go regardless of the path's signaling capacity.
Also consider the source and destination of the traffic: changing the network capacity won't change the speed if the source is not sending the data any faster or if the destination cannot receive it any faster.
In the case of network emulators, also be aware that buffer sizes can affect throughput. The size of the network buffer must be at least as large as the signal rate multiplied by the latency (the Bandwidth Delay Product). I am not familiar with the particulars of Opnet, but I have seen other emulators where it is possible to set a buffer size too small to support the select rate and latency.
I have written a couple of articles related to these topics which may be helpful:
This one discusses common network bottlenecks: Common Network Performance Problems
This one discusses emulator configuration issues: Network Emulators