what is the advantage of using a CPU register for temporarydata storage over using a memory location?

what is the advantage of using a CPU register for temporarydata storage over using a memory location? - cpu-registers

what is the advantage of using a CPU register for temporary data storage over using a memory location ?

registers are accessed usually in 1 cpu cycle (.3 nano-seconds).
L1
Cache access is .5 nano-seconds
L2 cache access is 7 nano-seconds
DRAM access is 200 nano-seconds.
So you are about 600x faster with registers than working with ram.
Speeds referenced from here.

Related

Difference between two perf events for intel processors

What is the difference between following perf events for intel processors:
UNC_CHA_DIR_UPDATE.HA: Counts only multi-socket cacheline Directory state updates memory writes issued from the HA pipe. This does not include memory write requests which are for I (Invalid) or E (Exclusive) cachelines.
UNC_CHA_DIR_UPDATE.TOR: Counts only multi-socket cacheline Directory state updates due to memory writes issued from the TOR pipe which are the result of remote transaction hitting the SF/LLC and returning data Core2Core. This does not include memory write requests which are for I (Invalid) or E (Exclusive) cachelines.
UNC_M2M_DIRECTORY_UPDATE.ANY: Counts when the M2M (Mesh to Memory) updates the multi-socket cacheline Directory to a new state.
The above description about perf events is taken from here.
In particular, if there is a directory update because of the memory write request coming from a remote socket then which perf event will account for that if any?
As per my understanding, since the CHA is responsible for handling the requests coming from the remote sockets via UPI, the directory updates which are caused by the remote requests should be reflected by UNC_CHA_DIR_UPDATE.HA or UNC_CHA_DIR_UPDATE.TOR. But when I run a program (which I will explain shortly), the UNC_M2M_DIRECTORY_UPDATE.ANY count is much larger (more than 34M) whereas the other two events have the count in the order of few thousand. Since there are no other writes happening other than those coming from the remote socket it seems that UNC_M2M_DIRECTORY_UPDATE.ANY measures the number of directory updates(and not the other two events which) happening due to remote writes.
Description of the system
Intel Xeon GOLD 6242 CPU (Intel Cascadelake architecture)
4 sockets with each socket having PMEM attached to it
part of the PMEM is configured to be used as a system RAM on sockets 2 and 3
OS: Linux (kernel 5.4.0-72-generic)
Description of the program:
Note: use numactl to bind the process to node 2 which is a DRAM node
Allocate two buffers of size 1GB each
Initialize these buffers
Move the second buffer to the PMEM attached to socket 3
Perform a data copy from the first buffer to the second buffer

Where do System.Runtime.Caching.ObjectCache caching data

Where do System.Runtime.Caching.ObjectCache cache/store data when it is Memorycahce.Default?
Do it save data in ram or cpu L1 cache ?
How do I caching memory in task manager?

Yes those are in memory (OR) in-process cache and does store the data in server's memory (RAM) whether L1/L2 cache that no idea. So, in case your worker process goes off (OR) IIS recyles (with context of ASP.NET) then all your cached data is gone.
On the other hand, you can as well choose to use distributed cache mechanism like REDIS or Azure Mem Cache which are stored on separate server instance and not in your server process.

No, it has nothing to do with processor caches L1, L2 or others. It is just a caching (as a concept) solution ,that is being held in memory.

Does setting the physical memory limit in iis 7.5 cause the garbage collector to operate more aggressively?

I am currently undertaking load testing of a asp.net 4.0 web application hosted on a 64bit 2008 server (iis 7.5).
The purpose of the load testing it to determine the maximum memory usage by the web application if every page is cached simultaneously.
To evaluate this I set the output cache duration of the pages to 900 seconds then I request each publicly accessible url via xenu link sluth. This effectively request 20,000 or so pages.
To monitor memory usage I am using both Windows performance monitor and Redgate memory profiler 7.0.
I have run the test twice, test 1 with the physical memory limit set to the default 0, and test 2 with the physical memory limit set to 921600 (900mb).
Here is what I have observed,
In both tests the application pool is never recycled.
In test 1, the worker process memory usage grows to 1,300mb. (Above the memory limit of test 2)
In test 2, memory usage grows to 720mb.
In test 2, memory usage grows to 720mb.
In test 1, the unused memory allocated to .Net grows to 700mb
In test 2 it grows to 150mb.
This leads me to my question, does setting the physical memory limit in iis 7.5 cause the garbage collector to operate more aggressively?
If this is not the case what am I witnessing?

To truly find if garbage collection is being triggered, you could watch the perf counters on it.
At a minimum I'd watch the Gen 0 counter and compare the two scenarios.
http://msdn.microsoft.com/en-us/library/x2tyfybc.aspx

UNIX Domain sockets vs Shared Memory (Mapped File)

Can anyone tell, how slow are the UNIX domain sockets, compared to Shared Memory (or the alternative memory-mapped file)?
Thanks.

It's more a question of design, than speed (Shared Memory is faster), domain sockets are definitively more UNIX-style, and do a lot less problems. In terms of choice know beforehand:
Domain Sockets advantages
blocking and non-blocking mode and switching between them
you don't have to free them when tasks are completed
Domain sockets disadvantages
must read and write in a linear fashion
Shared Memory advantages
non-linear storage
will never block
multiple programs can access it
Shared Memory disadvantages
need locking implementation
need manual freeing, even if unused by any program
That's all I can think of now. However, I'd go with domain sockets any day -- not to mention that it's a lot easier then to reimplement them to do distributed computing. The speed gain of Shared Memory will be lost because of the need of a safe design. However, if you know exactly what you're doing, and use the proper kernel calls, you can achieve greater speed with Shared Memory.

In terms of speed shared memory is definitely the winner. With sockets you will have at least two copies of the data - from sending process to the kernel buffer, then from the kernel to the receiving process. With shared memory the latency will only be bound by the cache consistency algorithm between the cores on the box.
As Kornel notes though, dealing with shared memory is more involved since you have to come up with your own synchronization/signalling scheme, which might add a delay depending on which route you go. Definitely use semaphores in shared memory (implemented with futex on Linux) to avoid system calls in non-contended case.

Both are inter process communication (IPC) mechanisms.
UNIX domain sockets are uses for communication between processes on one host similar as TCP-Sockets are used between different hosts.
Shared memory (SHM) is a piece of memory where you can put data and share this between processes.
SHM provides you random access by using pointers, Sockets can be written or read but you cannot rewind or do positioning.

#Kornel Kisielewicz 's answer is good IMO. Just adding my own results here for sockets, not only Unix domain sockets.
Shared Memory
Performance is very high. No copies with RAW access data. Fastest access for sure.
Synchronization needed. Design not so easy to setup for complex cases.
Fixed size. Growing shared memory is doable but memory has to be unmapped first, growed, and then remapped.
Signaling mechanism can be quite slow, see here : Boost.Interprocess notify() performance. Especially if you want to do lots of exchanges between processes. Signaling mechanism not so easy to setup also.
Sockets
Easy to setup.
Can be used on different machines.
No complex synchronisation needed.
Size is not a problem if you use TCP. Simple design with header containing the packet size and then send the data.
Ping/Pong exchange is fast because it can be treated as hardware interruption by the OS.
Performance is average: a few copies of data are made.
High CPU consumption compared to shared memory. Sockets calls are not that cheap if you use them a lot.
In my tests, exchanges of small chunks of data (around 1MByte/second) shows no real advantage for shared memory. I would even say that ping/pong exchanges were faster using TCP (due to simple and efficient signaling mechanism). BUT when exchanging large amount of data (around 200MBytes/second), I had 20% CPU consumption with sockets, compared to 3% CPU using shared memory. So a huge win for shared memory in terms of CPU because read and write socket calls are not cheap.

In this case - sockets are faster. Writing to shared memory is faster then any IPC but writing to a memory mapped file and writing to shared memory are 2 completely different things.
when writing to a memory mapped file you need to "flush" what was written to the shared memory to an actual binded file (not exactly, the flush is being done for you), so you copy your data first to the shared memory, and then you copy it again (flush) to the actual file and that is super duper expansive - more then anything, even more then writing to socket, you are gaining nothing by doing that.

ASP.NET - Single large web request triggers System.OutOfMemoryException - Still have plenty of available memory

Environment:
Windows 2003 Server (32 bit); IIS6, ASP.NET 2.0 (3.5); 4Gb Ram; 1 Worker Process
We have a situation where we have a very large System.XmlDocument is being loaded into memory, and then it heads into a complied XSL transform.
What is happening is when a web request comes in the server is sitting in an idle state with 2500Mb of available system memory.
As the XML DOM is populated, the available memory drops approx 500Mb at which point we get a System.OutOfMemoryException event. At this point the system should theoretically still have 2000Mb of available memory available to service the request (according to Perfmon).
The related questions I have are:
1) At what level in the stack is this out of memory limitation being met? OS? IIS? ASP.NET? worker process? Is this a per individual web request limit?
2) Is this limit configurable somewhere?
3) Why can’t this web request access the full available system memory?

1) I would guess at the worker process but this should be configurable within IIS to the limit of memory that a worker process can use. Another factor is what level of bits does your software use, e.g. 32 bit has a physical limit of 4 GB since this is the total address space.
2) Probably but don't forget that memory fragmentation may play a role in getting to out of memory faster than you think, e.g. if there is a memory request for a contiguous 1000 Mb piece of memory then this may not necessarily be found in the current memory.
3) Have you examined dump data to see what is in the memory when the exception gets thrown? If not, there are ways to get a snapshot of the memory to see what it looks like as this may give you more clues about what is going on.

You are running in a process. A process can only access 2 gigs of memory. This task is sharing memory with everything else running in this process, so this bit of code does not get the full 2 gig -- even if it is available.
There is a 3 gig switch on the os as well. I believe it is a registry setting. But you will have to search MSDN to find that info.
But realistically, you need to do this another way. Possibly by switching to a SAX style xml parser.

I'm sure there are some bright heads here that can answer your specific questions, but have you asked yourself if there is another way to do what you want? I specifically mean that you probably do not want to process a very large XML document, but you probably more specifically want to return something back to the client. Could you rewrite the code to avoid this XML document altogether, or perhaps not load it all into memory at the same time, and still produce the same end-result?

1) Dunno. Check your logs.
2) IIS limits memory divvied out to websites/application pools. Check your settings.
3) Servers are all about uptime; if an single app hogs all the resources everybody else suffers. Thats why enterprise apps like IIS limit memory to prevent runaways from taking down the entire server.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex