Optimize mathematical library (libm) - math

Have anyone tried to compile glibc with -march=corei7 to see if there's any performance improvement over the version that comes by default with any Linux x68_64 distribution? GCC is compiled with -march=i686. I think (not sure) that the mathematical library is also compiled the same way. Can anybody confirm this?

Most Linux distributions for x86 compile using only i686 instructions, but asking for scheduling them for later processors. I haven't really followed later developments.
A long while back different versions of system libraries according to processor lines were common, but the performance differences were soon deemed too small for the cost. And machines got more uniform in performance meanwhile.
One thing that has to be remembered always is that today's machines are memory bound. I.e., today a memory access takes a few hundred times longer than an instruction, and the gap is growing. Not to mention that this machine (an oldish laptop, was top-of-the-line some 2 years back) has 4 cores (8 threads), all battling to get data/instructions from memory. Making the code run a tiny bit faster, so the CPU can wait longer for RAM, isn't very productive.

Related

Out-of-memory-error on Minecraft Server with 16G RAM

Please excuse my inexperience, this is my first time on the site.
I have a Dell PowerEdge r710 with 2 Xeon L5630 CPUs and 16G RAM installed. I'm trying to host a Minecraft 1.7.10 Forge Server that runs perfectly fine on my Desktop, but refuses to run properly on the server.
This machine is running Java 8, and runs perfectly otherwise. When running the application without the mods, it loads up without a hitch. As I add more mods, it gets worse. As far as my (very, very limited) knowledge goes, the order of JVM arguments doesn't matter, and didn't on my Desktop, but in order to get the application to even run I had to change the order in my .bat file. With all mods installed, the Out Of Memory Error occurs with a chunk loading error when around 41% spawn loaded.
This is the .bat file that I've made to start the server:
java -jar minecraft_server.jar -Xms512M -Xmx8192M nogui -XX:+HeapDumpOnOutOfMemory
This should load up perfectly fine, everything is compatible and tested on another machine, but the exact same setup will not run on the r710, saying Out Of Memory with more than double the desktop's allocated memory.
First you should use Task Manager or a similar utility to make sure that Java process indeed is using more then the amount you allocated with your arguments. Then I would recommend reading through this lovely post written by Cpw and posted on Reddit. If it doesn't help you with your current situation it should at least give you a bit more information on Minecraft's memory footprint.
In a normal situation where you would be running Minecraft as a local server from your computer I would suggest taking a look at how much memory your GPU is taking up. Since you are running a server this is not relevant, but might still be useful to someone who stumbles upon this post so I will leave it here:
Your graphics card is probably the biggest address hog. Today's graphics adapters often contain a gigabyte or more of RAM, and every one of those bytes needs an address. To be fair, I doubt that many of those multi-gigabyte graphics cards are in 32-bit PCs, but even a 512mb video card will take a sizeable bite out of 4GB.
I am not quite familiar with running dedicated servers but another important thing that is worth mentioning is that in case you are on a 32-bit operating system you will only be able to take advantage of 4GB of your RAM due to architecture constraints.
Every byte of RAM requires its own address, and the processor limits the length of those addresses. A 32-bit processor uses addresses that are 32 bits long. There are only 4,294,967,296, or 4GB, possible 32-bit addresses.
If all else fails you should try to seek help on one of the available Discord channels dedicated to Minecraft modding. This should be a rule in general actually, especially for general purpose problems that are difficult for others to reproduce. Here is a small list of three Discord communities dedicated to Minecraft modding that I have experience with:
Modded Minecraft - The one with most traffic so it can be a bit more difficult for your question to get noticed on busy days, but definitely the best moderated one from this list.
Modding Help - The smallest of the three. I don't have much experience with this one.
Mod Dev Cafe - This one has a decent size and a pretty good response rate, but be prepared for the usual facepalms and other unpleasantness common to younger admins and moderators. However if you are willing to look past that this is a good choice.

Will R take advantage of 64GB memory on Mac Pro running OSX?

I've been using R 3.1.2 on an early-2014 13" MacBook Air with 8GB and 1.7GHz Intel Core I7, running Mavericks OSX.
Recently, I've started to work with substantially larger data frames (2+ million rows and 500+ columns) and I am running into performance issues. In Activity Monitor, I'm seeing virtual memory sizes of 64GB, 32GB paging files, etc. and the "memory pressure" indicator is red.
Can I use the "throw more hardware" at this problem? Since the MacBook Air tops out at 8GB physical memory, I was thinking about buying a Mac Pro with 64GB memory. Before I spend the $5K+, I wanted to ask if there are any inherent limitations in R other than the ones that I've read about here: R Memory Limits or if anyone who has a Mac Pro has experienced any issues running R/RStudio on it. I've searched using Google and haven't come up with anything specific about running R on a Mac Pro.
Note that I realize I'll still be using 1 CPU core unless I rewrite my code. I'm just trying to solve the memory problem first.
Several thoughts:
1) Its a lot more cost effective to use a cloud service like https://www.dominodatalab.com (not affiliated). Amazon AWS would also work, the benefit of domino is that it takes the work out of managing the environment so you can focus on the data science.
2) You might want to redesign your processing pipeline so that not all your data needs to be loaded in memory at the same time (soon you will find you need 128 GB, then what). Read up on memory mapping, using databases, separating your pipeline into several steps that can be executed independent of each other, etc (googling brought up http://user2007.org/program/presentations/adler.pdf). Running out of memory is a common problem when working with real life datasets, throwing more hardware at the problem is not always your best option (though sometimes it really can't be avoided).

R Performance Differential (Solaris vs Windows)

I noticed an interesting problem. If I run the following code in R 2.12.0 (32-Bit) on a windows 3.00 gHz Core 2 Duo CPU with 2GB of RAM, it runs in less than one second. If I run it on a unix-box with sparc-sun-solaris2.10 (Also 32-Bit, though the unix box could run 64-bit) it takes 84 seconds. The processing speed of the unix box is 2.5 gHz. If I run top while the code is running, I noticed that my R process is only taking up to ~3.2% of available cpu states, even if more are available. Could this be part of the problem? I read the install manual, but nothing jumped out at me as the obvious solution to my problem. Is the unix operating system somehow limiting available resources while windows is not? Or, is there some preferable way to compile R from source that was not done? I apologize if I have not given enough information to answer the problem, this is not really my area of expertise.
t0 <- proc.time()[[3]]
x <- rnorm(10000)
for(i in 1:10000){
sd(x)
}
print(proc.time()[[3]]-t0)
Processors such as the T1 or T2 have a number of cores, and each core has a number of strands (hardware-level context switching). If you can run a multithreaded application, you'll get a large throughput. A typical intended use case would be a Java based web server, processing e.g. 20-40 connections at the same time.
The downside of this type of processors is that single threaded performance of these SPARC chips is quite low. It looks like Oracle is aware of the issue; the current development on T4 focuses on improving the single threaded speed.
The T1 processor exposes 32 logical CPUs to the operating system. If this was your case, and the displayed value was the percent of total computing power, 1/32 ~= 3.125%, which is close to what you saw.
To squeeze all the performance from a T1 processor, you need to make R use multiple CPUs, for example via the multicore package.

CPU usage different?

I have a basic question.
If I run an executable file (Release, Visual Studio 2010) on two computers with the same CPU speed run two different Windows operating systems, eg. Windws7 vs XP, shall I expect to see different CPU usages when I measure it using the task manager? Is the CPU speed the only factor to measuring the CPU usage?
Thanks.
Sar
Different OS's? Yes.
Operating Systems are the go-between between the programs you run and the bare-metal they run on. As OS'es change and evolve the naturally and and remove features that consume resources--these are things that run in the background; or they could be changes to the manner in which the OS speaks to the hardware.
Also, the measurement of CPU usage is done by the OS. There isn't a tachometer on chips saying "running at 87% of redline", but rather that "tach" is constructed largely by the OS.
After better understanding your situation: I would suggest taking a look at the Performance Monitor (perfmon.exe) which ships with both XP and Win7, and gets you much finer-grain detail about processor usage levels. Another (very good) option would be to consider running a profiler on your application on both OSes and compare the results. It would likely be the best option to specifically benchmark your application on both OSes.
Even on the same OS you should expect to see different usages, because there are so many factors that determine CPU usage.
The percentage of CPU usage listed in the task manager is not a very good indication of much of anything, except to say that a program either is, or is not using CPU. That particular statistic is derived from task switching statistics, and task switching is very sensitive to basically every single thing that's going on in a computer, from network access to memory speed to CPU temperature.

Does hyperthreading lead to unstable systems?

I'm building a PC with the new Intel I7 quad core processor. With hyperthreading turned on it will report 8 cores in Task Manager.
Some of my colleagues are saying that hyperthreading will make the system unreliable and suggest turning it off.
Can any of you good people enlighten me and the rest of the stockoverflow users.
Follow on: I've been using hyperthreading constantly, and its been spot on. No instability whatsoever. I'm using:
Microsoft Server 2008 64 bit
Microsoft SQL Server 2008 64 bit
Microsoft Visual Studio 2008
Diskeeper Server
Lots of controls (Telerik, Dundas, Rebex, Resharper)
Stability isn't likely to be affected, since the abstraction is very low level and the OS just sees it as another CPU to provide work to. However, performance is another matter.
In all honesty I can't say if this is still the case, but at least when the HT-enabled CPUs first came out, there were known problems with at least some applications. For example, MySQL, and multi-threaded apps like the Java application I support for my day job were known to have decreased performance when HT was enabled. We always recommended it be removed, at least for our particular use case of a server-side enterprise application.
It's possible that this is no longer an issue, and in a desktop environment this is less likely to be a problem for most use cases. The ability to split work on the CPU generally would lead to more responsive applications when the CPU is heavily utilized. However, the context switching and overhead could be a detrement when the app is already heavily threaded and CPU-intensive such as in the case of a database server.
Off the top of my head I can think of a few reasons your colleagues might say this.
Several articles about SQL performance suffering under hyperthreading. I believe it winds up doing too much context switchings or cache thrashing. can't remember exactly.
Early on going from single proc to multi-proc or more likely for most people hyperthreaded procs, brought many threading issues into the open. Race conditions, deadlocks, etc, that they never saw before. Even though its a code problem some people blamed the procs.
Are they making the same claims about multi-core/multi-proc or just about hyperthreaded?
As for me, I've been developing on a hyperthreaded box for 4 years now, only problem has been a UI deadlock issue of my own making.
Hyperthreading will mainly make a difference in the scheduler behaviour/performance when dispatching threads to the same CPU as opposed to different CPU...
It will show in a badly coded application that does not handle race conditions between threads...
So it is usually bad design/code.... that suddendly find a failure mode condition
Unreliable? I doubt so. The only disadvantage of hyperthreading that I can think of is the fact that if the OS is not aware of it, it might schedule two threads on one physical processor when other physical processors are idle which will degrade performance.
There was a problem with SQL server and hyperthreading for some queries because SQL server has its own scheduler, maxdop 1 would solve that
To whatever degree Windows is unstable, it's highly unlikely that hyperthreading contributes significantly (or it would have made big news by now.)
I've had a hyperthreading PC for a couple years now. Not that many cores, but it's worked fine for me.
Wish I had test data to prove your colleagues wrong, but it sounds like it's just my opinion versus theirs at this point. ;)
The threads in a hyperthreaded CPU share the same cache, and as such don't suffer from the cache consistency problems that a multiple cpu architecture can. Though, if the developer of a piece of software is programming with multiple cpus in mind, they will (or should) be writing with read semantics (iirc, that's the term). i.e. all writes are flushed from the cache immediately.
As far as I know, from the OS's point of view, it doesn't see hyperthreading as any different from having actual multiple cores. From the OS's point of view, there is no difference - it's isolated.
So, aside from the fact that hyperthreading's "extra cores" aren't "real" (in the strictly technical sense) and don't have the full performance of "real" CPU cores, I can't see that it'd be any less reliable. Slower, perhaps, in some rare instances, but not less reliable.
Of course, it depends on what you're running - I suppose some applications might get "down & dirty" with the CPU and hyperthreading might confuse them, but that's probably pretty rare.
I myself have been running a PC with hyperthreading for several years now, and I have seen no stability problems.
Sorry I don't have more concrete data!
I own an i7 system, and I haven't had any issues.
If it works w/ multiple cores, it works with hyperthreading.
The short answer: yes.
The long answer, as with almost every question, is "it depends". Depends on the OS, the software, the CPU revision, etc. I have personally had to disable hyperthreading on two occasions to get software working properly (one, with the Synergy application, and two, with the Windows NT 4.0 installer), but your mileage may vary.
As long as you get windows installed detecting multiple HT cores from the beginning (it loads some relevant drivers and such), you can always disable (and re-enable) HT "after the fact". If you have bizarre stability issues with specific software that you can't resolve, it's not hard to disable HT to see if it has any impact.
I wouldn't disable it to start with because, frankly, it will probably work fine in 99.99% of your daily use. But be aware that yes, it can occasionally cause bizarre behaviors, so don't rule it out if you happen to be troubleshooting something very odd down the road.
Personally, I've found that hyperthreading, while not causing any problems, doesn't actually help all that much either. It might be like having an extra .1 of a processor. On my HT machine at work, I only very seldomly see my CPU go above 50%. I don't know if HT has gotten any better with newer processors like the i7, but I'm not optimistic.
Other than hearing a few reports about SQL Server, all I can report is positive. I get about 25% better performance on heavy multi-threaded apps with HT on. Have never run into a problem with it, and I'm using a first generation HT processor...
Late to the party, but for future referrence;
I'm currently having an issue with this with SQLServer. Basically, my understanding is Hyperthreading on the same processor shares the same L1 & L2 cache, which can cause issues between the two. Citrix also appears to have this problem from what I'm reading.
Slava Ok wrote a good blog post on it.
I'm here very late but found this page via Google. I may have discovered a very subtle problem. I have a i7 950 running 2003 Server and it's great. Initially I left hyperthreading on in the BIOS, but during some testing and pushing things hard, I ran a program called "crashme" by Carrette. This program tries to crash an OS by spawning a process and feeding it garbage to try and run. My dual Opteron setup ran it forever without a problem, but the 950 crashed within the hour. It didn't crash for anything else unless I did something stupid, so it was very surprising. On a whim I turned off HT and ran the program again. It runs all night, even multiple instances of it. One anecdote doesn't mean much, but try it and see what happens. Also, it seems that the processor is slightly cooler at any given load if HT is turned off. YMMV.

Resources