As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I am a Computational Fluid Dynamist (CFD), but I dont know mpi very well.
Since heavy CFD jobs requires Infiniband support, and people say that mvapich is usually much better than other mpi implementations. I want to know is this true? Any one has any real experience and references that I can look at? And how come that mvapich is better than the openmpi, etc.? Is it written by Infiniband company or what?
Thanks a lot!
So the answer is "probably not, and it doesn't matter anyway".
You write your code using the MPI API, and you can always install multiple MPI libraries and test against each, as you might with several LAPACK implementations. If one's consistently faster for your application, use it. But the MPI community is very concerned with performance, and the free competitors are all open source and publish their methods in papers, and all publish lots of benchmarks. The friendly rivalary, combined with the openness, tends to mean that no implementation has significant performance advantages for long.
On our big x86 cluster we've done "real world" and micro benchmarks with MPICH2, OpenMPI, MVAPICH2, and IntelMPI. Amongst the three open-source versions, there was no clear winner; on some cases one would win by 10-20%, on others it would lose by the same amount. On those few occassions where we were interested enough to dig into the details to find out why, it was often just a matter of defaults for things like eager limits or crossover between different collective algorithms, and by setting a couple of environment variables we got performance within noise between them. In other cases, a performance advantage persisted but was either not large enough or not consistent enough for us to investigate further.
(IntelMPI, which costs significant amounts of money, was noticibly and mostly-consistently faster, although what we consider the big win there was substantially improved startup times for very large jobs.)
MVAPICH was one of the first MPI implementations to really go after Infiniband performance, after having lots of experience with Myrinet, and they did have a significant advantage there for quite some time, and there are probably benchmarks in which they still win; but ultimately there was no consistent and important performance win and we went with OpenMPI for our main Open Source MPI option.
I would agree with Jonathan regarding the answer and add a few points from a cluster administration perspective.
As a person who at times dips into cluster administration I would add that tuning InfiniBand on a large cluster is not an easy task. You have to make sure that the OFED stack sits well upon your kernel. That the hardware is not faulty and the switches are working as expected without performance issues in a sustained mode and the application maps correctly onto the InfiniBand topology and lots more.
OpenMPI stack is considerably different from MPICH/MVAPICH. I find that OpenMPI component architecture makes it easier to find and debug issues than the architecture of MPICH/MVAPICH which I find more monolithic.
Speaking of vendors recall that MPICH comes from the MCS department of Argonne.
Update: Since version 3.1 MPICH supports OFED InfiniBand via the ib network module. Since 3.2 MPICH will support also the Mellanox MXM interface.
MVAPICH is built on top of MPICH sources by the people from the department of CS&E at Ohio State.
Many hardware vendors build either on top of MPICH or MVAPICH to provide InfiniBand support for their respective HW. One example is Intel MPI. Another is Voltaire MPI.
OpenMPI is developed by several teams supported by InfiniBand switch vendors like Cisco.
HP MPI used to be another very good MPI implementation for generic clusters that is currently available from Platfrom.
CFL codes don't scale well.
I can't speak directly to MVAPICH2, but I would recommend using whatever MPI is native to your cluster. So if you are using a Cray machine, you would go with Cray's MPI. It works like magic. Using your vendors recommended mpi makes a significant difference.
To directly answer your question, if your message size falls in the short range MVAPICH2 has a sweet spot where it beats OpenMPI. I think your CFL codes may fall in this range. On large clusters I have found that something goes wrong with MVAPICH2 with latency values when operating on over 2k PEs - but people don't run CFL on 2k PEs.
Ultimately, there is sufficient motivation to test this theory. Which code are you running OpenFOAM, Fluent?
Related
I need to make an internal application for personal use where i can write with pen on paper, and it can be stored digitally.
This must have minimal hardware requirement, to avoid extra cost.
What are the services and devices available for building such application.
Introduction of tablets, mobile, touch screen devices will overshoot the cost.
How this can be implemented in cost effected way.
Assuming platform and language agnosticism:
Hardware: Raspberry Pi provides an open-source, bare-bones, inexpensive solution in the $5-30 (USD) range. You would, however, need to purchase appropriate peripherals on an as-needed basis (e.g., camera, enclosure, power-source, etc).
Software: The Python programming language features a wealth of robust libraries that would help you accomplish such a computer vision application without having to bother learning the nuts and bolts of computer vision algorithms. I recommend searching the Stack Exchange for inquiries into the applicability for OpenCV toward handwriting recognition. Here are some resources to get you started:
OpenCV Documentation
A research paper exploring the idea in depth
Video demonstration
Another academic paper
Remember, there might be many ways to get the job done. Each approach carries merits and faults in terms of performance and practicality, and the question is likely still an area of very active research (machine learning, neural networks). My suggestion is that you weigh your priorities carefully and proceed accordingly (i.e., value learning experience or getting the job done?). I'll try to tag the question that might attract more seasoned, precise answers for software implementation.
X86 and AMD64 are the most important architectures for many computing environments (desktop, servers, and supercomputers). Obviously a JIT compiler should support both of them to gain acceptance.
Until recently, the SPARC architecture was the logical next step for a compiler, specially on high-end servers markets. But now that Sun is dead, things are not clear.
Oracle doesn't seem to be really interested in it, and some big projects are dropping support for that architecture (Ubuntu for example). But on the other hand, the OpenSPARC initiative intended to open source recent processors is quite promising, meaning that a lot of manufacturers could implement and use SPARC for free in the near future.
So, is SPARC still a good choice as the next target architecture for a JIT compiler? Or is it better to choose another one (POWER, ARM, MIPS, ...)?
I don't know any more than you about SPARC's future. I hope it has one; it's been tragic how many good architectures have died out while x86 has kept going.
But i would suggest you look at ARM as a target. It isn't present in big server hardware, but it's huge in the mobile market, and powers all sorts of interesting little boxes, like my NAS, my ADSL router, and so on.
Your next target architecture should definitely be ARM - power consumption in large datacenters is a huge issue and the next big thing will be trying to reduce that by using low-power CPUs; see Facebook's first attempt on this.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I understand the basics of networking such as Lan and stuff. I know what many of the protocols are and how to build a client/server socket program in C. But what I really want is a very good understanding of how networks actually work. Not only from a programming aspect but also from a application aspect. I am looking for some material(preferably a book) which will give me a very good foundation to go off of. I am in the middle of wanting to be a programmer or a UNIX admin, so I really should learn and know how to apply networking fundamentals.
Does any such a concise resource exist? Would it be better going the more academic route by buying a networking book(such as those from Tanenbaum or Kurose), or is it better to go the It route possibly looking into network admin text or certification books.
Thank you all so much.
Here is the way I would recommend:
Learn how Internet Evolved, this would give you the reason why they needed it
Learn the different protocols - HTTP, telnet, ssh esp the secure ones SFTP, HTTPS etc
Learn what are sockets and types of sockets
Learn how you can do socket programming. I suggest you to use Python sockets to do the programming
Learn the TCP/IP network stack. That would be beneficial
Learn how routing works, this is important for learning
Try to have a sound knowledge of topics like DNS, it is very important
Get VirtualBox and install various OS and try to internetwork them. Play around with the networking stack of the OS.
The late Richard Stevens' book is a masterpiece -- much more practical and immediately applicable than Tanenbaum's (haven't studied Kurose). Btw, by the same author, I just as warmly recommend the books in the "TCP/IP Illustrated" series, and Advanced Programming in the Unix Environment -- few books are more crucial to the wanna-be "programmer or Unix admin", save perhaps ESR's!
Its kind of unclear to me what exactly you're looking for, so I'm just going to throw this out there:
Start networking your own stuff together. Create a LAN. Go figure out how to create and manage a Linux firewall instead of a consumer one. Install Active Directory just for grins. Run your own DHCP and DNS servers on that Active Directory server.
Once you get that far, if you're still interested, start thinking about how you would plan your LAN if you had 500 computers. Learn about Virtual LANs (VLANs).
I think networking in particular is a great place to start tinkering because A) no one gets hurt, B) its mostly free.
Whoa .. networking is a seriously big field. To truly understand everything will require a PHD, or several PHD's.
Here are some of the aspects I think you need to learn.
1) You need to learn the history of networking. Many of the policies built into protocols were made due to limitations of the time. Learn the history of protocols to learn the "Why" of how it works.
2) Programming is an excellent source of knowledge on how a network works on the lowest level. Learn to write some socket code in C. BSD sockets is a good place to start. You can find alot of references for BSD sockets on the Internet.
3) *nix commands offer a wealth of knowledge on configuring and managing networks. Good network admins know a lot of tricks on how to build complicated networking operations using just the most basic network tools. The GNU networking tools is a good place to start.
4) If your up to it, there are several certificates like MCSE and CCNA which have modules on networking. These papers can be useful to gain knowledge on a particular type of network. I learnt alot about windows NT domain models from sitting for the MCSE paper even though i never really played around with domains much.
There are more aspects. Ask yourself, which do you like more ?
a bit of personal experience.
I have worked as a software developer for 10 years. I am also the "unpaid" network guru in my office. Somehow , i have to wear more than 1 hat as a developer because networking is part of the software i work on.
For fundamentals, you may want to get the W. Richard Stevens Classic, TCP/IP Illustrated, and possibly his other books as well. There will not be any more of them, either.
It sounds like the kind of understanding you're looking for is the kind that can only really be reached through experience. Each and every person will have a different way of looking at things, depending on what makes sense to them -- explanations can help, but there's no substitute for learning by actually solving problems.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
What could be the optimum programming languages (Perl, PHP, Java, Python or something else) to implement server (multi-threaded?) using tcp/ip socket serving like 1000's of clients with the streaming data?
Using C/C++ with libevent, we were streaming 800mbps sustained to 30,000 active connections (two four core processors, 7 threads each running one event loop). Erlang is a reasonable choice too. It is far safer against programmer errors. But it cannot keep pace to event driven c/c++ ... been there and had to rewrite (hint, erlang is written in c).
Python with Twisted Framework
www.twistedmatrix.com
Java with XSocket or Apache Mina Frameworks (which Red5 Flash/video streaming media sever based on)
mina.apache.org
xsocket.sourceforge.net
They all are multithreaded , easy and very powerful.
Erlang of course :-) But then again, your requirements are not clear ;-)
It was designed from ground up to handle multi-threaded networking applications. It's origin comes from Ericsson: they use Erlang in (some of) their networking products.
This doesn't precisely answer this question, but it will help answer future questions. The problem of connecting thousands of clients to the same server is known as the c10k problem. There you will find lots of answers and helpful information about setting up that kind of server.
based on the sparse information given I would say either c or erlang
What language are you most familiar with? What kind problem set do you have? A lot depends on these questions. Most popular programming languages have good documentation for doing socket programmimng. It depends on tastes. I prefer the C programming language. I'm sure some people will also chime to offer Erlang as a good language to use. Again, it depends.
Apple already sells an optimum multi-threaded streaming media server.
http://www.apple.com/quicktime/streamingserver/
You might be able to buy it and save yourself a lot of work.
I can't tell from your question what you're trying to do, but buying a solution is usually optimal.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I am currently looking at a distributed cache solution.
If money was not an issue, which would you recommend?
www.scaleoutsoftware.com
ncache
memcacheddotnet
MS Velocity
Out of your selection I've only ever attempted to use memcached, and even then it wasn't the C#/.NET libraries.
However memcached technology is fairly well proven, just look at the sites that use it:
...The system is used by several very large, well-known sites including YouTube, LiveJournal, Slashdot, Wikipedia, SourceForge, ShowClix, GameFAQs, Facebook, Digg, Twitter, Fotolog, BoardGameGeek, NYTimes.com, deviantART, Jamendo, Kayak, VxV, ThePirateBay and Netlog.
I don't really see a reason to look at the other solution's.
Good Luck,
Brian G.
One thing that people typically forget when evaluating solutions is dedicated support.
If you go with memcached then you'll get none, because you're using completely open source software that is not backed by any vendor. Yes, the core platform is well tested by virtue of age, but the C# client libraries are probably much less so. And yes, you'll probably get some help on forums and the like, but there is no guarantee responses will be fast, and no guarantee you'll get any responses at all.
I don't know what the support for NCache or the ScaleOut cache is like, but it's something that's worth finding out before choosing them. I've dealt with many companies for support over the last few years and the support is often outsourced to people who don't even work at the company (with no chance of getting to the people who do) and this means no chance of getting quality of timely support. On the other hand I've also dealt with companies who'll escalate serious issues to the right people, fix important issues very fast, and ship you a personal patch.
One of those companies is Microsoft, which is one of the reasons that we use their software as our platform. If you have a production issue, then you can rely on their support. So my inclination would be to go with Velocity largely on this basis.
Possible the most important thing though, whichever cache you choose, is to abstract it behind your own interface (e.g. ICache) which will allow you to evaluate a number of them without holding up the rest of the development process. This means that even if your initial decision turns out not to work for you, you can switch it without breaking much of the application.
(Note: I'm assuming here that all caches have sufficient features to support what you need from them, and that all caches have sufficient and broadly similar performance. This may not be a valid assumption, in which case you'll need to provide more detail in your question as to why it isn't).
You could also add Oracle Coherence to your list. It has both .NET and Java APIs.
From microsoft : App fabric
Commerical : NCache
Open source : RIAK
We tried a couple in the end we use the SQL session provider for asp.net/mvc yes there is the overhead of the connection to the DB but our DB server is very fast and the web farm has loads of capacity so not an issue.
Very interested in RIAK has .net client and used by Yahoo - can be scaled to many manu server