Message passing solution - mpi

I am creating an application involving concurrent actors which communicate through pre-specified FIFO message queues (Essentially a Kahn process network ). Actors do not (MUST not) share memory.
I am relatively inexperienced in this field and in this regard I would like to know if third party message passing libraries (eg. MPI implementations - open-MPI) over significant advantages over linux message queues which I am somewhat familiar with.
I do not need to support operating systems other than linux or languages other than C/C++. The application should take advantage of a multi-processor system, however the processes will reside on a single computer system and will not be distributed over a network.

Related

File management systems: device drivers and basic file systems

Page 526 of the textbook Operating Systems – Internals and Design Principles, eighth edition, by William Stallings, says the following:
At the lowest level, device drivers communicate directly with peripheral devices or their controllers or channels. A device driver is responsible for starting I/O operations on a device and processing the completion of an I/O request. For file operations, the typical devices controlled are disk and tape drives. Device drivers are usually considered to be part of the operating system.
Page 527 continues by saying the following:
The next level is referred to as the basic file system, or the physical I/O level. This is the primary interface with the environment outside of the computer system. It deals with blocks of data that are exchanged with disk or tape systems.
The functions of device drivers and basic file systems seem identical to me. As such, I'm not exactly sure how Stallings is differentiating them. What are the differences between these two?
EDIT
From page 555 of the ninth edition of the same textbook:
The next level is referred to as the basic file system, or the physical I/O level. This is the primary interface with the environment outside of the computer system. It deals with blocks of data that are exchanged with disk or tape systems. Thus, it is concerned with the placement of those blocks on the secondary storage device and on the buffering of those blocks in main memory. It does not understand the content of the data or the structure of the files involved. The basic file system is often considered part of the operating system.
Break this down into layer:
Layer 1) Physical I/O to a disk requires specifying the platter, sector and track to read or write to a block.
Layer 2) Logical I/O to a disk arranges the blocks in a numeric sequence and one reads or writes to a specific logical block number that gets translated into into the track/platter/sector.
Operating systems generally have support for a Logical I/O and physical I/O to the disk. That said, most disks these days do the logical to physical translation. O/S support for that is only needed for older disks.
If the device supports logical I/O the device driver performs the I/O. If the device only supports physical I/O the device driver usually handles both the Logical and Physical layers. Thus, the physical I/O layer only exists in drivers for disks that do not do logical I/O in hardware. If the disk supports logical I/O, there is no layer 1 in the driver.
All of the above is what is appears the your first quote is addressing.
Layer 3) Virtual I/O writes to a specific bytes or blocks (depending upon the O/S) to a file. This layer is usually handled outside the device driver. At this layer there are separate modules for each supported file system. Virtual I/O requests to all disks using the same file system go through the same module.
Handling Virtual I/O requires much more complexity than simply reading an writing disk blocks. The virtual I/O layer requires working with the underlying disk file system structure to allocate the blocks to a specific file.
This appears to be what is referred to in the second quote. What is confusing to me is why it is calling this the "physical I/O" layer instead of the "virtual I/O" layer.
Everywhere I have been Physical I/O and Logical I/O are the writing of raw blocks to a disk without regard to the file system on the disk.

Intel SGX Threading and vs TCS

I'm trying to understand the difference between SGX threads enabled by TCS and untrusted threading provided by SDK.
If I understand correctly, TCS enables multiple logical processors to enter the same enclave. Each logical processor will have its own TCS and hence its own entry point (the OENTRY field in TCS). Each thread runs until an AEX happens or reaches the end of the thread. However, these threads enabled by TCS have no way to synchronize with each other yet. At least, there is no SGX instruction for synchronize.
Then, on the other hand, the SGX SDK offers a set of Thread Synchronization Primitives, mainly mutex and condition variable. These primitives are not trusted since they're eventually served by OS.
My question is, are these Thread Synchronization Primitives meant to be used by TCS threads? If so, wouldn't this deteriorate the security? The OS is able to play with scheduling as it wishes.
First, let us deal with your somewhat unclear terminology of
SGX threads enabled by TCS and untrusted threading provided by SDK.
Inside an enclave, only "trusted" threads can execute. There is no "untrusted" threading inside an enclave. Possibly, the following sentence in the SDK Guide [1] misled you:
Creating threads inside the enclave is not supported. Threads that run inside the enclave are created within the (untrusted) application.
The untrusted application has to set up the TCS pages (for more background on TCS see [2]). So how can the TCS set up by the untrusted application be the foundation for trusted threads inside the enclave? [2] gives the answer:
EENTER is only guaranteed to perform controlled jumps inside an enclave’s code if the contents of all the TCS pages are measured.
By measuring the TCS pages, the integrity of the threads (the TCS defines the allowed entry points) can be verified through enclave attestation. So only known-good execution paths can be executed within the enclave.
Second, let us look at the synchronization primitives.
The SDK does offer synchronization primitives, which you say are not to be trusted because they are eventually served by the OS. Lets look at the description of these primitives in [1]:
sgx_spin_lock() and unlock operate solely within the enclave (using atomic operations), with no need for OS interaction (no OCALL). Using a spinlock, you could yourself implement higher-level primitives.
sgx_thread_mutex_init() also does not make an OCALL. The mutex data structure is initialized within the enclave.
sgx_thread_mutex_lock() and unlock potentially perform OCALLS. However, since the mutex data is within the enclave, they can always enforce correctness of locking within the secure enclave.
Looking at the descriptions of the mutex functions, my guess is that the OCALLs serve to implement non-busy waiting outside the enclave. This is indeed handled by the OS, and susceptible to attacks. The OS may choose not to wake a thread waiting outside the enclave. But it can also choose to interrupt a thread running inside an enclave. SGX does not protect against DoS attacks (Denial of Service) from the (potentially compromised) OS.
To summarize, spin-locks (and by extension any higher-level synchronization) can be implemented securely inside an enclave. However, SGX does not protect against DoS attacks, and therefor you cannot assume that a thread will run. This also applies to locking primitives: a thread waiting on a mutex might not be awakened when the mutex is freed. Realizing this inherent limitation, the SDK designers chose to use (untrusted) OCALLs to efficiently implement some synchronization primitives (i.e. non-busy waiting).
[1] SGX SDK Guide
[2] SGX Explained
qweruiop, regarding your question in the comment (my answer is too long for a comment):
I would still count that as a DoS attack: the OS, which manages the resources of enclaves, denies T access to the resource CPU processing time.
But I agree, you do have to design the other threads running in that enclave with the awareness that T might never run. The semantics are different from running threads on a platform you control. If you want to be absolutely sure that the condition variable is checked, you have to do so on a platform you control.
The sgx_status_t returned by each proxy function (e.g. when making an ECALL into an enclave) can return SGX_ERROR_OUT_OF_TCS. So the SDK should handle all threading for you - just make ECALLs from two different ("untrusted") threads A and B outside the enclave, and the execution flow should continue in two ("trusted") threads inside the enclave, each bound to a separate TCS (assuming 2 unused TCS are available).

cluster vs Grid vs Cloud

There are two questions:
1) What is the difference between cluster and Grid
2) What is the Cloud
I am not looking for conceptual definitions,
I found a lot of that by googling but the problem is I still do not get it.
so I believe the answer I seek is different. From what I could re-search online I start to think that
many article writers who is trying to explain this either do not understand this deep enough themselves
or not able to explain their knowledge for an average guy like myself (which is common issue with very technical people).
Just to let you know my level: I am a computer programmer, .NET and LAMP, I can do basic admin on both
Linux flavors and Windows, I have hands on experience with Hyper-V and now researching Xen and XCP
to setup a test cloud based on two computers for learning purposes.
Below info you do not have to read, it is just my current understanding of cluster,grid and cloud it
just to support my two questions because I thought it would help to understand
what kind of mess is in my head right now and what answers I am looking for.
Thank you.
Two computers used for reference in my statements are "A" and "B"
specs for A: 2 core intel cpu, 8GB memory , 500gb disk
specs for "B": 2 core intel cpu, 8GB memory , 500gb disk,
Now I would like to look at A and B roles from Cluster, Grid and from Cloud angle.
Common definitions between Grid and Cloud
1) cluster or Grid are 2 or more computers hooked up together, on hardware level
they are hooked up though network cards and on a software level
it is using some kind of program implementing message passing interface
to make it possible to send commands between nodes.
2) cluster or Grid do NOT combine CPU power or memory between nodes, meaning
that in this simulation a FireFox browser running on A still has only one 2 cores cpu,
8GB memory and 500gb available.
Differences between Grid and Cloud:
1) Cluster only provides fail over part, if A node breaks while FireFox is running
the cluster software will re-start FireFox process on node B.
2) Grid however is able to run a software in parallel on multiple nodes at the same time
provided that software is coded with MPI in mind. It can also lunch any software on any node
on demand (even if it is not written for MPI)
3) Grid is also able to combine different type of
nodes, Linux Server, Windows XP, Xbox and Playstation into one Grid.
Cloud definition:
1) Cloud is not a technical term at all, it is just a short convenient word to describe
a computer of unlimited resources, it can aslo be called a Supercomputer, a Beast, an Ocean or Universe but someone
said "Cloud" first and here we are.
2) Cloud can be based on Grids or on Clusters
3) From technical point of view Cloud is a software to combine hardware resources into one,
meaning that if I install Cloud software on Grid or Cluster then it will combine A and B
and I will get one Cloud like this: 4 core CPU, 16gb memory and 1000gb disk.
edited: 2013.04.02
item 3) was a complete nonsense, cloud will NOT combine resources from many nodes into one huge resource, so in this case there will be no 4 core CPU, 16gb memory and 1000gb cloud.
Grid computing is designed to parcel out large workloads to many participating grid members--through software on each member which is expecting to hear that request for computation or for data, and to reply with it's small piece of the overall puzzle. Applications must be written specifically for this approach to problem-solving. It can be heterogeneous because it's not the OS that matters but the software waiting to hear problem-solving requests.
The expectation of a cluster is that it can run the same executable image across any member node--any node can execute that code--which is what drives its requirement for homogeneity. You can write cluster-aware code which distributes workload throughout the cluster, but again you have to write your code to be cluster aware in order to take advantage of more than the redundancy features of a cluster. As most application vendors do not write cluster-aware code, the simple redundancy feature is all that's commonly used in cluster deployments, but that does not limit the architecture. Clusters can and do share their resources, and can collaborate on tasks simultaneously.
Cloud, as it's commonly defined is neither of these, precisely, but it doesn't preclude them, either. Cloud computing assumes the ability to deploy an application without advanced knowledge of it's underlying operating system, or even control of that operating system, coupled with the ability to expand or reduce the processing and memory footprint available to that application without having to destroy and recreate that environment--all done with enough isolation that the application won't know or be able to know what other applications might be installed or running on it's shared infrastructure, unless that access is approved-of by both application managers.
I would like to answer my question before this is closed as a duplicate because I believe it can be very frustrating to find correct info in regards to clusters,grids and clouds and I think this post can save time for many. If someone wants to challenge it please do so, otherwise I will mark it as answer in 1 week.
1) There are many differences and there are none, it really depends on the technical context but
generally you can connect several nodes and call it a Grid or you can call it Cluster. I would say Grid is a Cluster with extended capabilities, such as ability to connect heterogeneous nodes. Both Grid and Cluster will serve as scale-out platform equally good. From Network Engineer and Programmer perspective the difference in implementation or coding will be pretty big if Gird connects heterogeneous nodes.
2) Now the first question was actually a prelude for second one and I believe it is best answered by
Matt Joyce in this post:
https://stackoverflow.com/a/15286488/2230126
I'll take a crack at it. I have been collecting and saving my notes, scripts, and programs since the year 2002 A.D. This is a chop and paste of my statements over the years. Here is a brain friendly memorization list:
The grid is the hardware and hardware specifications.
a. You plug into the router or switch and setup IP addresses and top-level domains over the internet (which is also known as ICANN).
b. This is like OSI level 1, 2, and 3.
The cluster is the kernel (software ring 0 or 1 if its a virtual type thing going on).
a. The kernel is configured (compiled) to run a network stack that can handle sessions, permission, and account authentication.
b. You set up port to port communications usually over TCP/IP (like in the OSI model).
c. You setup iptables, pf, arp, and other OS level applications or shared objects.
d. You can setup ssh, kerberos, ldap, or some other PKI-database and protocol-socket combo.
e. This is like OSI level 4, 5, and 6.
The cloud is user-space applications.
a. The application processes talk to other application-processes within the cluster.
b. You setup process level permissions (via files, cgroups, and/or user-groups).
c. You setup mysql, redis, riak, Message Brokers, hadoop, apache, nginx, cron, java, haskell, erlang, and etcetera.
d. This is like OSI level 7.
The cloud floats over the cluster that grows from the grid. And actually visually think, cloud in the air, cluster in tree, and grid on the ground. Most of us creative types (which make all these technologies) are visual thinkers that can back it up with mathematical data and code. So always see if you can answer the riddle and correlate technological facsimiles to our physical realm here on Earth.
Intro
Grid, Cluster, and Cloud are three different words that mark their specific time in history. Their definitions have intersecting traits and they are modernly interchangeable. You just need to know when to apply the correct or associated word. For example, I was talking to some older M.D.s (medical doctors) and they wanted to know what the cloud was. So I told them that the cloud was a computer cluster that you rent over the internet. And Bingo, they got the idea within 10 seconds.
I will use a little bit of history in chronological prose.
Grid
The term grid is first used to represent one resource that is repeated across terrestrial landscape or space. The term is frequently used during the distribution of telegraphs where repeaters had to be placed on poles every N radii (plural for radius) to amplify the signal. Another example is the electrical grid that Thomas Edison and Nikola Tesla competitively started spreading around the Earth. Computers got really popular and they soon were expanded across The Grid to replace human telegraph (and telephone) operators.
The Grid is now a bunch of computers that can connect and terminate communication channels. The Grid is an infrastructure of computers that function for one goal which is the run assembly (or binary) code.
Cluster
Farseeing the power of computers and actually witnessing computers win wars (Turing's machine), DARPA (or ARPA which is the U.S.A. Military) stepped in.
DARPA started commissioning universities and colleges to utilize the Grid for multi-plexing communication methods (that use baud and protocols). Universities and colleges started making protocols to separate the different tasks that they wanted to carry out over the Grid and target the computers. That started the modern internet. In-house testing clusters were established in laboratories to simulate the grid. Clusters are great for orchestration. A job can be sub-divided over all or some of the slaves within a cluster. The military utilized the college and university's findings and applied the SOFTWARE to the Grid. There were some gotchas with clusters:
Must be same (or near same) hardware
Must have same operating system
The rules were strict because all the instruction-sets had to be the same passing over the CPUs. Clusters usually had a master and slave type relationship. A Cluster usually ran one unic (or unix) job at a time. Clusters had job-schedulers. Then clusters got more complex because hardware manufacturers started making parallel chip architectures (on top of the Von Neumann arch).
Clusters become more powerful. The Clusters inherited more complexity and people were doing more creative things. Cluster could now do different jobs, tasks, processes, asynchronously processes, synchronized processes, and many more interesting things. One box (or computer node) could run more jobs. Now the Grid could be used for multiple purpose. The rate of software updates on clusters was faster than the actual grid. Clusters were deployed locally on campuses. Clusters started superseding the grid because you could directly produce a public facing stack that out-performed the (national) grid.
My Experience
I went to college during the late 1990s and 2000s and cluster was the word for a physical laboratory of multiple computers working as one virtual computer. Clusters were used for testing. Once your software worked on the cluster, then you could mv (move) it to the production grade Grid. Then I witness network worms and computer viruses control zombie computers. These swarm of zombies could be used as one gigantic virtual cluster used to run commands. Well programmers started DIY (do it yourself) protocols and software like bit-torrent and Napster.
So leaping forward into the future, testing cluster softwares are starting to be replaced by Solaris jails, FreeBSD jails, Linux containers, QEMU, hyper-visors, VMWare, VirtualBox, Vagrant, and Docker.
Cloud
Cloud is a marketing term used to umbrella the hardware of different grids and the software of those clusters. Cloud is one big ubiquitous word used to advertise, promote, and profess all that cluster technology for monetary gains. Cloud is also an effort to wrap all those technologies under one singular word. The Cloud allows multi-tenanted processes to share a gigantic grid. The Cloud maximizes efficiency by sub-dividing the electricity, CPU, RAM, DISK, Electricity, and broadband which gets shared and paid for by consumers. A side effect is that those consumer subscriptions and/or pay-rates started producing profit. The Cloud also allows multiple users to install multiple operating systems that run multiple processes all in the software. So now we have acronyms like IaaS, PaaS, and SasS. The Cloud can replace the start-up cost that was once so darn difficult to fund and bootstrap. The Cloud is a great solution for mock testing your software and building a consumer base for your business.
From another perspective, the Cloud triggers the brain of non-programmers to think a certain way. For example, the human resource department can comprehend and isolate what is presented in-front of them.
So if you got the money, then you can purchase your share of the cloud experience and have easy support along with it. But if you have the skill-set, the time, the quick know-how, and the ability to install your own servers at co-locations, then do that because it is cheaper over the long run.
That is my narrative on the Grid vs Cluster vs Cloud.
I think this link well compared the Cluster and Grid.
As I know, there are some exceptions in the case of Clusters. YARN (Yahoo!) tries to handle mutli-tenancy and distributed scheduling. Also Corona (Facebook) has distributed scheduling.

MPI and OpenMP on Desktop CPUs

I was just wondering how is it possible that OpenMP (shared memory) and MPI (distributed memory) could run on normal desktop CPUs like i7 for example. Is there some kind of a virtual machine that can simulate shared and distributed memory on these CPUs? I am asking it because when learnig OpenMP and MPI, the structures of supercomputers is shown, with shared memory or different nodes for distributed memory, each node with its own processor and memory.
MPI assumes nothing about how and where MPI processes run. As far as MPI is concerned, processes are just entites that have a unique address known as their rank and MPI gives them the ability to send and receive data in the form of messages. How exactly are the messages transfered is left to the implementation. The model is so general that MPI can run virtually on any platform imaginable.
OpenMP deals with shared memory programming using threads. Threads are just concurrent instruction flows that can access a shared memory space. They can execute in a timesharing fashion on a single CPU core or they can execute on multiple cores inside a single CPU chip, or they can be distributed among multiple CPUs connected together by some sophisticated network that allows them to access each others memory.
Given all that, MPI does not require that each process executes on a dedicated CPU core or that millions of cores should be necessarily put on separate boards connected with some high speed network - performance does, as well as technical limitations. You can happily run a 100 processes MPI job on a single CPU core though performance would be very very bad but it will still work (given enough memory is available). The same applies to OpenMP - it does not require that each thread is scheduled on a dedicated CPU core but doing so gives the best performance.
That's why MPI and OpenMP are called abstractions - they are general enough that the execution hardware can vary greatly while source code is kept the same.
A modern multicore-CPU-based PC is a shared-memory computer. It is a sensible approximation to think of each core as a processor, and that they all have equal access to the same RAM. This approximation hides a lot of details of processor and chip architectures.
It has always (well, perhaps not always, but for almost as long as MPI has been around) been possible to use message-passing (of which MPI is one standard) on a shared-memory computer so that you can run the same MPI-enabled program as you would on a genuinely distributed-memory machine.
At the application level a programmer only cares about calls to MPI routines. At the systems level the MPI run-time translates these calls into, well on a cluster or supercomputer, into instructions to send stuff over the interconnect. On a shared-memory computer it could instead translate these calls into instructions to send stuff over the internal bus.
This is by no means a comprehensive introduction to the topics you've raised, but that's what Google and all the published sources out there are for.

MPI overhead in shared memory setup

I want parallelize a program. It's not that difficult with threads working on one big data-structure in shared memory.
But I want to be able to use distribute it over cluster and I have to choose a technology to do that. MPI is one idea.
The question is what overhead will have MPI (or other technology) if I skip implementation of specialized version for shared memory and let MPI handle all cases ?
Update:
I want to grow a large data structure (game tree) simultaneously on many computers.
Most parts of it will be only on one cluster node but some of it (unregular top of the tree) will be shared and synchronized from time to time.
On shared memory machine I would like to have this achieved through shared memory.
Can this be done generically?
All the popular MPI implementations will communicate locally via shared memory. The performance is very good as long as you don't spend all your time packing and unpacking buffers (i.e. your design is reasonable). In fact, the design imposed upon you by MPI can perform better than most threaded implementations because the separate address space improves cache coherence. To consistently beat MPI, the threaded implementations have to be aware of the cache hierarchy and what the other cores are working on.
With good network hardware (like InfiniBand) the HCA is responsible for getting your buffers on and off the network so the CPU can do other things. Also, since many jobs are memory bandwidth limited, they will perform better using, e.g. 1 core on each socket across multiple nodes than when using multiple cores per socket.
It depends on the algorithm. Clealy inter-cluster communication is orders of magnitude slower than shared memory either as inter-process communication or multiple threads within a process. Therefore you want to minimize inter-cluster traffic, E.g. by duplicating data where possible and practicable or breaking the problem down in such a way that minimizes inter node communication.
For 'embarrisngly' parallel algorithms with little inter-node communication it's an easy choice - these are problems like brute force searching for encryption key where each node can crunch numbers for long periods and report back to a central node periodically but no communication is required to test keys.

Resources