Difference between gLite and OpenNebula - grid

What is the difference between gLite and OpenNebula?
I intend to differences in structure and in usage. When to use the one and when the other?

I think g.lite is a DSL service that offers a download speed that is slower than other forms (maximum of 1.5 Mbps).
Whereas, OpenNebula is a cloud computing toolkit for managing heterogeneous or different distributed database center infrastructures.

gLite is a grid middleware, which -- very simply put -- takes care of (bulk) submission and processing of grid jobs (+ related tasks like scheduling, data handling, monitoring, information service and others).
OpenNebula is a cloud management framework, which -- again put simply -- provides you with virtual machines on demand, based on available templates.
So, you will use gLite if your workload comes in the form of jobs (think scripts), which can run in the target system (gLite is currently only supported in Debian 6 and Scinetific Linux 5 & 6). You will use OpenNebula if you need on-demand virtual machines running your own (or publically available vanilla) system images. You can of course preinstall any software you need in those images.)
I promise to give a better aimed answer if you say more about what your situation is and what you want to achieve. Both products are actually intended to provide different kind of service.

Related

What is missing in JuliaDB to use it as production database in a website backend?

I have some difficulties to understand the pros and the cons of using JuliaDB as a main backend database for a production website.
https://juliadb.org/
My use case is a collaborative data sciences platform. The client request 1 million unique visitors and 100 000 writes per day. Well... I wish so.
Implementing a SQL database mean that I need to "translate" the data science dataframes used for the calculus into SQL and backwards.
In the other hands JuliaDB is an end to end solution.
Regarding the different criterions for a website production database:
Julia natively hase concurrency:
Julia supports three main categories of features for concurrent and
parallel programming:
Asynchronous "tasks", or coroutines Multi-threading Distributed
computing Julia Tasks allow suspending and resuming computations for
I/O, event handling, producer-consumer processes, and similar
patterns. Tasks can synchronize through operations like wait and
fetch, and communicate via Channels.
Multi-threading functionality builds on tasks by allowing them to run
simultaneously on more than one thread or CPU core, sharing memory.
Finally, distributed computing runs multiple processes with separate
memory spaces, potentially on different machines. This functionality
is provided by the Distributed standard library as well as external
packages like MPI.jl and DistributedArrays.jl.
In the other hand, JuliaDB doc tells that they support parallel computing but does not give much details.
Can JuliaDB handle parallel connections and asynchronous operation, making it performant for lots of users using it in parrallel?
From your question it looks that what you need is a massively parallel data ingestion mechanism. You a software architecture that allows to concurrently collect data for huge amount of users.
Perhaps you should look at one of noSQL databases that provide horizontal scaling capability good example could be MongoDB (or perhaps a cloud equivalent such as DynamoDB).
If your data volume and parallelism is even higher you should consider streaming solution such as Apache Kafka.
On the other hand JuliaDB is totally on the other site of processing workflow. Once your massive data gets collected it ends up in an analytic process. In the recent years the most popular tool was the Hadoop stack with Apache Spark used for processing.
JuliaDB brings a new paradigm on the analytics step of data workflow. With this tool you can massively parallelize processing of huge data and hence you should consider JuliaDB as a nice alternative to Spark.

Is CouchDb suitable for use on client desktops (Windows 10)?

I want to support the users of my Windows 10 desktop application with:
Local Data (not having to perform a fetch from the cloud every time they want new data)
Offline support
Replication with a cloud database
There can potentially be multiple users (in the order of 10-100 but not 1000) simultaneously editing the same database. I would run CouchDb as a service (ie. in a separate process to my app).
To achieve the above I am considering installing CouchDb on each client desktop PC (all replicating to a single main cloud CouchDb instance) together with my application to achieve the above goals.
One of the reasons I am pursuing this line of thinking is that it would allow my application code to mostly just be written in a manner that interacts with local data and the sync/replication (which is probably quite complex) can be taken care of by CouchDb.
I am using CouchDb as a replacement for something that I often see is done by sqlite but I really want the replication ability of CouchDb (which sqlite does not have).
Is the above a scenario in which I can expect CouchDb to perform well or is there something that I am not considering?
We have multiple clients doing this successfully. This is a recommended use-case for CouchDB. The limiting factor will be the cloud vm configuration, but 100s to 1000s of clients should be no problem on a decent vm setup.
CouchDB is sometimes called "A replication protocol with a database attached to it." This sounds appropriate for your use case. However, installing CouchDB (or any external database/service), particularly as part of a client application, isn't necessarily trivial. That doesn't mean it should be avoided--it's just complexity to consider when making a choice.
You might consider lighter-weight options for your client. PouchDB is the go-to solution for web apps, and often mobile apps, as it syncs with CouchDB, but has much lower resource requirements (at the cost of not being multi-tenant). For a desktop app, that may be appropriate.
Ultimately, whether CouchDB (or PouchDB, or PouchDB Server, etc) is appropriate for your use case depends on which trade-offs you're willing to make.

What is the recommended way to interface with a CorDapp via an API?

I am designing a CorDapp, which would require user input as well as API integration, and I am considering various approaches to expose flows and vault queries to the outside world.
Default option seems to be to use Corda RPC. Unless I missed something, there are only Java bindings for it, which is effectively restricting the clients to only be JVM-based. This is somewhat inconvenient, and ideally I would like something like OpenAPI to make it more open and implementation-agnostic.
Another option is to use some kind of Corda RPC to OpenAPI proxy. I know about Braid, and I'm sure there are others. Braid seems to support deployment as a Corda service packed together with the flows into the CorDapp itself, effectively making it running embedded into the Corda JVM.
Braid can be deployed as a standalone proxy too, which I suppose is option three.
Instinctively I find the embedded mode more attractive, as it reduces the number of moving parts, as opposed to a standalone mode. However, I am concerned that such model may be in fact become discouraged at some point, either because Corda developers consider it to be a misuse of services facility, or because some organisations will not be keen to deploy such code onto their nodes, especially when they may be running multiple CorDapps. I would imagine anything deployed as part of Corda JVM would at least require more scrutiny due to potential impact on other things running there, which in turn would reduce the agility.
I wonder what approach to integrate with a CorDapp is actually recommended?
Edit 1: I know it is technically possible to embed a webserver into the node and expose a REST API from there, at least in the current version of Corda (4.3 at the time of writing). The question is more about whether it is a good idea to do so, or not, and why.
Take a look at the question I had asked on Stackoverflow regarding front end for CordApp. This might be of some help.
Following is the link -
"Corda: Can we develop Dapps that will be run by IIS webserver to talk to Corda platform?"
You can use any front-end technology you want.
As of Corda 3, your backend must be JVM-based, for two reasons:
You need to load various flow, state and other class definitions onto
the classpath to pass as arguments to flows, retrieve objects from the
vault, etc.
You need to use the CordaRPCClient library to create an
RPC connection to the node
If you really need to write your back-end
in another language, there are a few workarounds:
Create a thin Java webserver that sits between your main webserver and
the node. The Java webserver translates HTTP requests from the main
webserver into RPC calls to the node, and RPC responses from the node
into HTTP responses to the main webserver
This is the approach taken
by libraries such as Braid
Use a library such as GraalVM to compile
non-JVM languages to JVM bytecode
An example of writing a JVM
webserver in Javascript using GraalVM is available here:
https://github.com/nitesh7sid/cordapp-example-nodejs-server-graalvm

How does the Realm Mobile Platform scale?

You could say I am a fan of the Realm Mobile Platform. I'm using it and it seems to be working well.
However I am confused with how to operate it going to production. It seems to be deployed only to one server, and even the professional and enterprise editions are working on my single server.
Assuming Realm have thought of this (as Enterprise edition supports 'enterprise scaling) - how does this work if all clients point to my owned server URL?
Another question is how to monitor the load on that server.
Thanks!
The Professional Edition and the Enterprise Edition emit statsd compatible metrics which allow you to track the usage and load on each node in a Realm Object Server cluster. These metrics are also used internally inside the cluster in order to display statistics about the health of the cluster.
We are obviously still adding metrics as we understand more about our customer's use-cases, and fine-tuning the ones that we have.
With regards to the way the clustering works, we are currently implementing this according to an iterative process, where we add more and more features, and more and more resilience to the system with every passing day.
Basically, we have a logical load balancer process, which receives the incoming client connections, and then dispatches that to a node inside the cluster. This logical load balancer can be HA'd and LB'd itself as well, just like you would any regular WS connection handler. Handling many connections these days is easy. It's handling the quadratic merge algorithms that is expensive on the Realm Object Server, which is why the clustering is required for deployments at scale.

cluster vs Grid vs Cloud

There are two questions:
1) What is the difference between cluster and Grid
2) What is the Cloud
I am not looking for conceptual definitions,
I found a lot of that by googling but the problem is I still do not get it.
so I believe the answer I seek is different. From what I could re-search online I start to think that
many article writers who is trying to explain this either do not understand this deep enough themselves
or not able to explain their knowledge for an average guy like myself (which is common issue with very technical people).
Just to let you know my level: I am a computer programmer, .NET and LAMP, I can do basic admin on both
Linux flavors and Windows, I have hands on experience with Hyper-V and now researching Xen and XCP
to setup a test cloud based on two computers for learning purposes.
Below info you do not have to read, it is just my current understanding of cluster,grid and cloud it
just to support my two questions because I thought it would help to understand
what kind of mess is in my head right now and what answers I am looking for.
Thank you.
Two computers used for reference in my statements are "A" and "B"
specs for A: 2 core intel cpu, 8GB memory , 500gb disk
specs for "B": 2 core intel cpu, 8GB memory , 500gb disk,
Now I would like to look at A and B roles from Cluster, Grid and from Cloud angle.
Common definitions between Grid and Cloud
1) cluster or Grid are 2 or more computers hooked up together, on hardware level
they are hooked up though network cards and on a software level
it is using some kind of program implementing message passing interface
to make it possible to send commands between nodes.
2) cluster or Grid do NOT combine CPU power or memory between nodes, meaning
that in this simulation a FireFox browser running on A still has only one 2 cores cpu,
8GB memory and 500gb available.
Differences between Grid and Cloud:
1) Cluster only provides fail over part, if A node breaks while FireFox is running
the cluster software will re-start FireFox process on node B.
2) Grid however is able to run a software in parallel on multiple nodes at the same time
provided that software is coded with MPI in mind. It can also lunch any software on any node
on demand (even if it is not written for MPI)
3) Grid is also able to combine different type of
nodes, Linux Server, Windows XP, Xbox and Playstation into one Grid.
Cloud definition:
1) Cloud is not a technical term at all, it is just a short convenient word to describe
a computer of unlimited resources, it can aslo be called a Supercomputer, a Beast, an Ocean or Universe but someone
said "Cloud" first and here we are.
2) Cloud can be based on Grids or on Clusters
3) From technical point of view Cloud is a software to combine hardware resources into one,
meaning that if I install Cloud software on Grid or Cluster then it will combine A and B
and I will get one Cloud like this: 4 core CPU, 16gb memory and 1000gb disk.
edited: 2013.04.02
item 3) was a complete nonsense, cloud will NOT combine resources from many nodes into one huge resource, so in this case there will be no 4 core CPU, 16gb memory and 1000gb cloud.
Grid computing is designed to parcel out large workloads to many participating grid members--through software on each member which is expecting to hear that request for computation or for data, and to reply with it's small piece of the overall puzzle. Applications must be written specifically for this approach to problem-solving. It can be heterogeneous because it's not the OS that matters but the software waiting to hear problem-solving requests.
The expectation of a cluster is that it can run the same executable image across any member node--any node can execute that code--which is what drives its requirement for homogeneity. You can write cluster-aware code which distributes workload throughout the cluster, but again you have to write your code to be cluster aware in order to take advantage of more than the redundancy features of a cluster. As most application vendors do not write cluster-aware code, the simple redundancy feature is all that's commonly used in cluster deployments, but that does not limit the architecture. Clusters can and do share their resources, and can collaborate on tasks simultaneously.
Cloud, as it's commonly defined is neither of these, precisely, but it doesn't preclude them, either. Cloud computing assumes the ability to deploy an application without advanced knowledge of it's underlying operating system, or even control of that operating system, coupled with the ability to expand or reduce the processing and memory footprint available to that application without having to destroy and recreate that environment--all done with enough isolation that the application won't know or be able to know what other applications might be installed or running on it's shared infrastructure, unless that access is approved-of by both application managers.
I would like to answer my question before this is closed as a duplicate because I believe it can be very frustrating to find correct info in regards to clusters,grids and clouds and I think this post can save time for many. If someone wants to challenge it please do so, otherwise I will mark it as answer in 1 week.
1) There are many differences and there are none, it really depends on the technical context but
generally you can connect several nodes and call it a Grid or you can call it Cluster. I would say Grid is a Cluster with extended capabilities, such as ability to connect heterogeneous nodes. Both Grid and Cluster will serve as scale-out platform equally good. From Network Engineer and Programmer perspective the difference in implementation or coding will be pretty big if Gird connects heterogeneous nodes.
2) Now the first question was actually a prelude for second one and I believe it is best answered by
Matt Joyce in this post:
https://stackoverflow.com/a/15286488/2230126
I'll take a crack at it. I have been collecting and saving my notes, scripts, and programs since the year 2002 A.D. This is a chop and paste of my statements over the years. Here is a brain friendly memorization list:
The grid is the hardware and hardware specifications.
a. You plug into the router or switch and setup IP addresses and top-level domains over the internet (which is also known as ICANN).
b. This is like OSI level 1, 2, and 3.
The cluster is the kernel (software ring 0 or 1 if its a virtual type thing going on).
a. The kernel is configured (compiled) to run a network stack that can handle sessions, permission, and account authentication.
b. You set up port to port communications usually over TCP/IP (like in the OSI model).
c. You setup iptables, pf, arp, and other OS level applications or shared objects.
d. You can setup ssh, kerberos, ldap, or some other PKI-database and protocol-socket combo.
e. This is like OSI level 4, 5, and 6.
The cloud is user-space applications.
a. The application processes talk to other application-processes within the cluster.
b. You setup process level permissions (via files, cgroups, and/or user-groups).
c. You setup mysql, redis, riak, Message Brokers, hadoop, apache, nginx, cron, java, haskell, erlang, and etcetera.
d. This is like OSI level 7.
The cloud floats over the cluster that grows from the grid. And actually visually think, cloud in the air, cluster in tree, and grid on the ground. Most of us creative types (which make all these technologies) are visual thinkers that can back it up with mathematical data and code. So always see if you can answer the riddle and correlate technological facsimiles to our physical realm here on Earth.
Intro
Grid, Cluster, and Cloud are three different words that mark their specific time in history. Their definitions have intersecting traits and they are modernly interchangeable. You just need to know when to apply the correct or associated word. For example, I was talking to some older M.D.s (medical doctors) and they wanted to know what the cloud was. So I told them that the cloud was a computer cluster that you rent over the internet. And Bingo, they got the idea within 10 seconds.
I will use a little bit of history in chronological prose.
Grid
The term grid is first used to represent one resource that is repeated across terrestrial landscape or space. The term is frequently used during the distribution of telegraphs where repeaters had to be placed on poles every N radii (plural for radius) to amplify the signal. Another example is the electrical grid that Thomas Edison and Nikola Tesla competitively started spreading around the Earth. Computers got really popular and they soon were expanded across The Grid to replace human telegraph (and telephone) operators.
The Grid is now a bunch of computers that can connect and terminate communication channels. The Grid is an infrastructure of computers that function for one goal which is the run assembly (or binary) code.
Cluster
Farseeing the power of computers and actually witnessing computers win wars (Turing's machine), DARPA (or ARPA which is the U.S.A. Military) stepped in.
DARPA started commissioning universities and colleges to utilize the Grid for multi-plexing communication methods (that use baud and protocols). Universities and colleges started making protocols to separate the different tasks that they wanted to carry out over the Grid and target the computers. That started the modern internet. In-house testing clusters were established in laboratories to simulate the grid. Clusters are great for orchestration. A job can be sub-divided over all or some of the slaves within a cluster. The military utilized the college and university's findings and applied the SOFTWARE to the Grid. There were some gotchas with clusters:
Must be same (or near same) hardware
Must have same operating system
The rules were strict because all the instruction-sets had to be the same passing over the CPUs. Clusters usually had a master and slave type relationship. A Cluster usually ran one unic (or unix) job at a time. Clusters had job-schedulers. Then clusters got more complex because hardware manufacturers started making parallel chip architectures (on top of the Von Neumann arch).
Clusters become more powerful. The Clusters inherited more complexity and people were doing more creative things. Cluster could now do different jobs, tasks, processes, asynchronously processes, synchronized processes, and many more interesting things. One box (or computer node) could run more jobs. Now the Grid could be used for multiple purpose. The rate of software updates on clusters was faster than the actual grid. Clusters were deployed locally on campuses. Clusters started superseding the grid because you could directly produce a public facing stack that out-performed the (national) grid.
My Experience
I went to college during the late 1990s and 2000s and cluster was the word for a physical laboratory of multiple computers working as one virtual computer. Clusters were used for testing. Once your software worked on the cluster, then you could mv (move) it to the production grade Grid. Then I witness network worms and computer viruses control zombie computers. These swarm of zombies could be used as one gigantic virtual cluster used to run commands. Well programmers started DIY (do it yourself) protocols and software like bit-torrent and Napster.
So leaping forward into the future, testing cluster softwares are starting to be replaced by Solaris jails, FreeBSD jails, Linux containers, QEMU, hyper-visors, VMWare, VirtualBox, Vagrant, and Docker.
Cloud
Cloud is a marketing term used to umbrella the hardware of different grids and the software of those clusters. Cloud is one big ubiquitous word used to advertise, promote, and profess all that cluster technology for monetary gains. Cloud is also an effort to wrap all those technologies under one singular word. The Cloud allows multi-tenanted processes to share a gigantic grid. The Cloud maximizes efficiency by sub-dividing the electricity, CPU, RAM, DISK, Electricity, and broadband which gets shared and paid for by consumers. A side effect is that those consumer subscriptions and/or pay-rates started producing profit. The Cloud also allows multiple users to install multiple operating systems that run multiple processes all in the software. So now we have acronyms like IaaS, PaaS, and SasS. The Cloud can replace the start-up cost that was once so darn difficult to fund and bootstrap. The Cloud is a great solution for mock testing your software and building a consumer base for your business.
From another perspective, the Cloud triggers the brain of non-programmers to think a certain way. For example, the human resource department can comprehend and isolate what is presented in-front of them.
So if you got the money, then you can purchase your share of the cloud experience and have easy support along with it. But if you have the skill-set, the time, the quick know-how, and the ability to install your own servers at co-locations, then do that because it is cheaper over the long run.
That is my narrative on the Grid vs Cluster vs Cloud.
I think this link well compared the Cluster and Grid.
As I know, there are some exceptions in the case of Clusters. YARN (Yahoo!) tries to handle mutli-tenancy and distributed scheduling. Also Corona (Facebook) has distributed scheduling.

Resources