What is the difference between google's dataflow and talend ? - bigdata

I am trying to see through the technical performance of both technologies and put up a benchmark for that reason.
Could you guys help me with that ?
Thank you

It will be tricky to do that, google is serverless tech, and charges on a job/per minute basis and therefore in essence runs on there own servers. Where talend although can run aws and support a type of serverless tech this is not out of the box and would take sometime to come up with sensible comparisons at a deep technical level. For myself, if you don't need to get deep and dirty for performance and have relatively simplistic data transformations then I would tend to favor Google (if you don't mind lock-in and the price!) but otherwise I would go for Talend (good to be a java centric development house, and takes more time to make performant etc) Anyway that would be my take on it.

Related

Can meteor.js web framework support a social networking architecture effectively?

So I'm new to node.js, javascript frameworks, and meteor.com. I'm trying to learn how to build social networks, and I'm naive/struggling to understand why Meteor.js (meteor.com) wouldn't be able to do all the great things you see now that twitter, facebook, instagram are doing?
There's the comet technology between client/server, authentication configs, asynchronous coding for scaling and performance, and built on top of node.js.
I'm trying to learn more about long polling, comet, gridFS or how files are stored, and in general things like replication sets, and sharding to help with performance (esp since Redhat has this openshift platform that we can build our own private clouds with).
I have some computer science background, but it seems like magic, so what am I missing? If you all could think of a few buzz words that make a social network tick that Meteor.js doesn't support, what would it be?
I hear things about parallel and concurrency (webworkers fixes that in part, no?), websockets, that high level languages like python or java are better off supporting. There's only one to learn my answers, and thats by doing, but thought someone could sway me one way or the other via this thread. Thanks!
This question encompasses a really broad idea and just focusing on using meteor alone would solve this issue. Here are a few points to consider:
I don't think this framework would be a good starting point to learn long-polling, gridFS, etc etc. Meteor aims to be a framework that tends to be more of an ecosystem of packages e.g. you can certainly roll your own aformentioned strategies -- however for dynamic updates, Meteor uses its own Data Delivery Protocol (DDP) supported/implemented by (surprise) a good bunch of core packages such as Spark.
Parallel processing and concurrency can be better off done using other languages, but why not with? Since Meteor is largely based on node.js, and node.js is really good with the aforementioned stuff plus it can play very well with other languages so you could integrate smoothly. Meteor doesn't really require you purely rely on it, as other languages would say the same thing. It's all in the general engineering / planning for your project. There are already lots of really good stuff out there that rely on Meteor, join in! don't be afraid. It all boils down to planning (and the courage/perseverance to pull it off, of course).
Right now, we cannot tell if Meteor would be incapable of the usual great stuff by gigantic websites. Sure, we can do live updates, (its own kind of) publish/subscribe patterns, and powerful stuff to boost development (look at the seven core concepts of meteor to best understand this). It is not impossible to replicate what is already out there, really. We can only say it with uncertainty at the moment mainly because.. (see next point)
The framework is so young! it's still at 0.6.x at the time of writing. Please take time to look at the Meteor Roadmap to see how things are going in terms of broader support for persistence/databases, performance considerations, and the official DDP specification.
I hope I have answered your enquiry (and more, I hope). I'm really excited for meteor myself as it could easily be the next big thing. We have a couple of (for-)production projects using Meteor as well, so you're getting direct insight from a person who has done quite a bit of hacking (and tons of research and first-hand experience) in Meteor. Not that i'm saying i'm an expert or so, it's just so much fun to work with Meteor and i'm totally not kidding you.
Hope this helps!
P.S.: Fair warning though, resources and documentation is really sparse at this point. I try to contribute to the community as much as I can about it (one of my starting points is here, on SO).

When to choose LAMP over ASP.NET?

A friend wants to start a dating website, she wants me to help her. We still haven't discussed on what platform it'll be developed, but I'm thinking she'll suggest LAMP to save a buck (which is one reason already to chose over ASP.NET already). If the dating website does well, it'll potentially hold a large amount of data (I'm not sure if this would be another reason to consider either ASP.NET or LAMP).
Anyway, I ask this from an ASP.NET developer point of view. I have very little, almost null experience with LAMP, and I don't like it very much either, so if she decides to go with PHP odds are I won't help her. So what would be some good points to bring up when deciding which platform to develop on?
Please be objective, I don't want this to be argumentative or anything, try to stick to facts, not opinions alone.
Thanks!
What generally matter in that kind of choice is :
How much time will it require ?
How much money will it cost
Which is often linked to the time ^^
If you have a lot of experience with .NET and none with Linux/Apache/PHP/MySQL, choosing LAMP will mean that you'll need much more time : a whole lot of new stuff to learn.
It'll also mean that your code will probably not be as good as it would be with what you know.
After, the question is : do a couple of week "cost" more than a few licences ?
Only you and her can decide, there ;-)
If LAMP makes you queasy, you can try ASP.NET over Mono.
IMO the only good reason to move away from a programming environment that you are already experienced with is the one you already mentioned: cost.
You would use LAMP specifically to build appliances. If you're not building appliances, the software cost for ONE server is marginal, and is not worth the tradeoff for moving to a totally different development environment, IMO.
I think the first question is: Which is the target programming language and environment that you have experience with?
Imagine the site will become a success - how do you scale then? LAMP can scale, and so can WISC, but in both scenarios you need people who actually know the environment and who can secure it. If you don't know Linux and MySQL and PHP, how are you going to scale and secure it?
So even though LAMP may be significantly cheaper (The SQL Server license is the heavy part in the WISC stack), after the first hacker attack or downtime, that initial savings may seem marginal compared to the damage.
The other thing is of course the PHP vs. ASP.net/C# decision. If you don't know PHP, then it's a decision of "Not having the application at all" and "Having the application on an expensive stack", unless your partner of course decides to hire someone else to develop that.
Technically, both have their pros and cons, but there are huge websites built on both stacks, so it really boils down to "Which platform can you reliably/comfortably setup and maintain?"
I agree with Pascal. Go with what you feel comfortable with in completing the project and don't forget that YOUR TIME EQUALS MONEY. You have to put a $$ value on your time. LAMP may be cheaper up front but if it winds up taking 1000 extra manhours, then suddenly it's more expensive.
Also take into account the lost opportunity cost in not being able to bring something to market b/c you chose a technology you were not familiar with.
At the end, if the plans are for this to be a business that is successful, the cost of using ASP.NET should be negligible or else I would question the seriousness of the effort.
One argument for the Apache/MySQL/PHP stack is that it's available on most major platforms (Windows/Linux/Mac/BSD/...) and most webhosters provide it as well.
You also find many (as in "huge amounts") of good tutorials, books and other educational stuff about PHP/MySQL.
Apart from that all tools used in the LAMP stack are free (as in "free speech" and also as in "free beer"). ASP.NET is still a proprietary technology owned by Microsoft. I'm not a huge open source fan, but knowing that your tools will remain free to use in any way you want is quite nice.
Of course, if you have no experience with PHP at all and much exp. with ASP.NET it's easier for you to stick with ASP.
If your comfortable with Microsoft products there's nothing to stop you from developing code in .NET and using a free database (however you may need to find/develop a custom database adapter if you are not using free versions of SQL server or Oracle). If you are generating a lot of traffic you can swap out the data layer of your code and invest in a better performing database.
Time costs money and if you can develop a better product both from a user and maintenance/performance perspective it will serve you better in the long run.
Some hosting companies include the OS and flexible contracts so I would make fit from your prespective. The market's pretty competitve for that type of site and there's no point throwing a lot of money at it until you get some useful metrics for your site IMO.
The short answer is: it doesn't matter, unless the site is going to do something so amazingly different that one technology is obviously better suited. And I can't think of anything like that off the top of my head.
A big red flag is: if your friend is concerned about the extra $5/month for asp.net hosting instead of LAMP hosting, then you're probably not going to get paid. Ever.
Caveats aside, be realistic: what is the immediate goal? To get something working, or to design something on the scale of plentyoffish.com or facebook.com? [Facebook.com has about 44,000 servers at the moment]
So, what are the chances of your friend's dating web site exploding to the size where scaling is a concern? For most sites, the answer is "very close to zero" - because of the marketing effort required to drive that much traffic.
Now, what is the revenue stream? Is there any expectation that you will get paid to do this? Do you think the site will be profitable? Is the project fully funded?
Friendship is great, but don't let that keep you from asking the appropriate business and client-relationship questions. One sure way to ruin a friendship is to do some work for free and/or without thinking through the full extent of the project. Far too often, you think it is a one-time favor, while they think it is your job!
LAMP is only cheaper until you read the fine print. It's not better or worse technically, just different.
The WebsiteSpark/BizSpark programs will get you all the Microsoft software you need to get started, free for three years. If price is her driving concern, point her to those programs if she's willing to consider the ASP.NET platform.
Hosting will cost a fair amount either way, because for a full-service website you don't want to go shared. You'll need at least one dedicated server to support a dating site. The OS and database will be free either way if you go with one of the *Spark programs I mentioned.
As a small startup company you can get a free 3-year MSDN subscription (well, you have to pay $100 at the end of the 3 years). If you think .Net will be more efficient and this website will make money, seriously consider BizSpark.
Since you are looking for dating site, check out Markus Frind of plentyoffish.com he is running the largest dating site on .net platform with asp.net and sql.

Hosting ASP.NET - knowledgeable programmer, hosting noob - Scaling and Speed Question

Apologies for this huge question.....please bear with me and try to help :)
Previous employers have all had in house hosting or people other than me to deal with that side of stuff and all my personal projects (ie low traffic) have been comfortably handled by servergrid.com who allow any number of domains even in their basic package.
I am about to take on more serious projects and have little clue about hosting, the questions to ask and what to look for. Some basic research has been done but I am honestly confused by the number of metrics involved when main thing i care about is SPEED & SCALING.
I have noticed that servergrid db servers for instance shares many 100s DB users/server so I imagine a shared package where your paying just 2$/month for sql server, tho a bargain, is not going to scale beyond a hobby site.
So:
is moving to a dedicated or virtual dedicated server the simple answer to speed and the only real metric I need to worry about?
dedicated pricing is a big jump on servergrid - are there premium shared services that don't put a bazillion people on the server - it doesn't seem obvious from the sites, would it make a huge difference?
the landscape seems to changing in a big way - IIS7 and Server 2008 seem to have all these
features like Isolated Application Pools/ Hyper V, are these just BS hype or things that seriously help with scaling and speed?
Lastly cloud hosting (specifically http://www.rackspacecloud.com) - it runs .NET right, is it fundamentally architecturally different to anything else or just use of the word cloud for marketing? It looks v cool - but is it just normal hosting with a different billing model and a somewhat easier way to scale? Is this similar to the much hyped squarespace hosted blog/site system?
Sorry for my rambling style of question and would be deeply grateful for someone who can just in relatively plain english sweep away some of my basic misconceptions....
Thanks!
Okay, take a look at Amazon Web Services. They are very flexible in terms of infrastructure (both hardware and software) and I find their rates to be ok. Also, their business model revolves around "using" not "leasing" (ie you pay based on what you use, for how long, etc).
I think it's a good starting point.
Since your main concern is "speed" & "scale" you may also take a look at Windows Azure and SQL Azure
Windows Azure
A nice brief video explanation by Steve Marx.
What is Windows Azure
I would stay away from shared hosting for a "more serious" production deployment. Amazon's AWS is as good a place to start as any (rackspace has a similar service which now supports self-provisioning). Failing that, you might carefully evaluate how much scale you really need. If you know how many users you'll have and have any idea what their usage patterns will be, then get dedicated hosting to fit. If the number of your users is unknown and unpredictable, and their usage will be spiky, then go with AWS.
That would be my first-pass approach. YMMV, and it will take time to fine-tune your own approach.

Any experiences with Websphere Integration Developer (WID)?

My company (a large organization) is developing a "road-map" for evolving their rather old, tangled confederation of systems to an SOA model. A few people are pushing hard for using Websphere Integration Developer and Websphere Process Server as the defacto platform for developing future applications...because they feel IBM is a stable vendor, the tools are made for the enterprise, they drank the "business agility" BPEL kool-aid, etc.
Does anyone have positive or negative thoughts on this platform? Do the GUI tools help eliminate monotonous/redundant coding...or just obscure things and make things harder to maintain? Basically, do the benefits justify the complexity?
My experience with the IBM Java tool set is pure pain. Days to install lots of different versions of different components all incompatible with each other, discover a bug in component A get told to update to see if it fixes, updating component A breaks component B and C, get told to update these etc.
I find Eclipse with out the IBM extensions far more stable and quicker and provides more features (as its stable versions are a couple releases ahead of WID/RAD).
I would advise against going the IBM way for development tools. As for process server I have less experience but the people in my team using it seemed to enjoy it as much as I enjoyed WID. not a lot.
So far I havent been impressed by any tools with the "SOA" and/or "BPM" labels on them. My "roadmap" would be very very iterative to see some results with the archetecture as fast as possible while trying to grab some of the easy fruits. That way you gain your feel for what works for you and your people.
I would never let any vendor push me anywhere in the "scuplturing" of the architecture.
I agree with other users complaining about WID. The only reason we are using WID is that a decision was made a while back to use IBM products across the board by our sales department.
That's right, our sales department made the decision to use IBM products.
Development has been painful and frustrating. We have lots of stability problems with Process Server, sometimes it doesn't want to start or shutdown properly. Yeah you can easily draw processes in the IDE, but most any toolset provides that functionality these days. It is nothing special or unique to WID or IBM. IBM is a few iterations behind mainstream.
There are plenty of open source implementations out there that offer great support. Checkout JBoss or RedHat, they are pretty good. If that doesn't float your boat, you can always use Apache tools.
Walter
Developers don't choose WID, WMB, or WPS. Managers do, because IBM is a "stable vendor".
Look at JBoss, or K.I.S.S.
WID/WPS is actually pretty simple. The original intention was for analysts and business people to "compose" services (DO NOT LET THEM DO THIS!) so the UI is simple and easy.
Most of the work will be in defineing and implementing the back end services which depending on the platform will mostly involve wrapping existing code in SOA service.
The most important thing to bear in mind is that SOAP is technoligy and SOA is an architecture and a state of mind.
There is a zen to a succesful SOA implementation. Its all about "business services", if you have a service that you cannot describe to a business user in less than six words you have done it wrong! Ideally the service name alone should be enough to describe the functionality of the service.
If you end up with a service called "MyApp.GetContactData" described as "get name, addresses tel fax etc." then you are there. If You have a service called MyAppGetFaxNoFromOldSys" described as "Retrieve current-fax-nmbr from telephony table in legacy system" you are doomed!
Incidently most of the Websphere tooling for WS* is pretty nice. But I would recommend the very wonderful SAOPUI tool from http://www.eviware.com which is very good for compsing/reading WSDL based messages and also function as a useful test client or server.
Do the GUI tools help eliminate monotonous/redundant coding...or just obscure things and make things harder to maintain? Basically, do the benefits justify the complexity?
As a Developer, I find the tools at varying levels of being bug free. 6.0.1 was a pain, 6.2 is so much better. But once you develop with the tool, there is minimal effort to maintain it. I develop in hours what java developers take days to do. It is also easy to maintain as changes can be made very quickly. I cannot answer your question from the perspective of an architect or a Manager but i would agree with comments of some others here.

What are your experiences with Windows Workflow Foundation?

I am evaluating WF for use in line of business applications on the web, and I would love to hear some recent first-hand accounts of this technology.
My main interest here is in improving the maintainability of projects and maybe in increasing developer productivity when working on complex processes that change frequently.
I really like the idea of WF, however it seems to be relatively unknown and many older comments I've come across mention that it's overwhelmingly complex once you get into it.
If it's overdesigned to the point that it's unusable (or a bad tradeoff) for a small to medium-sized project, that's something that I need to know.
Of course, it has been out since late 2006, so perhaps it has matured. If that's the case, that's another piece of information that would be very helpful!
Thanks in advance!
Windows Workflow Foundation is a very capable product but still very much in its 1st version :-(
The main reasons for use include:
Visually modeling business requirements.
Separating your business logic from the business rules and externalizing rules as XML files.
Separating your business flow from your application by externalizing your workflows as XML files.
Creating long running processes with the automatic ability to react if nothing has happened for some extended period of time. For example an invoice not being paid.
Automatic persistence of long running workflows to keep resource usage down and allow a process and/or machine to restart.
Automatic tracking of workflows helping with business requirements.
WF comes as a library/framework so most of the time you need to write the host that instantiates the WF runtime. That said, using WCF hosted in IIS is a viable solution and saves a lot of work. However the WCF/WF coupling is less than perfect and needs some serious work. See here http://msmvps.com/blogs/theproblemsolver/archive/2008/08/06/using-a-transactionscopeactivity-with-a-wcf-receiveactivity.aspx for more details. Expect quite a few changes/enhancements in the next version.
WF (and WCF) are pretty central to a lot of the new stuff coming out of Microsoft. You can expect some interesting announcements during the PDC.
BTW keeping multiple versions of a workflow running takes a bit of work but that is mostly standard .NET. I just did a series of blog posts on the subject starting here: http://msmvps.com/blogs/theproblemsolver/archive/2008/09/10/versioning-long-running-workfows.aspx
About visually modeling business requirements.
In theory, this works quite well with a separation of intent and implementation. However, in practice, you will drop quite a few extra activities on a workflow purely for technical reasons, and that sort of defeats the purpose as You have to tell a business analyst to ignore half the shapes and lines.
Related question: When to use Windows Workflow Foundation? My answer there:
You may need WF only if any of the
following is true:
You have a long-running process.
You have a process that changes frequently.
You want a visual model of the process.
For more details, see Paul Andrew's
post: What to use Windows Workflow Foundation for?
Please do not confuse or relate WF
with visual programming of any kind.
It is wrong and can lead to very bad
architecture/design decisions.
So, if you have such requirements, then WF is a good candidate. Of course it is relatively complex, but mention that the problems that is trying to solve is also complex (and sometimes very complex). IMHO, it is very complex for example to dehydrate/rehydrate objects that have event handlers attached (with events that can be triggered when the object is not in memory).
I can not judge what you mean by "small to medium-sized project", but in general I would say that if your project has at least two requirements from the above list, then you can consider WF as a solution.
We've used WF in a large-ish SharePoint application and I can say it's OK. It has lots of power and flexibility. and, as Kevin mentions, once you grok the underlying concepts of workflows, you can do pretty much anything you want with it.
On the other hand, it has some really serious issues, like lack of versioning, which can really hurt your application in the future. We've been forced to deploy up to 3 parallel versions of the same workflow named xxx-v1, xxx-v2 and xxx-v3 to keep older instances running and have new instances use the updated versions. A real pain in the ass. Oh, and there are also some really non-intuitive concepts in there (correlation tokens, wtf??)
We had a project at work that I was involved in using Workflows.
The idea (from management), was that us programmers would write the Workflow Activities along with the "engine" and framework. Then non-programmers would take care of all the rest by compiling their own Workflows into dlls which the engine would automatically load.
Management was sold on this idea of non-programmers using Workflow to help develop software, and it was pretty much a complete waste of time. The problem we were trying to solve with this project was relatively complex and we knew from the very beginning that the software would have to be modified almost constantly (its calculations were dependent on other companies and governements).
The end result was that we were unable to make the Workflow modules generic enough for anyone else to use. So the programmers were the ones who were forced to work with the Workflows, and all the Workflows did was get in our way.
I've been using Workflow 4.0 for the last few months and although mostly impressed, I've found it extremely hard to learn.
For the most recent version (that comes with .NET 4.0 RC), there is next-to-no documentation on the web, in any books or no training courses available. I've only found articles relating to the now defunct 3.0 version. Even the MSDN documenation is light on the ground.
The workflow designer is not as intuitive as it should be by any means so learning is very hard. I've had to rely on answers from a single person on StackOverflow (thanks by the way Maurice!) - and I would be stuffed without his help.
So in summary, I think it has potential but you would be quite mad to learn it yet - wait for more training, documentation and books otherwise you will be going into it blind!
Last year we completed a working application with WF, now used as the backbone of an unbelievably huge system which is used by a very big bank for its mortgage process. The pe process has many steps starting from customer application to approval of credit.
Although it was a success, there were so many problems and crisis all along the way. And it wont worth the trouble for any smaller size projects.
I consider MS WF as a low-level workflow library rather than a fully fledged enterprise workflow product such as K2. It will enable you to build a workflow enabled application, but is not in itself a workflow application. My experiance of it in this capacity has been positive, although we have had to build a lot of our own infrastructure around it (a pub/sub framework, a worlkflow lifetime manager etc). A lot of the documentation out there is fairly simplistic and does not cover building up an enterprise workflow application based on MS WF.
Hard to learn. Quite flexible. Not to be confused with a visual tool for end users, only for programmers. Not sure if I like the dependancy property approach.
It really depends on what you want to do with it. I've only used it a little, but compared to more mature products like MetaStorm (I know technically it's a BPM, but there is still a workflow component), Process Choriographer and IBM MQ workflow, there's no comparison. It's just not mature enough. On the other hand it's free where the others are not and can probably get the job done. I don't know if I would place a multi-million dollar operation on it, but with smaller ones, I'd give it another shot. The real hurdle you are going to face is the change in thought process it requires. If you don't have developers that have worked with state systems before that can be a real hurdle.
Brian, I can't reply to your comment, but anyway, by versioning i mean making changes to the underlying code of the workflow without breaking already running instances, and gracefully applying updates to existing workflows. I'm not sure about 'stock' WF, but at least in SharePoint environment there's no concept of workflow versions so new versions have to be deployed as completely different workflows which becomes a maintenance nightmare.
This has nothing to do with 'rehydration', rehydration is the process by which you bring a 'dormant' workflow back to activity after some event or change in state. That is handled transparently by the workflow runtime.
WF ist integrated into SharePoint (WSS 3.0), and i have created quite a few workflows for various SharePoint-Websites, so i can tell about my experience of WF in SharePoint. Compared with other workflow-frameworks WF scores well. It's stable (i haven't experienced any mysterious errors), workflows are fairly easy to design (thank to the workflow-designer in Visual Studio) and you can use not only sequential but also state-machine workflows.
It's not perfect, of course, and a developer will definitly need some time to understand the concept (of i.e. the Activity Model); but it's definitely useable - even for "small tasks".
Never tried WFF, but I remember reading this article about WFF by Leon Bambrick where he basically says the whole genre of software development tools is nonsense. Might help you decide one way or the other.

Resources