Which Distribution CDH Vs HDP - cloudera

I happened to work on CDH longtime back ( around 1 year) and am planning to start again.Now we had CDH , HDP and Hortonwork acquired by Cloudera .
Is HDP being developed actively ? Or Is CDH being developed actively ?
Which distribution I should get started with ?

Cloudera (the company behind CDH) and Hortonworks (the company behind HDP) have merged. They now are called Cloudera.
After the merger a new distribution was released, called the Cloudera Data Platform, or CDP in short.
Though both older platforms will still exist for a short while, all new users should go for CDP. This is the platform that is seeing all development focus.
Note that the situation may be more nuanced if your company is already a heavy user of either HDP/CDH but even in those cases the formal recommendation is still to go to CDP as soon as possible.
Full disclosure: I am an employee of Cloudera (formerly Hortonworks), the company behind CDP as well as HDP and CDH.

Related

What is the difference between github enterprise 'cloud' and 'On premise'

We are investigating how to integrate our app with Github Enterprise.
There are 2 different deployment models - 'Cloud' and 'On Premise'
I have been looking around but couldn't find the differences between the two.
Maybe there is no such difference
The basic difference is that GitHub Enterprise Server is software you deploy on a virtual machine you provision and control (on-premise here is a bit of a misnomer since your VM could be in AWS).
GitHub Enterprise Cloud, on the other hand, is an enterprise-level of service at GitHub.com.
You'll find more here.

What can be achieved in Enterprise Corda is not achievable in Community version of CORDA

We are working with a client who is interested in developing a application using Corda Ledger. While in the initial phase of development to first rollout in to Production, client is looking to see the capabilities of Corda Ledger using its community version. Subsequent to first Production rollout when the capabilities of Corda are on the display with its own client, they want to look beyond making this solution a enterprise solution using by procuring Corda enterprise license.
I am not getting much help in forming a delineating line of difference between Community and Enterprise version of Corda.
**What are essential features which cannot be built using community version ?
**who governs Community version ?
**Is there any support provided for Community version ?
**Can we create a distributed architecture using Community version (Corda nodes located on different physical servers) ?
**Can we create Corda network using Docker containers using Community version ?
**Is there any detailed document to draw the lines between community and enterprise version ? **
I have worked on community version of Corda using it for developing PoC, Where all nodes are located on same server and were not truly distributedstrong text
Corda Open Source and Enterprise are functionally identical. What Enterprise offers extra is the non-functional stuff that is required for mission-critical enterprise applications, which includes performance, HA, HSM integration, Enterprise Database integration, 24 X 7 Support, etc.
The community version id developed primarily by R3, while we also accept and encourage community contribution to the Corda Open Source project.
There is no Official R3 Production Support for Open Source Corda, however, you could ask questions and ask for solutions to your problems on our public slack channel (stack.corda.net) and also here on StackOverflow.
You can operate a network of OS Corda with nodes on different servers without any problems.

Using Storm in Cloudera

I have been looking to use Storm which is available with Hortonworks 2.1 installation but in order to avoid installing Hortonworks in addition to a Cloudera installation (which has Spark in it), I tried to find a way to use Storm in Cloudera.
If one can use both Storm and Spark on a single platform then it will save additional resources required to have both Cloudera and Hortonworks installations on a machine.
You can use storm with Cloudera installation. You will have to install it on your own and maintain it as such. It will not be part of the Cloudera stack but that should not stop you from using it along with Hadoop if you need it.
You can use Storm on any of the vendor platform. However, storm cluster management is something you have to consider. Storm is not part of the CDH distribution. Cloudera Manager does not manage the lifecycle of the storm services and configurations, nor does it monitor the storm cluster, unless you are willing to write a Clouderea Manager extension yourself. On the contrary, if you choose a vendor such as HDP, the Ambari management tool on HDP provides all the above management features.
If you have a streaming project on CDH, you should strongly consider Apache Spark first, as it provides the same programming model for both batch and streaming processing. You do not need to learn a new API. However, Apache Spark streaming is micro-batch. Thus in use cases that requires sub-second low latency real-time processing, Storm is more suitable.
You can use Storm alongside Cloudera.
All the above are true, but why would you?
Spark includes Spark Streaming, which allows you to handle data processing and stream/event processing workloads using a single API. Spark/Streaming is already inside CDH.
So, why burden yourself with two different APIs?
You can install Apache Storm on Cloudera VM.
For a basic setup and test run, follow below link:
https://github.com/vrmorusu/StormOnClouderaVM/wiki/Apache-Storm-on-Cloudera-VM
This should get you started on developing Storm applications on Cloudera VM.

Neo4j replication alternative to Neo4j Enterprise edition?

It seems Neo4J High Availability is only available for the Enterprise edition which is paid- is there another alternative to achieve replication without that module? (i.e. without cost). Thanks for any help!
Update:
This answer has changed. Neo4j is now open core, so the Enterprise code is no longer dual-licensed - only the commercial license option remains.
You can find more details here: https://neo4j.com/open-core-and-neo4j/
Original Answer:
Enterprise is available as quid-pro-quo - if you put your code out under an open source license, then you get access to the open source Neo4j Enterprise free of charge. However, if you are closed source, Neo Tech charges a license fee. This fee is determined by your needs and your ability to pay - if you are a small outfit with no venture capital, it's still free, and then the licensing cost increases as your ability to pay back to the development of Neo4j increases.
If your application is open-source as you mention, then you are free to use Neo4j Enterprise without paying for it, simply download it at neo4j.org.
Actually Neo4j Enterprise is free under the open source AGPLv3 license.
Neo4j Inc can't modify the terms and still call it AGPL.
If you use Neo4j Enterprise as a server (like most people do) and communicate with it via its REST API or any of the official BOLT drivers then you never trigger AGPL's copyleft requirements.
In other words - the software that connects to it does not have to be open sourced.
You can download Neo4j Enterprise open source licensed binaries up to version 3.2.x from dist.neo4.org. The links for the windows and unix packages are below. (Replace the version number for specific versions)
http://dist.neo4j.org/neo4j-enterprise-3.2.8-windows.zip
http://dist.neo4j.org/neo4j-enterprise-3.2.8-unix.tar.gz
If you want Neo4j Enterprise 3.3.0 and on under it's free open source license, then you can build them from source like we do for our US government clients, or just grab them from our free distribution site.
Check out the blog post if you want to understand why this has happened.
https://blog.igovsol.com/2017/11/14/Neo4j-330-is-out-but-where-are-the-open-source-enterprise-binaries.html

Alfresco Community Enterprise Feature Comparison

I've seen this question but the answers are simply not good enough. I've searched the web and could find a clear listing of the main differences.
I am particularly surprised to see contradictions in the above link, that holds only 4 short answers.
So the question is, beyond support, what are (all) the differences between Alfresco Community and Enterprise editions (for the current versions of course)?
Are there functional or technical features that available in the Enterprise edition, that are not in the community edition?
I find it strange that it's so difficult to get a clear list. Looking at the forums to find this answer is not a serious option from a business perspective.
Until now, I found this link to be useful, but it's from 2009.
In particular, I find the platform support interesting, with the community edition supporting only lamp stuff:
Linux
MySQL
Tomcat
OpenLDAP
Firefox
And the enterprise edition supporting:
Windows
SQL Server
WebLogic, WebSphere
AD/Kerberos
IE and Safari
Apparently, these features are only available in the enterprise edition:
JMX monitoring
Runtime admininstration: What's that exactly? And what's in the community edition then?
Runtime indexing consistency check and update: What's in the community edition then?
High performance and availability: How is that implemented and what's in the community edition then?
Storage policies
Open source and proprietary technology stack support: which ones exaclty? Which ones are supported in the community edition?
If anyone could guide me towards serious documentation about these differences, that would be great.
I also went through the wiki but could not find an answer to my questions in there.
differences between Enterprise and Community vary in detail from version to version and are mainly visible for administrators. We see or maintain both flavors of Alfresco in midsize to very large environments and I would say it's more or less a question of taste and budget what the best decision / edition is for you. Excellent skills in infrastructure and java are highly advisable for both editions to run Alfresco in production.
The technical differences are not as dramatic as not being able to provide very similar functionality for the users - so if you're actually in a decision you should focus on a good technical partner, the support services and maybe the fact that you only get official patches in the Enterprise subscription, not on the Community. BTW Alfresco Enterprise is not Open Source but this is not a real point of interest for most end users. You can access the code as a subscription customer but it is not public available/accessible.
The main differences in features are already named more or less:
Administration
Enterprise has more views and setting in the admin web GUI. In Community you can access most configuration only from the command line. This may be a restriction but in real live Administrators prefer the command line and scripting automation.
Enterprise lets you change some Alfresco settings during runtime (most settings still require restart). Some can be change in the GUI and more in the jmx interface. Also you're able to stop and start subsystems like the CIFS protocol server. We use this feature to switch a system in read only mode. This point is meant with "runtime admininstration". Community requires restart of the service for most configuration changes. It is possible to work around this by advanced scripting like groovy or by implementing modules.
Indexing
Runtime indexing consistency check and update is not a self healing functionality as expected. You will have to learn (at least for now) that you have to recreate the Alfresco index from time to time even in Enterprise environments and that it is better to focus on good strategies how to speed recreation or how to setup standby indexes instead of hunting failed indexing transactions using the check and update methods. For major document model changes you need to recreate the index anyway.
High performance and availability
This is mainly the cluster and replication functionality which is no longer available in Community. It's similar to MS Clusters: It's a lot, lot work for very view more availability since some concepts are missing. The price is high in terms of complexity and can end up in loss of robustness. Even with enterprise support it's a hard job to keep a alfresco cluster running - so you need very good arguments why to go this way. But of course: its possible and available!
High performance: There shouldn't be any difference and if - I'm very curious about the explanation.
Technology stack
The main difference is the database support. In the Community you only can choose between MySQL and Postgres (No Oracle or MS SQL for Community). All other technologies are independent from Enterprise or Community (AD, Kerberos, OS, Browser, ...)
Java Container: I believe over 95% of all Alfresco installations run in tomcat. That's the configuration which is documented, tested and scales. Using WebLogic or WebSphere gives you no added value except new challenges - quite the contrary: You have to solve most issues for yourself and can't benefit from others experience.
Storage policies: I'm not pretty sure and should check in 4.2.x if the Content Store Selector / Storage policies is no longer available in the Community, but it was there in the 3.x versions.
[Edit]: storage policies have been removed in Community 4.2.x:
NoSuchBeanDefinitionException: No bean named 'storeSelectorContentStoreBase' is defined
If there is a really need for this functionality someone may re-enable that feature by coding a module for Community.
Regards
This page explains the difference between the editions:
https://wiki.alfresco.com/wiki/Enterprise_Edition
This page is the canonical, comprehensive list of the differences.
If you are considering an Enterprise Subscription and you have a question that isn't answered by what you can find on that page, you should talk to your account rep.
Well, regarding JMX monitoring:
Runtime administration: Alfresco enterprise allows to perform certain actions on Alfresco subsystems without restarting the server. This allows you to be very fast during debugging/developing and also making changes in production environment. Also you can access the JMX interface that supports JMX Remoting.
There is no consistency check or update, until you restart the server (during the startup you have to validate/check/rebuild your indexes). There is an option in alfresco.global.properties (or the original repository.properties config file) for that. If you have some inconsistencies in the Alfresco Community index, you're gonna have a bad time xD.
Alfresco Enterprise has specific license for clustering your architecture, the Community edition doesn't support those systems. Replicate and cluster Alfresco is one of the main improvements in performance/scalability/availability you could achieve.
The storage policies allow you to use Content Store selectors in Alfresco Enterprise. You can manage a primary and a secondary file store, and map/connect these stores in your architecture. The Community Edition allows you only to use one content store at a time.
These include everything inside Alfresco (Spring Framework, Apache-Lucene/Solr, Tomcat, and so on), because with the Enterprise license you have also the full support with everything inside the Alfresco package. The difference is that the Community is based on daily builds, supported by community, and therefor not guaranteed. The Enterprise support helps you resolve many problems that you might encounter during developing and in production environment, not only Alfresco related, but also on some configurations on supported platforms (Windows/Linux), your web application servers, and so on.
Hope it helps.

Resources