Presto Interpreter in Zeppelin on EMR - emr

Is it possible to add Presto interpreter to Zeppelin on AWS EMR 4.3 and if so, could someone please post the instructions? I have Presto-Sandbox and Zeppelin-Sandbox running on EMR.

There's no official Presto interpreter for Zeppelin, and the conclusion of the Jira ticket raised is that it's not necessary because you can just use the jdbc interpreter
https://issues.apache.org/jira/browse/ZEPPELIN-27
I'm running a later EMR with presto & zeppelin, and the default set of interpreters doesn't include jdbc, but it can be installed using a ssh to the master node and running
sudo /usr/lib/zeppelin/bin/install-interpreter.sh --name jdbc
Even better is to use that as a bootstrap script.
Then you can add a new interpreter in Zeppelin.
Click the login-name drop down in the top right of Zeppelin
Click Interpreter
Click +Create
Give it a name like presto, meaning you need to use %presto as a directive on the first line of a paragraph in zeppelin, or set it as the default interpreter.
The settings you need here are:
default.driver com.facebook.presto.jdbc.PrestoDriver
default.url jdbc:presto://<YOUR EMR CLUSTER MASTER DNS>:8889
default.user hadoop
Note there's no password provided because the EMR environment should be using IAM roles, and ppk keys etc for authentication.
You will also need a Dependency for the presto JDBC driver jar. There's multiple ways to add dependencies in Zeppelin, but one easy way is via a maven groupid:artifactid:version reference in the interpreter settings under Dependencies
e.g.
under artifact
com.facebook.presto:presto-jdbc:0.170
Note the version 0.170 corresponds to the version of Presto currently deployed on EMR, which will change in the future. You can see in the AWS EMR settings which version is being deployed to your cluster.
You can also get Zeppelin to connect directly to a catalog, or a catalog & schema by appending them to the default.url setting
As per the Presto docs for the JDBC driver
https://prestodb.io/docs/current/installation/jdbc.html
e.g. As an example, using Presto with a hive metastore with a database called datakeep
jdbc:presto://<YOUR EMR CLUSTER MASTER DNS>:8889/hive
OR
jdbc:presto://<YOUR EMR CLUSTER MASTER DNS>:8889/hive/datakeep
UPDATE Feb 2018
EMR 5.11.1 is using presto 0.187 and there is an issue in the way Zeppelin interpreter provides properties to the Presto Driver, causing an error something like Unrecognized connection property 'url'
Currently the only solutions appear to be using an older version in the artifact, or manually uploading a patched presto driver
See https://github.com/prestodb/presto/issues/9254 and https://issues.apache.org/jira/browse/ZEPPELIN-2891
In my case using an old reference to a driver (apparently must be older than 0.180) e.g. com.facebook.presto:presto-jdbc:0.179 did not work, and zeppelin gave me an error about can't download dependencies. Funny error but probably something to do with Zeppelin's local maven repo not containing this, not sure I gave up on that.
I can confirm that patching the driver works.
(Assuming you have java & maven installed)
Clone the presto github repo
Checkout the release tag e.g. git checkout 0.187
Make the edits as per that patch https://groups.google.com/group/presto-users/attach/1231343dbdd09/presto-jdbc.diff?part=0.1&authuser=0
Build the jar using mvn clean package
Copy the jar to the zeppelin machine somewhere zeppelin user has permission to read.
In the interpreter, under the Dependencies - Artifacts section, instead of a maven reference use the absolute path to that jar file.
There appears to be an issue passing the user to the presto driver, so just add it to the "default.url" jdbc connection string as a url parameter, e.g.
jdbc:presto://<YOUR EMR CLUSTER MASTER DNS>:8889?user=hadoop
Up and running. Meanwhile, it might be worth considering Athena as an alternative to Presto give it's serverless & is effectively just a fork of Presto. It does have limitation to External hive tables only, and they must be created in Athena's own catalog (or now in AWS Glue catalog, also restricted to External tables).

Chris Kang has a good post on doing that in spark-shell, http://theckang.com/2016/spark-with-presto/. I don't see you wouldn't be able to do that in Zeppelin. Another helpful post is making sure you have the right Java version in EMR, http://queirozf.com/entries/update-java-to-jdk-8-on-amazon-elastic-mapreduce. The current Presto version as of writing only runs on Java 8. I hope it sets you in the right direction.

Related

Migrating Data from Nexus2 to Nexus3 via Upgrade Agent

based on the documentation to upgrade Nexus2 to Nexus3 we can use the Upgrade Agent but now I am wondering is it possible to use it for data migration, my use case is, I have already Nexus3 with data inside, for the other project we are using Nexus2 which now we want to move data to Nexus3, just wondering if migrating in this way cause some configuration issue or overriding blob in Nexus3.
Does anyone tried it for migrating data from one instance to already existed instance with data inside?
at the end I had to come with this solution as I am using the Nexus OSS.
first download the target repository from Nexus 2
wget --user user --password pass --recursive --no-parent http://NEXUS2-URL/nexus/content/repositories/maven-releases/
then used this library to import them:
https://github.com/AlexLiue/nexus-repository-import-scripts, it supports NuGet, Maven, NPM
I would check repository import, maybe that solves your problems

Migrate existing Artifactory OSS installation to existing Artifactory PRO installation

I tried setting up a remote-repo on my PRO installation to replicate from the OSS installation but I get an error.
Error testing pull replication config: Replication to remote
open-source Artifactory instance is not supported.
Is there a script that can use the CLI to download each OSS artifact and upload to the PRO installation?
Or, do I need to purchase a PRO license, export the OSS version, and import into a new PRO installation, just to be able to replicate from one instance to the other?
I think your best option is to follow these instructions from the JFrog wiki.
Note that if you've already installed your new Pro and started uploading Artifacts to it, you might need to run an Export on each repo, do a "clean upgrade" as per the link, and import the repos data back in. Do not do a full export on your pro, as the import will override the OSS data you upgraded.
I ended up downloading all the artifacts from the OSS artifactory (20GB) and wrote a simple script using jfrog cli to upload the files to the PRO artifactory. No down time, and didn't have to modify a working server just to be compatible for replication.
There is this (new) Wiki page that talks about updating OSS to Pro in place. I couldn't make it to work, the license would not apply properly, and the start-up kept failing. I also didn't exactly want to do an "in-place" update, and instead tried to run Pro version on a separate system against a copy of the OSS data.
The remaining method (barring manually re-importing all artifacts as #Branson did) is full import export. There doesn't seem to be clear instructions on how to do this (anymore), the "Upgrading Artifactory" wiki page doesn't anymore talk about migrating between installation types. It looks like there was a section for this there before, judging by the URL fragment in the OP's URL, but it's no longer there.
Having just completed this myself, this is the process that I followed. Note that in my case, the Pro version runs on another system.
Mount a sufficiently large drive to do a full system export
Prepare Pro instance - set up (temporary) admin password and enter the license key
On OSS instance - disable garbage collection and artifact clean up in Administration→Artifactory→Advanced→Maintenance. I did this by simply adding a next year in both cron expressions.
On OSS instance - disable the encryption as explained on this page (yes, this can only be done using an W/S API call). Failure to do so will likely land you on this problem, and you would have wasted your time.
On OSS instance - start the full system export ("Export System") in Administration→Artifactory→Import & Export→System. Check "Output Verbose Log" and have all other checkboxes unchecked.
If your database is of any decent size, the page will eventually show an "Oops" error. Ignore that, and keep monitoring the export process in the logs (artifactory-service.log).
Once the export is finished, detach the drive, and attach it to the Pro instance. Mount the file system
On Pro instance - start the full system import ("Import System") in Administration→Artifactory→Import & Export→System. Check "Output Verbose Log" and have all other checkboxes unchecked.
Again, the page will either timeout with "Oops" message, or show a login screen again. Ignore that, and monitor artifactory-service.log for the import process. Don't touch the UI until import completes. Once completed, your user database will be whatever it was in your OSS version. For me, the import took about 220% of the export time.

How to configure target url for BPM 8.5.6 Standard?

I'm trying to install IBM BPM 8.5.6 in a linux environment with Oracle database.
Steps I followed to install was
Installed the IBM Installation
Manager using BPM PFS
Installed WAS
and BPM Process Center using The
installation manager.
Created 3 oracle schema for shred db, process
server and performance server
Configured the installation using
sample single cluster process center
file provided by IBM. : using
BPMConfig –create option
The installation was successful and I could see all tables being created. Then I started started it using BPMConfig –start option. That too completed successfully.
I didn't change any ports so it should be using all default ports. Afterwards when I try to access the console like http://servername:9080/ProcessAdmin or http://servername:9080/ProcessCenter or anything i'm getting a 404 error message Error 404: com.ibm.ws.webcontainer.servlet.exception.NoTargetForURIException: No target servlet configured for uri: /ProcessAdmin
Do I have to do anything else? Or what is the starting point or default url to get to process portal or admin console. The WAS admin console is working fine.
Any help is appreciated. Thanks.
Since you probably used custom installation, you have to properly initialize data calling following command:
bootstrapProcessServerData.bat -clusterName cluster_name

Changing Git protocol for RStudio project already under version control in Windows

I love using RStudio for it's built-in integration with version control systems. However with RStudio on Windows is there a way to change the Git protocol from http to ssh or vice versa for a project already under version control without first having to delete and recreate the project?
I might be missing something, but I originally cloned my repo using http which I subsequently found to be a massive pain because every time I want to push project changes to GitHub I have to re-enter my username and password. So I removed the project from version control(Project -> Project Option -> Git/SVN -> Version Control System: none) and then tried to re-add version control hoping to use ssh but it will only allow you to go back to the original protocol you selected when creating the project in the first place.
The only way I have found to change protocol it is to delete the project and then create a new project from GitHub using the correct ssh parameters. I'd really like to be able to change projects version control protocol from http to ssh without deleting and re-cloning first.
Is this possible?
Check out git config and the whole configuration stuff. You can configure several remotes to make the "distributed" aspect of git work.
You can try just copying the whole repository (or just .git/config, keep a copy!) and check what happens with your specific case when you change the configuration. It depends on lots of things that aren't under git's control, like firewall configurations en route, and the configuration on the other end.

Installing apex in solaris

I neva worked with solaris or Linux before, and would like to find out how i can install oracle application express using command line in solaris, after i have installed my oracle database 11g in solaris aswell.
I already have an idea of how to install the database.
i agree with the comments above, the apex installation guide on the oracle site is easy to follow, basically, you will have to:
execute scripts in the oracle database (first create some tablespaces, then run an sql script that will install the apex module)
deploy a file containing an apex "listener" that you will have to quickly configure, and run a WAR (java) file to run apex
and that's it :)
Installation of Application Express (APEX) is largely Operating System agnostic. The process is based around running a number of scripts. When performing APEX installations, my primary tool is SQL Plus.
Depending on the version\edition of the Oracle Database you install, you probably already have a version of APEX ready to use. You can check which version of APEX your database has by running the following sql statement:
select version_no from apex_release
More information about installing Apex can be found in the documentation.
the documentation on the oracle website is well done.
i installed apex on Solaris 10 and it works fine
basically, you just need to execute a few SQL scripts (create some users, import data), and afterwards run a java program to start the apex listener
check the documentation, it should be understandable even without too much of Solaris knowledge

Resources