How to use R AWS packages in Single Sign-On environment - r

I have searched online but I cannot find a way to use the packages such as aws.translate. My company accesses AWS using SSO and we cannot generate key pairs. I launched an ec2 instance where I run a docker file which contains R and Python. I believe lots of corporate users face similar issues. Could someone please guild on how to use those CloudyR packages in SSO environment?
I installed aws.translate, aws.ec2metadata and aws.signature. use_credentials cannot find .aws/credentials file. ecs.metadata() returns NULL. And I receive Bad Request HTTP 400 if I directly use the translate function.

Related

How to download latest files from s3 bucket into local machine using Airflow

Is there a way to download latest files from S3 bucket into my local system using Airflow .
Since I am a newbie to Airflow I don't have much idea on how to proceed. Please assist.
Short answer: You could use S3KeySensor to detect when a certain key appears in an S3 bucket and then use S3Hook.read_key() to get the content of the key.
Assuming you are completely new to Airflow, I would suggest:
Start with the tutorial
Read up on Connections, Hooks, and Sensors
Use this example as a starting point for your own DAG
As a followup:
Browse the amazon provider package docs to see what else there is for working with AWS services
Look through other examples

Access S3 bucket from AWS EC2 via r command lines

I have tried a number of things put files into my EC2 Rstudio instance, particularly uploading via putty, adding dropbox (via Louis Aslett's AMI, but only very small files sync), using filezilla and also winscp. Unfortunately after several months I can get them in, but they are corrupted on arrival.
There are some older questions related to this
To access S3 bucket from R
but they all quote packages that are no longer maintained or have no accepted answer.
This seems possible in python using boto and there are alot more answered questions on this via python.
I think maybe AWS might prefer using S3, so I am trying that, is there any r package that is current and works for uploading into EC2 from S3. I have set up EC2 and S3? Does anyone have the r code lines to use and is there anything behind the scenes that needs to be done. I was told by a colleague it was possible to do this in r with only 4 lines of r code, once the packages were installed, but she does not know how to actually do it.

Does there exist an OpenStack API with its implementation being JClouds?

I am trying to find if there exists an OpenStack REST API with its implementation being JClouds. I am willing to pay for someone to produce such a thing as an open source project.
SwiftProxy offers an OpenStack Swift implementation backed by Apache jclouds:
https://github.com/bouncestorage/swiftproxy
It back ends onto multiple jclouds storage backends including the local file system and many object stores.

Connecting to Analysis Services from R or Nodejs

I am trying to connect Analysis services from either through R or Nodejs.
For R, I have found the following library:
https://github.com/overcoil/X4R
For Nodejs, I have found the following library:
https://github.com/rpbouman/xmla4js
Analysis Services Server is external. It is not in my local machine. Currently I am able to connect it successfully from Excel using both Windows and basic authentication (username/password).
For accessing it through R or nodejs, in the following link it is said I need to configure http access using IIS. But since it is not local how can I get the
file msmdpump.dll and configure it.
In this link https://www.linkedin.com/grp/post/77616-265568694, at the end Sarah Lukens said that I need to follow the steps mentioned in https://msdn.microsoft.com/en-us/library/gg492140.aspx
Since I didn't work before in SSAS, I don't have much clarity. Can anyone please guide me in establishing the connection from R or Nodejs to Analysis services. Then I just want to submit the MDX queries and get the result.
thanks,
r karthik.
It seems there is no way to connect to your SSAS remotely without IIS. You have to share your msmdpump.dll in order to get access to your SSAS connection for third-party APIs via the xmla interface.

use julia language without internet connection (mirror?)

Problem:
I would like to make julia available for our developers on our corporate network, which has no internet access at all (no proxy), due to sensitive data.
As far as I understand julia is designed to use github.
For instance julia> Pkg.init() tries to access:
git://github.com/JuliaLang/METADATA.jl
Example:
I solved this problem for R by creating a local CRAN repository (rsync) and setting up a local webserver.
I also solved this problem for python the same way by creating a local PyPi repository (bandersnatch) + webserver.
Question:
Is there a way to create a local repository for metadata and packages for julia?
Thank you in advance.
Roman
Yes, one of the benefits from using the Julia package manager is that you should be able to fork METADATA and host it anywhere you'd like (and keep a branch where you can actually check new packages before allowing your clients to update). You might be one of the first people to actually set up such a system, so expect that you will need to submit some issues (or better yet; pull requests) in order to get everything working smoothly.
See the extra arguments to Pkg.init() where you specify the METADATA repo URL.
If you want a simpler solution to manage I would also think about having a two tier setup where you install packages on one system (connected to the internet), and then copy the resulting ~/.julia directory to the restricted system. If the packages you use have binary dependencies, you might run into problems if you don't have similar systems on both sides, or if some of the dependencies is installed globally, but Pkg.build("Pkgname") might be helpful.
This is how I solved it (for now), using second suggestion by
ivarne.I use a two tier setup, two networks one connected to internet (office network), one air gapped network (development network).
System information: openSuSE-13.1 (both networks), julia-0.3.5 (both networks)
Tier one (office network)
installed julia on an NFS share, /sharename/local/julia.
soft linked /sharename/local/bin/julia to /sharename/local/julia/bin/julia
appended /sharename/local/bin/ to $PATH using a script in /etc/profile.d/scriptname.sh
created /etc/gitconfig on all office network machines: [url "https://"] insteadOf = git:// (to solve proxy server problems with github)
now every user on the office network can simply run # julia
Pkg.add("PackageName") is then used to install various packages.
The two networks are connected periodically (with certain security measures ssh, firewall, routing) for automated data exchange for a short period of time.
Tier two (development network)
installed julia on NFS share equal to tier one.
When the networks are connected I use a shell script with rsync -avz --delete to synchronize the .julia directory of tier one to tier two for every user.
Conclusion (so far):
It seems to work reasonably well.
As ivarne suggested there are problems if a package is installed AND something more than just file copying is done (compiled?) on tier one, the package wont run on tier two. But this can be resolved with Pkg.build("Pkgname").
PackageCompiler.jl seems like the best tool for using modern Julia (v1.8) on secure systems. The following approach requires a build server with the same architecture as the deployment server, something your institution probably already uses for developing containers, etc.
Build a sysimage with PackageCompiler's create_sysimage()
Upload the build (sysimage and depot) along with the Julia binaries to the secure system
Alias a script to julia, similar to the following example:
#!/bin/bash
set -Eeu -o pipefail
unset JULIA_LOAD_PATH
export JULIA_PROJECT=/Path/To/Project
export JULIA_DEPOT_PATH=/Path/To/Depot
export JULIA_PKG_OFFLINE=true
/Path/To/julia -J/Path/To/sysimage.so "$#"
I've been able to run a research pipeline on my institution's secure system, for which there is a public version of the approach.

Resources