Accessing files from Google cloud storage in RStudio - r

I have been trying to create connection between the Google cloud storage and RStudio server(The one I spinned up in Google cloud), so that I can access the files in R to run sum analysis on.
I have found three different ways to do it on the web, but I don't see many clarity around these ways so far.
Access the file by using the public URL specific to the file [This is not an option for me]
Mount the Google cloud storage as a disc in RStudio server and access it like any other files in the server [ I saw someone post about this method but could not find on any guides or materials that shows how it's done]
Using the googleCloudStorageR package to get full access to the Cloud Storage bucket.
The step 3 looks like the pretty standard way to do it. But I get following error when I try to hit the gcs_auth() command
Error in gar_auto_auth(required_scopes, new_user = new_user, no_auto =
no_auto, : Cannot authenticate -
options(googleAuthR.scopes.selected) needs to be set to
includehttps://www.googleapis.com/auth/devstorage.full_control or
https://www.googleapis.com/auth/devstorage.read_write or
https://www.googleapis.com/auth/cloud-platform
The guide on how to connect using this is found on
https://github.com/cloudyr/googleCloudStorageR
but it says it requires a service-auth.json file to set the environment variables and all other keys and secret keys, but do not really specify on what these really are.
If someone could help me know how this is actually setup, or point me to a nice guide on setting the environment up, I would be very much grateful.
Thank you.

Before using any services by google cloud you have to attach your card.
So, I am assuming that you have created the account, after creating the account go to Console ,if you have not created Project then Create Project, then click on sidebar find APIs & Services > Credentials.
Then,
1)Create Service Account Keys save this File in json you can only download it once.
2)OAuth 2.0 client ID give the name of the app and select type as web application and download the json file.
Now For Storage go to Sidebar Find Storage and click on it.
Create Bucket and give the name of Bucket.
I have added the single image in bucket, you can also add for the code purpose.
lets look how to download this image from storage for other things you can follow the link that you have given.
First create environment file as .Renviron so it automatically catches the json file and save it in a working directory.
In .Renviron file add those two downloaded json files like this
GCS_AUTH_FILE="serviceaccount.json"
GAR_CLIENT_WEB_JSON="Oauthclient.json"
#R part
library(googleCloudStorageR)
library(googleAuthR)
gcs_auth() # for authentication
#set the scope
gar_set_client(scopes = c("https://www.googleapis.com/auth/devstorage.read_write",
"https://www.googleapis.com/auth/cloud-platform"))
gcs_get_bucket("you_bucket_name") #name of the bucket that you have created
gcs_global_bucket("you_bucket_name") #set it as global bucket
gcs_get_global_bucket() #check if your bucket is set as global,you should get your bucket name
objects <- gcs_list_objects() # data from the bucket as list
names(objects)
gcs_get_object(objects$name[[1]], saveToDisk = "abc.jpeg") #save the data
**Note :**if you dont get json file loaded restart the session using .rs.restartR()
and check the using
Sys.getenv("GCS_AUTH_FILE")
Sys.getenv("GAR_CLIENT_WEB_JSON")
#it should show the files

You probably want the FUSE adaptor - this will allow you to mount your GCS bucket as a directory on your Server.
Install gcsfuse on the R server.
create a mnt directory.
run gcsfuse your-bucket /path/to/mnt
Be aware though that RW performance isnt great vis FUSE
Full documentation
https://cloud.google.com/storage/docs/gcs-fuse

Related

How to make Azure batch see my data files?

I am new to Azure batch. I am trying to use R in parallel with Azure batch in rstudio to run code on a cluster. I am able to successfully start the cluster and get the example code to work properly. When I try to run my own code I am getting an error that says the cluster nodes cannot find my data files. Do I have to change my working directory to Azure batch somehow?
Any information on how to do this is much appreciated.
I have figured out how to get Azure batch to see my data files. Not sure if this is the most efficient way, but here is what I did.
Download a program called Microsoft Azure Storage Explorer which runs on my local computer.
Connect to my Azure storage using the storage name and primary storage key found in the Azure portal.
in Microsoft Azure Storage Explorer find Blob containers, right click create new container.
Upload data files to that new container.
Right click on data files and go to copy URL.
Paste URL in R like this model_Data<-read.csv(paste('https://<STORAGE NAME HERE>.blob.core.windows.net/$root/k',k,'%20data%20file.csv',sep=''),header=T)

How should I deal with twitter auth token in shinyapp?

I built simple shinyapp that download tweets from a particular account and display some simple statistics and graphs (sentiment analysis, word clouds, etc.). I used the rtweet package. I would like to publish it at https://www.shinyapps.io/. The app works as intended locally using twitter auth token saved as a global environment.
How should I safely authorize my app publishing it online? Hardcoding my API keys into the script feels a terrible idea.
You could use library(secret) and add your API key to a vault. In your shiny application you add a field where your privat key needs to be provided and with this key you can get the API key from the vault.
Alternatively, you can add a field in your APP where the api key needs to be entered directly.
I found the answers I needed using these two instructions together:
https://docs.ropensci.org/rtweet/articles/auth.html#save
How to pass environment variables to shinyapps
This allowed me to publish the app to shinyapps.io without hardcoding any secret information into the app. Instead I used the functions rtweet::rtweet_app and rtweet::auth_app like this at the top of the server.R file:
app <- rtweet::rtweet_app(bearer_token = Sys.getenv("MY_BEARER_TOKEN"))
rtweet::auth_as(app)
The part saying Sys.getenv("MY_BEARER_TOKEN") retrieves the token from an environmental variable that you store according to recipe 2 above (the bearer token that you need to put in that .Renviron file is gotten from the Twitter developer platform and your app project there). The only thing to note regarding the recipe in link 2 above is that you should not store the .Renviron file locally at your computer but in the app that you publish to shinyapps.io, (as commented by the user Erik Iverson: "This worked for me after creating a copy of my .Renviron file in the root directory of my Shiny application").

Firestore Run Functions Locally with Admin

I'm trying to run my Cloud Functions locally using the below guide
https://firebase.google.com/docs/functions/local-emulator
I'd like to be able to use the Admin SDK in my local functions. I've downloaded JSON admin keys at the Service Accounts Pane of the Google Cloud Console and it says to add it using
export GOOGLE_APPLICATION_CREDENTIALS="path/to/key.json"
I generated keys using
the PROJECTNAME#appspot.gserviceaccount.com that has
App Engine default service account credentials
NOT
firebase-adminsdk-CODE#PROJECTNAME.iam.gserviceaccount.com with firebase-adminsdk credentials
What I tried
I tried to save it down to a separate folder, and I provided the path as relative to root. And I executed this command in terminal while in my functions folder. It didn't give me any response. Just went to the next line in Terminal.
export GOOGLE_APPLICATION_CREDENTIALS="/Users/[user]/Documents/[PROJECT]/Service_Account/file_name.json"
Questions:
Did I download/use the right JSON credentials?
Is there a certain place I need to save that .json file? Or can it be anywhere n my system?
Does that path need to be from root? Or relative to my functions folder?
Where do I need to execute this command?
Should it provide some sort of response that it worked? How do we know if it does?

Deleting file from dam (publish server) will delete automatically from author server?

I've implemented code to delete the uploaded files from DAM storage[CRXDE] , I've one doubt will the code delete the file from author server too? If not how to delete the file simultaneously from author as well as 4 publish server.
With the below code , the file is getting removed from publish CRXDE.
Code:
AssetManager assetManager = (AssetManager) resourceResolver.adaptTo(AssetManager.class);
String damUtil=DamUtil.assetToBinaryPath(selectedFileName);
assetManager.removeAssetForBinary(damUtil);
To replicate changes from publish instances back to author instances you can use a mechanism called reverse replication. Normally, you replicate changes from an author to a publisher. This is the reverse operation to this, hence reverse replication.
Since it is a big topic I would like to point you to the official Adobe documentation for more information on how to configure reverse replication:
Official (reverse) replication documentation by Adobe

Enable S3 bucket contents to always be public

I've managed to use S3FS to mount an Amazon S3 folder into my Wordpress site. Basically, my gallery folder for NextGEN gallery is a symlink to a mounted S3FS folder of the bucket, so when I upload an image, the file is automatically added to the S3 bucket.
I'm busy writing an Apache rewrite rule to replace the links, to fetch gallery images from S3 instead, without having to hack or change anything with NextGEN, but one problem I'm finding, is that images are not public by default on S3.
Is there a way to change a parent folder, to make its children always be public, including new files as they are generated?
Is it possible or advisable to use a cron task to manually make a folder public using the S3 command line API?
I'm the lead developer and maintainer of Open source project RioFS: a userspace filesystem to mount Amazon S3 buckets.
Our project is an alternative to “s3fs” project, main advantage comparing to “s3fs” are: simplicity, the speed of operations and bugs-free code. Currently the project is in the “beta” state, but it's been running on several high-loaded fileservers for quite some time.
We are seeking for more people to join our project and help with the testing. From our side we offer quick bugs fix and will listen to your requests to add new features.
Regarding your issue:
if'd you use RioFS, you could mount a bucket and have a write / read access to it using the following command (assuming you have installed RioFS and have exported AWSACCESSKEYID and AWSSECRETACCESSKEY environment variables):
riofs -o allow_other http://s3.amazonaws.com bucket_name /mnt/static.example.com
(please refer to project description for command line arguments)
Please note that the project is still in the development, there are could be still a number of bugs left.
If you find that something doesn't work as expected: please fill a issue report on the project's GitHub page.
Hope it helps and we are looking forward to seeing you joined our community !
I downloaded s3curl and used that to add the bucket policy to S3.
See this link: http://blog.travelmarx.com/2012/09/working-with-s3curl-and-amazon-s3.html
You can generate your bucket policies using the Amazon Policy Generator:
http://awspolicygen.s3.amazonaws.com/policygen.html

Resources