How to upload a R data frame into a google drive ? - r

I am using googledrive package from CRAN. But, function - drive_upload lets you upload a local file and not a data frame. Can anybody help with this?

Just save a data_frame in question to a local file. Most basic options would be saving to CSV or saving an RData.
Example:
test <- data.frame(a = 1)
tempFileCon <- file()
write.csv(test, file = tempFileCon)
rm(test)
load("test.Rds")
exists("test")
Since clarified it is not possible to use temporary file we could use a file connection.
test <- data.frame(a = 1)
tempFileCon <- file()
write.csv(test, file = tempFileCon)
And now we have the file conneciton in memory that we can use to provide for other functions. Caveat - use literal object name to address it and not quotations like you would with actual files.

Unfortunately I can find no way to push the dataframe up directly, but just to document for others trying to get the basics accomplished that this question touches upon is with the following code that writes a local .csv and then bounces it up through tidyverse::googledrive to express itself as a googlesheet.
write_csv(iris, 'df_iris.csv')
drive_upload('df_iris.csv', type='spreadsheet')

You can achieve this using gs_add_row from googlesheets package. This API accepts dataframes directly as input parameter and uploads data to the specified google sheet. Local files are not required.
From the help section of ?gs_add_row:
"If input is two-dimensional, internally we call gs_add_row once per input row."

This can be done in two ways. Like mentioned by others, a local file can be created and this can be uploaded. It is also possible to create a new spreadsheet in your drive. This spreadsheet will be created in the main folder of your drive. If you want it stored somewhere else, you can move it after creation.
# install the packages
install.packages("googledrive", "googlesheets4")
# load the libraries
library(googledrive)
library(googlesheets4)
## With local storage
# Locally store the file
write.csv(x = iris, file = "iris.csv")
# Upload the file
drive_upload(media = "iris.csv", type='spreadsheet')
## Direct storage
# Create an empty spreadsheet. It is stored as an object with a sheet_id and drive_id
ss <- gs4_create(name = "my_spreadsheet", sheets = "Sheet 1")
# Put the data.frame in the spreadsheet and provide the sheet_id so it can be found
sheet_write(data=iris, ss = ss, sheet ="Sheet 1")
# Move your spreadsheet to the desired location
drive_mv(file = ss, path = "my_creations/awesome location/")

Related

How to send gs4_create() ouput directly to a googledrive folder

I am trying to upload hundreds of googlesheets using the new R googlesheets4 package using the function gs4_create. I can successfully upload files in the root of the google drive but fail to see how I can send it inside a pre existing folder on google drive.
See the following reprex:
df <- data.frame(a=1:10,b=letters[1:10])
googlesheets4:: gs4_create(name="TEST_FOLDER/testsheet",sheets=df)
It creates a file named : "TEST_FOLDER/testsheet in the root folder.
While I want to create the file inside the TEST_FOLDER.
I know I can use write_sheet() on files pre existing inside a folder but I want to create new files, not write in pre existing files. I also know the googledrive::drive_upload() will allow me to upload csv files but I do not like the format of the csv files when they are uploaded, as they go as plain text sheets with no frozen first row. This is possible only through the googlesheets4 package. So back to my question:
How do I create a googlesheet files (in bulk) inside the TEST_FOLDER?
First, you have to create a folder with drive_mkdir(name = "TEST_FOLDER") from the googledrive package. Once you created it, I would recommend you to work with the ids of the folder and the files. So, the next step to find the id would be:
folder_id <- drive_find(n_max = 10, pattern = "TEST_FOLDER")$id
*This works if you have only one folder called TEST_FOLDER in your Google Drive. If you have more than one, i would recommend you to copy/paste the id directly, or identifying the id you want before assigning to the "folder_id" object.
*If you don't want to do this step, you can also copy/paste the id directly from the Google Drive url
Once you have it, you can program a for loop in order to upload all files. For example, supposing your sheets are called sheet1, sheet2... sheet10:
a <- rep("sheet",10)
b <- 1:10
names <- paste0(a,b)
for(x in names){
gs4_create(name = x, sheets = list(sheet1 = get(x)))
sheet_id <- drive_find(type = "spreadsheet", n_max = 10,
pattern = x)$id
sheet_id <- drive_find(type = "spreadsheet", n_max = 10,
pattern = x)$id
drive_mv(file = as_id(sheet_id), path = as_id(folder_id))
}
NOTE: If you have too many files in your root folder of Google Drive, the mkdir function will take too much time. That's why I recommend working with ids. If you have this problem, you could create this folder manually, copy the id and assign it to the "folder_id" object.

Reading csv into Rstudio from google cloud storage

I want to read csv file from google cloud storage with a function similar to
read.csv.
I used library googleCloudStorageR and I can't find a function for that. I don't want to download it, I just want to read it in environment like a data frame.
If you download a .csv file, googleCloudStorageR will by default put it into a data.frame for you via write.csv - you can turn off the behaviour by specifying saveToDisk
# will make a data.frame
gcs_get_object("mtcars.csv")
# save to disk as a CSV
gcs_get_object("mtcars.csv", saveToDisk = "mtcars.csv")
You can specify your own parse function by supplying it via parseFunction
## default gives a warning about missing column name.
## custom parse function to suppress warning
f <- function(object){
suppressWarnings(httr::content(object, encoding = "UTF-8"))
}
## get mtcars csv with custom parse function.
gcs_get_object("mtcars.csv", parseFunction = f)
I’ve tried running a sample csv file with the as.data.frame() function.
In order to run this code snippet make sure you install (install.packages("data.table")) and included the library library(“data.table”)
Also be sure that you include the fread() within the as.data.frame() function in order to read the file from it’s location.
Here is the code snippet I ran and managed to display the data frame for my data set:
library(“data.table”)
MyData <- as.data.frame(fread(file="$FILE_PATH",header=TRUE, sep = ','))
print(MyData)
Reading Data with TensorFlow:
There is one other way you can read a csv from your cloud storage with the TensorFlow API. I would assume you are accessing this data from a bucket? Firstly, you would need to install the “readr” and “cloudml” packages for these functionalities to work. Then you would need to use gs_data_dir(“gs://your-bucket-name”) along with specifying the file path file.path(data_dir, “something.csv”). You would then want to read data from the file path with read_csv(file.path(data_dir, “something.csv”)). If you want it formatted as a data frame it should look something like this.
library(“data.table”)
library(cloudml)
library(readr)
data_dir <- gs_data_dir(“gs://your-bucket-name”)
MyData <- as.data.frame(read_csv(file.path(data_dir, “something.csv”)))
print(MyData)
Make sure you have properly authenticated access to your storage
More information in this link

Use R package "googledrive" to load in R a file from my googledrive

I have a file in my google drive that is an xlsx. It is too big so it is not automatically converted to a googlesheet (that's why using googlesheets package did not work). The file is big and I can't even preview it through clicking on it on my googledrive. The only way to see it is to download is as an .xlsx . While I could load it as an xlsx file, I am trying instead to use the googledrive package.
So far what I have is:
library(googledrive)
drive_find(n_max = 50)
drive_download("filename_without_extension.xlsx",type = "xlsx")
but I got the following error:
'file' does not identify at least one Drive file.
Maybe it is me not specifying the path where the file lives in the Drive. For example : Work\Data\Project1\filename.xlsx
Could you give me an idea on how to load in R the file called filename.xlsx that is nested in the drive like that?
I read the documentation but couldn't figure out how to do that.Thanks in advance.
You should be able to do this by:
library(googledrive)
drive_download("~/Work/Data/Project1/filename.xlsx")
The type parameter is only for Google native spreadsheets, and does not apply to raw files.
I want to share my way.
I do this way because I keep on updating the xlsx file. It is a query result that comes from an ERP.
So, when I tried to do it by googleDrive Id, it gave me errors because each time the ERP update the file its Id change.
This is my context. Yours can be absolutely different. This file changes just 2 or three times at month. Even tough it is a "big" xlsx file (78-80K records with 19 factors), I use it for just seconds to calculate some values and then I can trash it. It does not have any sense to store it. (to store is more expensive than upload)
library(googledrive)
library(googlesheets4) # watch out: it is not the CRAN version yet 0.1.1.9000
drive_folder_owner<-"carlos.sxxx#xxxxxx.com" # this is my account in this gDrive folder.
drive_auth(email =drive_folder_owner) # previously authorized account
googlesheets4::sheets_auth(email =drive_folder_owner) # Yes, I know, should be the same, but they are different.
d1<-drive_find(pattern = "my_file.xlsx",type = drive_mime_type("xlsx")) # This is me finding the file created by the ERP, and I do shorten the search using the type
meta<-drive_get(id=d1$id)[["drive_resource"]] # Get the id from the file in googledrive
n_id<-glue("https://drive.google.com/open?id=",d1$id[[1]]) # here I am creating a path for reading
meta_name<- paste(getwd(),"/Files/",meta[[1]]$originalFilename,sep = "") # and a path to temporary save it.
drive_download(file=as_id(n_id),overwrite = TRUE, path = meta_name) # Now read and save locally.
V_CMV<-data.frame(read_xlsx(meta_name)) # store to data frame
file.remove(meta_name) # delete from R Server
rm(d1,n_id) # Delete temporary variables

Import excel from Azure blob using R

I have the basic setup done following the link below:
http://htmlpreview.github.io/?https://github.com/Microsoft/AzureSMR/blob/master/inst/doc/tutorial.html
There is a method 'azureGetBlob' which allows you to retrieve objects from the containers. however, it seems to only allow "raw" and "text" format which is not very useful for excel. I've tested the connections and etc, I can retrieve .txt / .csv files but not .xlsx files.
Does anyone know any workaround for this?
Thanks
Does anyone know any workaround for this?
There is no file type on the azure blob storage, it is just a blob name. The extension type is known for OS. If we want to open the excel file in the r, we could use the 3rd library to do that such as readXl.
Work around:
You could use the get blob api to download the blob file to local path then use readXl to read the file. We also get could more demo code from this link.
# install
install.packages("readxl")
# Loading
library("readxl")
# xls files
my_data <- read_excel("my_file.xls")
# xlsx files
my_data <- read_excel("my_file.xlsx")
Solved with the following code. Basically, read the file in byte then wrote the file to disk then read it into R
excel_bytes <- azureGetBlob(sc, storageAccount = "accountname", container = "containername", blob=blob_name, type="raw")
q <- tempfile()
f <- file(q, 'wb')
writeBin(excel_bytes, f)
close(f)
result <- read.xlsx(q, sheetIndex = sheetIndex)
unlink(q)

Write R data as csv directly to s3

I would like to be able to write data directly to a bucket in AWS s3 from a data.frame\ data.table object as a csv file without writing it to disk first using the AWS CLI.
obj.to.write.s3 <- data.frame(cbind(x1=rnorm(1e6),x2=rnorm(1e6,5,10),x3=rnorm(1e6,20,1)))
at the moment I write to csv first then upload to an existing bucket then remove the file using:
fn <- 'new-file-name.csv'
write.csv(obj.to.write.s3,file=fn)
system(paste0('aws s3 ',fn,' s3://my-bucket-name/',fn))
system(paste0('rm ',fn))
I would like a function that writes directly to s3? is that possible?
In aws.s3 0.2.2 the s3write_using() (and s3read_using()) functions were added.
They make things much simpler:
s3write_using(iris, FUN = write.csv,
bucket = "bucketname",
object = "objectname")
The easiest solution is just to save the .csv in a tempfile(), which will be purged automatically when you close your R session.
If you need to only work in memory you can do this by doing write.csv() to a rawConnection:
# write to an in-memory raw connection
zz <- rawConnection(raw(0), "r+")
write.csv(iris, zz)
# upload the object to S3
aws.s3::put_object(file = rawConnectionValue(zz),
bucket = "bucketname", object = "iris.csv")
# close the connection
close(zz)
In case you're unsure, you can then check that this worked correctly by downloading the object from S3 and reading it back into R:
# check that it worked
## (option 1: save locally)
save_object(object = "iris.csv", bucket = "bucketname", file = "iris.csv")
read.csv("iris.csv")
## (option 2: keep in memory)
read.csv(text = rawToChar(get_object(object = "iris.csv", bucket = "bucketname")))
Sure -- but 'saving to file' requires that your OS sees the desired target directory as an accessible filesystem. So in essence you "just" need to mount S3. Here is a quick Google search for that topic.
An alternative is writing to a temporary file, and then using whatever you use to transfer files. You could code up both operations as a simple helper function.

Resources