I am trying to upload hundreds of googlesheets using the new R googlesheets4 package using the function gs4_create. I can successfully upload files in the root of the google drive but fail to see how I can send it inside a pre existing folder on google drive.
See the following reprex:
df <- data.frame(a=1:10,b=letters[1:10])
googlesheets4:: gs4_create(name="TEST_FOLDER/testsheet",sheets=df)
It creates a file named : "TEST_FOLDER/testsheet in the root folder.
While I want to create the file inside the TEST_FOLDER.
I know I can use write_sheet() on files pre existing inside a folder but I want to create new files, not write in pre existing files. I also know the googledrive::drive_upload() will allow me to upload csv files but I do not like the format of the csv files when they are uploaded, as they go as plain text sheets with no frozen first row. This is possible only through the googlesheets4 package. So back to my question:
How do I create a googlesheet files (in bulk) inside the TEST_FOLDER?
First, you have to create a folder with drive_mkdir(name = "TEST_FOLDER") from the googledrive package. Once you created it, I would recommend you to work with the ids of the folder and the files. So, the next step to find the id would be:
folder_id <- drive_find(n_max = 10, pattern = "TEST_FOLDER")$id
*This works if you have only one folder called TEST_FOLDER in your Google Drive. If you have more than one, i would recommend you to copy/paste the id directly, or identifying the id you want before assigning to the "folder_id" object.
*If you don't want to do this step, you can also copy/paste the id directly from the Google Drive url
Once you have it, you can program a for loop in order to upload all files. For example, supposing your sheets are called sheet1, sheet2... sheet10:
a <- rep("sheet",10)
b <- 1:10
names <- paste0(a,b)
for(x in names){
gs4_create(name = x, sheets = list(sheet1 = get(x)))
sheet_id <- drive_find(type = "spreadsheet", n_max = 10,
pattern = x)$id
sheet_id <- drive_find(type = "spreadsheet", n_max = 10,
pattern = x)$id
drive_mv(file = as_id(sheet_id), path = as_id(folder_id))
}
NOTE: If you have too many files in your root folder of Google Drive, the mkdir function will take too much time. That's why I recommend working with ids. If you have this problem, you could create this folder manually, copy the id and assign it to the "folder_id" object.
Related
I want to read the CSV file "mydata.csv" as an input and create the output in the same directory using R. I have hard-coded for getting csv input(Domain_test.csv) and the output(MyData.csv) path as below. But I will have to share the same Rscript and the corresponding csv files with one of the users so that he/she can execute it and take the results. I want the user should to select his specific path where ever he wants and make it run without hard coding the input/output path in the script.
How it should be done in R?
#reading csv from this current directory
data <- read.csv("C:/Users/Desktop/input_output_directory/Domain_test.csv")
#generating the output In this same directory
write.csv(dataframe,"C:/Users/Desktop/input_output_directory/MyData.csv", row.names = FALSE)
You can use
wd <- choose.dir(default = "", caption = "Select folder")
setwd(wd)
At work I have files that are added to a folder path as they are received and accepted. They are .wdf files that I need to convert from wdf to csv. I then want to connect the files into a single file that can be filtered by a column name. So I'm trying to pull a subset of the files from numerous folders based on extension and date, copy those that I want into another folder, and then I will connect those.
File names that I want to pull are in the form of:
"//xyz/ExternalUsers/em/em18thjudic/uploaded_files/ACCEPTED_201907101310_UIXD#FGE18thJULDWC2Q2019.wdf"
I want all files in that path that end in .wdf and fall between a certain date (currently the month of July). I would also prefer to be able to have it pull only new files when I run the script but I haven't figured that out yet. I can get it to pull files with either the date or the file type but not both.
I have tried using tapply with file.mtime to pull by date. This didn't work so I tried to pull by files that contain certain upload dates in name.
files <- list.files(
path="//sptw02/ExternalUsers/em",
pattern = "\\.wdf$|._201907.",
full.names = TRUE,
recursive = TRUE)
dirs <- dirname(files)
lastfiles <- tapply(files, dirs, function(v) v[which.max(file.mtime(v))])
what I've tried:
1) pattern = "\\.wdf$|._201907.",
2) pattern = c("(\\.wdf$,._201907.)"),
3) pattern = "\\.wdf"|"._201707.",
I can only get it to pull either files containing that date in the name or the .wdf file extension
I expect to grab only the files that contain patterns I'm filtering for and to be copied into another folder. Instead it is copying only all files that have .wdf or _201907. I can not get both pulled. It is pulling in everything when it copies.
I have used googledrive functions successfully to access xlsx spreadsheets on my own google drive - so
drive_download(file = "DIRECTOR_TM/Faculty/Faculty Productivity/Faculty productivity.xlsx",
overwrite=TRUE)
works and saves a local copy of the file for me to run analyses on.
Mid year we switched to using team drives and the equivalent
drive_download(file = "Director/Faculty/Faculty Productivity/Faculty productivity.xlsx",
overwrite=TRUE)
doesn't work - I get an error that says "Error: 'file' does not identify at least one Drive file."
So I have tried using the team_drive_get function - and am confused
Director <- team_drive_get("Director")
does work - I get a tribble with one observation. But the file I want is in a subdirectory in the "Director" team drive. So I tried
TeamDrive <- team_drive_get("Director/Faculty/Faculty Productivity/")
but the result is a 0 obs tribble.
How do I get access to a file in a subdirectory on a team drive?
googledrive uses IDs to identify objects in a flattened file structure for your team, i.e., you don't need to know the subdirectory. If you know the name of your file, you just need to search the team drive and find the ID (see your specific question---and why I found this---addressed below).
# environment variables
FILENAME <- "your_file_name"
TEAM_DRIVE_NAME <- "your_team_name_here"
# get file(s)
gdrive_files_df <- drive_find(team_drive = TEAM_DRIVE_NAME)
drive_download(
as_id(gdrive_files_df[gdrive_files_df$name == FILENAME,]$id),
overwrite = TRUE
)
Alternatively, this is what you can do if you do need to find the specific ID of a subdirectory (perhaps for an upload where there is no existing ID for the file).
# environment variables
FILEPATH <- "your_file_path"
TEAM_SUBDIRECTORY <- "your_subdirectory"
# grab the ID of your subdirectory and upload to that directory
drive_upload(
FILEPATH,
path = as_id(gdrive_files_df[gdrive_files_df$name == TEAM_SUBDIRECTORY,]$id),
name = FILENAME,
)
I am using googledrive package from CRAN. But, function - drive_upload lets you upload a local file and not a data frame. Can anybody help with this?
Just save a data_frame in question to a local file. Most basic options would be saving to CSV or saving an RData.
Example:
test <- data.frame(a = 1)
tempFileCon <- file()
write.csv(test, file = tempFileCon)
rm(test)
load("test.Rds")
exists("test")
Since clarified it is not possible to use temporary file we could use a file connection.
test <- data.frame(a = 1)
tempFileCon <- file()
write.csv(test, file = tempFileCon)
And now we have the file conneciton in memory that we can use to provide for other functions. Caveat - use literal object name to address it and not quotations like you would with actual files.
Unfortunately I can find no way to push the dataframe up directly, but just to document for others trying to get the basics accomplished that this question touches upon is with the following code that writes a local .csv and then bounces it up through tidyverse::googledrive to express itself as a googlesheet.
write_csv(iris, 'df_iris.csv')
drive_upload('df_iris.csv', type='spreadsheet')
You can achieve this using gs_add_row from googlesheets package. This API accepts dataframes directly as input parameter and uploads data to the specified google sheet. Local files are not required.
From the help section of ?gs_add_row:
"If input is two-dimensional, internally we call gs_add_row once per input row."
This can be done in two ways. Like mentioned by others, a local file can be created and this can be uploaded. It is also possible to create a new spreadsheet in your drive. This spreadsheet will be created in the main folder of your drive. If you want it stored somewhere else, you can move it after creation.
# install the packages
install.packages("googledrive", "googlesheets4")
# load the libraries
library(googledrive)
library(googlesheets4)
## With local storage
# Locally store the file
write.csv(x = iris, file = "iris.csv")
# Upload the file
drive_upload(media = "iris.csv", type='spreadsheet')
## Direct storage
# Create an empty spreadsheet. It is stored as an object with a sheet_id and drive_id
ss <- gs4_create(name = "my_spreadsheet", sheets = "Sheet 1")
# Put the data.frame in the spreadsheet and provide the sheet_id so it can be found
sheet_write(data=iris, ss = ss, sheet ="Sheet 1")
# Move your spreadsheet to the desired location
drive_mv(file = ss, path = "my_creations/awesome location/")
I have a file in my google drive that is an xlsx. It is too big so it is not automatically converted to a googlesheet (that's why using googlesheets package did not work). The file is big and I can't even preview it through clicking on it on my googledrive. The only way to see it is to download is as an .xlsx . While I could load it as an xlsx file, I am trying instead to use the googledrive package.
So far what I have is:
library(googledrive)
drive_find(n_max = 50)
drive_download("filename_without_extension.xlsx",type = "xlsx")
but I got the following error:
'file' does not identify at least one Drive file.
Maybe it is me not specifying the path where the file lives in the Drive. For example : Work\Data\Project1\filename.xlsx
Could you give me an idea on how to load in R the file called filename.xlsx that is nested in the drive like that?
I read the documentation but couldn't figure out how to do that.Thanks in advance.
You should be able to do this by:
library(googledrive)
drive_download("~/Work/Data/Project1/filename.xlsx")
The type parameter is only for Google native spreadsheets, and does not apply to raw files.
I want to share my way.
I do this way because I keep on updating the xlsx file. It is a query result that comes from an ERP.
So, when I tried to do it by googleDrive Id, it gave me errors because each time the ERP update the file its Id change.
This is my context. Yours can be absolutely different. This file changes just 2 or three times at month. Even tough it is a "big" xlsx file (78-80K records with 19 factors), I use it for just seconds to calculate some values and then I can trash it. It does not have any sense to store it. (to store is more expensive than upload)
library(googledrive)
library(googlesheets4) # watch out: it is not the CRAN version yet 0.1.1.9000
drive_folder_owner<-"carlos.sxxx#xxxxxx.com" # this is my account in this gDrive folder.
drive_auth(email =drive_folder_owner) # previously authorized account
googlesheets4::sheets_auth(email =drive_folder_owner) # Yes, I know, should be the same, but they are different.
d1<-drive_find(pattern = "my_file.xlsx",type = drive_mime_type("xlsx")) # This is me finding the file created by the ERP, and I do shorten the search using the type
meta<-drive_get(id=d1$id)[["drive_resource"]] # Get the id from the file in googledrive
n_id<-glue("https://drive.google.com/open?id=",d1$id[[1]]) # here I am creating a path for reading
meta_name<- paste(getwd(),"/Files/",meta[[1]]$originalFilename,sep = "") # and a path to temporary save it.
drive_download(file=as_id(n_id),overwrite = TRUE, path = meta_name) # Now read and save locally.
V_CMV<-data.frame(read_xlsx(meta_name)) # store to data frame
file.remove(meta_name) # delete from R Server
rm(d1,n_id) # Delete temporary variables