get_file() error in google colab R enviroment - r

I have this error:
Error in get_file(fname = "flores.zip", origin = "https://drive.google.com/u/0/uc?export=download&confirm=sgo2&id=107ocoPLxNddbHp2MDsIWIYX9qb196WUv", : argument "file" is missing, with no default
Here is My code:
data_dir <- get_file(fname ="flores.zip",
origin ="https://drive.google.com/u/0/uc?export=download&confirm=sgo2&id=107ocoPLxNddbHp2MDsIWIYX9qb196WUv",
extract = TRUE
)
data_dir <- file.path(dirname(data_dir), "flores")
images <- list.files(data_dir, pattern = ".jpg", recursive = TRUE)
length(images)
In my desktop version RStudio works! But not in google colab R enviroment
can anybody help me?
Thanks!

Related

googledrive::drive_mv gives error "Parent specified via 'path' is invalid: x Does not exist"

This is a weird one and I am hoping someone can figure it out. I have written a function that uses googlesheets4 and googledrive. One thing I'm trying to do is move a googledrive document (spreadsheet) from the base folder to a specified folder. I had this working perfectly yesterday so I don't know what happened as it just didn't when I came in this morning.
The weird thing is that if I step through the function, it works fine. It's just when I run the function all at once that I get the error.
I am using a folder ID instead of a name and using drive_find to get the correct folder ID. I am also using a sheet ID instead of a name. The folder already exists and like I said, it was working yesterday.
outFolder <- 'exact_outFolder_name_without_slashes'
createGoogleSheets <- function(
outFolder
){
folder_id <- googledrive::drive_find(n_max = 10, pattern = outFolder)$id
data <- data.frame(Name = c("Sally", "Sue"), Data = c("data1", "data2"))
sheet_id <- NA
nameDate <- NA
tempData <- data.frame()
for (i in 1:nrow(data)){
nameDate <- data[i, "Name"]
tempData <- data[i, ]
googlesheets4::gs4_create(name = nameDate, sheets = list(sheet1 = tempData)
sheet_id <- googledrive::drive_find(type = "spreadsheet", n_max = 10, pattern = nameDate)$id
googledrive::drive_mv(file = as_id(sheet_id), path = as_id(folder_id))
} end 'for'
} end 'function'
I don't think this will be a reproducible example. The offending code is within the for loop that is within the function and it works fine when I run through it step by step. folder_id is defined within the function but outside of the for loop. sheet_id is within the for loop. When I move folder_id into the for loop, it still doesn't work although I don't know why it would change anything. These are just the things I have tried. I do have the proper authorization for google drive and googlesheets4 by using:
googledrive::drive_auth()
googlesheets4::gs4_auth(token = drive_token())
<error/rlang_error>
Error in as_parent():
! Parent specified via path is invalid:
x Does not exist.
Backtrace:
global createGoogleSheets(inputFile, outPath, addNames)
googledrive::drive_mv(file = as_id(sheet_id), path = as_id(folder_id))
googledrive:::as_parent(path)
Run rlang::last_trace() to see the full context.
Backtrace:
x
-global createGoogleSheets(inputFile, outPath, addNames)
-googledrive::drive_mv(file = as_id(sheet_id), path = as_id(folder_id))
\-googledrive:::as_parent(path)
\-googledrive:::drive_abort(c(invalid_parent, x = "Does not exist."))
\-cli::cli_abort(message = message, ..., .envir = .envir)
\-rlang::abort(message, ..., call = call, use_cli_format = TRUE)
I have tried changing the folder_id to the exact path of my google drive W:/My Drive... and got the same error. I should mention I have also tried deleting the folder and re-creating it fresh.
Anybody have any ideas?
Thank you in advance for your help!
I can't comment because I don't have the reputation yet, but I believe you're missing a parenthesis in your for-loop.
You need that SECOND parenthesis below:
for (i in 1:nrow(tempData) ) {
...
}

How to get R to read a gdb file?

I am trying to get R to read in a gdb file. First thing I did was to find out its layers, which I did by running:
ogrListLayers("my_data.gdb")
It turns to out my_data has two large layers. I have tried opening both but have had no success. Here is what I have tried so far:
1)
Wont_open <- readOGR(dsn = "D:/my_data.gdb", layer = "layer_1", dropNULLGeometries = F)
I have tried the above with and without the dropNULLGeometries argument and for both layers in my_data. When running this, I get the following error:
Error in readOGR(dsn = "D:/my_data.gdb", :
Unsupported field type: Binary
Wont_open <- st_read(dsn="D:/my_data.gdb", layer = "layer_1")
I have tried the above for both layers in my_data. When I run this, R simply stops working after about 1 hour of having started the process.
3)
read_GDB_Layer <- function(dsn, layerName, overwrite = T){
conversionDir <- tempdir()
gdalUtils:: ogr2ogr(src_datasource_name = dsn, dst_datasource_name = conversionDir, f = "ESRI Shapefile", layer + layerName, verbose = T, overwrite = overwrite)
df <- read.dbf(file.path(conversionDir, paste0(layerName, ".gdbtable")))
return(df)}
Then,
Wont_open <- read_GDB_Layer(dsn = "D:/my_data.gdb", layerName = "layer_1")
I tried this for both layers and changed the .gdbtable argument of the function for .dbf to run it on both layers and it still did not work. I got the following warning messages:
1: In gdal_setInstallation(search_path = NULL, rescan = FALSE, ignore.full_scan = TRUE, :
No GDAL installation found. Please install 'gdal' before continuing:
- www.gdal.org (no HDF4 support!)
- trac.osgeo.org/osgeo4w/ (with HDF4 support RECOMMENDED)
- www.fwtools.maptools.org (with HDF4 support)
2: In gdal_setInstallation(search_path = NULL, rescan = FALSE, ignore.full_scan = TRUE, :
If you think GDAL is installed, please run:
gdal_setInstallation(ignore.full_scan=FALSE)
The st_read() function worked for me, as pointed by #sven-brandt

How to connect to Amazon Rekognition using paws package in R

I would like to connect to the AWS Rekognition package using R. The package "paws" in CRAN seems to cover this. However it fails to work due to an error "Error in get_region(): no region provided" despite the fact that it is specified in Sys.setenv. Note the "image.jpg" is a local image that is converted to base64enc using knitr to send to the Rekognition API using the detect_labels command in rekognition(), part of paws package.
library(paws)
library(knitr)
Sys.setenv("AWS_ACCESS_KEY_ID" = "xxxxxx", "AWS_SECRET_ACCESS_KEY" = "xxxx", "AWS_DEFAULT_REGION"= "eu-west-2")
svc <- rekognition()
img_X <- image_uri("image.jpg")
svc$detect_labels(Image=img_X)
Error in get_region() : No region provided
Try Sys.setenv(AWS_REGION = "eu-west-2"). This worked for me.
Full code:
Sys.setenv(AWS_REGION = "eu-west-2")
library(paws.machine.learning)
svc <- paws.machine.learning::rekognition()
# image in S3 bucket
svc$detect_text(
Image = list(
S3Object = list(
Bucket = "bucket",
Name = "path_to_image"
)
)
)
# Local image
download.file("https://www.freecodecamp.org/news/content/images/2019/08/0_4ty0Adbdg4dsVBo3.png",'test.png', mode = 'wb')
svc$detect_text(
Image = list(
Bytes = "test.png"
)
)

How to download .gz files from FTP server in R?

I am trying to download all .gz files from this link:
ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/BED/
So far I tried this and I am not getting any results:
require(RCurl)
url= "ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/BED/"
filenames = getURL(url, ftp.use.epsv = FALSE, dirlistonly = TRUE)
filenames <- strsplit(filenames, "\r\n")
filenames = unlist(filenames)
I am getting this error:
Error in function (type, msg, asError = TRUE) :
Operation timed out after 300552 milliseconds with 0 out of 0 bytes received
Can someone please help with this?
Thanks
EDIT:
I tried to run with with filenames provided to me bellow so in my r script I have:
require(RCurl)
my_url <-"ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/BED/"
my_filenames= c("bed_chr_11.bed.gz", ..."bed_chr_9.bed.gz.md5")
my_filenames <- strsplit(my_filenames, "\r\n")
my_filenames = unlist(my_filenames)
for(my_file in my_filenames){
download.file(paste0(my_url, my_file), destfile = file.path('/mydir', my_file))
}
And when I run the script I get these warnings:
trying URL 'ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/BED/bed_chr_11.bed.gz'
Error in download.file(paste0(my_url, my_file), destfile = file.path("/mydir", :
cannot open URL 'ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/BED/bed_chr_11.bed.gz'
In addition: Warning message:
In download.file(paste0(my_url, my_file), destfile = file.path("/mydir", :
URL 'ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/BED/bed_chr_11.bed.gz': status was 'Timeout was reached'
Execution halted
The file names you're trying to access are
filenames <- c("bed_chr_11.bed.gz", "bed_chr_11.bed.gz.md5", "bed_chr_12.bed.gz",
"bed_chr_12.bed.gz.md5", "bed_chr_13.bed.gz", "bed_chr_13.bed.gz.md5",
"bed_chr_14.bed.gz", "bed_chr_14.bed.gz.md5", "bed_chr_15.bed.gz",
"bed_chr_15.bed.gz.md5", "bed_chr_16.bed.gz", "bed_chr_16.bed.gz.md5",
"bed_chr_17.bed.gz", "bed_chr_17.bed.gz.md5", "bed_chr_18.bed.gz",
"bed_chr_18.bed.gz.md5", "bed_chr_19.bed.gz", "bed_chr_19.bed.gz.md5",
"bed_chr_20.bed.gz", "bed_chr_20.bed.gz.md5", "bed_chr_21.bed.gz",
"bed_chr_21.bed.gz.md5", "bed_chr_22.bed.gz", "bed_chr_22.bed.gz.md5",
"bed_chr_AltOnly.bed.gz", "bed_chr_AltOnly.bed.gz.md5", "bed_chr_MT.bed.gz",
"bed_chr_MT.bed.gz.md5", "bed_chr_Multi.bed.gz", "bed_chr_Multi.bed.gz.md5",
"bed_chr_NotOn.bed.gz", "bed_chr_NotOn.bed.gz.md5", "bed_chr_PAR.bed.gz",
"bed_chr_PAR.bed.gz.md5", "bed_chr_Un.bed.gz", "bed_chr_Un.bed.gz.md5",
"bed_chr_X.bed.gz", "bed_chr_X.bed.gz.md5", "bed_chr_Y.bed.gz",
"bed_chr_Y.bed.gz.md5", "bed_chr_1.bed.gz", "bed_chr_1.bed.gz.md5",
"bed_chr_10.bed.gz", "bed_chr_10.bed.gz.md5", "bed_chr_2.bed.gz",
"bed_chr_2.bed.gz.md5", "bed_chr_3.bed.gz", "bed_chr_3.bed.gz.md5",
"bed_chr_4.bed.gz", "bed_chr_4.bed.gz.md5", "bed_chr_5.bed.gz",
"bed_chr_5.bed.gz.md5", "bed_chr_6.bed.gz", "bed_chr_6.bed.gz.md5",
"bed_chr_7.bed.gz", "bed_chr_7.bed.gz.md5", "bed_chr_8.bed.gz",
"bed_chr_8.bed.gz.md5", "bed_chr_9.bed.gz", "bed_chr_9.bed.gz.md5"
)
The files are big, so I didn't check that this whole loop, but this worked at least for the first file. Add this to the end of your code.
my_url <- 'ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/BED/'
for(my_file in filenames){ # loop over the files
# download each file, saving in a directory that you need to create on your own computer
download.file(paste0(my_url, my_file), destfile = file.path('c:/users/josep/Documents/', my_file))
}

Error in ocrFile function in AbbyyR package

While I was using Abbyy cloud SDK for OCR, I keep on getting the error below when I try to use the ocrFile function which is inside the AbbyyR package.
" Error in curl_download(finishedlist$resultUrl[res$id == finishedlist$id], :
Argument 'url' must be string. "
When I send the files to the cloud and process them everything works fine but when the cloud returns the files there is a problem in downloading them. I thought that it might be a network or certificate problem but I can't solve the problem.
Thanks in advance
There is a problem in source code, it needs as.character() function for url.
I updated ocrFile function as follows:
install.packages("curl")
library(curl)
new_ocrFile<-function (file_path = "", output_dir = "./", exportFormat = c("txt",
"txtUnstructured", "rtf", "docx", "xlsx", "pptx", "pdfSearchable",
"pdfTextAndImages", "pdfa", "xml", "xmlForCorrectedImage",
"alto"), save_to_file = TRUE)
{
exportFormat <- match.arg(exportFormat)
res <- processImage(file_path = file_path, exportFormat = exportFormat)
while (!(any(as.character(res$id) == as.character(listFinishedTasks()$id)))) {
Sys.sleep(1)
}
finishedlist <- listFinishedTasks()
res$id <- as.character(res$id)
finishedlist$id <- as.character(finishedlist$id)
if (identical(save_to_file, FALSE)) {
res <- curl_fetch_memory(as.character(finishedlist$resultUrl[res$id ==
finishedlist$id]))
return(rawToChar(res$content))
}
curl_download(as.character(finishedlist$resultUrl[res$id == finishedlist$id]),
destfile = paste0(output_dir, unlist(strsplit(basename(file_path),
"[.]"))[1], ".", exportFormat))
}
I hope, it helps.

Resources