How to connect to Amazon Rekognition using paws package in R - r

I would like to connect to the AWS Rekognition package using R. The package "paws" in CRAN seems to cover this. However it fails to work due to an error "Error in get_region(): no region provided" despite the fact that it is specified in Sys.setenv. Note the "image.jpg" is a local image that is converted to base64enc using knitr to send to the Rekognition API using the detect_labels command in rekognition(), part of paws package.
library(paws)
library(knitr)
Sys.setenv("AWS_ACCESS_KEY_ID" = "xxxxxx", "AWS_SECRET_ACCESS_KEY" = "xxxx", "AWS_DEFAULT_REGION"= "eu-west-2")
svc <- rekognition()
img_X <- image_uri("image.jpg")
svc$detect_labels(Image=img_X)
Error in get_region() : No region provided

Try Sys.setenv(AWS_REGION = "eu-west-2"). This worked for me.
Full code:
Sys.setenv(AWS_REGION = "eu-west-2")
library(paws.machine.learning)
svc <- paws.machine.learning::rekognition()
# image in S3 bucket
svc$detect_text(
Image = list(
S3Object = list(
Bucket = "bucket",
Name = "path_to_image"
)
)
)
# Local image
download.file("https://www.freecodecamp.org/news/content/images/2019/08/0_4ty0Adbdg4dsVBo3.png",'test.png', mode = 'wb')
svc$detect_text(
Image = list(
Bytes = "test.png"
)
)

Related

Get data from Workspace in Azure ML Studio

I have a problem with connecting to workspace in Azure ML Studio. I am using library azuremlsdk, but it doesn't work.
My code looks like:
library(azuremlsdk)
workspace_name = 'workspace_name'
subscription_id = 'subscription_id'
resource_group = 'resource_group'
ws <- get_workspace(name = workspace_name, subscription_id = subscription_id, resource_group = resource_group)
dataSet <- get_dataset_by_name(ws, name = "registration_name", version = "latest")
After I run that code I get error:
I have know idea what is wrong. I tried to do it in python and it works fine with the same parameters, code:
from azureml.core import Workspace, Dataset
subscription_id = 'subscription_id'
resource_group = 'resource_group'
workspace_name = 'workspace_name'
workspace = Workspace(subscription_id, resource_group, workspace_name)
dataset = Dataset.get_by_name(workspace, name='registration_name')
Any idea how can I fix that bug?
I tried to reproduce the issue and got the solution. Follow the below procedure to access the dataset contents from the workspace.
data.set <-data.frame(installed.packages("azuremlsdk"));
data.set <-data.frame(installed.packages("remotes"));
remotes::install_cran('azuremlsdk', repos = 'http://cran.us.r-project.org', INSTALL_opts=c("--no-multiarch"))
library(azuremlsdk)
ws <- get_workspace(name = workspace_name, subscription_id = subscription_id, resource_group = resource_group, auth = authentication)
print(ws)
dataSet <- get_dataset_by_name(ws, name = "registration_name", version = "latest")
dataSet

How to get R to read a gdb file?

I am trying to get R to read in a gdb file. First thing I did was to find out its layers, which I did by running:
ogrListLayers("my_data.gdb")
It turns to out my_data has two large layers. I have tried opening both but have had no success. Here is what I have tried so far:
1)
Wont_open <- readOGR(dsn = "D:/my_data.gdb", layer = "layer_1", dropNULLGeometries = F)
I have tried the above with and without the dropNULLGeometries argument and for both layers in my_data. When running this, I get the following error:
Error in readOGR(dsn = "D:/my_data.gdb", :
Unsupported field type: Binary
Wont_open <- st_read(dsn="D:/my_data.gdb", layer = "layer_1")
I have tried the above for both layers in my_data. When I run this, R simply stops working after about 1 hour of having started the process.
3)
read_GDB_Layer <- function(dsn, layerName, overwrite = T){
conversionDir <- tempdir()
gdalUtils:: ogr2ogr(src_datasource_name = dsn, dst_datasource_name = conversionDir, f = "ESRI Shapefile", layer + layerName, verbose = T, overwrite = overwrite)
df <- read.dbf(file.path(conversionDir, paste0(layerName, ".gdbtable")))
return(df)}
Then,
Wont_open <- read_GDB_Layer(dsn = "D:/my_data.gdb", layerName = "layer_1")
I tried this for both layers and changed the .gdbtable argument of the function for .dbf to run it on both layers and it still did not work. I got the following warning messages:
1: In gdal_setInstallation(search_path = NULL, rescan = FALSE, ignore.full_scan = TRUE, :
No GDAL installation found. Please install 'gdal' before continuing:
- www.gdal.org (no HDF4 support!)
- trac.osgeo.org/osgeo4w/ (with HDF4 support RECOMMENDED)
- www.fwtools.maptools.org (with HDF4 support)
2: In gdal_setInstallation(search_path = NULL, rescan = FALSE, ignore.full_scan = TRUE, :
If you think GDAL is installed, please run:
gdal_setInstallation(ignore.full_scan=FALSE)
The st_read() function worked for me, as pointed by #sven-brandt

get_file() error in google colab R enviroment

I have this error:
Error in get_file(fname = "flores.zip", origin = "https://drive.google.com/u/0/uc?export=download&confirm=sgo2&id=107ocoPLxNddbHp2MDsIWIYX9qb196WUv", : argument "file" is missing, with no default
Here is My code:
data_dir <- get_file(fname ="flores.zip",
origin ="https://drive.google.com/u/0/uc?export=download&confirm=sgo2&id=107ocoPLxNddbHp2MDsIWIYX9qb196WUv",
extract = TRUE
)
data_dir <- file.path(dirname(data_dir), "flores")
images <- list.files(data_dir, pattern = ".jpg", recursive = TRUE)
length(images)
In my desktop version RStudio works! But not in google colab R enviroment
can anybody help me?
Thanks!

How can I correctly use the cluster plan in the R future (furrr) package

I am currently using furrr to create a more organized execution of my model. I use a data.frame to pass parameters to a function in a orderly way, and then using the furrr::future_map() to map a function across all the parameters. The function works flawlessly when using the sequential and multicore futures on my local Machine (OSX).
Now, I want to test my code creating my own cluster of AWS instances (just as shown here).
I created a function using the linked article code:
make_cluster_ec2 <- function(public_ip){
ssh_private_key_file <- Sys.getenv('PEM_PATH')
github_pac <- Sys.getenv('PAC')
cl_multi <- future::makeClusterPSOCK(
workers = public_ip,
user = "ubuntu",
rshopts = c(
"-o", "StrictHostKeyChecking=no",
"-o", "IdentitiesOnly=yes",
"-i", ssh_private_key_file
),
rscript_args = c(
"-e", shQuote("local({p <- Sys.getenv('R_LIBS_USER'); dir.create(p, recursive = TRUE, showWarnings = FALSE); .libPaths(p)})"),
"-e", shQuote("install.packages('devtools')"),
"-e", shQuote(glue::glue("devtools::install_github('user/repo', auth_token = '{github_pac}')"))
),
dryrun = FALSE)
return(cl_multi)
}
Then, I create the cluster object and then check that is connected to the right instance
public_ids <- c('public_ip_1', 'public_ip_2')
cls <- make_cluster_ec2(public_ids)
f <- future(Sys.info())
And when I print f I get the specs of one of my remote instances, which indicates the socket is correctly connected:
> value(f)
sysname
"Linux"
release
"4.15.0-1037-aws"
version
"#39-Ubuntu SMP Tue Apr 16 08:09:09 UTC 2019"
nodename
"ip-xxx-xx-xx-xxx"
machine
"x86_64"
login
"ubuntu"
user
"ubuntu"
effective_user
"ubuntu"
But when I run my code using my cluster plan:
plan(list(tweak(cluster, workers = cls), multisession))
parameter_df %>%
mutate(model_traj = furrr::future_pmap(list('lat' = latitude,
'lon' = longitude,
'height' = stack_height,
'name_source' = facility_name,
'id_source' = facility_id,
'duration' = duration,
'days' = seq_dates,
'daily_hours' = daily_hours,
'direction' = 'forward',
'met_type' = 'reanalysis',
'met_dir' = here::here('met'),
'exec_dir' = here::here("Hysplit4/exec"),
'cred'= list(creds)),
dirtywind::hysplit_trajectory,
.progress = TRUE)
)
I get the following error:
Error in file(temp_file, "a") : cannot open the connection
In addition: Warning message:
In file(temp_file, "a") :
cannot open file '/var/folders/rc/rbmg32js2qlf4d7cd4ts6x6h0000gn/T//RtmpPvdbV3/filecf23390c093.txt': No such file or directory
I can not figure out what is happening under the hood, and I can not traceback() the error either from my remote machines. I have test the connection with the examples in the article and things seem to run correctly. I am wondering why is trying to create a tempdir during the execution. What am I missing here?
(This is also an issue in the furrr repo)
Disable the progress bar, i.e. don't specify .progress = TRUE.
This is because .progress = TRUE assumes your R workers can write to the a temporary file that the main R process created. This is typically only possible when you parallelize on on the same machine.
A smaller example of this error is:
library(future)
## Set up a cluster with one worker running on another machine
cl <- makeClusterPSOCK(workers = "node2")
plan(cluster, workers = cl)
y <- furrr::future_map(1:2, identity, .progress = FALSE)
str(y)
## List of 2
## $ : int 1
## $ : int 2
y <- furrr::future_map(1:2, identity, .progress = TRUE)
## Error in file(temp_file, "a") : cannot open the connection
## In addition: Warning message:
## In file(temp_file, "a") :
## cannot open file '/tmp/henrik/Rtmp1HkyJ8/file4c4b864a028ac.txt': No such file or directory

"download.file" Incomplete and inconsistent downloads

Am trying to understand why I am having inconsistent results downloading CSV files from a website archive. Don't know if the problem is at my end, the other side or just failed communications in between. Any suggestions are welcomed.
Using a R script to automate the downloading of CSV files by month and year from the HYCOM archives for analysis. The script generated the following URL trying URL 'http://ncss.hycom.org/thredds/ncss/GLBu0.08/reanalysis/3hrly?var=salinity&var=water_temp&var=water_u&var=water_v&latitude=13.875&longitude=-72.25&time_start=2012-05-01T00:00:00Z&time_end=2012-05-31T21:00:00Z&vertCoord=&accept=csv'
Running download.file successfully obtains the file about half the time, otherwise fails. Any suggestions are welcomed. The images below shows the failed run. Successful run is below.
Successful Log
#download one month of data
MM = '05'
LastDay = ndays(paste(year,MM,'01',sep="-"))
H1 = paste( as shown in image)
H2 = '-01T00:00:00Z&time_end='
#H3 = 'T21:00:00Z&timeStride=1&vertCoord=&accept=csv'
H3 = 'T21:00:00Z&vertCoord=&accept=csv'
HtmlLink <- paste(H1,year,"-",MM,H2,year,"-",MM,"-",LastDay,H3,sep="")
dest = paste("../data/",year,MM,".csv",sep="")
download.file(url =HtmlLink ,destfile=dest,cacheOK=FALSE, method="auto")
trying URL 'as shown in image'
Content type 'text/plain;charset=UTF-8' length unknown
..................................................
................downloaded 666 KB
user system elapsed
28.278 6.605 5201.421
LOG OF FAILED RUN
You can/should turn the following into a function accepting parameters and replace the hardcoded values with said params (I used httr:::parse_query() to make the list):
library(httr)
URL <- "http://ncss.hycom.org/thredds/ncss/GLBu0.08/reanalysis/3hrly"
params <- list(var = "salinity",
var = "water_temp",
var = "water_u",
var = "water_v",
latitude = "13.875",
longitude = "-72.25",
time_start = "2012-05-01T00:00:00Z",
time_end = "2012-05-31T21:00:00Z",
vertCoord = "",
accept = "csv")
dest_file <- "filename"
res <- GET(url=URL,
query=params,
timeout(360),
write_disk(dest_file, overwrite=TRUE),
verbose())
warn_for_status(res)
You can (eventually) remove the verbose() from that GET call, but it's helpful during debugging.
The main issue is that this server is s l o w and times out before the transfer is complete. Even the value of 360 might not be enough (you'll need to experiment).
Many thanks to all for the help. The suggestion by hrbrmstr appears to be an elegant answer and I look forwards to testing it. However, I was unable to install a working copy using the program manager. Installation from a local download also failed since R complained that the OS X version that I downloaded from CRAN was a windows version, not OS X. Yes, I repeated the download several times to make sure I had the right package.
As suggested by Cyrus Mohammadian, I tried the procedures in the curl library.
Running the same URL, download.file transfers failed about 50% of the time. Using curl reduced the transfer times from 2000 seconds to 1000 seconds with no failures in 12 tries.
## calculate number of days in month
ndays <- function(d) {
last_days <- 28:31
rev(last_days[which(!is.na(
as.Date( paste( substr(d, 1, 8),
last_days, sep = ''),
'%Y-%m-%d')))])[1] }
nlat = 13.875
elon = -72.25
#download one month of data
year = 2008
MM = '01'
LastDay = ndays(paste(year,MM,'01',sep="-"))
H1 = paste('http://ncss.hycom.org/thredds/ncss/GLBu0.08/reanalysis/3hrly?
var=salinity&var=water_temp&var=water_u&var=water_v&latitude=',
nlat,'&longitude=', elon,'&time_start=',sep="")
H2 = '-01T00:00:00Z&time_end='
H3 = 'T21:00:00Z&timeStride=1&vertCoord=&accept=csv'
HtmlLink <- paste(H1,year,"-",MM,H2,year,"-",MM,"-",LastDay,H3,sep="")
dest = paste("../data/",year,MM,".csv",sep="")
curl_download(url =HtmlLink ,destfile=dest,quiet=FALSE, mode="wb")

Resources