Cannot access file on S3 using R - r

access_key<-"**************"
secret_key<-"****************"
bucket<- "temp"
filename<-"test.csv"
Sys.setenv("AWS_ACCESS_KEY_ID" = access_key,
"AWS_SECRET_ACCESS_KEY" = secret_key )
buckets<-(bucketlist())
getbucket(bucket)
usercsvobj <-get_object(bucket = "","s3://part112017rscriptanddata/test.csv")
csvcharobj <- rawToChar(usercsvobj)
con <- textConnection(csvcharobj)
data <- read.csv(con)
I am a able to see the contents of the bucket, but fail to read the csv as a data frame.
[1] "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<Error>
<Code>PermanentRedirect</Code><Message>The bucket you are attempting to
access must be addressed using the specified endpoint. Please send all
future requests to this endpoint.</Message><Bucket>test.csv</Bucket>
<Endpoint>test.csv.s3.amazonaws.com</Endpoint>
<RequestId>76E9C6B03AC12D8D</RequestId>
<HostId>9Cnfif4T23sJVHJyNkx8xKgWa6/+
Uo0IvCAZ9RkWqneMiC1IMqVXCvYabTqmjbDl0Ol9tj1MMhw=</HostId></Error>"
I am using the cran versioin of the aws.S3 package .

I was able to read from an S3 bucket both in local r and via r stuio server using:
data <-read.csv(textConnection(getURL("https://s3-eu-west-1.amazonaws.com/'yourbucket'/'yourFileName")),sep = ",", header = TRUE)

Related

XLSX data upload with RestRserve

I would like to work with RestRServe to have a .xlsx file uploaded for processing. I have tried the below using a .csv with success, but some slight modifications for .xlsx with get_file was not fruitful.
ps <- r_bg(function(){
library(RestRserve)
library(readr)
library(xlsx)
app = Application$new(content_type = "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet")
app$add_post(
path = "/echo",
FUN = function(request, response) {
cnt <- request$get_file("xls")
dt <- xlsx::read.xlsx(cnt, sheetIndex = 1, header = TRUE)
response$set_body("some function")
}
)
backend = BackendRserve$new()
backend$start(app, http_port = 65080)
})
What have you tried? According to the documentation request$get_file() method returns a raw vector - a binary representation of the file. I'm not aware of R packages/functions which allow to read xls/xlsx file directly from the raw vector (probably such functions exist, I just don't know).
Here you can write body to a file and then read it normal way then:
library(RestRserve)
library(readxl)
app = Application$new()
app$add_post(
path = "/xls",
FUN = function(request, response) {
fl = tempfile(fileext = '.xlsx')
xls = request$get_file("xls")
# need to drop attributes as writeBin()
# can't write object with attributes
attributes(xls) = NULL
writeBin(xls, fl)
xls = readxl::read_excel(fl, sheet = 1)
response$set_body("done")
}
)
backend = BackendRserve$new()
backend$start(app, http_port = 65080)
Also mind that content_type argument is for response encoding, not for request decoding.

Unable to read csv from S3 using R

I am trying to read a csv from AWS S3 bucket. Its the same file which I was able to write to the bucket.When I read it I get an error. Below is the code for reading the csv:
s3BucketName <- "pathtobucket"
Sys.setenv("AWS_ACCESS_KEY_ID" = "aaaa",
"AWS_SECRET_ACCESS_KEY" = "vvvvv",
"AWS_DEFAULT_REGION" = "us-east-1")
bucketlist()
games <- aws.s3::get_object(object = "s3://path/data.csv", bucket = s3BucketName)%>%
rawToChar() %>%
readr::read_csv()
Below is the error I get
<Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Key>_data.csv</Key><RequestId>222</RequestId><HostId>333=</HostId></Error>
For reference below is how I used to write the data to the bucket
s3write_using(data, FUN = write.csv, object = "data.csv", bucket = s3BucketName
You don't need to include the protocol (s3://) or the bucket name in the object parameter of the get_object function, just the object key (filename with any prefixes.)
Should be able to do something like
games <- aws.s3::get_object(object = "data.csv", bucket = s3BucketName)

Access Azure blob storage from R notebook

in python this is how I would access a csv from Azure blobs
storage_account_name = "testname"
storage_account_access_key = "..."
file_location = "wasb://example#testname.blob.core.windows.net/testfile.csv"
spark.conf.set(
"fs.azure.account.key."+storage_account_name+".blob.core.windows.net",
storage_account_access_key)
df = spark.read.format('csv').load(file_location, header = True, inferSchema = True)
How can I do this in R? I cannot find any documentation...
The AzureStor package provides an R interface to Azure storage, including files, blobs and ADLSgen2.
endp <- storage_endpoint("https://acctname.blob.core.windows.net", key="access_key")
cont <- storage_container(endp, "mycontainer")
storage_download(cont, "myblob.csv", "local_filename.csv")
Note that this will download to a file in local storage. From there, you can ingest into Spark using standard Sparklyr methods.
Disclaimer: I'm the author of AzureStor.
If you do not want to download it, create a tempfile and then read from it
endp <- storage_endpoint("https://acctname.blob.core.windows.net", key="access_key")
cont <- storage_container(endp, "mycontainer")
fname <- tempfile()
storage_download(cont, "myblob.csv", fname)
df = read.csv(fname)

how can I save a binary file from cloud object storage to the notebook filesystem?

Frequently when working with files in IBM Cloud Object Storage from a Watson Studio notebook, I need to save the files to the notebook local file system where I can then access them from R functions.
Project-lib allows me to retrieve the file from cloud object storage as a byte array, how can I save the byte array to a file?
library(projectLib)
project <- projectLib::Project$new(projectId="secret, projectToken="secret")
pc <- project$project_context
my.file <- project$get_file("myfile.csv.gz")
#
# Question: how do I save the file to disk ??
#
df = read.csv2("myfile.csv.gz", sep = "|",
colClasses=c("ASSETUNIT_GLOBALID"="character"))
I tried using save() but this was corrupting the data in the file.
The R function writeBin was the solution for me:
library(projectLib)
project <- projectLib::Project$new(projectId="secret, projectToken="secret")
pc <- project$project_context
my.file <- project$get_file("myfile.csv.gz")
#
# writeBin was the solution :
#
writeBin(my.file, 'myfile.csv.gz', size = NA_integer_,
endian = .Platform$endian, useBytes = TRUE)
df = read.csv2("myfile.csv.gz", sep = "|",
colClasses=c("ASSETUNIT_GLOBALID"="character"))

Using R to download SAS file from ftp-server

I am attempting to download some files onto my local from an ftp-server. I have had success using the following method to move .txt and .csv files from the server but not the .sas7bdat files that I need.
protocol <- "sftp"
server <- "ServerName"
userpwd <- "User:Pass"
tsfrFilename <- "/filepath/file.sas7bdat"
ouptFilename <- "out.sas7bdat"
# Run #
## Download Data
url <- paste0(protocol, "://", server, tsfrFilename)
data <- getURL(url = url, userpwd=userpwd)
## Create File
fconn <- file(ouptFilename)
writeLines(data, fconn)
close(fconn)
When I run the getURL command, however, I am met with the following error:
Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) :
embedded nul in string:
Does anyone know of any alternative way which I can download a sas7bdat file from an ftp-server to my local, or if there is a way to alter my code below to successfully download the file. Thanks!
As #MrFlick suggested, I solved this problem using getBinaryURL instead of getURL(). Also, I had to use the function write() instead of writeLines(). The result is as follows:
protocol <- "sftp"
server <- "ServerName"
userpwd <- "User:Pass"
tsfrFilename <- "/filepath/file.sas7bdat"
ouptFilename <- "out.sas7bdat"
# Run #
## Download Data
url <- paste0(protocol, "://", server, tsfrFilename)
data <- getBinaryURL(url = url, userpwd=userpwd)
## Create File
fconn <- file(ouptFilename)
write(data, fconn)
close(fconn)
Alternatively, to transform the read data into R data frame, one can use the library haven, as follows
library(haven)
df_data= read_sas(data)

Resources