Unable to read csv from S3 using R

Unable to read csv from S3 using R - r

I am trying to read a csv from AWS S3 bucket. Its the same file which I was able to write to the bucket.When I read it I get an error. Below is the code for reading the csv:
s3BucketName <- "pathtobucket"
Sys.setenv("AWS_ACCESS_KEY_ID" = "aaaa",
"AWS_SECRET_ACCESS_KEY" = "vvvvv",
"AWS_DEFAULT_REGION" = "us-east-1")
bucketlist()
games <- aws.s3::get_object(object = "s3://path/data.csv", bucket = s3BucketName)%>%
rawToChar() %>%
readr::read_csv()
Below is the error I get
<Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Key>_data.csv</Key><RequestId>222</RequestId><HostId>333=</HostId></Error>
For reference below is how I used to write the data to the bucket
s3write_using(data, FUN = write.csv, object = "data.csv", bucket = s3BucketName

You don't need to include the protocol (s3://) or the bucket name in the object parameter of the get_object function, just the object key (filename with any prefixes.)
Should be able to do something like
games <- aws.s3::get_object(object = "data.csv", bucket = s3BucketName)

Related

Amazon AWS: Passing append = TRUE to a s3write_using FUN = write_csv

I am trying to pass an argument to the write_csv function in R but I can't seem to pass it correctly. It does not append the data to the current csv file in the S3 Bucket.
Currently what I have is:
data %>%
s3write_using(.,
FUN = write_csv,
bucket = "myBUCKET",
object = "myLocationToSaveCSV.csv",
append = TRUE) # Append = TRUE is what I would like to pass to write_csv
I am able to write the data to the S3 bucket but when I want to append data to the current data it just re-writes the old data and no new data gets appended.
How can I pass extra parameters ... to the write_csv(...) funcition inside the s3write_using?

R uses positional arguments in functions (also nonpositional named args- its weird..)
so try to put the additional args directly after FUN :
data %>%
s3write_using(.,
FUN = write_csv,
append = TRUE,
bucket = "myBUCKET",
object = "myLocationToSaveCSV.csv"
)
if this doesn't work you can try to use a lambda:
data %>%
s3write_using(.,
FUN = \(x) write_csv(x,append=TRUE),
bucket = "myBUCKET",
object = "myLocationToSaveCSV.csv")

XLSX data upload with RestRserve

I would like to work with RestRServe to have a .xlsx file uploaded for processing. I have tried the below using a .csv with success, but some slight modifications for .xlsx with get_file was not fruitful.
ps <- r_bg(function(){
library(RestRserve)
library(readr)
library(xlsx)
app = Application$new(content_type = "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet")
app$add_post(
path = "/echo",
FUN = function(request, response) {
cnt <- request$get_file("xls")
dt <- xlsx::read.xlsx(cnt, sheetIndex = 1, header = TRUE)
response$set_body("some function")
}
)
backend = BackendRserve$new()
backend$start(app, http_port = 65080)
})

What have you tried? According to the documentation request$get_file() method returns a raw vector - a binary representation of the file. I'm not aware of R packages/functions which allow to read xls/xlsx file directly from the raw vector (probably such functions exist, I just don't know).
Here you can write body to a file and then read it normal way then:
library(RestRserve)
library(readxl)
app = Application$new()
app$add_post(
path = "/xls",
FUN = function(request, response) {
fl = tempfile(fileext = '.xlsx')
xls = request$get_file("xls")
# need to drop attributes as writeBin()
# can't write object with attributes
attributes(xls) = NULL
writeBin(xls, fl)
xls = readxl::read_excel(fl, sheet = 1)
response$set_body("done")
}
)
backend = BackendRserve$new()
backend$start(app, http_port = 65080)
Also mind that content_type argument is for response encoding, not for request decoding.

How to provide options when writing to S3 from R using s3write_using?

Writing to an S3 bucket from R using the aws.s3 library works like so
s3write_using(iris, FUN = write.csv, object = "iris.csv", bucket = "some-bucket")
But I cannot get an option to work. E.g. row.names=FALSE
What I've tried so far
I have tried the following
s3write_using(iris, FUN = write.csv, object = "iris.csv", bucket = "some-bucket", opts = list(row.names=FALSE))
s3write_using(iris, FUN = write.csv, object = "iris.csv", bucket = "some-bucket", opts = list("row.names"=FALSE))
# Seen here: https://github.com/cloudyr/aws.s3/issues/200
s3write_using(iris, FUN = write.csv, object = "iris.csv", bucket = "some-bucket", opts=list(headers = c('row.names' = 'FALSE')))
None of the above errors (or even warns), but the resulting .csv has row names, which it shouldn't
Question
How do I write to an S3 bucket from R using s3write_using and specify row.names = FALSE (or any other optional parameter)?

Manual for s3write_using says:
Optional additional arguments passed to put_object or save_object, respectively.
So the optional parameters are passed to other aws.s3 functions and not write.csv.
But you can always define your own function:
mywrite <- function(x, file) {
write.csv(x, file, row.names=FALSE)
}
And then use this function in lieu of write.csv:
s3write_using(iris, FUN = mywrite, object = "iris.csv", bucket = "some-bucket")
You can even make a function generator, that is a function which returns a function which wraps around write.csv to provide the necessary arguments:
customize.write.csv <- function(row.names=TRUE, ...) {
function(x, file) {
write.csv(x, file, row.names=row.names, ...)
}
}
Which you can use like this:
s3write_using(iris,
FUN = customize.write.csv(row.names=FALSE, dec=","),
object = "iris.csv",
bucket = "some-bucket")

Look like you can just add the options you want at the end, give it a try:
s3write_using(iris,
FUN = write.csv,
object = "iris.csv",
bucket = "some-bucket",
row.names=FALSE, dec=","
)

Import file from environment instead of read.table

I am using a package of someone else. As you see, there is a ImportHistData term in the function. I want to import the file from environment as rainfall name instead of rainfall.txt. When I replace rainfall.txt with rainfall, I got this error:
Error in read.table(x, header = FALSE, fill = TRUE, na.strings = y) :
'file' must be a character string or connection
So, to import file not as a text, which way should I follow?
Original shape of the function
DisagSimul(TimeScale=1/4,BLpar=list(lambda=l,phi=f,kappa=k,
alpha=a,v=v,mx=mx,sx=NA),CellIntensityProp=list(Weibull=FALSE,
iota=NA),RepetOpt=list(DistAllowed=0.1,FacLevel1Rep=20,MinLevel1Rep=50,
TotalRepAllowed=5000),NumOfSequences=10,Statistics=list(print=TRUE,plot=FALSE),
ExportSynthData=list(exp=TRUE,FileContent=c("AllDays"),file="15min.txt"),
ImportHistData=list("rainfall.txt",na.values="NA",FileContent=c("AllDays"),
DaysPerSeason=length(rainfall$Day)),PlotHyetographs=FALSE,RandSeed=5)
Source of ImportHistData part in the function
ImportHistDataFun(mode = 1, x = ImportHistData$file,
y = ImportHistData$na.values, z = ImportHistData$FileContent[1],
w = TRUE, s = ImportHistData$DaysPerSeason, timescale = 1)

First, check documentation of the package and see if the method (?DisagSimul) allows a data frame in memory to be used for ImportHistData argument instead of reading from an external .txt file.
If the function is set up to only read a file from disk and you do not want to save your rainfall data frame permanently as a file, consider using a tempfile that exists only in the R session or until you use unlink():
# INITIALIZE TEMP FILE
tf <- tempfile(pattern = "", fileext = ".txt")
# EXPORT rainfall to FILE
write.table(rainfall, tf, row.names=FALSE)
...
# USE TEMPFILE IN METHOD
DisagSimul(...
ImportHistData = list(tf, na.values="NA", FileContent=c("AllDays"),

Cannot access file on S3 using R

access_key<-"**************"
secret_key<-"****************"
bucket<- "temp"
filename<-"test.csv"
Sys.setenv("AWS_ACCESS_KEY_ID" = access_key,
"AWS_SECRET_ACCESS_KEY" = secret_key )
buckets<-(bucketlist())
getbucket(bucket)
usercsvobj <-get_object(bucket = "","s3://part112017rscriptanddata/test.csv")
csvcharobj <- rawToChar(usercsvobj)
con <- textConnection(csvcharobj)
data <- read.csv(con)
I am a able to see the contents of the bucket, but fail to read the csv as a data frame.
[1] "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<Error>
<Code>PermanentRedirect</Code><Message>The bucket you are attempting to
access must be addressed using the specified endpoint. Please send all
future requests to this endpoint.</Message><Bucket>test.csv</Bucket>
<Endpoint>test.csv.s3.amazonaws.com</Endpoint>
<RequestId>76E9C6B03AC12D8D</RequestId>
<HostId>9Cnfif4T23sJVHJyNkx8xKgWa6/+
Uo0IvCAZ9RkWqneMiC1IMqVXCvYabTqmjbDl0Ol9tj1MMhw=</HostId></Error>"
I am using the cran versioin of the aws.S3 package .

I was able to read from an S3 bucket both in local r and via r stuio server using:
data <-read.csv(textConnection(getURL("https://s3-eu-west-1.amazonaws.com/'yourbucket'/'yourFileName")),sep = ",", header = TRUE)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Unable to read csv from S3 using R - r

You don't need to include the protocol (s3://) or the bucket name in the object parameter of the get_object function, just the object key (filename with any prefixes.) Should be able to do something like games <- aws.s3::get_object(object = "data.csv", bucket = s3BucketName)

Related

Amazon AWS: Passing append = TRUE to a s3write_using FUN = write_csv

XLSX data upload with RestRserve

How to provide options when writing to S3 from R using s3write_using?

Import file from environment instead of read.table

Cannot access file on S3 using R

Categories

Resources