Using R to put a PDF file into an S3 bucket - r

I was able to put a pdf file in an S3 bucket using 'put_object' from the 'aws.s3' package. Basically, I moved a pdf file I had stored in my local machine to S3 succesfully but when I opened the pdf files in S3 they were all corrupted.
This is the code I'm using:
put_object('myfiles/myfile.pdf',
object = "myfile.pdf",
bucket = "myS3bucket")
Any suggestions on how this can be achieved?
Thanks

Related

Reading Tables from PDFs in S3 bucket using Camelot or Tabula packages: s3 URL

Can Python packages that pull tables from PDFs, such as Tabula and Camelot, read in the PDF from an S3 bucket - like with Pandas. For example, I can read a CSV file from the S3 bucket like this:
df = pd.read_pdf("s3://us-east-1-name/Test/Testfile.csv")
I want to be able to do the same thing using Tabula or Camelot:
dfs = tabula.read_pdf("s3://us-east-1-name/Test/Testfile.pdf", pages='all')
tables = camelot.read_pdf("s3://us-east-1-name/Test/Testfile.pdf")
I get an "HTTP Error 403: Forbidden" or "[Errno 2] No such file or directory." But there is no issue with the S3 locations. Does anyone know how I can pass an S3 URL/API with Tabula or Camelot.

Read Excel file in Amazon Sage maker using R Notebook

I am having S3 bucket named "Temp-Bucket". Inside that I am having folder named "folder".
I want to read file named file1.xlsx. This file is present inside the S3 bucket(Temp-Bucket) under the folder (folder). How to read that file ?
If you are using the R Kernel on the SageMaker Notebook Instance you can do the following:
library("readxl")
system("aws s3 cp s3://Temp-Bucket/folder/file1.xlsx .", intern = TRUE)
my_data <- read_excel("file1.xlsx")

r - read zip file from s3 using r

Is it possible to read a zipped sas file (or any kind or file) from s3 using r?
Here is what I'm trying:
library(aws.s3)
library(haven)
s3read_using(FUN = read_sas(unzip(.)),
bucket = "s3://bucket/",
object = "file.zip") # and inside is a .sas7bdat file
but it's obviously not recognizing the .. I have not found any good info on reading a .zip file from s3
I was trying to read the zip file from S3 and store it in the local Linux system. Maybe you can try this and then unzip the file and read.
library("aws.s3")
save_object("s3://mybucket/input/test.zip", file = "/home/test.zip", bucket = "mybucket")

How do I save CSV file as zip on s3 bucket using R?

I read few files from s3 and do some manipulation on those files .Now I want to save those CSV file as zip on s3 using R ?
You can write the csv as gz file using write_csv and then push to s3 using boto or AWS Cli
readr::write_csv(df, gzfile('sample.csv.gz'))
As mentioned by #sonny you can save zip file locally, by using any below function-
readr::write_tsv(df, file.path(getwd(), "mtcars.tsv.gz"))
OR
readr::write_csv(mtcars, file.path(dir, "mtcars.csv.gz"))
And then use below code to push to S3-
system(paste0("aws s3 cp ",file_path, " ", s3_path))
**Note- file_path should include complete file location with file name.

save zipped file directly from the web in r

I am trying to save a zip file from the internet onto my computer. I can download the content straight into R with:
sfile <- "http://xweb.geos.ed.ac.uk/~smaccal1/ARCLake/v3_0/PL/ALID0001.zip"
temp <- tempfile()
download.file(sfile,temp)
From here, how can I then save that zipped file on my computer without having to open it in R by unzipping the folder and then using read.table
data <- read.table(unz(temp, "a1.dat"))
unlink(temp)
and then save that data. Essentially I would like to save the files directly from the web (still zipped). How can this be done?
You can use download.file to save the file in a specified location:
sfile <- "http://xweb.geos.ed.ac.uk/~smaccal1/ARCLake/v3_0/PL/ALID0001.zip"
download.file(sfile, destfile = "/path/to/myfile.zip")

Resources