How to write streamlit UploadedFile to temporary directory with original filename? - streamlit

Streamlit has a function that allows convenient upload of multiple files.
files = st.file_uploader('File upload', type=['txt'],accept_multiple_files=True)
Then files contains a list of UploadedFile objects which are ByteIO like. Though it is not clear how to get the filenames of the original files and write the file to a temporary directory. It is also not clear if that approach would conflict with the way streamlit operates. It basically reruns the underlying script every time an action is performed.
I am using some tools that read files based on their path given as a string. They are expected to be read from the hard drive.

You can access the name of the file with files[i].name and its content with files[i].read().
It looks like this in the end:
import os
import streamlit as st
files = st.file_uploader("File upload", type=["txt"], accept_multiple_files=True)
if len(files) == 0:
st.error("No file were uploaded")
for i in range(len(files)):
bytes_data = files[i].read() # read the content of the file in binary
print(files[i].name, bytes_data)
with open(os.path.join("/tmp", files[i].name), "wb") as f:
f.write(bytes_data) # write this content elsewhere

Related

Save multiple HDF files from an ftp list in R, giving a different name to each according to the name of the ftp link

I have a text list of HDF files I need to download from an ftp server.
This is the (example) structure of the list:
ftp://username:password#ftppath/File#1_hh_mm_ss.HDF
ftp://username:password#ftppath/File#2_hh_mm_ss.HDF
ftp://username:password#ftppath/File#3_hh_mm_ss.HDF
...
I tried to download a single file with this basic script:
url = "ftp://username:password#ftppath/File#3_hh_mm_ss.HDF"
download.file(url, destfile = "Test1.HDF")
What I would like to do is to download multiple files at once (i.e., the ones in the list above), and save them automatically, giving at each file the name of the file as it is in the ftp link (i.e., File#1_hh_mm_ss.HDF, File#2_hh_mm_ss.HDF, File#3_hh_mm_ss.HDF)
Is anyone able to help?
Thanks!
EDIT:
I noticed that the list of files that I need to download also includes different FTP URLs (i.e.:
'ftp://username1:password1#ftppath/File#1_hh_mm_ss.HDF',
'ftp://username1:password1#ftppath/File#2_hh_mm_ss.HDF',
'ftp://username2:password2#ftppath/File#3_hh_mm_ss.HDF',
'ftp://username2:password2#ftppath/File#4_hh_mm_ss.HDF',
'ftp://username3:password3#ftppath/File#5_hh_mm_ss.HDF',
'ftp://username3:password3#ftppath/File#6_hh_mm_ss.HDF',
...
This makes everything more complicated.
Would it be possible, instead, to download all of the files per each ftp URL?
For example (simplified):
ftp://username1:password1#ftppath/File#1, File#2, File#3, File#4, ... .HDF #(DOWNLOAD ALL .HDF FILES in the ftp folder)
ftp://username2:password2#ftppath/File#1, File#2, File#3, File#4, ... .HDF #(DOWNLOAD ALL .HDF FILES in the ftp folder )
ftp://username3:password3#ftppath/File#1, File#2, File#3, File#4, ... .HDF #(DOWNLOAD ALL .HDF FILES in the ftp folder )
...
Thanks a lot for your help!
you can use for loop, try this :
list_files= list(
"ftp://username:password#ftppath/File#1_hh_mm_ss.HDF",
"ftp://username:password2#ftppath/File#2_hh_mm_ss.HDF",
"ftp://username:password3#ftppath/File#3_hh_mm_ss.HDF",
"...")
for (i in 1:length(list_files))
{
file_name=basename(list_files[[i]])
#url=paste0(base_url,file_name)
destfile = paste0("path/to/created_folder/",file_name)
download.file(list_files[[i]], destfile = destfile)
}

Is there a way of reading shapefiles directly into R from an online source?

I am trying to find a way of loading shapefiles (.shp) from an online repository/folder/url directly into my global environment in R, for the purpose of making plots in ggplot2 using geom_sf. In the first instance I'm using my Google Drive to store these files but I'd ideally like to find a solution that works with any folder with a valid url and appropriate access rights.
So far I have tried a few options, the first 2 involving zipping the source folder on Google Drive where the shapefiles are stored and then downloading and unzipping in some way. Have included reproducable examples using a small test shapefile:
Using utils::download.file() to retrieve the compressed folder and unzipping using either base::system('unzip..') or zip::unzip() (loosely following this thread: Downloading County Shapefile from ONS):
# Create destination data folder (if there isn't one)
if(!dir.exists('data')) dir.create('data')
# Download the zipped file/folder
download.file("https://drive.google.com/file/d/1BYTCT_VL8EummlAsH1xWCd5rC4bZHDMh/view?usp=sharing", destfile = "data/test_shp.zip")
# Unzip folder using unzip (fails)
unzip(zipfile = "data/test_shp.zip", exdir = "data/test_shp", junkpaths = TRUE)
# Unzip folder using system (also fails)
system("unzip data/test_shp.zip")
If you can't run the above code then FYI the 2 error messages are:
Warning message:
In unzip(zipfile = "data/test_shp.zip", exdir = "data/test_shp", :
error 1 in extracting from zip file
AND
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of data/test_shp.zip or
data/test_shp.zip.zip, and cannot find data/test_shp.zip.ZIP, period.
Worth noting here that I can't even manually unzip this folder outside R so I think there's something going wrong with the download.file() step.
Using the googledrive package:
# Create destination data folder (if there isn't one)
if(!dir.exists('data')) dir.create('data')
# Specify googledrive url:
test_shp = drive_get(as_id("https://drive.google.com/file/d/1BYTCT_VL8EummlAsH1xWCd5rC4bZHDMh/view?usp=sharing"))
# Download zipped folder
drive_download(test_shp, path = "data/test_shp.zip")
# Unzip folder
zip::unzip(zipfile = "data/test_shp.zip", exdir = "data/test_shp", junkpaths = TRUE)
# Load test.shp
test_shp <- read_sf("data/test_shp/test.shp")
And that works!
...Except it's still a hacky workaround, which requires me to zip, download, unzip and then use a separate function (such as sf::read_sf or st_read) to read in the data into my global environment. And, as it's using the googledrive package it's only going to work for files stored in this system (not OneDrive, DropBox and other urls).
I've also tried sf::read_sf, st_read and fastshp::read.shp directly on the folder url but those approaches all fail as one might expect.
So, my question: is there a workflow for reading shapefiles stored online directly into R or should I stop looking? If there is not, but there is a way of expanding my above solution (2) beyond googledrive, I'd appreciate any tips on that too!
Note: I should also add that I have deliberately ignored any option requiring the package rgdal due to its imminient permanent retirement and so am looking for options that are at least somewhat future-proof (I understand all packages drop off the map at some point). Thanks in advance!
I ran into a similar problem recently, having to read in shapefiles directly from Dropbox into R.
As a result, this solution only applies for the case of Dropbox.
The first thing you will need to do is create a refreshable token for Dropbox using rdrop2, given recent changes from Dropbox that limit single token use to 4 hours. You can follow this SO post.
Once you have set up your refreshable token, identify all the files in your spatial data folder on Dropbox using:
shp_files_on_db<- drop_dir("Dropbox path/to your/spatial data/", dtoken = refreshable_token) %>%
filter(str_detect(name, "adm2"))
My 'spatial data' folder contained two sets of shapefiles – adm1 and adm 2. I used the above code to choose only those associated with adm2.
Then create a vector of the names of the shp, csv, shx, dbf, cpg files in the 'spatial data' folder, as follows:
shp_filenames<- shp_files_on_db$name
I choose to read in shapefiles into a temporary directory, avoiding the need to have to store the files on my disk – also useful in a Shiny implementation. I create this temporary directory as follows:
# create a new directory under tempdir
dir.create(dir1 <- file.path(tempdir(), "testdir"))
#If needed later on, you can delete this temporary directory
unlink(dir1, recursive = T)
#And test that it no longer exists
dir.exists(dir1)
Now download the Dropbox files to this temporary directory:
for (i in 1: length(shp_filenames)){
drop_download(paste0("Dropbox path/to your/spatial data/",shp_filenames[i]),
dtoken = refreshable_token,
local_path = dir1)
}
And finally, read in your shapefile as follows:
#path to the shapefile in the temporary directory
path1_shp<- paste0(dir1, "/myfile_adm2.shp")
#reading in the shapefile using the sf package - a recommended replacement for rgdal
shp1a <- st_read(path1_shp)

Use R package "googledrive" to load in R a file from my googledrive

I have a file in my google drive that is an xlsx. It is too big so it is not automatically converted to a googlesheet (that's why using googlesheets package did not work). The file is big and I can't even preview it through clicking on it on my googledrive. The only way to see it is to download is as an .xlsx . While I could load it as an xlsx file, I am trying instead to use the googledrive package.
So far what I have is:
library(googledrive)
drive_find(n_max = 50)
drive_download("filename_without_extension.xlsx",type = "xlsx")
but I got the following error:
'file' does not identify at least one Drive file.
Maybe it is me not specifying the path where the file lives in the Drive. For example : Work\Data\Project1\filename.xlsx
Could you give me an idea on how to load in R the file called filename.xlsx that is nested in the drive like that?
I read the documentation but couldn't figure out how to do that.Thanks in advance.
You should be able to do this by:
library(googledrive)
drive_download("~/Work/Data/Project1/filename.xlsx")
The type parameter is only for Google native spreadsheets, and does not apply to raw files.
I want to share my way.
I do this way because I keep on updating the xlsx file. It is a query result that comes from an ERP.
So, when I tried to do it by googleDrive Id, it gave me errors because each time the ERP update the file its Id change.
This is my context. Yours can be absolutely different. This file changes just 2 or three times at month. Even tough it is a "big" xlsx file (78-80K records with 19 factors), I use it for just seconds to calculate some values and then I can trash it. It does not have any sense to store it. (to store is more expensive than upload)
library(googledrive)
library(googlesheets4) # watch out: it is not the CRAN version yet 0.1.1.9000
drive_folder_owner<-"carlos.sxxx#xxxxxx.com" # this is my account in this gDrive folder.
drive_auth(email =drive_folder_owner) # previously authorized account
googlesheets4::sheets_auth(email =drive_folder_owner) # Yes, I know, should be the same, but they are different.
d1<-drive_find(pattern = "my_file.xlsx",type = drive_mime_type("xlsx")) # This is me finding the file created by the ERP, and I do shorten the search using the type
meta<-drive_get(id=d1$id)[["drive_resource"]] # Get the id from the file in googledrive
n_id<-glue("https://drive.google.com/open?id=",d1$id[[1]]) # here I am creating a path for reading
meta_name<- paste(getwd(),"/Files/",meta[[1]]$originalFilename,sep = "") # and a path to temporary save it.
drive_download(file=as_id(n_id),overwrite = TRUE, path = meta_name) # Now read and save locally.
V_CMV<-data.frame(read_xlsx(meta_name)) # store to data frame
file.remove(meta_name) # delete from R Server
rm(d1,n_id) # Delete temporary variables

Using 'R' and "aws.s3" how to push a directory to the cloud

I have a directory with subdirectories and many files that need to be pushed to Amazon S3. I am using the 'R' tool.
Is there a clean/easy way to say "push this directory and everything in it up to S3"? I am hoping to avoid pushing things up one at a time, and manually re-building the directory structures.
If you pass file names to put_object() using their full path names and then use those path names as their object keys, then you can implicitly recreate a directory structure. Basically like this (though you may want to change the filenames when using them as object keys in some way):
library("aws.s3")
lapply(dir(full.names = TRUE, recursive = TRUE), function(filename) {
put_object(file = filename, object = filename, bucket = "mybucket")
})
There is also an experimental function s3sync() that should do this for a complete file tree (but it isn't widely tested):
s3sync()

DEM to Raster for multiple files

I'm trying to design a program to help me convert 1000+ DEM file into USGS raster file, using the method "arcpy.DEMtoRaster_Conversion" in ArcGIS. My idea is to use a OpenFileDialog to allow multiple selection for these files, then use an array to same these names and use these names as the inDEM and save the outRaster in tif format.
file_path = tkFileDialog.askopenfilename(filetypes=(("DEM", "*.dem"),),multiple=1)
this is how I open multiple files in the dialog, but I;m not sure how to save them so as to fulfill the following steps. Can someone help me?
This code will find all dems in a folder and apply the conversion function and save the output tiffs to another folder
#START USER INPUT
datadir="Y:/input_rasters/" #directory where dem files are located
outputdir="Y:/output_rasters/" #existing directory where output tifs are to be saved in
#END USER INPUT
import os
arcpy.env.overwriteOutput = True
arcpy.env.workspace = datadir
arcpy.env.compression = "LZW"
DEMList = arcpy.ListFiles("*.dem")
for f in DEMList:
print "starting %s" %(f)
rastername=os.path.join(datadir, f)
outrastername=os.path.join(outputdir, f[:-4]+".tif")
arcpy.DEMToRaster_conversion(rastername, outrastername)

Resources