Get ASCII grid from compressed .gz file from URL in R - r

I am trying to download and gunzip grid files in ascii format, compressed to .gz files from an URL like this. I tried to get to the files via y <- gzon(url("name-of-url") and then gunzip(y), but for gunzip that is an invalid file. If I can decompress the file, I would like to read the .asc file with raster()
Any ideas how to solve this?

I don't know why unzip does not work on these files, but you can get at the contents as follows:
URL = "https://opendata.dwd.de/climate_environment/CDC/grids_germany/annual/summer_days/grids_germany_annual_summer_days_1951_17.asc.gz"
download.file(URL, "grids_germany_annual_summer_days_1951_17.asc.gz")
GZ = gzfile("grids_germany_annual_summer_days_1951_17.asc.gz")
Lines = readLines(GZ, 10)
writeLines(Lines, "grids_germany_annual_summer_days_1951_17.asc")
Now you have an ascii text file.

Related

How do I download and extract a list of papers in LaTeX format from arXiv?

I have a list of papers that I'd like to extract from arXiv (I have the arxiv links / name of the arxvi file), but in the LaTeX format. How can I do this in Python?
If we go to this page: https://arxiv.org/format/2010.11645
We can read the following text:
Source:
Delivered as a gzipped tar (.tar.gz) file if there are multiple files, otherwise as a PDF file, or a gzipped TeX, DVI, PostScript or HTML (.gz, .dvi.gz, .ps.gz or .html.gz) file depending on submission format. [ Download source ]
We can download the file by clicking on [ Download source ], but I have no idea what type of file I'm getting back. The filename is simple 2010.11645.
I'd like to download the file in LaTeX format (which I believe it .tex) and then convert it into .txt using pandoc. I believe I'd need to download the files via requests somehow?
How can I do this? Thanks!

How to read an Excel file from a folder without specifying the filename?

Is there a way to read an Excel file into R directly from a folder without having to specify the filename?
I'm using the readxl library and I have only 1 file in the folder, but the file's name changes monthly.
Thanks!
The computer need to have the path anyway BUT you can get it without giving if you are absolutly sure that this is the only one file in your folder
see https://stat.ethz.ch/R-manual/R-devel/library/base/html/list.files.html
to learn more about how to open a directory and getting filename inside.
If this isn't the only file but this is the only excel file you while have to get the extension of each file and do some parsing to take a decision of wich file you want to open
As noted in other answers, this can be solved using the dir function.
Assuming the variable path contains the path of the directory in which the XLSX file is located, then the following will give you the full path to the file:
file_path = dir(path, pattern = '\\.xlsx$', full.names = TRUE)
Note that pattern uses regular expressions rather than glob format! Using the glob pattern *.xlsx might appear to work but it’s incorrect, only works by accident, and will also match other file names.
Suppose your file is located inside a folder called myFolder located in disk E:\\
library(readxl)
folderPath <- "E://myFolder"
fileName <- grep(".xlsx", dir(folderPath), value = TRUE)
filePath <- paste(folderPath, fileName, sep = "//")
df <- read_xlsx(filePath)
This code will get the name of your xlsx file inside folderPath each time you run it and then you can import it with readxl::read_xlsx. I assume there is only one xlsx file inside the folder.

How can these I convert these characters to a CSV?

I tried to download a file from LSData, but it brings me to a page full of weird characters. The first few are:
7z¼¯'�DÙ™µUa�����b�������’³_èÚ†à]�&Jgl›Ü)ÉZKŒP7þò|¤ˆëÁëxŠ§u6²ã]’“Àé3lGê7ñ"!èÞ’ïjP³
l½Öv<¹-žøZ¹Æ âäùëOKä#;cÞ Žmï•&?^¢Ø"Á.=ù‚u|õ9žG<އ趽ÈËŒøÂtŠÍÝê/ÂG×à×–R§Ýj×zÛ¥™éwG—ï‘ývíõåò ÂÑ\‡W�ܱò§úßxlø¾Ö¾EºáPnÚR"økv§}6“SLÒ¢ø€m]-Ì«gÐáÅMŠWGU�µOÿDõ™}u¦HŠ_qŠ,/¦lÔ}Áô|,Òäêÿ2l«ª»°úö¡]+€™´í¿¢«|Ãw#êñ:t!
I have no clue what I'm looking at. How can I convert this entire page into a CSV, or in whatever file so I can use it in R?
it is a 7z zipped file, you can download and unzip it to get the CSV file

julia: how to read a bz2 compressed text file

In R, I can read a whole compressed text file into a character vector as
readLines("file.txt.bz2")
readLines transparently decompresses .gz and .bz2 files but also works with non-compressed files. Is there something analogous available in julia? I can do
text = open(f -> read(f, String), "file.txt")
but this cannot open compressed files. What is the preferred way to read bzip2 files? Is there any approach (besides manually checking the filename extension) that can deduce compression format automatically?
I don't know about anything automatic but this is how you could (create and) read a bz2 compressed file:
using CodecBzip2 # after ] add CodecBzip2
# Creating a dummy bz2 file
mystring = "Hello StackOverflow!"
mystring_compressed = transcode(Bzip2Compressor, mystring)
write("testfile.bz2", mystring_compressed)
# Reading and uncompressing it
compressed = read("testfile.bz2")
plain = transcode(Bzip2Decompressor, compressed)
String(plain) # "Hello StackOverflow!"
There are also streaming variants available. For more see CodecBzip2.jl.

Downloading Excel files in R

I am trying to download an Excel file (xls) from the Australia Bureau of Statistics in r with the following code. However, everytime I try to run the line with the read_excel command my session crashes.
library(readxl)
target <- 'http://www.ausstats.abs.gov.au/ausstats/meisubs.nsf/LatestTimeSeries/6202001/$FILE/6202001.xls'
path <- paste0(getwd(),"/","6202001.xls")
download.file(target, destfile = path)
#read_excel(path = path) << problem line
I think it might have something to do with the excel file pop-up when you go to put the link into a browser and download it that way, but I'm not sure!
Do I need to change the file before I go to read it at all?
Any help would be great.
Download the file in binary mode (default for download.file is ASCII mode) with the mode argument set to wb:
download.file(myurl, mydestfile, mode="wb")

Resources