I'm trying to download a long list of podcasts, but when I use the download.file command in R it corrupts the audio file into a bunch of crackling noises.
Could any of you recommend a dedicated audio-downloading package, or recommend a download.file method that would be better suited for download audio. I went through the ones listed in the help file, but none worked. ("auto", "internal", "wininet", "libcurl", "wget" and "curl")
The downloading portion of the code looks similar to this:
url <- "http://play.podtrac.com/npr-510289/npr.mc.tritondigital.com/NPR_510289/media/anon.npr-mp3/npr/pmoney/2016/06/20160603_pmoney_podcast.mp3?
orgId=1&d=1121&p=510289&story=480606726&t=podcast&e=480606726&siteplayer=true&dl=1"
download.file(url = url, destfile = "test.mp3")
I attempted different audio files from different sites and had similar results.
Edit: In response to the question by VC.One, this a url to the initial section of the Hex code. I added in more than the couple of lines he requested because the first section looked like file information which may or may not be relevant:
Try mode = "wb" in download.file(). I had the same issue you mentioned and this solved it for me.
Related
This is the second time that I searched online for help with download.file and solved my problem by calling the mode argument but I don't know what is the reason or when/why to use them. Just saw the suggestion and passed the argument to get my problem solved.
The R help file on download.file() is a bit too brief and dose not tell me when to use particular mode
mode character. The mode with which to write the file. Useful values are
"w", "wb" (binary), "a" (append) and "ab". Only used for the "internal"
method. (See also ‘Details’.)
For me to get correct answer I had to pass the mode="wb" below; but why (maybe something to do with the s in https or should i just go with trial and error for now).
fileUrl <-"https://d396qusza40orc.cloudfront.net/getdata%2Fjeff.jpg"
download.file(fileUrl, destfile = "./data/leekjpg.jpg", mode="wb")
I would like get at least some rudimentary understanding of the method and mode argument in download.file and would appreciate your explanations or suggestive reads.
I am downloading more files, and it bothers me that I don't know when to pass some the related arguments.
This is what the docs say:
If mode is not supplied and url ends in one of .gz, .bz2, .xz, .tgz,
.zip, .rda or .RData a binary transfer is done. Since Windows (unlike
Unix-alikes) does distinguish between text and binary files, care is
needed that other binary file types are transferred with mode = "wb".
unix A progress bar tracks the transfer. If the file length is known,
an equals sign represents 2% of the transfer completed: otherwise a
dot represents 10Kb. Code written to download binary files must use
mode = "wb", but the problems incurred by a text transfer will only be
seen on Window
Basically it is saying that "w" and "wb" are the same when using on Unix-like OS, because they do not differentiate between text and binary files, but Windows does.
In Windows the line endings are slightly different. To be safe I use "w" when opening text files, where as "wb" when files are not supposed to be text, like jpg
See the following code (Windows 10, R 3.6.3)
download.file(
url = "https://www6.ohiosos.gov/ords/f?p=VOTERFTP:DOWNLOAD::FILE:NO:2:P2_PRODUCT_NUMBER:363",
destfile = paste0("SWVF_1_22_", format(as.Date(Sys.Date()), "%Y%m%d"), ".txt.gz")
)
Once I try to unzip this with 7zip, this simply says "Data error."
The funniest thing is, this code used to work before, going back to late Jan 2019.
In addition, if you go to the original url, it downloads fine and it is also unzipped properly via either 7zip or WinZip.
Downloaded files either way have the same file size, which is why I never suspected corruption of files for an entire year and more. I've tried changing destfile argument, as well as using tempfile(). Neither works.
This style of download.file works fine for zip files. Just this .txt.gz files.
I have absolutely no idea what is going on, and whether my files can be recovered at all. Any advice?
I'm doing a project that requires going into a database of the brazillian equivalent of the FTC and downloading a few files (which I will later process), and I want to automate this using R.
My problem is that when naming the file, I have to tell it the file extension, and I don't know what it will be (usually it will be a scanned pdf, but sometimes it will be an html file). Here an example:
https://sei.cade.gov.br/sei/modulos/pesquisa/md_pesq_processo_exibir.php?0c62g277GvPsZDAxAO1tMiVcL9FcFMR5UuJ6rLqPEJuTUu08mg6wxLt0JzWxCor9mNcMYP8UAjTVP9dxRfPBcbZvmE_iaYkTbpPedZsRpa1llf9W8WXxdUJxor5q0IiE
I want the first and the tenth file. Downloading them is easy:
download.file("https://sei.cade.gov.br/sei/modulos/pesquisa/md_pesq_documento_consulta_externa.php?DZ2uWeaYicbuRZEFhBt-n3BfPLlu9u7akQAh8mpB9yPDzrBMElK1BGz7u3NcOFP7-Z5s9oDvQR1K4ELVR_nmNlPto_G3CRD_y2Hu6JLvHZVV2LDxnr4dccffqX3xlEao", destfile = 'C:/teste/teste1', mode = 'wb')
download.file("https://sei.cade.gov.br/sei/modulos/pesquisa/md_pesq_documento_consulta_externa.php?DZ2uWeaYicbuRZEFhBt-n3BfPLlu9u7akQAh8mpB9yPaFy5S3krC8lTKjlRbfodOIg2NArJmAFS5PyUEHL3hnJYr8VG9zLGdNts6K99Ht673e_ZPr2gr3Cw7r8zJqRiH", destfile = 'C:/teste/teste2', mode = 'wb')
The thing is, I don't know which one is a pdf file and which one is an html file without manually trying to open them with another program. Is there any way to tell R to automatically add the correct file extension when downloading?
If you use the httr package, you can get the content-type header which will help you decide what type of file it is. You can use the HEAD() function to get the headers of the files. For example with your URLs
urls <- c(
"https://sei.cade.gov.br/sei/modulos/pesquisa/md_pesq_documento_consulta_externa.php?DZ2uWeaYicbuRZEFhBt-n3BfPLlu9u7akQAh8mpB9yPDzrBMElK1BGz7u3NcOFP7-Z5s9oDvQR1K4ELVR_nmNlPto_G3CRD_y2Hu6JLvHZVV2LDxnr4dccffqX3xlEao",
"https://sei.cade.gov.br/sei/modulos/pesquisa/md_pesq_documento_consulta_externa.php?DZ2uWeaYicbuRZEFhBt-n3BfPLlu9u7akQAh8mpB9yPaFy5S3krC8lTKjlRbfodOIg2NArJmAFS5PyUEHL3hnJYr8VG9zLGdNts6K99Ht673e_ZPr2gr3Cw7r8zJqRiH"
)
You can write a helper function
get_content_type <- function(x) {
unname(sapply(x, function(x) headers(HEAD(x))[["content-type"]]))
}
get_content_type(urls)
# [1] "application/pdf;" "text/html; charset=ISO-8859-1"
These return mime-type, but you can grep for things like "pdf" to save as a PDF or "html" for web pages. Not sure what other types of files might be available. There is no "correct" file name for a given file type so you'd need to make that decision yourself.
I am trying to download some sound files through R (mostly mp3). I've started off using download.file() like below. However, the sound files downloaded this way sound horrible and it's like as if they're playing way too fast. Any ideas?
download.file("http://www.mfiles.co.uk/mp3-downloads/frederic-chopin-piano-sonata-2-op35-3-funeral-march.mp3","test.mp3")
Even better than if the above function would work, is there a way do download files without having to specify the extension? Sometimes I only have the redirecting page.
Thanks!
Try explicitly setting binary mode with mode="wb":
download.file("http://www.mfiles.co.uk/mp3-downloads/frederic-chopin-piano-sonata-2-op35-3-funeral-march.mp3",
tf <- tempfile(fileext = ".mp3"),
mode="wb")
(You can view the filename with cat(tf).)
I'm trying to download the zip file from this url:
url1 <- http://www.clinicaltrials.gov/ct2/results?cond=%22acne%22&studyxml=true
Here's my code:
tempZip <- tempfile()
download.file(url1, tempZip)
And here's the error I get:
Warning message:
In download.file(url1, tempZip) :
downloaded length 817445 != reported length 200
Any ideas?
EDIT: OK, after seeing agstudy's reply below, I found that the file was indeed being downloaded (it also appears to be the correct file size). Now the problem is when I try to unzip the file - it days the file is corrupted.
Maciej, I agree that it would be better to use a link with a .zip extension, however, there's no way to get that from this website.
OK, I figured out what was wrong. Because this url does not specifically have ".zip" at the end, the download.file function does not know to use a binary download. This code fixes the problem:
url1 <- http://www.clinicaltrials.gov/ct2/results?cond=%22acne%22&studyxml=true
tempZip <- tempfile()
download.file(url1, tempZip, mode="wb")
If you don't specify the mode argument, the downloaded zip file will be corrupt.
You don't have direct link to the file. R try to download webpage not file. Use link which end with '.zip'.
Maybe useful be using XML or RCurl package to scrape links to datasets from this webpage.