I need to download a lot of files from an "HTML" link using R.
The links look like:
http://bioinf-applied.charite.de/supernatural_new/src/download_mol.php?sn_id=SN00000001
with the number after id= incrementing for each subsequent file. I want to download the first 1000 files, from: ...id=SN00000001 to ...id=SN00001000
I´m trying to use a loop with a variable to download all those files, but I have no idea how to construct this code in R.
Something like this:
for(i in 1:1000){
x <- sprintf("%08d", i)
myPath <- paste0("http://bioinf-applied.charite.de/supernatural_new/src/download_mol.php?sn_id=SN", x)
download.file(myPath, paste0("SN", x, ".mol"))
}
Related
I am trying to deal with extracting a subset from multiple .grb2 files in the same file path, and write them in a csv. I am able to do it for one (or a few) by using the following set of commands:
GRIB <- brick("tmp2m.1989102800.time.grb2")
GRIB <- as.array(GRIB)
readGDAL("tmp2m.1989102800.time.grb2")
tmp2m.6hr <- GRIB[51,27,c(261:1232)]
str(tmp2m.6hr)
tmp2m.data <- data.frame(tmp2m.6hr)
write.csv(tmp2m.data,"tmp1.csv")
The above set of commands extract, in csv, temperature values for specific latitude "51" and longitude "27", as well as for a specific time range "c(261:1232)".
Now I have hundreds of these files (with different file names, of course) in the same directory and I want to do the same for all. As you know, better than me, I cannot do this to one by one, changing the file name each time.
I have struggled a lot with this, but so far I did not manage to do it. Since I am new in R, and my knowledge is limited, I would very much appreciate any possible help with this.
The simplest way would be to use a normal for loop:
path <- "your file path here"
input.file.names <- dir(path, pattern =".grb2")
output.file.names <- paste0(tools::file_path_sans_ext(file.names),".csv")
for(i in 1:length(file.names)){
GRIB <- brick(input.file.names[i])
GRIB <- as.array(GRIB)
readGDAL(input.file.names[i]) # edited line
tmp2m.6hr <- GRIB[51,27,c(261:1232)]
str(tmp2m.6hr)
tmp2m.data <- data.frame(tmp2m.6hr)
write.csv(tmp2m.data,output.file.names[i])
}
You could of course create the body of the for loop into a function and then use the standard lapply or the map function from purrr.
Note that this code will print out different CSV files. If you want to append the data to a single file then you should check out write.table
I have created the following code
library('XML')
library('rvest')
links <- c('https://www.google.com/',
'https://www.youtube.com/?gl=US',
'https://news.google.com/news/u/0/headlines?hl=en&ned=us')
for (i in 1:3){
html_object <- read_html(links[i])
write_xml(html_object, file="test.html")
}
I want to save all of these files as html files, but my current code is only saving one. I am guessing that it keeps rewriting the same file 3 times for this example. How would I make it so that it does not rewrite the same file? Ideally, I would like the file name for these html files to be their url link, but I am unable to figure out how to do that with multiple links. For example, my end result should be three HTML files titled 'https://google.com/', 'https://www.youtube.com/?gl=US', and 'https://news.google.come/news/u/0/headlines?h1-en&ned=us'.
What about using paste0() to create your filename in the for-loop?
for(i in 1:length(links)){
html_object <- read_html(links[i])
somefilename <- paste0("filename_", i, ".html")
write_xml(html_object, file = somefilename)
}
I am trying to download PDFs from a website using R.
I have a vector of the PDF-URLs (pdfurls) and a vector of destination file names (destinations):
e.g.:
pdfurls <- c("http://website/name1.pdf", "http://website/name2.pdf")
destinations <- c("C:/username/name1.pdf", "C:/username/name2.pdf")
The code I am using is:
for(i in 1:length(urls)){
download.file(urls, destinations, mode="wb")}
However, when I run the code, R accesses the URL, downloads the first PDF, and repeats downloading the same PDF over and over again.
I have read this post: for loop on R function and was wondering if this has something to do with the function itself or is there a problem with my loop?
The code is similar to the post here: How to download multiple files using loop in R? so I was wondering why it is not working and if there is a better way to download multiple files using R.
I think your loop is mostly fine, except you forgot to index the urls and destinations objects.
Tangentially, I would recommend getting in the habit of using seq_along instead of 1:length() when defining for loops.
for(i in seq_along(urls)){
download.file(urls[i], destinations[i], mode="wb")
}
Or using Map as suggested by #docendodiscimus :
Map(function(u, d) download.file(u, d, mode="wb"), urls, destinations)
I have written a R script for binning on the specific parameters of several .csv files in the same folder. I used the smbinning package. When I execute the script, it produces detailed results. I do not need all of them. I want to take a specific part of the results and write into a .csv file automatically. Can someone tell me how can I do this? My R script, details results, and wanted parts of result is as follows
My R script is as follows:
library(smbinning)
files <- list.files(pattern = "0.csv")
cutpoint <- rep(0,length(files))
for(i in 1:length(files)){
data <- read.csv(files[i],header=T)
df.train <- data.frame(data)
df.train_amp <-rbind(df.train)
cutpoint[i] <- smbinning(df=df.train_amp, y="cvflg",x="dwell")
}
result <- cbind(files,cutpoint)
write.csv(result,"result_dwell.csv")
You can use View(result) to see if the variable contains exactly what your require. Else there is something wrong in your logic.
There is function sink in R which writes the output of a program to a file.
https://stat.ethz.ch/R-manual/R-devel/library/base/html/sink.html
I am using Rversion 3.03 on a Windows 7 OS and am trying to solve a problem. I am not sure if this is just me being stupid or if this is really a problem with my version of R.
Intitial problem: I have a folder with 300+ csv files and I need to specify a function that reads in a user-specified number of files. So my idea was to use the list.files function to give me a list of the csv's and then choose from this list rather than having to reformat the user input to match the csv filenames.
pm <- function(directory, id = 1:332) {
setwd("C:/Users/cw/Documents")
setwd(directory)
x <- id[1]
x
files <- list.files()
#for (x in 1:length(id))
#data[i] <- read.csv(files[x], header=T)
#}
}
pm("specdata", 25:30)
So first I set the wd which works like a charm. Then I wanted to set x equal to the first element of id to obtain a starting point. Next I wanted to build a vector 'files' to choose the filenames from.
Real problem: if I run the 'pm'-function, R tells me that the object files does not exist. So am I doing sth wrong (obviously I am) and what?
Thanks very much,
C
files is just a local variable that you declare inside your pm function. To use the results in your calling code, you should assign it to a variable (I used filelist here):
filelist <- pm("specdata", 25:30)