I have created the following function to read a csv file from a given URL:
function(){
s <- 1;
#first get the bhav copy
today <- c();ty <- c();tm <- c();tmu <- c();td <- c();
# get the URL first
today <- Sys.Date()
ty <- format(today, format = "%Y")
tm <- format(today, format = "%b")
tmu <- toupper(tm)
td <- format(today, format = "%d")
dynamic.URL <- paste("https://www.nseindia.com/content/historical/EQUITIES/",ty,"/",tmu,"/cm",td,tmu,ty,"bhav.csv.zip", sep = "")
file.string <- paste("C:/Users/user/AppData/Local/Temp/cm",td,tmu,ty,"bhav.csv")
download.file(dynamic.URL, "C:/Users/user/Desktop/bhav.csv.zip")
bhav.copy <- read.csv(file.string)
return(bhav.copy)
}
If I run the function, immediately it says that "file.string not found". But when I run it after some time(a few seconds), it executes normally. I think when download.file ecexecutes, it transfers control to read.csv,and it tries to load the file which is not yet properly saved. when i run it after some time, it tries to overwrite the existing file, which it cannot, and the read.csvproperly loads the saved file.`
I want the function to execute the first time I run it. Is there any way or a function to defer the action of read.csvuntil the file is properly saved? Something like this:
download.file(dynamic.URL, "C:/Users/user/Desktop/bhav.csv.zip")
wait......
bhav.copy <- read.csv(file.string)
Ignore the fact that the destfile in download.file is different from file.string; it is due to function of my system (windows 7).
Very many thanks for your time and effort...
Related
I'm trying to check to see if a user has an up-to-date file version of London's Covid prevalence data in their working directory and if not, download it from here:
fileURL <-"https://api.coronavirus.data.gov.uk/v1/data?filters=areaType=region;areaName=London&structure=%7B%22areaType%22:%22areaType%22,%22areaName%22:%22areaName%22,%22areaCode%22:%22areaCode%22,%22date%22:%22date%22,%22newCasesBySpecimenDateRollingSum%22:%22newCasesBySpecimenDateRollingSum%22,%22newCasesBySpecimenDateRollingRate%22:%22newCasesBySpecimenDateRollingRate%22%7D&format=csv"
Pasting that URL into the browser downloads the csv file. using download.file(URL, "data_.csv") creates junk. Why?
So far I have:
library(data.table)
#Look for COVID file starting with "data_"
destfile <- list.files(pattern = "data_")
#Find file date
fileDate<-file.info(destfile)$ctime %>%as.Date()
if(!file.exists(destfile) | fileDate != today()){
res <- tryCatch(download.file(url = fileURL,
destfile = paste0("data_",today(),".csv"),
method = "auto"),
error=function(e) 1)
if(res!=1) COVIDdata<-data.table::fread(destfile) #This doesn't read the file
}
The function always downloads a file regardless of the date on it but it saves it an unreadable format. I've resorted to downloading the file every time as follows.
COVIDdata <- data.table::fread(fileURL)
The junk file that gets downloaded is this:
I think this an issue with encoding the result of download.file, one way could be to use fread to get the data then write it with fwrite:
#Look for COVID file starting with "data_"
destfile <- list.files(pattern = "data_")
#Find file date
fileDate <- file.info(destfile)$ctime %>% as.Date()
#>[1] "2020-11-06" "2020-11-06" "2020-11-06"
if(!length(destfile) | max(fileDate) != today()){
COVIDdata <- fread(fileURL)
fwrite(COVIDdata, file = paste0("data_",today(),".csv"))
}
I want take financial data using API.
I do so.
#load jsons
library("rjson")
json_file <- "https://api.coindesk.com/v1/bpi/currentprice/USD.json"
json_data <- fromJSON(paste(readLines(json_file), collapse=""))
#get json content as data.frame
x = data.frame(json_data$time$updated,json_data$time$updatedISO,json_data$time$updateduk,json_data$bpi$USD)
x
But the main problem is that information changes every minute, so i can't gather history.
Are there ways to make R independently connect every minute(i.e. in real time mode) to this site and collect data every minute.
So collected data must be saved in C:/Myfolder.
Is it possible to do it?
Something like this could do it
library("rjson")
json_file <- "https://api.coindesk.com/v1/bpi/currentprice/USD.json"
numOfTimes <- 2L # how many times to run in total
sleepTime <- 60L # time to wait between iterations (in seconds)
iteration <- 0L
while (iteration < numOfTimes) {
# gather data
json_data <- fromJSON(paste(readLines(json_file), collapse=""))
# get json content as data.frame
x = data.frame(json_data$time$updated,json_data$time$updatedISO,json_data$time$updateduk,json_data$bpi$USD)
# create file to save in 'C:/Myfolder'
# alternatively, create just one .csv file and update it in each iteration
nameToSave <- nameToSave <- paste('C:/Myfolder/',
gsub('\\D','',format(Sys.time(),'%F%T')),
'json_data.csv', sep = '_')
# save the file
write.csv(x, nameToSave)
# update counter and wait
iteration <- iteration + 1L
Sys.sleep(sleepTime)
}
Note that this requires to have an R session opened (you could create a .exe or .bat file and have it run in the background).
Hi I am new to R and struggling to understand where my script is going wrong. I am trying to import only the csv files that fall between the 2 dates Sdate
& Fdate entered near the top of the script. The script runs fine without any errors but only pulls in the last file in the list. I am on windows 10 and all the files are on the local machine. Any help will be appreciated. Thanks
Sdate <- as.Date("2018-10-01")
Fdate <- as.Date("2018-10-30")
Ndate = as.character.Date(seq.Date(from = as.Date(Sdate), to = as.Date(Fdate),
by = "days"), format ="%Y%m%d")
for (i in Ndate){
MyData <- read.csv(
file=paste('D:/Data/Merlin Data/Merlin BDD/T1/BDD_',i,'_T1.csv',sep = ""),
header=TRUE, sep=",")
}
The problem is you are overwriting your variable everytime your loop. So let's change that. You need to append your files to your dataframe.
One solution is to create an initial dataframe
MyData <- read.csv(file='D:/Data/Merlin Data/Merlin BDD/T1/BDD_20181001_T1.csv', header=TRUE, sep=",")
and afterwards append your data to this one with concatenationc(). Since you already read the first file let's set Sdate to Sdate<-as.Date("2018-10-02")
You should be able to read your Data then with:
for (i in :Ndate){
MyData <- read.csv(file=paste('D:/Data/Merlin Data/Merlin BDD/T1/BDD_',i,'_T1.csv',sep = ""), header=TRUE, sep=",")
}
I see it is super-easy to grab a PDF file, save it, and fetch all the text from the file.
library(pdftools)
download.file("http://www2.sas.com/proceedings/sugi30/085-30.pdf", "sample.pdf", mode = "wb")
txt <- pdf_text("sample.pdf")
I am wondering how to loop through an array of PDF files, based on links, download each, and scrape the test from each. I want to go to the following link.
http://www2.sas.com/proceedings/sugi30/toc.html#dp
Then I want to download each file from 'Paper 085-30:' to 'Paper 095-30:'. Finally, I want to scrape the text out of each file. How can I do that?
I would think it would be something like this, but I suspect the paste function is not setup correctly.
library(pdftools)
for(i in values){'085-30',' 086-30','087-30','088-30','089-30'
paste(download.file("http://www2.sas.com/proceedings/sugi30/"i".pdf", i".pdf", mode = "wb")sep = "", collapse = NULL)
}
You can get a list of pdfs using rvest.
library(rvest)
x <- read_html("http://www2.sas.com/proceedings/sugi30/toc.html#dp")
href <- x %>% html_nodes("a") %>% html_attr("href")
# char vector of links, use regular expression to fetch only papers
links <- href[grepl("^http://www2.sas.com/proceedings/sugi30/\\d{3}.*\\.pdf$", href)]
I've added some error handling and don't forget to put R session to sleep so you don't flood the server. In case a download is unsuccessful, the link is stored into a variable which you can investigate after the loop has finished and perhaps adapt your code or just download them manually.
# write failed links to this variable
unsuccessful <- c()
for (link in links) {
out <- tryCatch(download.file(url = link, destfile = basename(link), mode = "wb"),
error = function(e) e, warning = function(w) w)
if (class(out) %in% c("simpleError", "simpleWarning")) {
message(sprintf("Unable to download %s ?", link))
unsuccessful <- c(unsuccessful, link)
}
sleep <- abs(rnorm(1, mean = 10, sd = 10))
message(sprintf("Sleeping for %f seconds", sleep))
Sys.sleep(sleep) # don't flood the server, sleep for a while
}
I trying to create some automated R scripts using the taskscheduleR library. I have created the following script:
library(lubridate)
setwd("C:/Users/Marc/Desktop/")
create_df <- function(){
list <- c(1,2,3)
df <- data.frame(list)
x <- format(Sys.time(), "%S")
name <- paste0("name_", x, ".csv")
write.csv(df, name)
}
create_df()
That can be fired up with the following:
myscript <- "C:/Users/Marc/Dropbox/PROJECTEN/Lopend/taskschedulR_test/test.R"
taskscheduler_create(taskname = "myfancyscript", rscript = myscript,
schedule = "ONCE", starttime = format(Sys.time() + 62, "%H:%M"))
However when I execute it nothing happens. Any thoughts on how I can get this running?
It worked for me, I've now got a .csv called "name_03". I have the script within the folder that the output goes into, unlike yours which is in your dropbox. You could check the event log by looking at the history tab on the Task Scheduler, type this into R:
system("control schedtasks")