I'm trying to take a set of dois and have the doc.org website return the information in .bib format. The code below is supposed to do that and crucially append each new result to a .bib file. mode = a is what I understand will do the appending but it doesn't. The last line of code prints out the contents of oufFile and it contains only the last .bib results.
What needs to be changed to make this work.
library(curl)
outFile <- tempfile(fileext = ".bib")
url1 <- "https://doi.org/10.1016/j.tvjl.2017.12.021"
url2 <- "https://doi.org/10.1016/j.yqres.2013.10.005"
h <- new_handle()
handle_setheaders(h, "accept" = "application/x-bibtex")
curl_download(url1, destfile = outFile, handle = h, mode = "a")
curl_download(url2, destfile = outFile, handle = h, mode = "a")
read_delim(outFile, delim = "\n")
It's not working for me as well with curl_download(). Alternatively you could download with curl() and and use write() with append = TRUE.
Here is a solution for that, which easily can be used for as many urls as you are looking to download the bibtex from. You can execute this after your line 7.
library(dplyr)
library(purrr)
urls <- list(url1, url2)
walk(urls, ~ {
curl(., handle = h) %>%
readLines(warn = FALSE) %>%
write(file = outFile, append = TRUE)
})
library(readr)
read_delim(outFile, delim = "\n")
Related
I'm trying to loop through all the CSV files on an FTP site and upload the contents of CSVs with a certain filename to a database.
So far I've been able to
access the FTP using...
getURL((url, userpwd = userpwd, ftp.use.epsv = FALSE, dirlistonly = TRUE),
get a list of the filenames using...
unlist(strsplit(filenames, "\r\n"),
and create a dataframe with a list of the full urls (e.g ftp://sample#ftpserver.name.com/samplename.csv) using...
for (i in seq_along(myfiles)) {
url_list[i,] <- paste(url, myfiles[i], sep = '')
}
How do I loop through this dataframe, filtering for certain filenames, in order to create a new dataframe with all of data from the relevant CSVs? (half the files are named Type1SampleName and half are Type2SampleName)
I would then uploading this data to the database.
Thanks!
Since RCurl::getURL returns direct HTTP response here being content of CSVs, consider extending your lapply function call to pass result into read.csv using text argument:
# VECTOR OF URLs
urls <- paste0(url, myfiles[grep("Type1", myfiles])
# LIST OF DATA FRAMES FROM EACH CSV
mydata <- lapply(urls, function(url) {
resp <- getURL(url, userpwd = userpwd, connecttimeout = 60)
read.csv(text = resp)
})
Alternatively, getURL supports a callback function with write argument:
Alternatively, if a value is supplied for the write parameter, this is returned. This allows the caller to create a handler within the call and get it back. This avoids having to explicitly create and assign it and then call getURL and then access the result. Instead, the 3 steps can be inlined in a single call.
# USER DEFINED METHOD
import_csv <- function(resp) read.csv(text = resp)
# LONG FORM NOTATION
mydata <- lapply(urls, function(url)
getURL(url, userpwd = userpwd, connecttimeout = 60, write = import_csv)
)
# SHORT FORM NOTATION
mydata <- lapply(urls, getURL, userpwd = userpwd, connecttimeout = 60, write = import_csv)
Just an update on how I finished this off and what worked for me in the end...
mydata <- lapply(urls, getURL, userpwd = userpwd, connecttimeout = 60)
Following on from above..
while (i <= length(mydata)) {
mydata1 <- paste0(mydata[[i]])
bin <- read.csv(text = mydata1, header = FALSE, skip = 1)
#Column renaming and formatting here
#Uploading to database using RODBC here
}
Thanks for the pointers #Parfait - really appreciated.
Like most problems it looks straightforward after you've done it!
Some background for my question: This is an R script that a previous research assistant wrote, but he did not provide any guidance to me on using it for myself. After working through an R textbook, I attempted to use the code on my data files.
What this code is supposed to do is load multiple .csv files, delete certain items/columns from them, and then write the new cleaned .csv files to a specified directory.
When I run my code, I don't get any errors, but the code isn't going anything. I originally thought that this was a problem with file permissions, but I'm still having the problem after changing them. Not sure what to try next.
Here's the code:
library(data.table)
library(magrittr)
library(stringr)
# create a function to delete unnecessary variables from a CAFAS or PECFAS
data set and save the reduced copy
del.items <- function(file)
{
data <- read.csv(input = paste0("../data/pecfas|cafas/raw",
str_match(pattern = "cafas|pecfas", string = file) %>% tolower, "/raw/",
file), sep = ",", header = TRUE, na.strings = "", stringsAsFactors = FALSE,
skip = 0, colClasses = "character", data.table = FALSE)
data <- data[-grep(pattern = "^(CA|PEC)FAS_E[0-9]+(T(Initial|[0-
9]+|Exit)|SP[a-z])_(G|S|Item)[0-9]+$", x = names(data))]
write.csv(data, file = paste0("../data/pecfas|cafas/items-del",
str_match(pattern = "cafas|pecfas", string = file) %>% tolower, "/items-
del/", sub(pattern = "ExportData_", x = file, replacement = "")) %>%
tolower, sep = ",", row.names = FALSE, col.names = TRUE)
}
# delete items from all cafas data sets
cafas.files <- list.files("../data/cafas/raw/", pattern = ".csv")
for (file in cafas.files){
del.items(file)
}
# delete items from all pecfas data sets
pecfas.files <- list.files("../data/pecfas/raw/", pattern = ".csv")
for (file in pecfas.files){
del.items(file)
}
Some background for my question: This is an R script that a previous research assistant wrote, but he did not provide any guidance to me on using it for myself. After working through an R textbook, I attempted to use the code on my data files.
What this code is supposed to do is load multiple .csv files, delete certain items/columns from them, and then write the new cleaned .csv files to a specified directory.
Currently, the files are being created in the right directory with the right file name, but the .csv files that are being created are empty.
I am currently getting the following error message:
Warning in
fread(input = paste0("data/", str_match(pattern = "CAFAS|PECFAS",: Starting data input on line 2 and discarding line 1 because it has too few or too many items to be column names or data: (variable names).
This is my code:
library(data.table)
library(magrittr)
library(stringr)
# create a function to delete unnecessary variables from a CAFAS or PECFAS
data set and save the reduced copy
del.items <- function(file){
data <- fread(input = paste0("data/", str_match(pattern = "CAFAS|PECFAS",
string = file) %>% tolower, "/raw/", file), sep = ",", header = TRUE,
na.strings = "", stringsAsFactors = FALSE, skip = 0, colClasses =
"character", data.table = FALSE)
data <- data[-grep(pattern = "^(CA|PEC)FAS_E[0-9]+(TR?(Initial|[0-
9]+|Exit)|SP[a-z])_(G|S|Item)[0-9]+$", x = names(data))]
write.csv(data, file = paste0("data/", str_match(pattern = "CAFAS|PECFAS",
string = file) %>% tolower, "/items-del/", sub(pattern = "ExportData_", x =
file, replacement = "")) %>% tolower, row.names = FALSE)
}
# delete items from all cafas data sets
cafas.files <- list.files("data/cafas/raw", pattern = ".csv")
for (file in cafas.files){
del.items(file)
}
# delete items from all pecfas data sets
pecfas.files <- list.files("data/pecfas/raw", pattern = ".csv")
for (file in pecfas.files){
del.items(file)
}
I have 100 text file in a folder. I can use this function below to read all the files and store it into myfile.
file_list <- list.files("C:/Users/User/Desktop/code/Test/", full=T)
file_con <- lapply(file_list, function(x){
return(read.table(x, head=F, quote = "\"", skip = 6, sep = ","))
})
myfile <- do.call(rbind, file_con)
My question is how I can read the first file in the Test folder before I read the second file. All the text file name also are different and I cannot change it to for example number from 1 to 100. I was thinking of maybe can add a integer no infront of all my text file, then use a for loop to match the file and call but is this possible?
I need to read the first file then do some calculation and then export the result into result.txt before read the second file.but now I'm doing it manually and I have almost 800 file, so it will be a big trouble for me to sit and wait it to compute. The code below is the one that I current in use.
myfile = read.table("C:/Users/User/Desktop/code/Test/20081209014205.txt", header = FALSE, quote = "\"", skip = 0, sep = ",")
The following setup will read one file at the time, perform an analysis,
and save it back with a slightly modified name.
save_file_list <- structure(
.Data = gsub(
pattern = "\\.txt$",
replacement = "-e.txt",
x = file_list),
.Names = file_list)
your_function <- function(.file_content) {
## The analysis you want to do on the content of each file.
}
for (.file in file_list) {
.file_content <- read.table(
file = .file,
head = FALSE,
quote = "\"",
skip = 6,
sep = ",")
.result <- your_function(.file_content)
write.table(
x = .result,
file = save_file_list[.file])
}
Now I can read a file and do calculation using
for(e in 1:100){
myfile = read.table(file_list[e], header = FALSE, quote = "\"", skip = 0, sep = ",");
while(condition){
Calculation
}
myresult <- file.path("C:/Users/User/Desktop/code/Result/", paste0("-",e, ".txt"));
write.table(x, file = myresult, row.names=FALSE, col.names=FALSE ,sep = ",");
Now my problem is how I can make my output file to have the same name of the original file but add a -e value at the back?
So i've got to write.csv's for each downloaded files, with the currencies from a bunch a countries, from the web. And i wanted it to be saved using their ticks.
So i did,
codigos = list("JPY", "RUB","SGD","BRL","INR","THB","GBP","EUR","CHF")
for (i in 1:9){
url1 = 'http://www.exchangerates.org.uk/'
url2='-USD-exchange-rate-history-full.html'
codigos = list("JPY", "RUB","SGD","BRL","INR","THB","GBP","EUR","CHF")
codigo = codigos[i]
url <- paste(url1, codigo, url2, sep = "")
download.file(url, destfile='codigo.html')
dados <- readHTMLTable('codigo.html')
write.csv(dados, file="codigo.csv")
}
although it can read each of the url's altered by the loop, it can't download them, nor save the csv's individually. During the process i can see each of them being "saved" in a file named codigo.html and in the very end i get a codigo.html and a codigo.csv with the last country of the list.
The problem is that you're saving everything to the same filename. Each pass through the loop will overwrite the prior contents entirely.
Note also that readHTMLTable will take a URL. So perhaps something like this is in order:
for (i in 1:9){
url1 = 'http://www.exchangerates.org.uk/'
url2='-USD-exchange-rate-history-full.html'
cod = codigos[i]
url <- paste(url1, cod, url2, sep = "")
dados <- readHTMLTable(url)
# Create a unique name for each file
filename <- paste(cod, 'csv', sep='.')
write.csv(dados, file=filename)
}
Instead of creating csv files on disk, you might be better off using a list to hold the data, so you can manipulate the list:
url1 <- 'http://www.exchangerates.org.uk/'
url2 <- '-USD-exchange-rate-history-full.html'
l <- lapply(codigos
, function(i) readHTMLTable(paste0(url1, i, url2))
)
names(l) <- codigos
a) in your loop "url <- ..." line should go before "download.file(url ...)"
url <- paste(url1, cod, url2, sep = "")
download.file(url, destfile='cod.html')
b) in your line "write.csv(url, file=nome)" , nome must be between "
write.csv(url, file= "nome")