loop web page R - r

I want to apply a loop to scrape data from multiple webpages in R. I'm running the next code:
city <- c("Spokane+Valley", "Spokane+-+West" , "Stanwood", "Steilacoom", "Stevenson", "Sudden+Valley", "Sultan", "Sumas", "Summit", "Summitview", "Sumner", "Sunnyside", "Sunnyslope", "Suquamish", "Tacoma+-+Central", "Tacoma+-+East", "Tacoma+-+NE", "Tacoma+-+NW", "Tacoma+-+SE", "Tacoma+-+South", "Tacoma+-+SW", "Tacoma+-+West", "Tanglewilde" , "Tenino", "Terrace+Heights", "Thrashers+Corner", "Tokeland", "Toledo" , "Toppenish", "Town+and+Country", "Tracyton" , "Trentwood", "Tukwila", "Tulalip+Bay" , "Tulalip+Indian+Reservation", "Tumwater", "Twisp", "Union+Gap" , "University+Place", "Vancouver", "Vancouver+Mall", "Veradale", "Walla+Walla", "Walla+Walla+East", "Waller", "Walnut+Grove", "Wapato", "Warden", "Washougal", "Wenatchee", "West+Clarkston-Highland", "West+Lake+Sammamish", "West+Longview", "West+Pasco", "West+Richland", "West+Side+Highway", "West+Valley", "Westport", "White+Center-Shorewood", "White+Salmon", "White+Swan", "Winlock", "Winslow", "Winthrop", "Woodinville", "Woodland", "Woodmont+Beach", "Yakima", "Yelm", "Zillah")
for(i in city){
url <- ("http://www.washingtongasprices.com/GasPriceSearch.aspx?typ=adv&fuel=D&srch=0&area=",i,"&site=Washington&station=All%20Stations&tme_limit=36")
}
But I having this message:
Error: unexpected ',' in:
"for(i in city){
url <- ("http://www.washingtongasprices.com/GasPriceSearch.aspx?typ=adv&fuel=D&srch=0&area=","
How can I solve it?

You need to paste your URL together
url <- paste0("http://www.washingtongasprices.com/GasPriceSearch.aspx?typ=adv&fuel=D&srch=0&area=",i,"&site=Washington&station=All%20Stations&tme_limit=36")

Related

updating one data.table from another data.table in R

im trying to update one table (bigDta, fields smiles) using data from another table, but it produces an error
(bigData$smiles == '' | is.null(bigData$smiles) | is.na(bigData$smiles))
& bigData$compound_id %in% tmpCompounds$compound_id
,
`:=` (
smiles=dtChembl[dtChembl$chembl_id == compound_id ,]$canonical_smiles
, comment = paste(comment,'smiles added from chemblDB by chemblID;')
, filteringStep=12
)
]
the error i get is
Error in `[.data.table`(dtChembl, dtChembl$chembl_id == compound_id, ) :
i evaluates to a logical vector length 5832210 but there are 1088555 rows. Recycling of logical i is no longer allowed as it hides more bugs than is worth the rare convenience. Explicitly use rep(...,length=.N) if you really need to recycle.
In addition: Warning message:
In dtChembl$chembl_id == compound_id :
longer object length is not a multiple of shorter object length
i have solved the problem..in case that anyone need it, this is solution:
bigData[
dtChembl
, on=.(compound_id = chembl_id)
,
`:=` (
smiles= canonical_smiles
, comment = paste(comment,'smiles added from chemblDB by chemblID;')
, filteringStep=12
)
]

Converting docx.files to pdf.files with docx2pdf

Not sure what I am doing wrong.
I want to convert multiple docx.files to pdf.files - each file into a separate one.
I decided to use the "doconv"-package with following command:
docx_files <- list.files(pattern=paste0("Protokollnr_"))[39:73]
docx_files %>% length
lapply(1:35, function(x) {
docx2pdf(input = docx_files[[x]],
output = tempfile(fileext = ".pdf"))})
I does not say anything specific in the error message - only that it cannot be converted.
Is it that I should have specified the file path - now I only define the file name in my WD.
The object "docx_files" contain:
c("Protokollnr_1.docx", "Protokollnr_10.docx", "Protokollnr_11.docx",
"Protokollnr_12.docx", "Protokollnr_13.docx", "Protokollnr_14.docx",
"Protokollnr_15.docx", "Protokollnr_16.docx", "Protokollnr_17.docx",
"Protokollnr_18.docx", "Protokollnr_19.docx", "Protokollnr_2.docx",
"Protokollnr_20.docx", "Protokollnr_21.docx", "Protokollnr_22.docx",
"Protokollnr_23.docx", "Protokollnr_24.docx", "Protokollnr_25.docx",
"Protokollnr_26.docx", "Protokollnr_27.docx", "Protokollnr_28.docx",
"Protokollnr_29.docx", "Protokollnr_3.docx", "Protokollnr_30.docx",
"Protokollnr_31.docx", "Protokollnr_32.docx", "Protokollnr_33.docx",
"Protokollnr_34.docx", "Protokollnr_35.docx", "Protokollnr_4.docx",
"Protokollnr_5.docx", "Protokollnr_6.docx", "Protokollnr_7.docx",
"Protokollnr_8.docx", "Protokollnr_9.docx")
The error message is:
Error in docx2pdf(input = docx_files[[x]], output = tempfile(fileext = ".pdf")) :
could not convert C:/Users/Nadine/OneDrive/Documents/Arbeit_Büro_papa/Protokolle_Sallapulka/fertige_Protokolle/Protokollnr_1.docx
Many thanks,
Nadine
I'd recommend specifying the file path since the function requires the following format:
docx2pdf(input, output = gsub("\\.docx$", ".pdf", input))

R: hide cells in DT::datatable based on condition

I am trying to create a datatable with child rows: the user will be able to click on a name and see a list of links related to that name. However, the number of itens to show is different for each name.
> data1 <- data.frame(name = c("John", "Maria", "Afonso"),
a = c("abc", "def", "rty"),
b=c("ghj","lop",NA),
c=c("zxc","cvb",NA),
d=c(NA, "mko", NA))
> data1
name a b c d
1 John abc ghj zxc <NA>
2 Maria def lop cvb mko
3 Afonso rty <NA> <NA> <NA>
I am using varsExplore::datatable2 to hide specific columns:
varsExplore::datatable2(x=data1, vars=c("a","b","c","d"))
and it produces the below result
Is it possible to modify DT::datatable in order to only render cells that are not "null"? So, for example, if someone clicked on "Afonso", the table would only render "rty", thus hiding "null" values for the other columns (for this row), while still showing those columns if the user clicked "Maria" (that doesn't have any "null").
(Should I try a different approach in order to achieve this behavior?)
A look into the inner working of varsExplore::datatable2
Following your request I took a look into the varsExplore::datatable2 source code. And I found out that varsExplore::datatable2 calls varsExplore:::.callback2 (3: means that it's not an exported function) to create the javascript code. this function also calls varsExplore:::.child_row_table2 which returns a javascript function format(row_data) that formats the rowdata into the table you see.
A proposed solution
I simply used my js knowledge to change the output of varsExplore:::.child_row_table2 and I came up with the following :
.child_row_table2 <- function(x, pos = NULL) {
names_x <- paste0(names(x), ":")
text <- "
var format = function(d) {
text = '<div><table >' +
"
for (i in seq_along(pos)) {
text <- paste(text, glue::glue(
" ( d[{pos[i]}]!==null ? ( '<tr>' +
'<td>' + '{names_x[pos[i]]}' + '</td>' +
'<td>' + d[{pos[i]}] + '</td>' +
'</tr>' ) : '' ) + " ))
}
paste0(text,
"'</table></div>'
return text;};"
)
}
the only change I did was adding the d[{pos[i]}]!==null ? ....... : '' which will only show the column pos[i] when its value d[pos[i]] is not null.
Looking at the fact that loading the package and adding the function to the global environment won't do the trick, I forked it on github and commited the changes you can now install it by running (the github repo is a read-only cran mirror can't submit pull request)
devtools::install_github("moutikabdessabour/varsExplore")
EDIT
if you don't want to redownload the package I found a solution basically you'll need to override the datatable2 function :
first copy the source code into your R file located at path/to/your/Rfile
# the data.table way
data.table::fwrite(list(capture.output(varsExplore::datatable2)), quote=F, sep='\n', file="path/to/your/Rfile", append=T)
# the baseR way
fileConn<-file("path/to/your/Rfile", open='a')
writeLines(capture.output(varsExplore::datatable2), fileConn)
close(fileConn)
then you'll have to substitute the last ligne
DT::datatable(
x,
...,
escape = -2,
options = opts,
callback = DT::JS(.callback2(x = x, pos = c(0, pos)))
)
with :
DT::datatable(
x,
...,
escape = -2,
options = opts,
callback = DT::JS(gsub("('<tr>.+?(d\\[\\d+\\]).+?</tr>')" , "(\\2==null ? '' : \\1)", varsExplore:::.callback2(x = x, pos = c(0, pos))))
)
what this code is basically doing is adding the js condition using a regular expression.
Result

RSelenium - Storing data in an array

I'm extracting events description from a list of event in my website.
Each event is a href link which goes to another page where we can find the image and the description of the event. I'm trying to store the image url and the description of all events in an array so I used the code below in the end of my loop, but I only get the image and the description of the last event looped:
m<-c(images_of_events)
n<-c( description_of_events)
cc<-remDr$findElement(using = "css", "[class = '_24er']")
cc<-remDr$getPageSource()
page_events<-read_html(cc[[1]][1])
links_events_data=html_nodes(page_events,'._24er > table > tbody > tr > td >
div> div._4dmk > a ')
events_urls<-html_attr(links_events_data,"href")
//the loop of each event
for (i in events_urls) {
remDr$navigate(paste("localhost://www.mywebsite",i,sep=""))
#get image
imagewebElem <- remDr$findElement(using = "class", "scaledImageFitWidth")
images_of_events<-imagewebElem $getElementAttribute("src")
descriptionwebElem <-remDr$findElement(using = "css", "[class = '_63ew']")
descriptionwebElem <-remDr$getPageSource()
page_event_description<-read_html(descriptionwebElem[[1]][1])
events_desc =html_nodes(page_event_description,'._63ew > span')
description_of_events= html_text(events_desc)
m<-c(images_of_events)
n<-c( description_of_events)
}
To save values in array in R you have to
1) create the array/data.frame dta <- data.frame(m=c(),n=c()) and then save to it dta[i,1] <- image_of_events and dta[i,2] <- description_of_evants where i is numeric iterator
2) create the array/data.frame and use rbind to add values like dta <- rbind(dta, data.frame(m=images_of_events, n = description_of_events))

'Con not a connection' Error in R program

I am trying to use readLines in R but I am getting below error
orders1<-readLines(orders,2)
# Error in readLines(orders, 2) : 'con' is not a connection
Code :
orders<-read.csv("~/orders.csv")
orders
orders1<-readLines(orders,2)
orders1
Data:
id,item,quantity_ordered,item_cost
1,playboy roll,1,12
1,rockstar roll,1,10
1,spider roll,1,8
1,keystone roll,1,25
1,salmon sashimi,6,3
1,tuna sashimi,6,2.5
1,edamame,1,6
2,salmon skin roll,1,8
2,playboy roll,1,12
2,unobtanium roll,1,35
2,tuna sashimi,4,2.5
2,yellowtail hand roll,1,7
4,california roll,1,4
4,cucumber roll,1,3.5
5,unagi roll,1,6.5
5,firecracker roll,1,9
5,unobtanium roll,1,35
,chicken teriaki hibachi,1,7.95
,diet coke,1,1.95
I'm guessing you want this:
orders1 <- readLines( file("~/orders.csv") )
It's not clear why you want to do your own parsing or substitution, but that should give readLines a valid connection object.

Resources