I have a shiny app that takes in user input:
input$libraries is a reactive character vector produced by user input from
output$libraries <- renderUI({
checkboxGroupInput(inputId = "libraries",
label = strong("Select the libraries for which you would like to see part counts"),
choiceValues = LibraryIDs$libraryid,
choiceNames = LibraryNames$name,
selected = LibraryIDs$libraryid[1],
inline = T)}})
})
I would like to select from my postgreSQL database, I have a function set up as such:
get_query <- function(querystring){
# create a connection
# loads the PostgreSQL driver
drv <- dbDriver("PostgreSQL")
con <- dbConnect(drv, dbname = "RosettaRelational",
host = "localhost", port = 5432,
user = "postgres", password = rstudioapi::askForPassword("Database password"))
on.exit(dbDisconnect(con))
# check for the existance of tables, must be created in pgAdmin4
#dbExistsTable(con, "libraries")
query <- eval(parse(text = querystring))
return(query)
}
It takes in a string and parses it to evaluate the query
now when I try to query the database as such:
Names <- get_query(paste0("con %>% tbl('libraries') %>%
filter(libraryid %in% input$libraries) %>% select(name) %>% collect()"))
I get the error: object 'input' not found. I know it's not parsing the reactive character vector correctly. How should I change this to get it to work?
I tried:
Names <- get_query(paste0("con %>% tbl('libraries') %>%
filter(libraryid %in% '",input$libraries,"') %>% select(name) %>% collect()"))
but that only selects the first library in the vector even when the user selects multiple libraries..this works when input$ is only one character, for example when the input is an action button instead of checkboxes
basically what I need is for input$libraries to look like c('111a,'111b','211','311a') when it is passed into the string if user selects 111a, 111b, 211 and 311a, instead of just '111a' which is what it is currently passing.
It seems from some testing on my side that your code
Names <- get_query(paste0("con %>% tbl('libraries') %>% filter(libraryid %in% '",input$libraries,"') %>% select(name) %>% collect()"))
will "vectorise" in its current form for multiple libraries in input$libraries. This will create a separate string for each library in input$libraries instead of one string containing all libraries. e.g.
> Names
[1] "con %>% tbl('libraries') %>% filter(libraryid %in% '111a') %>% select(name) %>% collect()"
[2] "con %>% tbl('libraries') %>% filter(libraryid %in% '111b') %>% select(name) %>% collect()"
Using my own data and your suggestion "for input$libraries to look like c('111a','111b','211','311a')" I adapted your code to
Names <- get_query(paste0("con %>% tbl('libraries') %>% filter(libraryid %in% c(", paste0("'", input$libraries, "'", collapse = ", "), ")) %>% select(name) %>% collect()"))
This should give you your required c('111a', '111b', '211', '311a').
It's not the most elegant but it should work. You could also do that inner paste0() before if it looks messy, as below
libraries_comma_separated <- paste0("'", input$libraries, "'", collapse = ", ")
This will give you '111a', '111b', '211', '311a' and then do
Names <- get_query(paste0("con %>% tbl('libraries') %>% filter(libraryid %in% c(", libraries_comma_separated, ")) %>% select(name) %>% collect()"))
Related
I am trying to collect more tweets than is allowed in a single query, hence I am using a for loop to automate this.
tweets <- data_frame()
for(i in 1:10){
httr::GET(url = url_tweet,
httr::add_headers(.headers = headers),
query = params) %>%
httr::content(response, as = "text") %>%
fromJSON(obj, flatten = TRUE) %>%
json_data <- view(enframe(unlist(json_data))) %>%
mutate(
id2 = name %>% str_extract("[0-9]+$"), # ensure unique rows
name = name %>% str_remove("[0-9]+$") %>% str_remove("^data.")
) %>%
pivot_wider(names_from = name, values_from = value) %>%
select(`tweet_id` = id, text, user_id=includes.users.id, user_name=includes.users.username, likes=public_metrics.like_count, retweets=public_metrics.retweet_count, quotes=public_metrics.quote_count) %>%
type_convert() -> data_sep
tweets <- rbind(tweets, data_sep)
}
I have run the code individually and there is nothing wrong with any of it, but when I try to loop it I get this error
Error in `select()`:
! Can't subset columns that don't exist.
x Column `id` doesn't exist.
This question is a sequence to the problem stackoverflow
I have these two example html: url1.html ; url2.html
The url3.html is another example with more IPC
In URL2.html there is no information (51) and in URL1.html there is.
I'm using this code in R:
library(rvest)
library(tidyverse)
library(stringr)
x<-data.frame(
URL=c(1:2),
page=c(paste(readLines("url1.html"), collapse="\n"),
paste(readLines("url2.html"), collapse="\n"))
)
for (i in 1:nrow(x)){
html<-x$page[i]%>% unclass() %>% unlist()
read_html(html,encoding = "ISO-8859-1") %>%
rvest::html_elements(xpath = '//*[#id="principal"]/table[2]') %>%
html_nodes(xpath='//div[#id="classificacao0"]') %>%
html_text(trim=T)%>%
str_replace_all(.,"[\\n\\r\\t]+", "")%>%
stringr::str_trim( ) -> tmp
if(length(tmp) == 0) tmp <- "ND"
x$ipc_0[i] <- tmp %>% str_replace_all(.,"\\s+", " ") %>% str_replace_all(.," \\)", "\\)")
}
for (i in 1:nrow(htm_temp)){
html<-x$page[i]%>% unclass() %>% unlist()
read_html(html,encoding = "ISO-8859-1") %>%
rvest::html_elements(xpath = '//*[#id="principal"]/table[2]') %>%
html_nodes(xpath='//div[#id="classificacao1"]') %>%
html_text(trim=T)%>%
str_replace_all(.,"[\\n\\r\\t]+", "")%>%
stringr::str_trim( ) -> tmp
if(length(tmp) == 0) tmp <- "ND"
x$ipc_1[i] <- tmp %>% str_replace_all(.,"\\s+", " ") %>% str_replace_all(.," \\)", "\\)")
}
Result: partially correct
Desired result:create a new dataframe with the following structure.
URL
IPC
1
B62B 1/16 (1968.09)...
1
B62B 1/00 (1968.09)...
2
ND
Problem: There are url`s that have the code (51) and others that do not. When you have the code (51) the structure can contain "n" id with the following structure xpath='//div[#id="classificacao0"]. the Rating Id can contain values from 0 to "n". How to optimize this code to capture the necessary information without having to do a lot of for (variable in vector) for each "n"?
Any idea how to solve this problem?
You can use css attribute = value css selector list with ^ starts with operator to capture/exclude elements with specific id and id values.
Convert your current extraction code into a function which accepts (in this case) an url as argument. Extend the regex to remove the other characters not shown in your desired output.
Have that function return a tibble of url and ipcs found; wrap the whole thing in a map_dfr() call to generate a single DataFrame result.
library(rvest)
library(tidyverse)
urls <- sprintf("https://prequest.websiteseguro.com/example/url%i.html", 1:3)
get_ipc <- function(url) {
ipc <- read_html(url, encoding = "ISO-8859-1") %>%
html_elements("div[id^=classificacao]:not([id^=classificacaoc]) .normal > b") %>%
html_text(trim = T) %>%
str_replace_all(., "[\\n\\r\\t]+|\\(|\\s{2,}|\\)", "")%>%
stringr::str_trim()
if(length(ipc) == 0) ipc <- "ND"
return(tibble(url = url, ipc))
}
df <- purrr::map_dfr(urls, get_ipc)
print(df)
I'm building a scrape that pulls the name of a player and the years he played for thousands of different players. I have built an otherwise successful function to do this but unfortunately in some instances the table with the other half of data I need (years played) does not exist. For these instances, I'd like to add a way to tell the scrape to bypass these instances. Here is the code:
(note: the object "url_final" is the list of active webpage URLs of which there are many)
library(rvest)
library(curl)
library(tidyverse)
library(httr)
df <- map_dfr(.x = url_final,
.f = function(x){Sys.sleep(.3); cat(1);
fyr <- read_html(curl(x, handle = curl::new_handle("useragent" = "Mozilla/5.0"))) %>%
html_table() %>%
.[[1]]
fyr <- fyr %>%
select(1) %>%
mutate(name = str_extract(string = x, pattern = "(?<=cbb/players/).*?(?=-\\d\\.html)"))
})
Here is an example of an active page in which you can recreate the scrape by replacing "url_final" as the .x call in map_dfr with:
https://www.sports-reference.com/cbb/players/karl-aaker-1.html
Here is an example of one of the instances in which there is no table and thus returns an error breaking the loop of the scrape.
https://www.sports-reference.com/cbb/players/karl-aaker-1.html
How about adding try-Catch which will ignore any errors?
library(tidyverse)
library(rvest)
df <- map_dfr(.x = url_final,
.f = function(x){Sys.sleep(.3); cat(1);
tryCatch({
fyr <- read_html(curl::curl(x,
handle = curl::new_handle("useragent" = "Mozilla/5.0"))) %>%
html_table() %>% .[[1]]
fyr <- fyr %>%
select(1) %>%
mutate(name = str_extract(string = x,
pattern = "(?<=cbb/players/).*?(?=-\\d\\.html)"))
}, error = function(e) message('Skipping url', x))
})
I am trying to load the kapferer min dataset into r using the igraph function "read_graph"
The code is very simple, however it throws an error.
test_g <-read_graph("http://vlado.fmf.uni-lj.si/pub/networks/data/ucinet/kapmine.dat", format = "dl")
Error in read.graph.dl(file, ...) : At foreign.c:3050 : syntax
error, unexpected $end, expecting DL in line 1, Parse error
The as can be seen by following the link the file does begin with DL. The only clue I can find to this is a message from 2015 which basically says file a bug report.
Can dl files not beloaded by igraph at the moment, or is there some trick to it?
As there doesn't seem to be a clear way to load dl files I have made a loader that seems to work well for dl graph on the Pajek website. The function is a bit scrappy and has not been extensively tested, but it may be useful to some who want to use certain graphs that are not available in a more common format. If there is more to date information on these datasets, then this code can be ignored.
load_dl_graph <- function(file_path, directed){
raw_mat <- readLines(file_path) %>%
enframe()
row_labels_row <- grep( "ROW LABELS:", raw_mat$value)
column_labels_row <- grep( "COLUMN LABELS:", raw_mat$value)
level_labels_row <- grep("LEVEL LABELS:",raw_mat$value )
data_table_row <- grep( "DATA:", raw_mat$value)
row_labels <- raw_mat %>%
slice((row_labels_row+1):(column_labels_row-1)) %>%
select(from = value)
column_labels <- raw_mat %>%
slice((column_labels_row+1):(level_labels_row-1)) %>% pull(value)
table_levels <- raw_mat %>%
slice((level_labels_row+1):(data_table_row+-1)) %>% pull(value)
data_df <- raw_mat %>%
slice((data_table_row+1):nrow(.)) %>%
select(value) %>%
mutate(value = str_squish(value)) %>%
separate(col = value, into = column_labels, sep = " ") %>%
mutate(table_id = rep(1:length(table_levels), each = nrow(.)/length(table_levels)))
tables_list <- 1:length(table_levels) %>%
map(~{
data_df %>%
filter(table_id ==.x) %>%
select(-table_id) %>%
bind_cols(row_labels,.) %>%
pivot_longer(cols = 2:ncol(.), names_to = "to", values_to = "values") %>%
filter(values ==1) %>%
select(-values) %>%
graph_from_data_frame(., directed = directed)
})
names(tables_list) <- table_levels
return(tables_list)
}
When trying to reproduce the example found in http://tidytextmining.com/twitter.html there's a problem.
Basically I want to adapt this part of the code
library(tidytext)
library(stringr)
reg <- "([^A-Za-z_\\d##']|'(?![A-Za-z_\\d##]))"
tidy_tweets <- tweets %>%
mutate(text = str_replace_all(text, "https://t.co/[A-Za-z\\d]+|http://[A-Za-z\\d]+|&|<|>|RT", "")) %>%
unnest_tokens(word, text, token = "regex", pattern = reg) %>%
filter(!word %in% stop_words$word,
str_detect(word, "[a-z]"))
in order to keep the stop_Word included dataframe of tweets.
So i tried this :
tidy_tweets <- tweets %>%
mutate(text = str_replace_all(text, "https://t.co/[A-Za-z\\d]+|http://[A-Za-z\\d]+|&|<|>|RT", "")) %>%
unnest_tokens(word, text, token = "regex", pattern = reg)
tidy_tweets_sw <- filter(!word %in% stop_words$word, str_detect(tidy_tweets, "[a-z]"))
But that did not work as i got the following error message :
Error in match(x, table, nomatch = 0L) :
'match' requires vector arguments
I have tried to pass a vector version of both inputs to match, but to no avail.
Does anyone have a better idea?
Unsure but I think your problem is here:
tidy_tweets_sw <- filter(!word %in% stop_words$word, str_detect(tidy_tweets, "[a-z]"))
filter has no clue about what you want to filter at all, this should work:
tidy_tweets_sw <- tidy_tweets %>% filter(!word %in% stop_words$word, str_detect(tidy_tweets, "[a-z]"))
You need to have the data in your filter statement as your first argument.
tidy_tweets <- tweets %>%
mutate(text = str_replace_all(text, "https://t.co/[A-Za-z\\d]+|http://[A-Za-z\\d]+|&|<|>|RT", "")) %>%
unnest_tokens(word, text, token = "regex", pattern = reg)
tidy_tweets_sw <- filter(tidy_tweets, !(word %in% stop_words$word), str_detect(tidy_tweets, "[a-z]"))