Error in open.connection(x, "rb") : HTTP error 405 - web-scraping

While trying to extract data from Glassdoor, I got the following error.
Error in open.connection(x, "rb") : HTTP error 405.
Here is the code:
rm(list=ls())
library("rvest")
htmlpage <- read_html("https://www.glassdoor.co.uk/Reviews/Google-Reviews-E9079.htm")
forecasthtml <- html_nodes(htmlpage, ".summary")
SelectorGadget was used to select just the Headlines associated in each review and is given by .summary in the above code.
Is it because extracting data is not allowed or is there any fundamental mistake behind coding?

Related

error while trying to read a single cell genomic dataset

My code is
multiome <- load_object(file.path(dataset, "C:/Users/s/Desktop/Shahid/MOJITOO DATASETS/PBMC_Multiome.Rds"))
I get the following error:
Error in open.connection(3L, "rb") : cannot open the connection
In addition: Warning message:
In open.connection(3L, "rb") :
cannot open file 'kidney/C:/Users/s/Desktop/Shahid/MOJITOO DATASETS/PBMC_Multiome.Rds': Invalid argument
Called from: open.connection(3L, "rb")
I want to implement the MOJITOO single cell integration model. I get the error shown above.

Error in open.connection(x, "rb") : http error 403

My code ruturn an error message "error in open.connection(x, "rb") : http error 403." when scraping indeed.com
for(i in 0:(count-1)){
#progress$inc((i)/count, detail = paste0("https://www.indeed.com/jobs?q=", URLencode(job), "&start=", i*15))
print(paste0("https://www.indeed.com/jobs?q=", job, "&start=", i*15))
page = read_html(paste0("https://www.indeed.com/jobs?q=", URLencode(job), "&start=", i*15))
jobcards <- html_node(page, "#mosaic-provider-jobcards")
job_links <- html_nodes(jobcards, 'a[id^="job"]')
This code works well half years ago. Is it due to the anti-web-crawler system? Is there anything I can do to fix it?
My program is trying to scrape data from indeed.com based on given job titles and technologies list to analyze the word frequency of each technology.

HTTP 403 error when using lookupUsers for a list of twitter handles

I have a list of twitter handles in a csv and I am trying to extract data for all these handles.My csv contains around 200 twitter handles
users <- read.csv("Twitter.csv")
users1 <- lookupUsers(users[1:nrow(users),1])
however, I am getting the following error:
Error in twInterfaceObj$doAPICall(paste("users", "lookup", sep = "/"), :
Forbidden (HTTP 403).
Anybody knows why am I getting this error and how can I fix it?

Reading error status code in R and handling the exception

I am getting some data from a server in json or xml format , and the server is sending status code in header like 500,200,404 etc. So how can I get the error status code or the error and handle it in R. If I can get a simple sample code or any reference, it would work.
Or if there is some other way around that will also work.
If you are just looking to collect the status responses, you just need to inspect/parse what you get back.
library(httr)
GET("www.google.com")$status
# [1] 200
As a starting point for error handling... if you just want console warnings then you can use the function of the same name.
info_get <- GET("www.google.com")
if (info_get$status == 200) {
warning(paste0("Response ", info_get$status, " received from target."))
}
# Warning message:
# Response 200 received from target.

url_absolute error "not compatible with STRSXP" when using submit_form

I am trying to scrape the http://www.emedexpert.com/lists/brand-generic.shtml web page for brand and generic drug names
library(httr)
library(rvest)
session <- read_html("http://www.emedexpert.com/lists/brand-generic.shtml")
form1 <- html_form(session)[[2]]
form2 <- set_values(form1, brand = "tylenol")
submit_form(session, form2)
however this results in the error message:
Error in xml2::url_absolute(form$url, session$url) :
not compatible with STRSXP
Therefore, based on this answer to the same error message ("Error: not compatible with STRSXP" on submit_form with rvest) I added a session$url as follows:
session$url <- "http://www.emedexpert.com/lists/brand-generic.shtml" # added from S.Ov
but I still get the same error message. So I tried also adding various permutations of also adding form2$url such as these
form2$url <- "http://www.emedexpert.com/lists/brand-generic.shtml"
form2$url <- ""
form2$url <- "/"
submit_form(session, form2)
At this point, the error message goes away and I obtain a web page which contain most of the desired web page. However it seems to completely lack the table of brand and generic names.
Any suggestions?
Yes #hackR, RSelenium is not always the answer.
library(rvest)
url<-"http://www.emedexpert.com/lists/bg.php?myc"
page<-html_session(url)
table<-html_table(read_html(page))[[1]]
This could help you I hope.

Resources