Rcrawler package: ContentScraper Error - r

i've a problem with ContentScraper function of Rcrawler package. I would like to extract from this site some information about time and airports of arrival and departure and also the price: (I took inspiration fom this site)
MY_Data=ContentScraper(CssPatterns = c(".leg",".price"), ManyPerPattern = T, Url = "http://www.skyscanner.it/trasporti/voli/rome/lond/180201?adults=1&children=0&adultsv2=1&childrenv2=&infants=0&cabinclass=economy&rtn=0&preferdirects=false&outboundaltsenabled=false&inboundaltsenabled=false&ref=day-view#results")
but i get this error:
Error in LinkExtractor(url = Ur, encod = encod) : object 'Extlinks' not found
I had a look to LinkExtractor function but i have no ideas of why it doesn't find Extlinks since it should be created by the function itself. Isn't it?
Someone could help me?
Thank You!

This website doesn't allow scraping. This may be one reason why your example doesn't work. You can try in this web. I also recommend you to try rvest package which is easier to use.

I have tried the same request using Rcrawler+phantomjs web driver but no result, There is some sort of javascript protection against unreal sessions,
br<-run_browser()
MY_Data<-ContentScraper(CssPatterns = c(".leg",".price"), ManyPerPattern = T, Url = "https://www.skyscanner.it/trasporti/voli/rome/lond/?adults=1&children=0&adultsv2=1&childrenv2=&infants=0&cabinclass=economy&rtn=0&preferdirects=false&outboundaltsenabled=false&inboundaltsenabled=false&ref=day-view&oym=1903&selectedoday=01", browser = br, RenderingDelay = 5)
I retrieved the session Screenshot, and I can confirm that the javascript which load results is stuck .
Using Rselenium+ chrome headless (with gpu enabled) I got robot check page. (see images)
As a result the only hope to get data legitimately is to use their API
Rcrawler creator

Related

Calling Mastodon API in R Studio

Please can someone help me figure out what is going on? I used the mastodon library in R Studio to extract some data from the fediverse successfully a while ago. Here is the code I used:
tokens <- login("https://mastodon.social/",user = user,pass = password)
"user" is my email address.
It worked well initially, but trying it again, I am getting this annoying error message, which I do not understand:
Error in UseMethod("content", x) :
no applicable method for 'content' applied to an object of class "response"
Please can any good samaritan out there who has used this library in R Studio help me figure out what is going on? I need to prepare a report on this project. Thanks in advance of your help.
It is possible that some functions may have masked the function. By calling this on a fresh R session, it did work
tokens <- login("https://mastodon.social/",user = user,pass = password)
tokens$instance
[1] "https://mastodon.social/"

Flatten facet.pivot from solr query in R

I'm trying to flatten a facet.pivot from a solr query.
I've came across this page: https://rdrr.io/github/ropensci/solr/man/pivot_flatten_tabular.html#heading-1
which says there is a function (pivot_flatten_tabular) that does it, but, after installing the package solrium the function does not appear.
Any ideas of why is it not working?
maintainer of solrium here: I changed the pkg name form solr to solrium. The pivot_flatten_tabular fxn is still there, its not exported though, you can access it with triple colon solrium:::pivot_flatten_tabular
Here's an example facet pivot query with solrium
cli <- SolrClient$new(host = "api.plos.org", path = "search", port = NULL)
solr_facet(cli, params = list(q='alcohol', facet.pivot='journal,subject',
facet.pivot.mincount=10))

R: Error in get_map()/get_googlemap() from ggmap

I am trying to use GGmap to create a plot of vehicle car crashes by state. The map will have dots which are sized based on the number of car crashes in the state.
In particular I am trying to recreate the usa-plot shown in the visualizing clusters section of this blog post.
However, whenever I try to create the map I get this error.
Error in aperm.default(map, c(2, 1, 3)) :
invalid first argument, must be an array
I have setup the Google API and see that it is recieving hits. I have also enabled it and have the key.
In addition I have installed GGmap from the github account using this command:
devtools::install_github("dkahle/ggmap", ref = "tidyup", force=TRUE)
since the CRAN one isn't updated.
I have restarted and quit R several times as well but the error persists.
Even if I just simply run:
get_map()
it still results in the error:
Error in aperm.default(map, c(2, 1, 3)) :
invalid first argument, must be an array
Below is my code, it is similar to the code in the blog post:
mydata$State <- as.character(mydata$State)
mydata$MV.Number = as.numeric(mydata$MV.Number)
mydata = mydata[mydata$State != "Alaska", ]
mydata = mydata[mydata$State != "Hawaii", ]
devtools::install_github("dkahle/ggmap", ref = "tidyup", force=TRUE)
library(ggmap)
ggmap::register_google(key = "...") #my key is here
for (i in 1:nrow(mydata)) {
latlon = geocode(mydata[i,1])
mydata$lon[i] = as.numeric(latlon[1])
mydata$lat[i] = as.numeric(latlon[2])
}
mv_num_collisions = data.frame(mydata$MV.Number, mydata$lon, mydata$lat)
colnames(mv_num_collisions) = c('collisions','lon','lat')
usa_center = as.numeric(geocode("United States"))
USAMap = ggmap(get_googlemap(center=usa_center, scale=2, zoom=4),
extent="normal")
USAMap +
geom_point(aes(x=lon, y=lat), data=mv_num_collisions, col="orange",
alpha=0.4, size=mv_num_collisions$collisions*circle_scale_amt) +
scale_size_continuous(range=range(mv_num_collisions$collisions))
I expect the map to output like this
But I cannot seem to get passed this error.
If anyone can help that would be great.
Please let me know if you need any more information.
Thank you.
This error is due to the google key not having the appropriate API activity enabled for that key.
Go into the google API console and enable the API "Maps Static API" and it should work for you.
EDIT: Jan 2020 - I was doing some similar work and found that a similar API was failing because billing information had to be added to the project in the Google Cloud console before it would work.
Make sure to enable billing. You don't have to restrict api, but make sure all the api's you need are enabled. if you want to search location names, you'll need geocoding api in addition to static maps. ggmap from CRAN is OK now (don't need github version).
it´s necessary confirm your credit card in Google API, with this, your API key is activated and you can use ggmap normally

unused argument (key = "iris.hex")

When ever I try to run this line or any other line which uses key(following document in http://h2o-release.s3.amazonaws.com/h2o/rel-lambert/5/docs-website/Ruser/rtutorial.html)
iris.hex = h2o.uploadFile(localH2O, path = irisPath, key = "iris.hex")
I get an error in the key calling it as unused argument.
This is the first time I am using H2O and I am new to R as well. Please let me know what is the function of key and only when I run this, I get error. I could create a dataframe with the following statement. But still I would want to understand this key error
h2o.init(ip = "localhost", port = 54321, startH2O = TRUE)
irisPath = system.file("extdata", "iris.csv", package = "h2o")
iris.hex = h2o.uploadFile(path = prosPath, destination_frame = "iris.hex")
iris.data.frame<- as.data.frame(iris.hex)
summary(iris.data.frame)
H2O may be very good in various areas. but unfortunately lack of documentation and tutorial makes it's really difficult to learn...
Hoping that they watch these type of comments and improve their documentation.
At least launching one tutorial of 12GB airlines data processing can help a lot for multiple enthusiastic people who really wanted to explore H2O.
This is a very outdated version of the H2O docs and there have been some major API changes since H2O 3.0. The latest R docs can always be found at: http://h2o-release.s3.amazonaws.com/h2o/latest_stable_Rdoc.html
Our main docs landing page has a link to the latest R docs, Python docs, and a bunch of other links you may find useful. We also have a Google Group called h2ostream for posting new questions and searching through old questions. Welcome to H2O!

What to do when a NOAA ERDDAP dataset is not found?

I'm trying to download some gridded ERDDAP data using the rnoaa package in R. While the data retrieval works perfectly for some datasets, I'm having some problems getting the data for some datasets in particular. For example when I run:
library (rnoaa)
ds.info <- erddap_info ("noaa_pfeg_95de_54ab_a60a")
erddap_grid (ds.info,
time = c("2005-01-01", "2015-01-01"),
altitude = c (0,0),
latitude = c (3.25, 3.75),
longitude = c (72.5, 73.25),
fields = "all")
I get the following error:
`Error: (404) - Resource not found: /erddap/griddap/ncdcOwDly.csv (Currently unknown datasetID=ncdcOwDly)`.
The error is not really consistent because it works sometimes when I try different time-spans. But I get it pretty much every single time I try to download data from the datasets noaa_pfeg_95de_54ab_a60a, noaa_pfeg_1a4b_0c2a_2365 and some others by NOAA-NCDC.
Because erddap_grid works for some datasets but not for others, I'm inclined to think it's not a bug. Maybe it is a problem of the ERDDAP server?, or maybe something to do with my API key? Is there a way around it?
Update - 2015-01-10: It seems it is a server's problem. When trying to download the data using the address generated by the web interface (see below) I get the same error. I guess I'll just have to wait until "they" fix the problem with the database.
http://coastwatch.pfeg.noaa.gov/erddap/griddap/ncdcOw6hr.csv?u[(2006-01-01):1:(2015-01-09T18:00:00Z)][(10.0):1:(10.0)][(3.25):1:(3.75)][(72.5):1:(73.25)],v[(2006-01-01):1:(2015-01-09T18:00:00Z)][(10.0):1:(10.0)][(3.25):1:(3.75)][(72.5):1:(73.25)]
ERDDAP servers often become overloaded and 404 on some requests. These are public-facing servers that do heavy data lifting, after all.
So the answer here is to try again after waiting some time (please wait a while to be nice to the ERDDAP administrators), and contact the server administrator to be sure that your IP address has not been blacklisted for performing too many requests.

Resources