I found this wonderful code on GitHub (https://github.com/rpodcast/nhl_analysis/blob/master/web-scraping/hockey-reference-boxscore-scratch.R), as I am new to R and more familiar to matlab, my goal was just to use the code to get the data I want. I just copied the code from his github, i imported every possible package.
After executing the code in RStudio, i get this problem:
table.stats <- readHTMLTable(full.url, header=FALSE)
Error: failed to load external entity "http://www.hockey-reference.com/boxscores/199511210BOS.html"
I tried to solve the problem with other Q&A from here, but wasnt able to. I tried to rewrite it using the httr-package instead of the RCurl package, but this doesnt work.
I really appreciate your help.
The codes you're using are last updated 7 years ago. And websites frequently change their HTML design, so codes are not guaranteed to work.
Use the following codes instead.
library(rvest)
library(httr)
ua <- user_agent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36")
url <- 'https://www.hockey-reference.com/boxscores/199511210BOS.html'
session <- html_session(url,ua)
session %>%
html_nodes("table") %>%
html_table()
Related
Some code that worked (still works flawlessly in R 4.1.3) crashes to desktop in 4.2.1.
This only happens when I try to open a table which contains a geometry column. If I select only non-geometry columns everything works fine but as soon as I try to get the geometry column, R unceremoniously dies.
(The PostgreSQL DB is definetly working with Postgis.)
I've tried loading the data with:
DBI::dbReadTable()
RPostgreSQL::dbReadTable()
RPostgres::dbReadTable()
rpostgis::dbReadDataFrame()
RJDBC::dbReadTable()
sf::st_read() ## st_read(conn, query) is how it was coded in the first place.
## I also tried dbSendQuery()
conn <- dbConnect(drv = drv,
dbname = db2,
host = host_db1,
port = db_port,
user = tolower(Sys.info()[[7]]), ## capital letters may cause failed login
password = pw)
query = paste("SELECT * FROM some_table") # this worked in R 4.1.3 but doesn't in 4.2.1
conn %>% st_read(query = query)
The PC I'm working on runs Windows 10.
I ran out of ideas what to test, since there is no error message. I can access the data as per usual via dbvis as well as any non-geometry column via queries. So I am pretty sure it isn't a syntax error or a system error. Would be happy for any advice (other than "just use R 4.1.3").
EDIT:
ression-user.log:
2022-08-09T06:24:07.254357Z [rsession-user] ERROR CLIENT EXCEPTION (rsession-user): (TypeError) : Cannot read property 'O' of null;|||org/rstudio/studio/client/workbench/views/source/editors/text/AceEditor.java#4509::setScrollSpeed|||org/rstudio/studio/client/workbench/views/source/editors/text/AceEditorMonitor.java#46::monitor|||org/rstudio/studio/client/workbench/views/source/editors/text/AceEditorMonitor.java#70::execute|||com/google/gwt/core/client/impl/SchedulerImpl.java#140::execute|||com/google/gwt/core/client/impl/Impl.java#306::apply|||com/google/gwt/core/client/impl/Impl.java#345::entry0|||rstudio-0.js#-1::eval|||com/google/gwt/cell/client/AbstractEditableCell.java#41::viewDataMap|||Client-ID: 33e600bb-c1b1-46bf-b562-ab5cba070b0e|||User-Agent: Mozilla/5.0 (Windows NT 10.0 Win64 x64) AppleWebKit/537.36 (KHTML, like Gecko) QtWebEngine/5.12.8 Chrome/69.0.3497.128 Safari/537.36
I have basic experience with R and I'm starting to use it for modulation. I came across an error that I am not able to solve, and would kindly appreciate it if any of you would be of help.
I am trying to read a .nc file, extracted from NOOA databases (https://www.psl.noaa.gov/cgi-bin/db_search/DBListFiles.pl?did=98&tid=91102&vid=1913), with the tidync() function from the tidync package. I always get the following error:
Error in RNetCDF::open.nc(x) : NetCDF: HDF error
I am able to open this same file with nc_open() from the ncdf4 package.
However, following I want to use hyper_array() from tidync() and by opening the file with nc_open it doesn't allow me to use hyper_array(), because of the class of the object.
I believe the error when opening with tidync() has to be with the version or some error from my settings, as other colleagues are using the same code and the same files and it works perfectly for them. I have tried uninstalling and reinstalling everything to try to avoid it being versions errors (as it has happened to me in the past), but still, it didn't work.
Here are my version settings:
RStudio 2021.09.0+351 "Ghost Orchid" Release (077589bcad3467ae79f318afe8641a1899a51606, 2021-09-20) for Windows
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) QtWebEngine/5.12.8 Chrome/69.0.3497.128 Safari/537.36
I would kindly appreciate it if someone could help me out, either by correcting the error or by changing the class object so I can open it with nc_open and afterwards use hyper_array on it.
Thanks!!
Code used:
library(tidync)
library(ncmeta)
library(plyr)
library(ggplot2)
library(ncdf4)
getwd()
oisstfile <- "C:/Users/marta/Documents/Ciência/Doutoramento/Doutoramento_BYT_Marta/R/R_Data/CTI/Guillem/Global_1980_2020/Global_1980_2020/pottmp.1980.nc"
oisst <- tidync(oisstfile)
oisst_data <- oisst %>% hyper_array()
trans <- attr(oisst_data, "transforms")
Gives error:
# Error in RNetCDF::open.nc(x) : NetCDF: HDF error
Im trying to get JSON from webapi to R
From the following website
http://wbes.srldc.in/Report/GetCurrentDayFullScheduleMaxRev?regionid=4&ScheduleDate=31-05-2020
When i try in browser, getting a proper response.
However when i try in R using any method, the fetched data is of a different html page without any JSON format.
Im just starting with R without any programming background. Please help
I feel your pain, it seems the website you're trying to access checks for user-agents to avoid scraping. I'd set a common user agent in httr (Chrome or Firefox are fine) and perform a single GET request. This should get you through scraping the MaxRevision value:
library(httr)
url <- "http://wbes.srldc.in/Report/GetCurrentDayFullScheduleMaxRev?regionid=4&ScheduleDate=31-05-2020"
ua <- "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"
res <- GET(url, user_agent(ua))
content(res)
#> $MaxRevision
#> [1] 216
Created on 2020-06-01 by the reprex package (v0.3.0)
Out of now where I'm getting this 404 error when browser is requesting the jquery.min.map.
Funny this is that I've never added this file to my solution.
Can anyone explain to me how to get ride of this error?
I have no idea where this file is being referenced since I did not add a reference to this file.
Request URL:http://localhost:22773/Scripts/jquery.min.map
Request Method:GET
Status Code:404 Not Found
Request Headersview source
Accept:*/*
Accept-Encoding:gzip,deflate,sdch
Accept-Language:en-US,en;q=0.8
Connection:keep-alive
Host:localhost:22773
Referer:http://localhost:22773/Manager/ControlPanel.aspx
User-Agent:Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.62 Safari/537.36
Response Headersview source
Cache-Control:private
Content-Length:4992
Content-Type:text/html; charset=utf-8
Date:Tue, 10 Sep 2013 17:37:40 GMT
Server:Microsoft-IIS/8.0
X-Powered-By:ASP.NET
X-SourceFiles:=?UTF-8?B?YzpcdXNlcnNcYWRtaW5pc3RyYXRvclxkb2N1bWVudHNcdmlzdWFsIHN0dWRpbyAyMDEyXFByb2plY3RzXEFsdW1DbG91ZFxBbHVtQ2xvdWRcU2NyaXB0c1xqcXVlcnkubWluLm1hcA==?=
Source maps are like favicons, a thing that will be loaded by browsers in some circumstances.
Typically, javascript are minified on production servers and debugging javascript on them is difficult.
Source maps are the original versions of minified javascript. It's up to the developers to include them or not on their websites.
In Chrome, you have to activate this functionality for the browser to attempt to download the original non-minified version of a minified script. It is then easier to debug client-side.
Basically, you can't get rid of this error besides providing source maps.
Anyways, see: http://www.html5rocks.com/en/tutorials/developertools/sourcemaps/
I just fixed this in my own app.
Files that you copy from a CDN often have the sourcemap line at the bottom. For axios.min.js, it's
//# sourceMappingURL=axios.min.map
Just remove that line and you won't get that error. Better still, use the version they provide for local loading.
I came across this when developing something without reliable internet access, so I needed the local version. Removing that line solved the problem.
I'm running some tests with PhantomJS / CasperJS on Ubuntu and Google Analytics, and i'm having problems with GA to correctly recognize my language settings that i'm sending in HTTP Request Headers.
No matter what i enter in my Accept-Language header, i end up with GA classifying the language as "c".
I'm sure my Accept-Language headers are correct, here's an example:
ACCEPT-ENCODING:gzip, deflate
CONNECTION:Keep-Alive
ACCEPT-LANGUAGE:en-US
USER-AGENT:Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/604.1.38 (KHTML, like Gecko) Version/11.0 Mobile/15A372 Safari/604.1
ACCEPT:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
After long hours of trial-and-error i found out that C was in fact the default setting for the LANG env variable inside Ubuntu itself:
LANG=C.UTF-8
I can in fact impact Google Analytics classification by altering my ENV variables by using the following command from the command line:
export LC_ALL="en_US.UTF-8"
It does not work if i only set "export LC_LANG" or "LANGUAGE". I am not sure why either.
But how do i control this setting from inside PhantomJS / CasperJS? I can't / don't want to have to change my ENV variables for each PhantomJS run from CLI, i test multiple languages at once in big numbers.
Has anyone experienced this and can help?
I managed to find a hack-ish solution to this problem. I simply use the following command fromt he CLI:
$ LC_ALL=en-gb phantomjs script.js
and that passes the Accept-Language correctly to Google Analytics.
I think there's a problem with CasperJS request-headers being correctly passed on to PhantomJS.