I'm using phantomjs (2.0) to try and take a screenshot of the following website: http://www.langorigami.com/art/gallery/gallery.php?tag=birds&name=annas_hummingbird_3 but am getting the following errors for each image:
[DEBUG] Network - Resource request error: 202 ( "Error downloading http://www.langorigami.com/header/logo.gif - server replied: Forbidden" ) URL: "http://www.langorigami.com/header/logo.gif"
Any idea why and/or how to remedy?
From the site's Copyright & Usage page:
...“the Content” are protected by copyright and are the property of Robert J. Lang unless specifically noted. This includes (but is not limited to) articles, text, photographs, images ...
Seems to me the site builders have taken care for anti-scrape measures. See my answer as to web-scraping protecting means.
Would you provide the essential scrape code you've already done?
Related
Trying to make a request like shown below
(https://learn.microsoft.com/en-us/graph/api/user-list?view=graph-rest-1.0&tabs=http#code-try-15)
picture
picture
Tried to see if it works on their graph explorer ---> https://developer.microsoft.com/en-us/graph/graph-explorer
I need specifically to make this request work ---> https://graph.microsoft.com/v1.0/users?$search="displayName:wa"
But as you see I get this error which suggests that I didn't add the consistencyLevel header but I did, in multiple ways. It's annoying :))
============================================
Update:
I logged in my student microsoft account and now I got this
picture
You missed the "?" in the url
====================Update====================
It can work, but you need to send request with a correct access token. I'm afraid it met some issue when test in the self-contained test tool.
I'd like to scrape team advanced stats from stats.nba.com.
My current code to get the XHR file where the data is stored is :
library(httr)
library(jsonlite)
nba <- GET('https://stats.nba.com/stats/leaguedashteamstats?Conference=&DateFrom=11%2F12%2F2019&DateTo=&Division=&GameScope=&GameSegment=&LastNGames=0&LeagueID=00&Location=&MeasureType=Advanced&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=PerGame&Period=0&PlayerExperience=&PlayerPosition=&PlusMinus=N&Rank=N&Season=2019-20&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StarterBench=&TeamID=0&TwoWay=0&VsConference=&VsDivision=')
I get the URL via these steps in Chrome:
Inspect -> Network -> XHR
The code throws this error:
Error in curl::curl_fetch_memory(url, handle = handle) :
LibreSSL SSL_read: SSL_ERROR_SYSCALL, errno 60
I also tried it with custom advanced filters on the website which either result in the same error or the code running forever. I'm not that great in web scraping so I would appreciate if anyone can point out what the issue is here.
I have had a good look at this. It looks like this site goes to some lengths to prevent scraping, and won't give you the json from that url unless you provide it with cookies that are generated by a back-and-forth between your browser's javascript and their own servers. They also monitor request timings with New Relic technology and are therefore likely to block your IP if you scrape multiple pages. It wouldn't be impossible, but very, very hard.
If you are desperate for the data you could look into using the NBA API which requires a sign-up but us free to use for 1000 requests per day.
The other option is to automate a browser using RSelenium to get the html of the fully rendered pages.
Of course, if you only want this one page, you can just copy the html from your Chrome's inspector, then use rvest::read_html(readClipboard())
I'm a beginner of AMP from Japan.
Now I'm in trouble dealing with a error that is output when I introduce AMP into my WordPress webpage.
I cloud activate the AMP plugin and display an AMP of the post page without problem.
But the following error was output on my browser console when I added #development=1 to the end of URL of the post AMP to confirm whether the post AMP was configured as a valid AMP on Google.
Failed to load resource: the server responded with a status of 404 ()
https://cdn.ampproject.org/v0/validator_minified.js.sourcemap
Then I accessed to the above URL described in the error message, the following error page was displayed.
Google
404. That’s an error.
The requested URL /v0/validator_minified.js.sourcemap was not found on this server. That’s all we know.
I guess the output post AMP may not be recognized by Google as a structure of AMP if this error page was displayed.
But I have no idea to resolve the 404 error and can't progress any more.
In other words, I'd like to know some solution and hints to resolve the 404 error in order that Google recognizes my post AMP.
If you have some solution or hints, I'd be very helpful if you provide them for me.
Thanks in advance.
clear your cache in the server and delete your log. Blocking an malicious IP trafics and your see the IP in 30 per connection then you want blocked the IP.
Using proxy or like mitigation
I really desperately need your help because I got very unusual problem with my programs:
I am receiving this error when I am trying to debug my ASP.NET or MVC Application on local IIS Server 7.5:
"Unable to start debugging on the web server. The debug request could not be processed by the server due to invalid syntax. "
I tried literally every solution I could find in google until this day. I spent 12 hours trying to figure this out. Without luck.
Error happens only with this address: http:// localhost/AspDemo not with this one: https:// localhost/AspDemo. Basically, I can debug like always if I put https instead of http.
https is disabled on my local IIS server :), I mean it is set to ignore.
Moreover I CAN open my sites with using BOTH protocols in IIS Control Panel
If I disable ASP.NET Debugger in Properties of my App in Visual Studio, Application runs fine.
This is the error from the httperr log:
2014-10-30 00:23:46 ::1%0 2977 ::1%0 80 - - - 400 - Verb -
I am not sure where exactly, but in some other log I saw reference to something like this: Error 400 "Bad Request - Invalid Verb"
A week ago ALL applications I got problems with were working perfectly with asp.net debugger from visual studio, I had no idea what happened.
I suspect some update. Because updates lately messed with my custom bootloader as well.
When I enable tracing - log is empty with http
Fiddler log is empty as well, maybe there is some config I can use in Fiddler to produce some more logs? (It logs of course with modified machine.config when I use https)
Would be eternally grateful for your help, if you need some more logs, please don't hestitate to ask.
I would like to mention as well that yes, I saw similar posts on this site, but none of them described problem this similar to mine.
/edit
From what I was able to observe, error happens BEFORE debugger access machine.config file, can you guys tell me how to catch errors in that moment?
==== /edit 2 ====
Anyone? No one knows the answer?
Recently I was able to find the complete error message in: C:\Users\\AppData\Local\Temp\Visual Studio Web Debugger.log
http://localhost/MVCDemo/debugattach.aspx
Status code=400 (Bad Request)
Protocol version=1.1
Cached=False
Connection=close
Content-Length=326
Content-Type=text/html; charset=us-ascii
Date=Fri, 31 Oct 2014 03:44:14 GMT
Server=Microsoft-HTTPAPI/2.0
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN""http://www.w3.org/TR/html4/strict.dtd">
<HTML><HEAD><TITLE>Bad Request</TITLE>
<META HTTP-EQUIV="Content-Type" Content="text/html; charset=us-ascii"></HEAD>
<BODY><h2>Bad Request - Invalid Verb</h2>
<hr><p>HTTP Error 400. The request verb is invalid.</p>
</BODY></HTML>
In proccess monitor, I found this one as well:
<event>
<ProcessIndex>1339</ProcessIndex>
<Time_of_Day>04:40:33,3661553</Time_of_Day>
<Process_Name>devenv.exe</Process_Name>
<PID>10768</PID>
<Operation>CreateFile</Operation>
<Path>C:\Users\<UserName>\AppData\Roaming\Microsoft\VisualStudio\12.0\Debugger\StepIntoFilterData.ini</Path>
<Result>PATH NOT FOUND</Result>
<Detail>Desired Access: Read Data/List Directory, Read Attributes, Synchronize, Disposition: Open, Options: Sequential Access, Synchronous IO Non-Alert, Non-Directory File, Attributes: n/a, ShareMode: Read, Delete, AllocationSize: n/a</Detail>
</event>
I really, really need help with this.
=== /edit 3 ===
Second error is not relevant, I just checked and it is present with https as well (C:\Users\\AppData\Roaming\Microsoft\VisualStudio\12.0\Debugger\StepIntoFilterData.in), and debugging as https localhost works perfectly.
=== /edit 4 ===
Here are Process Monitor logs captured during Visual Studio Debugging HTTP (not working) and HTTPS (working)
(Test performed on random MVC Tutorial)
Use CTRL+F and Look for "Visual Studio Web Debugger.log" in logs to get the idea when it is happening. In https log is good, in http log returns Invalid Verb error mentioned before.
HTTP (not working)::
https://www.dropbox.com/s/7b26ybogtyqlico/LogFile%20HTTP%20NOT%20Working.CSV?dl=0
HTTPS (working):
https://www.dropbox.com/s/ggsj57v97ky90e6/LogFile%20HTTPS%20Working.CSV?dl=0
I might be wrong, but I think the key here is that only HTTP doesn't work and only with VS Debugger, everything else is just fine (HTTP and HTTPS without debugging and HTTPS with debugging.)
(It happens with every solution on IIS, new ones are affected too, all of these solutions were working, moreover some of them were not even changed from the time they have been working, IIS config didn't change as well)
I do not see the name of the verb in any of the description here, but my guess is VS debugger using a special verb (DEBUG) rather than the standard GET, HEAD, POST, etc. If you re-installed IIS after VS, the ISAPI mapping probably got nuked. This may solve the problem:
https://msdn.microsoft.com/en-us/library/ms165022%28v=vs.90%29.aspx
When attempting to upload any number of documents, including very small files, seems to succeed- but subsequently redirects to an error page indicating the following:
/_layouts/error.aspx?ErrorText=The%20HTTP%20verb%20POST%20used%20to%20access%20path%20%27%2F%5Fvti%5Fbin%2Fshtml%2Edll%2FSiteCollectionDocuments%2FForms%2FUpload%2Easpx%27%20is%20not%20allowed%2E
The HTTP verb POST used to access path '/_vti_bin/shtml.dll/SiteCollectionDocuments/Forms/Upload.aspx' is not allowed.
Any ideas as to why HTTP POST would be denied for this operation?
Update:
Navigating directly to /_vti_bin/shtml.dll/SiteCollectionDocuments/Forms/Upload.aspx gives:
The XML page cannot be displayed
Cannot view XML input using style sheet. Please correct the error and then click the Refresh button, or try again later.
An invalid character was found in text content. Error processing resource 'http://sitename/...
MZ
Error in event log looks like this:
Critical error has occured but the exception object has already been cleared
Current Url: /_vti_bin/shtml.dll/SiteCollectionDocuments/Forms/Upload.aspx
User Login: xxxxxxx
User is Authenticated: True
Performance Counters
% Processor Time Total: 0
Processor Queue Length: 1
ASP.NET Request Queued Total: 1
.NET CLR Exceptions, # of Exceps Thrown: 55
PATH_INFO: /_vti_bin/shtml.dll/SiteCollectionDocuments/Forms/Upload.aspx
PATH_TRANSLATED: C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\12\isapi\shtml.dll
The solution turned out to be removing the wildcard application mapping in IIS.
The url which receives the document upload via HTTP POST /_vti_bin/shtml.dll/SiteCollectionDocuments/Forms/Upload.aspx was being incorrectly mapped to C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\aspnet_isapi.dll and thus failing.
Update:
This solution tended to break other functionality, such as document deletion, and was scrapped during testing.
As it turns out, there was an HTTP module that was causing this url to be processed incorrectly. I added a bypass for /_vti_bin/shtml.dll/SiteCollectionDocuments/Forms/Upload.aspx and this solved the issue with no side effects.
The supported methods of uploading documents to SharePoint are:
Using web services (extensive example here)
Using RPC (example here)
Using the object model (example here)
Are you able to use one of these methods? If not can you please edit your question with more information about why and some sample code?
A guess at why the HTTP POST method isn't working is probably because it's for internal SharePoint use only.