The requested URL /_Incapsula_Resource was not found on this server - web-scraping

While using simple HTML dom to get some data from a site , I got this error
Not Found
The requested URL /_Incapsula_Resource was not found on this server.
as I have used this library many times but this happened for the first time and the URL works well when pasted in the browser.
Kindly can you suggest me the solution? Thanks

use this to get HTTP status code:
$code = curl_getinfo($ch, CURLINFO_HTTP_CODE);
If 404, it means the request URL cannot be found.
Note: remember to issue this line before curl_close($ch);

Related

unable to crawl a website using scrappy but the same website can be requested and used using scrappy shell using same settings

I am trying to crawl the website https://www.rightmove.co.uk/properties/105717104#/?channel=RES_NEW
but I get (410) error
INFO: Ignoring response <410 https://www.rightmove.co.uk/properties/105717104>: HTTP status code is not handled or not allowed
I am just trying to find the properties that have been sold using the notification on the page "This property has been removed by the agent."
I know the website has not blocked me because I am able to use the scrappy shell to get the data and also view(response) works fine too, I can directly go to the same URL using web browser so the 410 doesn't make sense I can also crawl pages from the same domain,
(ie) the pages without the notification "This property has been removed by the agent."
Any help would be much appreciated.
Seem's the when a listing has been marked as removed by and agent on Rightmove then the website will return status code 410 Gone (Which is quite weird). But to solve this, simply do something like this in your request:
def start_requests(self):
yield scrapy.Request(
url='https://www.rightmove.co.uk/properties/105717104#/?channel=RES_NEW',
meta={
'handle_httpstatus_list': [410],
}
)
EDIT
Explanation: Basically, Scrapy will only handle the status code from the response is in the range 200-299, since 2XX means that it was a successful response. In your case, you got a 4XX status code which means that some error happened. By passing handle_httpstatus_list = [410] we tell Scrapy that we want it to also handle 410 responses and not only 200-299.
Here is the docs: https://docs.scrapy.org/en/latest/topics/spider-middleware.html#std-reqmeta-handle_httpstatus_list

Why am i getting a 505 error code from the server?

I looked up the 505 response code and saw that it was "The Web server (running the Web site) does not support, or refuses to support, the HTTP protocol version specified by the client"
The web site I am trying to access on the web server is https://query.yahooapis.com/v1/public/yql?q=select * from yahoo.finance.quote where symbol %3D"msft"&diagnostics=true&env=store
I was able to get on the site and see that it provided xml data. However when I tried to make a HttpsURlConnection with that site, I got a 505 response code, code for doing so is
URL url = new URL(params[0]);
URLConnection connection = url.openConnection();
HttpsURLConnection httpConnect = (HttpsURLConnection) connection;
int responseCode = httpConnect.getResponseCode();
where i inspected the value of params[0] at runtime and saw that it had the right url in it. Does anyone know how i can fix this issue? The web server should support https because that link works. I don't understand why a 505 error is occuring then.
java.net.URL will allow you to create an invalid URL with spaces, which results in this error. Spaces in query parameters should be encoded as +, so make your URL:
https://query.yahooapis.com/v1/public/yql?q=select+*+from+yahoo+finance.quote+where+symbol+%3D"msft"&diagnostics=true&env=store
This request is still incorrect, and will result in a 400 ("expecting table got 'finance'"). From these questions:
Getting data from Yahoo Finance
Issue with AngularJS and financial quote
I believe you want:
https://query.yahooapis.com/v1/public/yql?q=select+*+from+yahoo.finance.quote+where+symbol+%3D"msft"&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys
I think i got the problem. I looked online and saw that someone else had the problem and that it was the spaces inside the url. I am not sure how to resolve this though

Response Redirect URL returns HTTP Error 400 - Bad Request

I'm a noob when it comes to ASP.NET. I know few basic commands such as Response.Redirect("URL") to redirect my application web page to a different location.
However i receive HTTP Error 400 - Bad Request, whenever i try to use the code shown below
Response.Redirect(Server.UrlEncode(this.Downloadlink));
where this.Downloadlink is a user defined property which returns something like this
http://mdn.vatsag.net/fp;files/DOWNLOAD/VTSetup.exe
If i post this link in the browser, the .exe file pops up (means the link is good)
However this error comes when i use the ASP.NET code.
Any form of response on this issue/reason is deeply appreciated.
See here: http://www.kirit.com/Response.Redirect%20and%20encoded%20URIs
In short: if you quickly want to fix the issue, remove the part of your code that is UrlEncoding the URL!

http method GET is not supported by URL

I run the servlet ,at the time of running it shows error 405 http method Get is not supported but ager refreshing changes is occured which is required in servlet.Why this is happening?
Please Help me.
Thanks
Perhaps you have put method ='post' and are doing a post request while trying to handle that using doGet ? . Ty remoing the method=post or override the dpPost method . If this is not the issue, please post your code.

Could it be that the google url shortening api key works only on 1 computer?

I'm using the Google URL Shortener from an ASP.NET website. It works
fine from my localhost, but on the test server I get the following
error:
System.Net.WebException: The remote server returned an error: (403)
Forbidden.
at System.Net.HttpWebRequest.GetResponse()
at GoogleUrlShortnerApi.Shorten(String url)
I'm using the exact code that is shown here:
http://www.jphellemons.nl/post/Google-URL-shortener-API-%28googl%29-C-sharp-class-C.aspx
Could it be that the key works only on my local computer, and not any other computer? I have obtained another key (using another Google account), but this one gives me the same error (403) both on my local computer, and on the test server.
I doubt very much the API is linked to a particular PC. You need to check the requests - both the URL and headers - that your program is sending out, they must be different in some way. Is your server behind some kind of proxy - e.g Apache? If not configured right this might also be mangling the request. Also make sure your requests are encoded correctly.
I made a few modifications, according to a tutorial by Scott Mitchell, and I change the following lines of code:
First, Instead of:
string post = "{\"longUrl\": \"" + url + "\"}";
I used:
string post = string.Format(#"{{""longUrl"": ""{0}""}}", url );
Second, I commented out these 2 lines:
request.ServicePoint.Expect100Continue = false;
request.Headers.Add("Cache-Control", "no-cache");
I don't know why, but suddenly it started working. So I wanted to see which of the 3 thins I did made the problem, so I returned each one, and - TADA - it still works, even with all 3 back there! So I really don't know what caused the problem, but since the code work without those 2 commented out lines, and the other modification, I am leaving it that way.
I hope this answer will help someone sometime...

Resources