Always getting status 429 through scrapy

Always getting status 429 through scrapy - web-scraping

I am always getting status 429 through scrapy but then get status 200 when using browser for the same url. Is this a preventative measure by the domain to disallow scraping of their site or is it my setting?
As I know, status 429 is too many request. I have tried setting concurrent request to 1 and it's still not working.
Hope someone could give me some feedback on this.
Thanks all

Would you be able to share the domain you're attempting to scrape? This will help me debug your issue.
Thx

Related

HTTP error 414 returning from the payment gateway

I have a setback I hope you can help me
the following happens
I'm doing a test of the payment gateway of my website and what happens is that once inside the checking (or about to make the payment) when I go back it sends me back to the page to finalize the purchase (so far so good) .
the problem is that it seems that when doing that some options on the page disappear and when trying to go to the home page
I get an error 414 for URl too long.
and I can not access the page again until I exit the browser clear cache and reopen
I know that this can be solved by modifying the Apache httpd configuration, however it seems to me that it is not correct.
I use Wordpress with Woocommerce and CoinPayments gateway
I do not put my web address because I do not know if it is allowed by the forum, thank you very much in advance.

Public LinkedIn profile url returns server Status code 999

LinkedIn Consumer Support asked me to add this question here for #LinkedIn developpers to answer this.
I have seen there are multiple questions about this 999 status code, but they are all API related. My question is not API related.
Here it is. On my website I have a link to my public LinkedIn profile: https://nl.linkedin.com/in/jpcornelissen/nl
The broken link checker plugin on my website tells me that this link is broken with the error: SERVER RESPONSE: HTTP/1.1 999 Request denied
Why is that? The page is accessible so it should return status code 200 not 999. Status 999 is not even an existing http status code.
The issue is not plugin related. You also get the 999 status code if you check with http://tools.seobook.com/server-header-checker/
Regards

Maybe you can find the answer here : 999 Error Code on HEAD request to LinkedIn or here : https://social.msdn.microsoft.com/Forums/vstudio/en-US/5a4f8eb5-bf1b-4776-b4bb-4baef621838f/999-non-standard-linked-in-error?forum=csharpgeneral
It seems that LinkedIn blocks "bad" request with this 999 non-standard Status Code (they should better respond with 400 Bad Request and an explanation). Some reports that it comes from the HEAD method (which is similar to GET but does not request the body), or a missing header( Accept-Encoding), or the User-Agent header, or the source IP (from Heroku).
Only LinkedIn can explain why. Chances are they will not explain why because of Security through obscurity.

Update,
It looks like this is a common occurrence and my investigation concluded when I realized the resource wasn't available for the main LinkedIn servers. Thus, when inspecting the profile badges page the script we need also has a status code of 999. I've opened a ticket, hopefully in the future they will provide a more stable means with full html+css options :)

410 gone error response from IIS

I've recently come accross an issue with IIS responding with a 410 status code when a request query string exceeds 500 characters.
The application is an asp.net web app using the classic managaged pipline.
I've already tried increasing the maxRequestLength, maxUrlLength, and maxQueryStringLength to their max of 2097151 to no avail.
I don't understand why the server would respond with a 410 status code.
Any help would or insight would be greatly appreciated.

Correct HTTP status code when resource is available but not accessible because of permissions

I am building a RESTful protocol for Dynamic Carpooling applications, for my Computer Science thesis.
In the Protocol I also have to formally specify the HTTP status code for each operation. I've got this "privacy related" problem. Suppose the following:
GET /api/persons/angela/location
Retrieves the current position of user "angela".
It is obvious that not everybody should be able to obtain a result. Only angela itself and a possible driver that is going to pick her should be able to know it.
I can not decide whether to return a 404 Not Found or a 401 Forbidden here.
Any hints? What would be the best one and why?

According to Wikipedia (and RFC 2616), a 401 code is used when a page exists but requires authentication; 403 is for a page where authenticating won't change anything. (In the wild, 403 usually means the permissions on something are wrong, whereas a 401 will prompt the user for a username/password). 404 is for where the document simply doesn't exist.
In your case, it seems like 401 is the most appropriate code, since there is some way of authenticating the users who DO have access to the page.

If authorization credentials are provided in the request and the requester does not have permissions to access this resource then you should return 403.
If no authorization credentials are provided in the request then you should return 401.

Definitely NOT 404. 404 is just Not Found.
401 is access denied.
403 is forbidden.
I would go with 401

To me I will use 400 Bad request.
Because my application will not go unaccessable resources in programmatically.
Filtering users permission and hide unaccessable resources is good user experience in my opinion.
If my server got unaccessable request which means some person trying to do something.
That is why I choose 400 - Bad request in my applications.

Unauthorized error when surfing to files

My IIS is configured to use WindowsAuthentication.
When I surf to the file:
I can see the file perfectly but when i'm checking with charles (http debugger) I see the following result.
alt text http://img155.imageshack.us/img155/6428/capturea.jpg
The problem is that when a browser does this, it will retry (apperantly up to 3 times) but when .NET (spring.net) tries this, it's crashes after the first attempt saying that I'm unauthorized.
Does anyone has a solution for this? I've been struggeling with this problem for weeks now.

I see this is normal. You may learn more about the conversation between client and server from this KB article,
http://support.microsoft.com/kb/264921
Then you will know why those 401 messages were there.
Regards,

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Always getting status 429 through scrapy - web-scraping

Would you be able to share the domain you're attempting to scrape? This will help me debug your issue. Thx

Related

HTTP error 414 returning from the payment gateway

Public LinkedIn profile url returns server Status code 999

410 gone error response from IIS

Correct HTTP status code when resource is available but not accessible because of permissions

Unauthorized error when surfing to files

Categories

Resources