http user agent to avoid cookie warning (youtube) - http

I wrote my own RSS reader and it is not able to discover feeds on youtube. The reason is that youtube redirects to a separate page for their cookie warning, and the rss discovery url is not on that page. I tested it with curl and the redirect did not happen. I then changed the user agent in my application to be curl/7.8.6.0 and now my app can handle youtube discovery. So the problem is solved, but this feels like a bad workaround. Best practices for a user agent string is
appname/version framework/version (comment)
or something similar, which is what I was using before. Now I am forced to lie to every website I visit about what application I am using, just to make it work for one site. I could write a special bespoke discovery function just for youtube and change the url to youtube-nocookie or change the user agent just for that request, but that is extra work, and extra processing cycles on the user's machine for every discovery attempt.
Is there a standard way to inform a website that the request is coming from a piece of code and not a human? How do webcrawlers deal with this?
I am using Godot Mono as the framework for the application, you can find the full source here, this question relates to the GetHtml() function in the DiscoveryService which is a standard dotnet HttpClient call:
HttpResponseMessage response = await this.http.GetAsync(url);
if (!response.IsSuccessStatusCode) return null;
using HttpContent content = response.Content;
return await content.ReadAsStringAsync();
The HttpClient is instantiated in the constructor and the User agent string is set there too.
If the user agent is set to anything other than curl/someversion youtube does not return the requested page and instead returns a page asking the user whether it is ok for them to save a bunch of cookies.
The issue can be reproduced easily enough with curl, just do
curl <some youtube channel url> | grep rss+xml
and
curl -A "porifera/1.0.0.0" <some youtube channel url> | grep rss+xml
porifera is the name of the application I am developing, you can write pretty much anything there it seems

Related

How to get items from headers by learning from initiators and using request python?

I am trying to get the fingerprint as can be seen from this snapshot.
I tried searching for the fingerprint but it's not in the response or cookies. I am wondering how this fingerprintjs works so that I can imitate and return the fingerprint item.
The website is https://alfagift.id/
When you take a look into network, especially categories, there's a preflight and an xhr where it is initiated by https://alfagift.id/_nuxt/ca268e7.js
I've tried doing a requests
resp=requests.get(" https://alfagift.id/")
resp.cookies
nothing seems to be returning the fingerprint that's needed.
Can anyone show me how you can get the fingerprint?
This file's rendering and executing the fingerprinting script on the client side: https://alfagift.id/_nuxt/f9d159c.js
Proof:
__fpjs_d_m||Math.random()>=.001))try{var t=new XMLHttpRequest;t.open("get","https://m1.openfpcdn.io/fingerprintjs/v3.3.3/npm-monitoring",!0),t.send()}catch(t){console.error(t)}}(),[4,vt(r)];case 1:return t.sent(),[2,gt(L(ft,{debug:n},
Used library: https://github.com/fingerprintjs/fingerprintjs

Making an HTTP request with a blank user agent

I'm troubleshooting an issue that I think may be related to request filtering. Specifically, it seems every connection to a site made with a blank user agent string is being shown a 403 error. I can generate other 403 errors on the server doing things like trying to browse a directory with no default document while directory browsing is turned off. I can also generate a 403 error by using a tool like Modify Headers for Google Chrome (Google Chrome extension) to set my user agent string to the Baidu spider string which I know has been blocked.
What I can't seem to do is generate a request with a BLANK user agent string to try that. The extensions I've looked at require something in that field. Is there a tool or method I can use to make a GET or POST request to a website with a blank user agent string?
I recommend trying a CLI tool like cURL or a UI tool like Postman. You can carefully craft each header, parameter and value that you place in your HTTP request and trace fully the end to end request-response result.
This example straight from the cURL docs on User Agents shows you how you can play around with setting the user agent via cli.
curl --user-agent "Mozilla/4.73 [en] (X11; U; Linux 2.2.15 i686)" [URL]
In postman its just as easy, just tinker with the headers and params as needed. You can also click the "code" link on the right hand side and view as HTTP when you want to see the resulting request.
You can also use a heap of hther HTTP tools such as Paw and Insomnia, all of which are quite well suited to your task at hand.
One last tip - in your chrome debugging tools, you can right click the specific request from the network tab and copy it as cURL. You can then paste your cURL command and modify as needed. In Postman you can import a request and past from raw text and Postman will interpret the cURL command for you which is particularly handy.

Connection to a site made by Http Post - how to check if connection is still alive?

I'm developing a Scraping app to extract some information from a sit. To get that information I have to be logged in to that site.
So I use Http post and pass the data needed for login using FormData and log in successfully, so I can browse the private content of that site.
My question Is: "How can I tell if the user is logged in?". What is the simple way to do that using session cookies or something like that?
I'm currently checking the connection by sending an Http Get Request to a Url that I know is available to registered users.
So before I try to login again, I use this method "isLoggedIn" to check the connection. But it is not perfect, I mean, it seems a kind o tricky and not the best way to do that.
Currently, I'm using Dio - a Lib to make Http Request in Dart. But I think it's a general Http matter.
Just to register...
I solve that after checking the difference between a 'logged' and a 'not logged in' response. In my specify case, when I did a get request to the login page, it answers with a response that has a 'CUSTOMER_AUTH' cookie setted with a random String, otherwise, this cookie is not present.
So I just check if this cookie is present and if it has a valid value.

Invalid HTTP Request - linkedin.com/oauth/v2/authorization

Suddenly linkedin oauth2 stopped working! As per instructions found here:
https://developer.linkedin.com/docs/oauth2
When invoking this:
https://www.linkedin.com/oauth/v2/authorization?response_type=code&client_id=75jdo0an3ktnbx&redirect_uri=https://app.myapp.com/account/linkedin_login&state=fregfdgfasd&scope=r_basicprofile%20r_emailaddress
Instead of a valid response I get a 400 error:
LinkedIn
Invalid HTTP Request
Could not process this client request HTTP method request for URL. Please double-check the URL (address) you used, or contact us if you feel you have reached this page in error.
I am experiencing the same problem using Chrome, but not with Edge or Firefox. Contacted LI, reply was we are working on it, no estimate of when we will solve it. The new profile update seems to be botched in Chrome, OK with Edge and still not updated to the new look if using Firefox.
Linkedin has problems far deeper than poor coding, they forgot the meaning of being social in networking, the site is becoming a pile of stale resumes, non-existent debates and bad quality networking.
I am not OAuth fluent enough to tell you why, but they have 2 different systems: oAuth and oAuth legacy.
I personaly couldn't find a way to retrieve a valid token from OAuth but yes from OAuth legacy. The main difference is the URL and the authorization window.
You are actually using : https://www.linkedin.com/oauth/v2 for you api calls.
OAuth legacy is using https://www.linkedin.com/uas/oauth2.
The whole process is the same so you won't have to change your code, just the URL.
see OAuth legacy doc: linkedin.com/docs/oauth2-legacy
The bad side is the authorization window, the user has to literaly login (email + password) before clicking on the 'Authorized' button and being redirected to your callback URL.
I am agree, this website has something buggy. When visited from France (browser language set to FR-fr and an IP geolocalised in France), their whole interface is written in Dutch ...
Anyway, i hope it helps

What will the RightSignature API send to my callback URL when a signer signs a document

When I send a one-off document to RightSignature via their API, I'm specifying a callback location in the XML document as specified in RightSignature's schema definition. I then get a signer-link value back from their API for the document. I display the HTML response from the signer-link URL in an iFrame on our website. When our user signs the document in this iFrame, which is rendering the responses from their website, I want their website to post to our callback location.
Can I do this with the RightSignature API and does it make sense?
So far, I'm only getting content in the iFrame that indicates that the signing was successful. The callback location does not seem to be getting called.
I got it solved just now. Basically, i was doing two things wrong first you have to go in RightSignature Account and set it there the CallBack url
Account > Settings > Advanced Settings
But the thing which RS is unable to mention to us that this url can not be of localhost, but it should be of https i mean like Live URL of your site like
https://stagingmysite.azurewebsites.net/User/CallBackFunction
And then in your CallBack just write these two lines and you will receive complete XML which would have the GUID and document status as well.
byte[] data = Request.BinaryRead(Request.TotalBytes);
string callBackXML = System.Text.Encoding.UTF8.GetString(data);
I found the answer with some help from the API team at RightSignature. I was using callback_location but what I really wanted is redirect_location. Their online documentation was difficult to follow and did not clearly point out the difference.
I got this working after a lot of trial and error.

Resources