How to curl or wget a web page? - http

I would like to make a nightly cron job that fetches my stackoverflow page and diffs it from the previous day's page, so I can see a change summary of my questions, answers, ranking, etc.
Unfortunately, I couldn't get the right set of cookies, etc, to make this work. Any ideas?
Also, when the beta is finished, will my status page be accessible without logging in?

Your status page is available now without logging in (click logout and try it). When the beta-cookie is disabled, there will be nothing between you and your status page.
For wget:
wget --no-cookies --header "Cookie: soba=(LookItUpYourself)" https://stackoverflow.com/users/30/myProfile.html

From Mark Harrison
And here's what works...
curl -s --cookie soba=. https://stackoverflow.com/users
And for wget:
wget --no-cookies --header "Cookie: soba=(LookItUpYourself)" https://stackoverflow.com/users/30/myProfile.html

Nice idea :)
I presume you've used wget's
--load-cookies (filename)
might help a little but it might be easier to use something like Mechanize (in Perl or python) to mimic a browser more fully to get a good spider.

I couldn't figure out how to get the cookies to work either, but I was able to get to my status page in my browser while I was logged out, so I assume this will work once stackoverflow goes public.
This is an interesting idea, but won't you also pick up diffs of the underlying html code? Do you have a strategy to avoid ending up with a diff of the html and not the actual content?

And here's what works...
curl -s --cookie soba=. http://stackoverflow.com/users

Related

What is my Gitlab domain for making API calls?

I feel reaaally silly for asking this but how do I know what my Gitlab domain is when all my projects are on Gitlab.com? Lets say, I want to do an API call to get all my projects - which is done like this according to the docs:
curl --header "PRIVATE-TOKEN: XXXX" "https://gitlab.example.com/api/v4/projects"
I tried replacing example with my username and also tried specifying the target as https://gitlab.com/username/api/v4/projects but this doesn't work.
Any help is greatly appreciated.
Its actually just:
https://gitlab.com/api/v4/projects
Your private token will be used to figure out who you are and what projects you can access.

Run a cURL command in a browser

How do i run a cURL command like this
curl -X GET http://www.in.com/
I am using windows 8.
How do i run this inside a browser say using NETWORK tab/or any other tab
I know that there is an option in Network tab to copy as curl command ,but i want to execute it right over there in firefox not in cmd /terminal.
What do i do to modify that command and execute it in the network tab over there only.
Yes , i am not asking for places like http://onlinecurl.com/ which allow you to execute curl commands online
Is it possible ?
I have tried FIrebug in firefox but it according to my research does not have this option.
If yes please tell me how .
Thanx in advance!
Why not just put the URL in the address field for the browser? The browser is going to do a "GET" request on this URL and return the results. This is what curl is doing. If you are trying to do a "PUT" or a "POST" request, then you would need to do something different, but for "GET", it should just work.

CURL command not working with simple HTTP GET but browser does

I tried to fetch the data from https://m.jetstar.com/Ink.API/api/flightAvailability?LocaleKey=en_AU&ChildPaxCount=0&DepartureDate=2016-03-21T00%3A00%3A00&ModeSaleCode=&Destination=NGO&CurrencyCode=TWD&AdultPaxCount=1&ReturnDate=&InfantPaxCount=0&Origin=TPE
it couldn't be done by curl -vv https://m.jetstar.com/Ink.API/api/flightAvailability?LocaleKey=en_AU&ChildPaxCount=0&DepartureDate=2016-03-21T00%3A00%3A00&ModeSaleCode=&Destination=NGO&CurrencyCode=TWD&AdultPaxCount=1&ReturnDate=&InfantPaxCount=0&Origin=TPE it will return nothing,
However, browser can fetch whole data.
What's wrong with that?
It seems to me that "m.jetstar.com" is filtering requests that don't include the headers that a browser would send. Your curl statement needs to fully emulate a browser to get the data. One way to see what I'm saying is to open developer tools in Google Chrome, select the network tab, run the URL in the browser then goto to the row indicating the call and right click, then copy the request as a curl statement, then paste it to a notepad and you'll see all the additional headers you need. Additionally, that curl statement should work.
check if you have set any HTTP_REQUEST variable for proxy settings. Verify by calling curl command in verbose mode. curl -v
I had setup a variable earlier and when I check the curl output in verbose mode it told me that it was going to proxy address. Once I deleted the HTTP_REQUEST variable from advanced system settings, it started working. Hope it helps.

how to get the link names of a webpage using cURL command

I am using Postman, a google Chrome add on that makes cURL commands, and I make a GET command with a website url. My question goes with an example: on a website like google, if i type "stackoverflow" and search, I take this url and make my cURL command, how can I get the names of each link? Is that possible? By example, for this page there would be "Stack Overflow" ... "Stack Overflow - Wikipédia"...
I found out the answer by myself.
I installed Postman Interceptor, activated it and entered the website on google chrome. The interceptor sent all the requests done to get that website to my postman, so all i had to do is look at them and I found the information I looked for in one of them.

Record http form posts via a browser

I'm trying to automate the login to a website and submission of a form.
Is there a browser plugin (for firefox or Chrome) that allows you to record HTTP GET and POST requests in a form that allows them to be played back at a later point? I'm looking for something that will be possible to automate from a script e.g. via curl or wget.
I've tried using the Chrome developer tools to capture POST form data but I get errors when trying to replicate the request with wget which suggests I'm missing some cookies or other parameters. Ideally there would a nice automated way of doing this rather than doing lots of trial and error.
For a simple interaction, you don't really need a tool like Selenium that will record and playback requests.
You only need the tools you've already mentioned:
Chrome already comes with the Developer Tools that you need: use the Network tab. No plugin to download. I don't know if Safari will work -- I don't see a "Network" tab in its Developer Tools.
Both curl and wget support cookies and POST data, but I've only tried curl for automation.
There are several key steps that need to be done properly (this takes some experience):
The sequence of pages that are requested needs to model real user interaction. This is important because you have no idea exactly how the backend handles forms or authentication. This is where the Network tab of Chrome's Developer Tools comes in. (Note that there is "record" button that will prevent the clearing of the log.) When you prepare to log a real user interaction for your analysis, don't forget to clear your cookies at the beginning of each session.
You need to use all the proper options of curl and wget that will ensure that cookies and redirects are properly processed.
All POST form fields will likely need to be sent (you'll often see fields with nonce values to prevent CSRF
Here's a sample of 3 curl calls that I wrote for an automation script that I wrote to download broadband usage from my ISP:
curl \
--silent \
--location \
--user-agent "$USER_AGENT" \
--cookie-jar "$COOKIES_PATH.txt" \
'https://idp.optusnet.com.au/idp/optus/Authn/Service?spEntityID=https%3A%2F%2Fwww.optuszoo.com.au%2Fshibboleth&j_principal_type=ISP' >$USAGE_PATH-1.html 2>&1 && sleep 3 &&
# --location because the previous request returns with a series of redirects "302 Moved Temporarily" or "302 Found"
curl \
--silent \
--location \
--user-agent "$USER_AGENT" \
--cookie "$COOKIES_PATH.txt" \
--cookie-jar "$COOKIES_PATH.txt" \
--referer 'https://idp.optusnet.com.au/idp/optus/Authn/Service?spEntityID=https%3A%2F%2Fwww.optuszoo.com.au%2Fshibboleth&j_principal_type=ISP' \
--data "spEntityID=https://www.optuszoo.com.au/shibboleth&j_principal_type=ISP&j_username=$OPTUS_USERNAME&j_password=$OPTUS_PASSWORD&j_security_check=true" \
'https://idp.optusnet.com.au/idp/optus/Authn/Service' >$USAGE_PATH-2.html 2>&1 && sleep 1 &&
curl \
--silent \
--location \
--user-agent "$USER_AGENT" \
--cookie "$COOKIES_PATH.txt" \
--cookie-jar "$COOKIES_PATH.txt" \
--referer 'https://www.optuszoo.com.au/' \
'https://www.optuszoo.com.au//r/ffmu' >$USAGE_PATH-3.html 2>/dev/null
Note the careful use of --cookie-jar, --cookie, and --location. The sleeps, --user-agent, and --referer may not be necessary (the backend may not check) but they're simple enough that I include them to minimize the chance of errors.
In this example, I was lucky that there were no dynamic POST fields, e.g. anti-CSRF nonce fields, that I would have had to extract and pass on to a subsequent request. That's because this automation is for authentication. For automating other types of web interactions, after the user's already logged in, you're likely to run into more of these dynamically-generated fields.
Not exactly a browser plugin, but Fiddler can capture all the HTTP data passing back and forth; with FiddlerScript or FiddlerCore, it is then simple to export that into a text file - and pass that into cURL as request headers and request body.
In Firefox, turn on the Persist option in Firebug to be sure to capture the POST. Then install and use the "Bookmark POST" add-on to bookmark the POST request for later use.
Firefox Firebug already has a feature which allows you to copy a web request as a curl request, so you see all the various elements of the request on the command line.
Turn on the Firebug and right click on a request in the Net panel and pick Copy as cURL. Then use it in the curl
https://hacks.mozilla.org/2013/08/firebug-1-12-new-features/#copyAsCURL
Have you tried Selenium?
There are way too many methods for you to choose.
Use Firefox and selenium IDE. It can record your browser action
User selenium Web Driver. It can simulate different browser action by the script you write in Ruby or Java.
Use a macro plugin for Firefox to simulate absolute clicks and keypresses.
Use a OS level macro application and do the same as 3.
Write a script (such as PHP) to simulate the actual form post or cookie interations.
No.1 is common and easy to use.
No.4 can be powerful but you need time to polish the automation.
No.3 is in the middle of No.4 and No.1.
No.2 can be a tool for environment test and stress test also.
No.5 is seeming the most flexible and resource saving.
Request Maker chrome plugin does that.
https://chrome.google.com/webstore/detail/request-maker/kajfghlhfkcocafkcjlajldicbikpgnp?hl=en
The Safari developer tools and Firebug are sufficient for your needs.
Recently I cam across this beautiful chrome extension which does what you ask:
Katalon Recorder
Katalon Recorder will make your test automation work a lot easier.
Record, play, debug with speed control, pause/resume, breakpoints capabilities.
Enjoy fastest execution speed compared to other extensions with Selenium 3 core engine.
Make use of multiple locator types including XPath & CSS.
Use original Selenium IDE commands (Selenese), plus block statement if...elseIf...else...endIf and while...endWhile. Testing file input control is supported.
Import test data from CSV files for data-driven testing.
Report easily with logs, screenshots capturing, with historical data and analytics from Katalon Analytics.
Compose & organize test cases in suites. Never get your work lost with autosave feature.
Import original Selenium IDE (Firefox extension) tests.
Export to Selenium WebDriver scripts in these frameworks: C# (MSTest and NUnit), Java (TestNG and JUnit), Ruby (RSpec), Python (unittest), Groovy (Katalon Studio), Robot Framework, and XML.

Resources