How to access https page via wget or curl? - unix

Let's say I want to save the contents of my Facebook page. Obviously fb uses https, thus ssl, how do I download the contents of a secure page using wget?
I found a lot of sources online... and I modify my command, but it doesn't save the page I want.
wget --secure-protocol=auto "https://www.facebook.com/USERNAMEHERE" -O index.html
Actually this is the result I'm getting in index.html:
"Update Your Browser
You’re using a web browser that isn’t supported by Facebook.
To get a better experience, go to one of these sites and get the latest version of your preferred browser:"

The problem is not the SSL / https. The problem is the fact that facebook sees "wget" as the agent and tells "update your browser".
You have to fool facebook with the --user-agent switch and imitate a modern browser.
wget --user-agent="Mozilla/5.0 (Windows NT 5.2; rv:2.0.1) Gecko/20100101 Firefox/4.0.1" https://facebook.com/USERNAME -O index.html
and then you will see the actual facebook page if you open index.html in a modern browser.

Related

Simulating a crawler on my website

I need to debug my web app which is written by asp.net to find out how it is acting when rendering the content for the crawlers like Googlebot. The first thing I found was some online/offline tools but none of them can pass the Request.Browser.IsCrawler flag.
Then I tried to simulate a handmade request adding the Googlebot UserAgent but still no chance.
I used Telerik Fidler and Chrome while setting User-Agent to Googlebot/2.1 (+http://www.googlebot.com/bot.html), including _escaped_fragment_ in the URI and successfully saw the page from crawler perspective.

how to login to a website and download a file in UNIX AIX 6.1

I need to login to a website which requires credentials. After that I need to click on a link (link url changes everyday but there will be only one link in the page). Once link is clicked file will be downloaded. I achieved this using selenium in windows not sure how to do it in UNIX. After my search in google I understood we can use wget or curl but we need to specify directly the url which I don't know before login.
Here is a "hand wavey" answer.
Use curl or wget to get the first page. It will be a text file.
Use tools like sed and grep to pull out the URL. Then use curl / wget to get the target file. If it is a simple page with one link, this should be rather easy to do.
You can get curl / wget from perzl.

Visual Website Optimizer A/B Testing On Local Machine

So I have my app running on my local server, I am trying to a/b test the feature with Visual Website Optimizer.
I put the server address(publicly not available) in the preview URL, but when I open the preview page, it gives a warning saying "Error: Cookie could not be set.". Cookies are enabled.
My question is, should the preview page (or default campaign url) be publicly reachable for VWO to work properly?
In order to run the VWO on localhost, you just need to make a small adjustment to the /etc/hosts file. Just add "localhost.com" at the end of the line that says "127.0.0.1 localhost"
Now you can run the VWO on any URL that starts with http://localhost.com.
The URL of the page should be publicly reachable only for Screenshots of the variations. Live previews should work even if the URL is accessible only locally. Though you should use a valid URL i.e. a URL having a valid domain name and TLD.
For any other info / debugging I would request you to send a mail to support (at) wingify (dot) com if not already done so that an exact solution can be shared.
This is probably because you are opening your website as an html file in your file system. You can host your static website locally with HarpJS. Install it through npm
npm install -g harp
Go to the root dir of your site and run harp server. Now you'll be able to open your site on http://localhost:9000
Also I have written an article how to easier develop experiments for Optimizely/VWO usage http://bit.ly/1C4vtSA

Wordpress : To load all asset files coming from HTTP to HTTPS?

I'm using Wordpress HTTPS plugin to force Admin mode to run under HTTPS.Its fine for Admin Panel.
But still, once i'm under HTTPS mode, every front pages are broken because of, it is saying some front-pages Asset Files are coming as normal HTTP (without 'S') which are then getting blocked to load onto page.
Than resulted in rendering the page looking messy.
So to be more clear again,
When i call the site in HTTPS / SSL mode .. some asset files, like:
http://www.my-another-site.com/something.js
http://www.my-another-site.com/something.css
http://www.my-another-site.com/something.jpg
... etc
.. are BROKEN. (Because i'm in https mode and those above files are coming as http)
So how to make Wordpress to FORCE LOAD those whatever files?(I DON'T CARE WHETHER IT IS SECURE OR NOT. Just want the site under https://... to be rendering properly.)
You could try using a protocol relative URL (dropping both the http and https from the URLs) - see this answer.
According to this answer you'll need to be on a recent version of WordPress (I'd assume 3.5) for it to work with wp_enqueue_script.

HTML5 audio with a HTTP 302 redirect in Chrome

I am trying to write an HTML 5 based last.fm player using the popular jPlayer jQuery plugin (http://jplayer.org).
The player works fine in Firefox. However I ran into a problem:
From the last.fm API (http://last.fm/api) I get a playlist with urls to the files. When requesting one of these, last.fm does a HTTP 302 redirect from play.last.fm to something like "http://s03.last.fm/someurl/128.mp3".
It looks like there is some same origin policy for html 5 tags, because jPlayer is unable to play the file in Chrome and Chromium. If jPlayer uses the flash solution (using "flash, html" instead of "html, flash"), everything works fine.
I installed the extra codecs on my Ubuntu and mp3 playback works nicely for the jPlayer demos.
HEAD requests are not supported by the streaming servers. I already tried to do a normal GET request and then tried to get the "Location" header of the xmlhttprequest, but it fails with a security error.
You can find the sources of my (proof of concept) project at https://github.com/tburny/html5-lastfm-player
Is there any hint/solution to this problem?
i had a similar problem but only on android browser. there are lots of gotchas. the key question is if either the original url which gives 302 and the end one is https? if so it'll fail.
check out this test suite http://areweplayingyet.org/

Resources