I'm using jsoup. And I want to get the content of only this url. I mean, I want only the contents of Incidents tab only; not other page contents. But now the content I'm getting is the content of whole page other than this tab. I tried to follow even this but not still succeeded. Looking for some kind of even better help if someone have knowledge about Jsoup.
Thanks
This should fetch you the required results
Document document = Jsoup.connect(
"http://131940.qld.gov.au/Road-Conditions.aspx?tab=incident#incidents")
.userAgent(
"Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.110 Safari/537.36"
).timeout(0) .followRedirects(true) .execute().parse();
System.out.println(document.select("#incidents").html());
Related
Assume the div contains a name of a product:
<div _ngcontent-serverapp-c225 class="shelfProductTile-content">
in scrapy using response.css('div.shelfProductTile-content') returns an empty list, how do you overcome this issue?
Edit: It was claimed that Javascript web content like AngularJs and react can't be obtained by Scrapy, and it is recommended to use a tool such as Splash or Selenium. That's true, but this was not the case with my example, I tried both of these tools but didn't solve the issue. The problem was with the user-agent that should be changed. please check accepted answer below.
Thanks to all who helped.
The following Code should match your element(s):
response.xpath("//div[#class='shelfProductTile-content']")
I changed the user agent in the settings file it solved the issue:
USER_AGENT = 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'
Through inspect element on web page, i am able to properly see the link for anchor tag like, but when i try to get it through soup, it gives me the result as . I tried lxml and html5lib but couldn't find any solution.
I had a similar issue, some chunks of the html page that I was scraping were not loaded correctly. I ended up scraping using PhantomJS via Selenium. Here's an example. And another one.
There's also dryscape , which I've never used, but might do the trick.
I was able to get the href by specifying a User-Agent in the headers. Site may be designed to give different response to various browsers. It is better to use a User-Agent similar to the browser which you used to inspect the page.
import requests
from bs4 import BeautifulSoup
url='https://co.jim-hogg.tx.us/index.php/bids/278-solid-waste-resedential-collection-disposal-bids'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
r = requests.get(url, headers=headers)
soup=BeautifulSoup(r.text,'html.parser')
print(soup.find("div",{"itemprop":"articleBody"}).a['href'])
Output
http://www.jimhoggcounty.net/files/BIDS/Notice%20for%20bids%20on%20Solid%20Waste%20Residential%20%26%20CommercialCollection.pdf
Note:
My region was blocked by the site, so i had to use a proxy to get a response. I have removed that additional code.
I solved this. Look at the selected answer below!
I've been fighting to find what causes the WebView of OSX Cocoa application to act differently from Safari. It turns out that the user-agent is different (sort of obvious?) and the website I'm visiting does not know how to handle that.
Surprisingly, it's https://messenger.com (facebook chat).
The problem is that it doesn't display the picture on the screen. It does load, but it doesn't actually display. Take a look at this..
If you look at the area that I numbered as '2' you just see empty space. I didn't censor that out. It's just empty.
So here's my original question link: Simple Swift Cocoa app with WebKit: Upload picture doesn't work
I solved the first issue (thanks to the answers :D), but second issue persists.
Shared picture does not show up - I labeled as 2 in the picture.
again, from other browsers or released apps, it shows the pictures that I shared with participants like below. (of course I censored the pictures)
To debug this, I opened the Inspect Element and I found this out.
<body class=" webkit-legacy webkit mac x1 body_textalign Locale_en_US _z4_" dir="ltr">
When I did load the exact same page from Safari, I'd see this:
<body class="safari webkit mac x1 body_textalign Locale_en_US _z4_" dir="ltr">
So I decided to replace that line from WebView of my app, and viola! it works! so...
TLDR: How do I make this work every time I load the view?
I tried to find some methods to set up my user agent to Safari, but I can't get this to work. Any suggestion please?
I've created simple sample application and fixed this issue. I've provided same user agent, as in Safari. After that, Shared Photos work as expected.
- (void)applicationDidFinishLaunching:(NSNotification *)aNotification {
self.webView.customUserAgent = #"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/601.2.7 (KHTML, like Gecko) Version/9.0.1 Safari/601.2.7";
self.webView.mainFrameURL = #"https://messenger.com";
}
After that, I compare user agents:
Original WebView user agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/600.8.9 (KHTML, like Gecko)
Safari user agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/601.2.7 (KHTML, like Gecko) Version/9.0.1 Safari/601.2.7
So, looks like I need to change only last part of User Agent. Like that:
- (void)applicationDidFinishLaunching:(NSNotification *)aNotification {
self.webView.applicationNameForUserAgent = #"Safari/601.2.7";
self.webView.mainFrameURL = #"https://messenger.com";
}
With this code, Shared Photos also works. Unfortunately, it doesn't work with other applicationNameForUserAgent values I've tried.
As promised, here's the answer.
I added this function in the ViewController class.
func webView(sender: WebView!, didFinishLoadForFrame frame: WebFrame!) {
webView.stringByEvaluatingJavaScriptFromString("document.body.className = 'safari webkit mac x1 Locale_en_US _z4_';")
}
Pretty simple, but it's working perfectly fine.
I still have some problems with the app itself (e.g. Download image button doesn't trigger file explorer dialog), but basic functions like
send/receive messages
view shared pictures
work just fine.
I'm planning on working on this for
download image
notification
If you are interested in helping me out, please comment!
Thanks all for your help :)
Does anyone know how selecting the Desktop View option in a mobile browser affects CSS media queries and Javascript?
I'm making a site which is meant to be for mobile users only. When I select "Desktop View" when using the stock Android browser, it breaks the site. I want to effectively ignore the desktop view setting.
Thanks
I believe this option only change the user-agent.
The 'Desktop View' in the stock android browser simply changes the user-agent (UA) string sent to the server. Without the option selected a typical UA string would look similar to the following:
Mozilla/5.0 (Linux; U; Android {OS Version}; en-gb; {Device Model Information}) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30
With the 'Desktop View' option enabled this changes to something similar to:
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/534.24 (KHTML, like Gecko)
Chrome/11.0.696.34 Safari/534.24
As you can see the first example explicitly includes the word Mobile, it also informs us that it is an Android operating system. The second example mimics the user-agent string of a 'full-browser' critically by omitting the word Mobile.
More importantly, to specifically answer the original question in two parts;
How does this option affect CSS media queries?
It doesn't. CSS media queries themselves are unaffected by the UA string sent to the server. Typically when we talk about media queries we're making reference to the screen size / resolution / orientation of the device, all of which are unaffected by the 'Desktop View' option. The CSS behavior of the site would only be altered if the server were to provide a different response based on the UA string provided.
How does this option affect javascript?
This affects javascript if you (or a third party library you use) make use of the Navigator object to modify behavior based on the browser version.
It doesn't necessarily change the user agent, however it will set a session variable for the website for that user, so if they make a request it will skip the user agent check and simply return the desktop website.
Example Code:
<?php
//See if the browser is a mobile browser
// If it's mobile browser see if the user prefers desktop version
// IF the user does not the foward to mobile website.
$browser = $_SERVER['HTTP_USER_AGENT'];
if(preg_match('/iPhone/i',$browser)) { $iphone = "true"; }
if(preg_match('/Android/i',$browser)) { $android = "true"; }
if($iphone == true || $android == true) {
session_start();
if($_SESSION['version'] != "desktop") {
header("location: http://tuts.pinehead.tv/jqmobile/mobile.php");
}
}
I have .css files with relative references to images like this:
BODY
{
BACKGROUND: url(bg.gif);
}
where the bg.gif file is located in the same folder as the .css file. This seems to work fine in my testing but I notice some errors in my logs that indicate that some browsers are trying to find bg.gif in the same folder as the .html page that refers to the .css file, not the folder where the .css file is.
Here's an example of one such HTTP_USER_AGENT: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.7 (KHTML, like Gecko) Chrome/16.0.912.63 Safari/535.7
I have tried searching this online and find conflicting information.
As far as my knowledge goes, none of the modern browsers have any such problems. Relative paths are very well supported.
I've never heard of anything having this problem...I'd suggest tying to link them with the full relative path, or perhaps placing them into a images folder and linking through that (sort of like a reset, 2nd chance).
that can be something to do with base url for the page. You might want to specify it so the browser knows how to access files.
Add base element to head (as below) section and let me know
<head>
<base href="http://mysite.com/images/" target="_blank" />
</head>
Hope it helps.