DOM structure in Scrapy is different from what's in a browser - web-scraping

I'm trying to learn scraping web content with Scrapy and came across an issue that I'm unable to comprehend. I can't understand why it's happening; I'm able to select a DOM element with any browser's dev tools, but when I try to select the very same element from Scrapy an empty list is returned. Here you can see that this xpath //*[#class='lSPager lSGallery']/li/a/img[#src] works fine in a browser , but returns nothing when it's called from Scrapy:
Moreover Scrapy completely doesn't see the lSPager class which can be easily selected in a browser

OK, I figured it out just right after I wrote the question. The problem was that I had JavaScript enabled in the browser. I disabled JS and DOM structure in the browser became the same as in the Scrapy.

Related

CSS gets messed up after Ajax page load

I have a website in wordpress. I recently download a plugin called Advanced Ajax Page Loader. It refreshes you content when clicked on other page without refreshing the whole site(header, footer). I tried to get my answer from plugins developer and wordpress support forum, but none responded.
I read that if ajax jquery call is used then all scripts should be reloaded again, for that the plugin have a place where I should put those codes. Until that everything works correctly, except one thin. When I go from a category to category, everything works fine, but when I open a single Post it completely screws up all my css for that page, when I refresh it, everything looks fine but then again, if I open one of the big categories with many posts, then that pages css is messed up.
I though that I could somehow refresh whole css by putting some code in the "Reload code" box, but I have no idea how to do that using scripts. English isn't my native language, therefore I'm having difficulty finding my answer on google, I tried, but my vocabulary is limited. How can I do it?
are you adding CSS classes to your elements via Javascript? If so, then the styles you add will only affect those elements which are part of the DOM at that point in time, so you might be experiencing a race condition, that actually happens to work in Chrome and Safari, but not Firefox.
second try to validate your markup and CSS and see if you have any error in your css syntax ?

Where does Firebug report that a CSS specified background-image failed to load?

I've been clicking through Firebug's interface and can clearly see under the "Net" tab when requests return fail responses; however, I'm certain that a url in a CSS file is incorrect and the failure to load isn't reported here. I've got a lot of developers working on CSS files, so I'm not familiar with all of the resources being loaded at any particular time. I wondered if there was a very apparent way to see when a CSS specified image failed to load.
The best approach I have so far is to click on the CSS tab, select each CSS, and then roll my mouse pointer over background-image selectors' urls. On the roll-over, Firebug will report in the roll-over tooltip that the image failed to load.
Is there a better way?
Try the Firefox "Live HTTP Headers" addon. It shows you the status codes and headers of all requests and responses (or files matching a filter expression if desired).

Get CSS computed style during web crawl

Is there a way for me to get an element's computed style from a page source? Or, if not from the page source, some other way? I want to be able to go to a web page and then get all the computed styles (via my code; I'm not talking about opening a browser tab and clicking Inspect element). Right now I'm using Python BeautifulSoup to get and traverse the document. This gets me all the elements and their attributes, but not the css styles. Ideally this would be with Python, but I'm open to using other languages.
(Sorry, if this has been answered before. I looked at several questions and they all seemed to have to do with getting the info either from "inspect element" or from your own personal page using javascript.)
I'm using PhantomJS. I inject a JavaScript script into the page that runs getComputedStyle.
You can look for CssParsers like the following
http://www.modeltext.com/css/index.html
http://www.codeproject.com/KB/recipes/CSSParser.aspx

CSS seems to be not working in a subdomain?

I developed a website using code igniter, styled it with CSS, locally it works fine but online it looks like css is not loaded it picks up the old css style. I checked the link but it's correct. What gives?
Without more information (such as seeing the site in question), I can't give you a direct answer, but I can give you some pointers.
My suggestion is to use a tool like Firebug (in Firefox) or Chrome's Developer Tools, etc. These tools allow you to see full details of all requests being made by the browser.
(the exact instructions will differ according to the tool you're using, so I'll assume Firebug for simplicity).
Open your page in the browser, with Firebug open, and look at Firebug's "Net" tab (And make sure that the option below the tab is set to "All"). This will list all requests that are made by the browser.
The key thing for you is to look for any 404 errors. Since you say your CSS isn't working, it's a pretty good bet that your stylesheets are failing to load. The 404 errors listed in Firebug will show you why they're failing to load.
If you hover over the filenames, Firebug will expand it to show you the full URL that it attempted to load. This will almost certainly show you that you've got something wrong in your configuration, and it's trying to load the stylesheets (and possibly other files too) from the wrong location. This should show you what's going wrong and give you a enough clues to be able to work out how to fix it.
Hope this helps you solve the problem.

How to change the included CSS files on button click?

I would like to implement the application where user can include the different CSS files when clicked on different buttons. Please let me know how this can be achieved. I don't want to use the theme feature.
I am trying to change the CSS but I have noticed the ungly behaviour as follows:
When using mozilla i see the source
code for page i see code for latest
CSS.
But its not getting downloaded/ tried using the tamper data request to download CSS is not getting sent.
When I inspect the elements style is still the old file
Any idea what could be causing this? Please let me know how to get this working. Desperately looking for a solution.
Can this be done nicely using the ScriptManager control ?
To change styles on the client-side, you need to programmably change the reference to the stylesheet, which would work. However, you wouldn't see this changed in the view source... view source isn't a running document, inspecting all the changes made by JavaScript... so that can be a pain.
Firebug is pretty good, but again, even with Firefox/FireBug, IE dev tools, certain things don't get updated, depending on what you are doing.
So did you write some code and you are not seeing the changes directly, or you see the changes but you can't verify them?
HTH.
In this case I would use xmlHttpRequest with GET verb in order to obtain the needed CSS file from a dedicated handler. Pass the name of the style sheet that you need to request as a query string argument. I suggest that you fire the request dynamically, on click of the button who should download the respective CSS file.

Resources