I'm trying to crawl this page
But the page is stuck on loading.
I'm using
Scrapy==1.5.0
scrapy-splash==0.7.2
I even changed the wait time to 7 seconds. but the problem persists. Any solution?
Try using the wait_for_element lua script to ensure the html you're after is loaded. It might take longer than 7 seconds, but at least it will have the element loaded without needing a fixed timeout. Splash can take longer than a regular web browser to load the page.
https://github.com/scrapinghub/splash/blob/master/splash/examples/wait-for-element.lua
Related
community!
I have a small problem with the loading of my website done in Wordpress.
When I enter the site, all the elements are displayed in a disorderly way for about 3 or 4 seconds.
After that short time, all the elements are loaded and sorted correctly.
I have tried to use plugins to optimize the load, clear cache and minimize css and javascript. However, the problem continues.
What action can I take on the site to fix this faulty and slow load? Thank you very much.
This is my site: https://www.tecnobreak.com/
Various plugins such as WP-Rocket and WP-Optimize.
I would suggest adding a loading screen. This will be the easiest route.
Alternatively you will need to priorities what needs to be loaded first on page load.
I assume you have optimized your images and made sure everything is up to date.
I am working on this website. I used firebug along with DreamWeaver to create it. The website does not load fine when the page is opened(the layout is not as I wanted and I created it.) but when I refresh the page, it becomes exactly as I wanted it to be. Now I am stuck with it. I have never been into such a situation before. Please help me out how can I make the website load properly at the first instant. Thanks
It could be that the connection was slow the first time you loaded it. So the layout took longer. Next time you loaded it, everything was in the cache and the layout was quick!
So your low bandwidth (HTML) version of the site loads a 7.46 MB animated gif? Wow... I would get rid of that animated gif! Makes your site look choppy and unprofessional! That is surely a big part of your problem.
It looks exactly the same to me both before and after refresh. Hard-reset your cache, make sure you're not on a destructive proxy, and try using your browser in private mode.
Can anyone explain me why those spaces (marked with ?) are there? They are delaying the page loading. I thought it could be the page/script parsing time, but ~350ms looks too much for a simple page; Okay, there're lots of script, but it still looks to much.
What can it be?
My guess is that it is a JavaScript loading issue. You should be deffering loading of JavaScript using a defer attribute. This will allow the page to load before it will execute the JavaScript code.
This is because browsers are single threaded and when they encounter a script tag, they halt any other processes until they download and parse the script. By including scripts at the end, you allow the browser to download and render all page elements, style sheets and images without any unnecessary delay. Also, if the browser renders the page before executing any script, you know that all page elements are already available to retrieve.
See http://www.hunlock.com/blogs/Deferred_Javascript and http://blog.fedecarg.com/2011/07/12/javascript-asynchronous-script-loading-and-lazy-loading/
Is your CSS in the header section?
Else your browser might wait quite long before attempting to load the resources.
Second guess would be that your JavaScript is blocking the page load for whatever reason. Is there any DOM manipulation right after load? Also, is your JavaScript located at the bottom of your page, loaded last? Else this could potentially block loading.
This seems rather a common problem, however I can't find any reliable sources on this.
Once in a while Chrome will display a stylesheet-less version of page for like 2-3 seconds and soon after the page is displayed correctly. It can affect the very same page once in every 20-50 refresh and its not tied to a specific site. Happens all over the place. There are some threads about this here and there, but I have yet to find a full explanation.
Is this a bug? Feature? Is there a way to prevent Chrome from behaving like this on the client or perhaps server side?
In my experience, this happens when the network connection is poor and the page is (necessarily) loading slowly. The page's HTML will render first, and other assets called for within that HTML (like stylesheets or images) are rendered only after their calls are complete and their respective files load.
I have noticed this as well. It's definitely a bug. It seems to be this issue:
http://code.google.com/p/chromium/issues/detail?id=75761
You can "force" the stylesheet to load by opening the inspector (ctrl+shift+i).
shift + f5 should reload the page and the referenced stylesheets
With a normal reload it will only reload the page itself, and incorrectly assume that the stylesheets in the cache (the ones that never loaded in the first place) are correct.
This is basically a continuation of a question of mine from yesterday,
"Foregoing intialization on a page"
(And btw, kudos to all who give selflessly in this forum to help others - need to do more of that myself.)
So anyway, I was told about HistoryManager, BrowserManager and SharedObject, and so quickly ascertained that its no problem to store a few data items in a shared object so a flex page restores the previous configuration when the browser navigates back to it.
But my real concern would be speed of loading. Its a 15mb page and it only takes 2 seconds to load, but that's still not instantaneous. If it were in a tabbed browser and I just clicked on another tab containing my page, my page would then appear instantaneously. Is there any way to achieve that behavior when my page is navigated back to (via the browser back button for example.) Would that mean that the entire 15mb flex web page would have to be stored in memory.
Thanks.
Here's what I'm thinking, you're going the wrong way about this,(unless I missed the boad on what you want to acheive) what you need to do is work with javascript to interact with the browsers url. Thjis is assuming that you want to be able to go back on a page without reloading content.
Basicaly a java script would override the reloading, and when you hit back, the page doesn't reload, but the javascript notifies the flash what change in has occurred.
Have a look at the gaya framework for how they do it
or lookat http://www.robertpenner.com/experiments/backbutton/backbutton.html