Google Doodles archive scraping - web-scraping

I am trying to download all (about 2000) of the Google Doodles (not .gif or interactive ones) from their archive, however I have been having no luck in doing so.
I have tried many scraping commands for terminal, and a few packages from Github, such as ImageScraper. Through each of the different ways I have tried, I found that either:
the process just gets stopped after about 100 images using this chrome extension
it downloads only some sample images (most command-line tools did this)
just downloads a couple of images and did not keep going
The archive itself is not indexed from what I can see and each image has its own name, which also makes it harder. To keep viewing more images, the page also needs to keep being loaded as you scroll further down (causing the issue of only downloading a few images). I am also worried that the connection or something may be being cut-off so that these cannot be all downloaded at once, probably to not cause overload (not sure how to fix this though).
I would very much appreciate anyone's help, who has experience in retrieving/scraping such images, given the mentioned problems.

Related

Why does updating my file in GDrive break iFrames and is treated as new file rather than a revision?

I am using the Google Drive iframe/embed feature to display .pdf menus for a brewery client. They update the tap-list often so it is/was an easy way to keep it up-to-date. However, semi-recently something changed within Drive. Whenever I upload an updated .pdf, rather than just updating and it is visible, it changes share settings, and the embed link is changed. Definitely not the biggest issue but having to re-embed the file a few times a weeks is getting tedious.
Any one have any clue what to do about this? hoping there is a Drive setting or soemthing simple I'm missing.
Thank you!
Whenever I upload an updated .pdf to Google Drive, rather than just updating, becoming just a revision of the old file, and saying visible, it changes share settings, and the embed link is new thus breaing iframes.

Wordpress upload images fails – post-processing error

Since about a week back, I’ve started to experience failures in uploading images using the “upload files” option in “Add media”. When it fails, the upload eventually times out and I get the following error message:
Post-processing of the image likely failed because the server is busy or does not have enough resources. Uploading a smaller image may help. Suggested maximum size is 2500 pixels.
The images are relatively small. 900x500px and about 80kb in size, so the message is probably a generic one. Strangely, when I go back later to check the media page, the images have been uploaded although it gave me that error message and was stuck loading.
I don't believe it's a server issue because on my other sites which use the same theme and are on the same server, I don't have this issue.
I tried deactivating one plugin at a time but that didn’t help. Later I noticed that if I upload the original image (400 kb) it uploads with no issue. If I compress it in Photoshop (80kb) as I have always done in the past without any problems, then it gives me the error.
This is so odd. Compressing in Photoshop by setting quality to 50% (or lower) suddenly now fails upload (but work only a week ago). However, setting quality between 51-100% works fine when uploading.
Any suggestions?

WordPress website loading image which is not present in my website

My WordPress website takes a lot of time to load pages, a problem that many of us face. I used GT metrix to check my WordPress page and then checked my website waterfall.
One thing I saw taking more than half time of my page is an image which is not uploaded in my website.
check this image of gt metrix waterfall:
I checked it, and I found this is an image which I have not used in my entire webpage. Also I could not find where is this image used.
Same thing happens in different pages that has different images to it.
I deleted one image from my media but now when I check the GT metrix waterfall I get a 404 error code which means its still trying to load that image and I cannot find it.
This is a theme which I had purchased and its not a popular theme like divi or ocean and hence could not contact the support.
How to check where a particular image is used in my webpage using
the media library (can I do that?)
How to find out and remove this image? Or at least is there a way where I can delete the image from the library and hence my webpage should not look for this image wasting its time instead of getting a 404 code
Your problem is quite common indeed, for your specific case i can suggest by starting to search the image name in both code + DB, it MUST be somewhere.
If you cannot find it inside your stuff there's only one answer left: there's some JS third party script that is loading that for you, but in this case i seriously dubt it would be in the same domain as your site.
Using the media library there's not much you can understand, if you are VERY lucky it will have a message like "attached to" but that thing cover like 10% of the cases, most of the time the image ARE used but are not attached to anything like a post, so the media library won't tell you anything
I've had this happen before a few times, too. Isn't it frustrating!? If you could provide a URL, I (and others, I'm sure) would be happy to take a look and try to figure out what's going on. :)

How we can decreases the loading time of WordPress website?

I have developed wordpress website but after hitting it is taking so much time for loading , this is my website link http://www.dahotreanddahotre.com/.
Tell me any plugin or manual setting such that i can decreases the loading time of my website?
There are a few things you can do:
Cache
Use a cache system: https://wordpress.org/plugins/wp-super-cache/
This will let you serve your fixed pages a lot faster to the user.
Minify
Use some minifier: https://wordpress.org/plugins/fast-velocity-minify/
This will make included javascript, css files smaller and thus they will take less time to load
Identify image needs
Looking at the network dev-tools a lot of the loading time (4 seconds +) comes from huge images:
1st image (1.47MB): http://www.dahotreanddahotre.com/wp-content/uploads/2019/03/We-intend-to-be-your-financial-lifeline.jpg
2nd image (1.64MB): http://www.dahotreanddahotre.com/wp-content/uploads/2019/03/We-are-startup-friendly.jpg
Use a Image compressor before uploading them and don't upload images that are bigger than you need.
For example image2 is: 4,300px × 2,862px this could be reduced and compressed.
By decreasing it's size to: 2,150px x 1,431px and compressing it, it becomes only: 350kb
Checking unreachable resources
Still looking at the dev tool I could see some fonts which where giving a 404 error (almost 2 seconds loading).
This means the font is unreachable but the browser still spends time to try to load it.
Make sure all resources are reachable and unused one are deleted.
Use good hosting
Avoid bunch of plugins
Get a custom wordpress theme starting from scratch
Avoid page builders
Optimize images used in website

TimThumb - Broken Images on WordPress Site, but Only on One Dimension

I'm having some problems on a WordPress site I am working on: http://bit.ly/1Hs41z8
TimThumb generates the thumbnails in the bottom section of the site just fine with no issues. However, in the slider, the thumbnail images are broken. When I look at the URL, it looks really long and complex so I am not sure what is causing it to break. If I go to the image url itself, the image loads fine.
Any help would be greatly appreciated!
TimThumb is no longer supported or maintained. You can read the reasons why on blog: TimThumb End of Life.
See what Google say's
I would suggest you to use PHPThumb.
From PHPThumb:
phpThumb() uses the GD library to create thumbnails from images (JPEG,
PNG, GIF, BMP, etc) on the fly. The output size is configurable (can
be larger or smaller than the source), and the source may be the
entire image or only a portion of the original image. True color and
resampling is used if GD v2.0+ is available, otherwise paletted-color
and nearest-neighbour resizing is used. ImageMagick is used wherever
possible for speed. Basic functionality is available even if GD
functions are not installed (as long as ImageMagick is installed). One
demo file uses portions of Javascript API by James Austin.

Resources