How to get all content of website from google cache? - blogspot

My gmail account was hacked today, and I can't login or request new password anymore. And I lost all content in my blogspot too.
I looked around and found it was stored in google cache. But I had more than 200 articles, and I need to go through more than 200 urls to copy all content.
Is there any methods can help me retries all content from google cache?

A Web Crawler could help you to retrieve many pages of information.
Also, maybe you could use Internet Archive Wayback Machine to retrieve some lost information.
Another tip: the google advanced search could help you too. In particular, the site or domain param.
Update: Maybe this script can do all work for you: Retrieving Google’s Cache for a Whole Website

Related

LinkedIn sharing doesn't work as expected

I'm looking for an easy way to share through LinkedIn without all that hassle with OAuth 2.0 which I don't see required when I see other pages that use this kind of sharing (and they didn't required anything from - I can straight out share).
Straight to the issue:
this one works: https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Frefair.me
this one doesn't: https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Frefair.me%2Fjob%2F494
Seems like beyond main domain I can't get sharing working. For instance from other site a link that goes deeper and is still shareable: https://www.linkedin.com/shareArticle?mini=true&url=https://bulldogjob.pl/companies/jobs/2043-programista-java-warszawa-bms-sp-z-o-o&title=Programista+Java&summary=&source=https://bulldogjob.pl
I also tested with and without source and summary query params. Anyone had that issue?
LinkedIn uses the Open Graph protocol (http://ogp.me/) to determine how pages are shared in LinkedIn.
You may also use the LinkedIn Post Inspector (https://www.linkedin.com/post-inspector/) tool to debug how various pages would be shared in LinkedIn.
I decoded your URL so I could get a cleaner look...
https://www.linkedin.com/sharing/share-offsite/?url=https://refair.me/job/494
So, let's try to visit your URL: https://refair.me/job/494 . The webpage you are sharing DOES NOT LOAD.
Is your site down for everyone? Yes, your site is down for everyone.
In order to share a URL on LinkedIn, you must fulfill the following minimum requirements:
The URL must load.
If you just want to test out the API, try using wikipedia.org or google.com as test pages.
Surprisingly, the old refair.me URL by itself works fine in LinkedIn, but that could be from some internal cache, from way back in the day when the page once did work. It certainly does not do so anymore.

Making your site shareable on LinkedIn

I'm having a few issues with making our site shareable on linked in and I'm at a loss. The og: meta tags all look fine, the facebook scraper picks it up fine, but the linkedIn scraper does not... and the img etc are not on a protected folder or anything like that.
When inspecting the developer tools the get request to the url-preview?url= link shows that the img etc.. aren't there.
The image is less than 1mb, all og: meta tags are obeyed. The only think that may not be 100% is the image ratio is not 1/4 or 4/1 (it's 2/1)... But that is only a recommendation and not a hard and fast rule.
Does LinkedIn provide something similar to FB (https://developers.facebook.com/tools/debug/) where you can test the scraper and re-run it? Or is there another way to debug this? Any help appreciated.
https://www.hipla.co.uk (is the page i'm trying to share).
cheers
It transpires linked in doesn't offer a similar facility to FB or twitter to test the OG meta tags and re-scrape the page. They cache a page for 7 days and then re-scrape again. However, you can refresh the linkedIn crawler cache simply by appending GET params to the URL, i.e. https://www.hipla.co.uk?123.
I eventually figured out what our issue was. We were using a wildcard cert (for multi domain, so we could have a single ssl cert for multiple subdomains) which meant we had to set the server name in the apache default-ssl.conf file, but we had a typo in it for the www instance ... which meant it gave an SSL error (for the linkedIn crawler) which isn't debuggable (if that's a word) using linkedIn but was spotted as we got an SSL error when testing the twitter metadata tags using the twitter card validator. Hope this helps anyone else who has a typo in their ssl settings. Note that the ssl error was not visible using a browser(s) as all looked fine.

Website not posting to Facebook: security & app id issues

I'm a new WordPress designer. My site runs Tesseract Theme and is built with Beaver Builder.
PROBLEM: When I post my website (https://louiseclark.tech) on Facebook it removed my site after a couple minutes. Now when I try to post my site it gives me this message--> It looks like a link you're sharing might be unsafe. If you can, please remove this link: louiseclark.tech Note: The unsafe link might be on the page you’re linking to.
What I've done to try and resolve:
When I ran my site through the Facebook debugger I got this message:
The 'fb:app_id' property should be explicitly provided, Specify the app ID so that stories shared to Facebook will be properly attributed to the app. Alternatively, app_id can be set in url when open the share dialog.
I created an app id following this instructional video: https://www.youtube.com/watch?v=V97h03H21y0
I pasted my app id into my Yoast SEO plugin under the Facebook category.
Check my Google Webmaster Tools Sitemap...all is verified and sitemap set.
SSL certificate is set - checked with my hosting company SiteGround. When I asked them about this problem they didn't really feel that the security issues where from their side.
I've reported this problem to the black hole that is Facebook support.
Thank you for any insight.
In case anyone sees this thread, I found the solution.
When I moved my WordPress sites to managed WordPress hosting I also migrated my websites to https with the SSL certificates. While the pages were migrated and displaying the https just fine, the images still held their old url (http).
I did two things:
I installed SSL Content Fixer plugin. This worked for some images but not others.
I installed Better Search Replace plugin. I had found the specific insecure images using Firefox. From my page in Firefox, I went to:
Tools -> Page Info -> Media This showed me every image/js/css call on this page. Finding these images allowed me to use the plugin to make the changes.
It worked. I'm quite sure knowing how to code my site would be much better in this situation. But I'm a newbie and this is what I could come up with.
What I learned: It's a flag when you have a secure site that embeds non secure objects/images.

Will a Sitemap in localhost create duplicate content issue?

For my Wordpress.org site I use the Google Sitemap Generator plugin by Arne B. While in localhost I activated the plugin and it works.
I usually update my website in localhost and then upload the database to my webhost. So now I am wondering if Google search results will now enter both urls below?? Reason I am asking is because I am afraid Google will consider this as duplicate content.
http://127.0.0.1/beef-recipe-1/
http://www.actual-website.com/beef-recipe-1/
Google can't access your localhost (127.0.0.1) so it will most likely ignore those URL's.
If you are afraid of the above situation the best thing you can do is to delete all the previous sitemaps and re-generate a new one while your site is online. By going to Google's webmasters tools Resubmit the sitemap if necessary and crawl your websites main domain link for e.g: mydomainname.com and let Google crawl all direct links from associated to the homepage.
This way you will not lose rankings on Google while it may become a helping factor to your website.
Cheers!

ASP.Net authentication and Googlebot

I have an ASP.Net 3.5 web site with forms authentication enabled. Is it possible to have Googlebot crawl my web site without getting prompted for a username/password?
Google claims that is not wont to index page and show them on the users as available that are not, because actually they request user name and password.
It can give the option only to crawl the protect page by the AdSense so he can know what advertize to show on them
https://www.google.com/adsense/support/bin/answer.py?answer=37081
Other solutions that check if is bot or coming from google bot computers are not safe because they can easy spoof by the users, and also may fail to show a preview or a cache of the page.
So you need to think your site structures, what is very important and what is not, to show some part of the pages, hide some other if the user is not register, and that way google have something to index even if its not loged in.
Here is an article:
http://www.guruofsearch.com/google-access-password-protected-site
It would be interesting to see if a google sitemap would result in pages showing up in google - but I doubt that would work either, as the pages would likely need to be crawled anyway.
And some other interesting comments here:
http://forums.searchenginewatch.com/showthread.php?t=8221

Resources