I need to re-scrape facebook's cache for every page in my web site (3000+ pages)
The only way i know how to do that is too tough Open graph debugger
I Cannot run this with 3000...
I read From Facebook developer support page that this (StackOverflow) is the place to ask questions but there is little to none knowledge about refreshing facebook url cache
Can you please suggest any working solution to re-scrape a page?
my web site: Mentallica
One possible answer, given the number of URL's you've got, is to use the batch invalidator. You could go to an access list of your URL's, or maybe do a recursive directory listing and replace the folders with URL's (if it's a flat site), or the like. At least, you don't have to do them one at a time. Once you have a list, paste the list into the invalidator (multiple lines).
The batch invalidator is here:
https://developers.facebook.com/tools/debug/sharing/batch/
It IS frustrating. I've searched several places, and don't really see a solution. We have a website with all of the proper tags, yet FB refuses to refresh any past posts with the new website data.
3 years later, but this can help someone: Paste yout url here and click fetch new scrape information: https://developers.facebook.com/tools/debug/og/object/
Related
I'm using a custom Genesis child theme and lately I've been noticing that many false articles have been showing up on webmaster tools. They look something like this:
I haven't written these nor are they topics my site focuses on so I have no clue why they are showing up. So far, I've had to delete about a hundred of these. I read on a forum that this can be due to my theme generating bad urls but I'm not sure what that means nor do I know how to fix it. What can be causing this?
I believe that this problem is due to your website being hacked or Google is trying to Crawl or follow a link within your content that is not really a link.
This is what webmaster tool tells you about the problem:
In Crawl Errors, you might occasionally see 404 errors for URLs you don't believe exist on your own site or on the web. These unexpected URLs might be generated by Googlebot trying to follow links found in JavaScript, Flash files, or other embedded content.
To find out if your website has been hacked. First get this total = WordPress number of pages + number of post + number of categories + number of PDF or files + Images. Then do a google search using the following query (without the quotes) "site:yourdomain.com" if the result number is exaggerated greater than the calculated total then your website is definitely hacked.
If you believe that your website is not hacked try to find from where these links are being generated. Here is the trick: Go to the Web Master Tool report and click on one of those links, check the "Linked from" tab. There should be one or many possible pages listed from where these unexpected links are coming from.
Two possible Outcomes:
The page from where the link is found is from your own website: Go
to that page and open the source code, do a Ctrl+F search for that
link, if found check what section or content is generating this
problem.
The page from where the link is found is NOT from your own website:
In this case try to contact the owner of the other site and ask the
link to be removed, if not possible I highly recommend you to create
a 404 page within your WordPress installation with some useful
links. Google how to do this, there are plenty of resources.
Hope this helps
HI so i keep running across websites which when looked through or searched (using their own search function) return's a static URL ie.) ?id=16 or default.aspx no mater what page of the website you visit after the search has been performed. This becomes a problem when i want to go directly to a post/page within one of these sites so i'm wondering. If anyone knows How could i actually find out what the absolute URL is.
So that i can navigate straight to it. I'm not really familiar with coding but have tried looking in the page source but i wasn't really able to gleam anything from there.
The basics around asp.net urls: http://www.codeproject.com/Articles/142013/There-is-something-about-Paths-for-Asp-net-beginne
It all really depends on what you're trying to find, as far as finding a backway to locate a absolute path, is highly doubtful. If the owner of the site(most blogs) want you to have a perma link to a page, they use url-rewriting for putting things in the URI like title page and such. Alot of MVC sites do this now.
The '?id=16' you're seeing is just a query string, a holder for other logic they are doing.
Often when searching for information i hit the problem, that the author of an article/website/blog post doesnt give out a date.
Is there any way (maybe special meta search engine, web-archives, use of google search operators to find out at least on which month & year a website URL was uploaded?
thx
puttin
javascript:alert(document.lastModified)
in the adress bar of a browser with loaded page pops up a date and time. Where this time data is coming from i have no idea, probably time html or php file was created on server. On the other way i thought javascript cannot access filesystem, but im no expert...
Still curious if someone knows a reliable method of finding out when a specific .html site was created as i find it useful for enquiry.
I've got a Flex 3 project. One of the problems I have is that not very much of its content is indexed by Google. Currently, I pull data from a mySQl database, so the Googlebot doesn't see most of the site.
My goal is to increase the amount of content indexed by Google, improve the SEO, and improve SERPs.
I thought that instead of pulling the data from the database that I would change the project's architecture and create separate "pages". So, in my case, I would compile each puzzle separately and upload it to the server in its own directory. This way the info in each puzzle would get indexed.
The negative is that if I add a puzzle, I'd have to add a link to it in all of the puzzles that are already on the server. I would have to add the link, re-compile each puzzle and upload it to the server. Is there a way to get around this problem? Also, if I wanted to communicate some data from one puzzle to another in the future, I wouldn't be able to do so.
Any suggestions?
Thank you.
-Laxmidi
The usual way to achieve this goal is to develop a hidden parallel site in HTML.
On the first page you will have your flash and, hidden by javascript, a list of links to the other pages. These links will be parsed by the robots. Ideally, the href pages are virtual (look for "url rewriting"). On each "fake" page, your server-side language will print on the page a content or links from your database AND the flash. The flash will be provided with a string explaining where it is and what it's supposed to show.
Ex: http://www.mysite.com/category1/content7 The URL rewriting sends this request to http://www.mysite.com/index.php?uri=category1/content7. The page should display the Flash with FlashVar "uri=category1/content7". The Flash knows which content it has to display so when an user comes from google, following this link, he will find the content he was looking for.
Every linking and content for SEO should be in HTML, don't trust robots capability of reading Flash.
have a look at Adobe's reference on deep-linking.
you can generate a website's sitemap.xml with a cron process (daily), such that the URLs encode the state of the application you need. This URL will encode whatever content you need to retrieve from the db, with just one index.html page.
good luck!
I inherited a Drupal 5 site recently and have a series of enhancements to make. Several of then revolve around search results.
Unpublished pages showing up in
search engine results. Some of these
are old pages, others are recently
unpublished. All are correctly
marked as unpublished in the CMS and
are still showing up.
Outdated pages are showing up from the search engine. The URL path structure changed and those items are old results in the DB.
From what I can tell the site uses Google Search Appliance(GSA) for the search rather than the default Drupal search. Is there a way I can be certain that it's using GSA other than seeing the module enabled?
If it is GSA it seems that I could get someone with access to the GSA to rebuild the search results on the site. Is this correct?
If rebuilding the search results is the right way to go about it, it seems whenever a fair amount of content is removed from the site I'll need to get someone to rebuild the search. Is there a better/automatic way?
Sounds like it's drupal that is handling the search. Google would need db access to show unpublished nodes. It could be you are using views to do search but forgot to only take published nodes.
If Drupal is handling the searchyou just need to flush and rebuild the search index. This can be done without too much trouble if you don't have too much content.
The GSA could still be showing deleted content depending on what your data source is.
If the content is coming from a database feed and is then dropped from the query it would be dropped. If the content was coming from a natural crawl or through a custom connector feed it would not be removed from the index on delete. Instead it needs to naturally cycle out of the index which can take a while.
One way to block deleted url's from being displayed is to do it through the front end. In the GSA Admin interface go to Serving > Front Ends then choose your front end and click the Remove URL tab. You can either list your url's or block a group of url's through regular expressions.
I have posted an answer to your more general question concerning node access. The problem with your search results might well be related to that.
In order to keep the Google Appliance more up to date, you might try out XmlSiteMap, a module that publishes a proper xml sitemap for all your content.
For an online website, publishing a sitemap is a good way to keep the search engines up to date, as they can use it to know about new pages and to purge old pages. I'm assuming that the Google Appliance would use this too,.