Site producing bad urls? - wordpress

I'm using a custom Genesis child theme and lately I've been noticing that many false articles have been showing up on webmaster tools. They look something like this:
I haven't written these nor are they topics my site focuses on so I have no clue why they are showing up. So far, I've had to delete about a hundred of these. I read on a forum that this can be due to my theme generating bad urls but I'm not sure what that means nor do I know how to fix it. What can be causing this?

I believe that this problem is due to your website being hacked or Google is trying to Crawl or follow a link within your content that is not really a link.
This is what webmaster tool tells you about the problem:
In Crawl Errors, you might occasionally see 404 errors for URLs you don't believe exist on your own site or on the web. These unexpected URLs might be generated by Googlebot trying to follow links found in JavaScript, Flash files, or other embedded content.
To find out if your website has been hacked. First get this total = WordPress number of pages + number of post + number of categories + number of PDF or files + Images. Then do a google search using the following query (without the quotes) "site:yourdomain.com" if the result number is exaggerated greater than the calculated total then your website is definitely hacked.
If you believe that your website is not hacked try to find from where these links are being generated. Here is the trick: Go to the Web Master Tool report and click on one of those links, check the "Linked from" tab. There should be one or many possible pages listed from where these unexpected links are coming from.
Two possible Outcomes:
The page from where the link is found is from your own website: Go
to that page and open the source code, do a Ctrl+F search for that
link, if found check what section or content is generating this
problem.
The page from where the link is found is NOT from your own website:
In this case try to contact the owner of the other site and ask the
link to be removed, if not possible I highly recommend you to create
a 404 page within your WordPress installation with some useful
links. Google how to do this, there are plenty of resources.
Hope this helps

Related

LinkedIn Not utilizing og:image

I've got a site that has multiple share buttons on entries in a WordPress site.
We designed this so there are no individual entries to view, they're Podcasts and videos. The listing page has a minimum of 10 entries, each with share buttons.
Currently the share links and titles are working correctly. But the page is not recognizing the og:image, and instead is picking up the default logo for the site itself.
I read another post on Stack Overflow that said it might be an issue for LinkedIn if the image is utilizing SSL for the link. But I just find that hard to believe.
The other issue I'm struggling with, the docs say once an image is scraped it stays cached for approximately 7 days.
I had an issue with FaceBook and there's a debugger that allows you to rescrape the page which let's me verify my changes worked.
My two questions are, is there something other than the og:image i should be specifying? since I can't specify it per post, it's in the head of the page itself, i would think it would pick that up. No?
Second, is there a way a developer can re-check after the meta info has been changed to see if the changes worked, without having to wait the TTL on the cache?
try this:
url/link?blah=1
url/link?blah=2
url/link?blah=3
to get around the cache.
This should trick it into thinking its a new page each time.
Can i get a link to test?
Anthony Walz posted the correct answer. Through email he also helped another problem i had which corrected a new issue i didn't realize I had until i looked.
my LinkedIn shares were not picking up the show title, they were picking up the page description instead (i have several podcasts showing on one page, we don't use individual post pages, they all play from the listing.)
he pointed me to the developer docs on formatting sharing links
Which gives a real world example - here:
https://www.linkedin.com/shareArticle?mini=true&url=http://developer.linkedin.com&title=LinkedIn%20Developer%20Network
&summary=My%20favorite%20developer%20program&source=LinkedIn
Thanks a ton for assist Anthony!

Wordpress search not working

I have seen various topics on here and trying to follow but my site may be different but am not sure. Mine is on http://www.doctorwhoworld.net/doctors for example if you search for "daleks" or "bbc audio" it takes you to the index page.
I have followed guides on here but I am still at a stuck point as to why pages will not be found by the search system - it is pages rather than posts I am using for help with the code going forward.
The second question is I am getting lot of white space on top and left can anyone suggest how to minimise.
I think I can see the problem. Your WordPress URL is not at the root of the domain, so you need to change your settings. Go to this page in your admin system:
Settings > General
Look at your Site Address (URL), which will probably be http://www.doctorwhoworld.net/. You need this to be http://www.doctorwhoworld.net/doctors, so that the search is carried out on the right home page.

How to refresh facebook scrape cache for a whole site

I need to re-scrape facebook's cache for every page in my web site (3000+ pages)
The only way i know how to do that is too tough Open graph debugger
I Cannot run this with 3000...
I read From Facebook developer support page that this (StackOverflow) is the place to ask questions but there is little to none knowledge about refreshing facebook url cache
Can you please suggest any working solution to re-scrape a page?
my web site: Mentallica
One possible answer, given the number of URL's you've got, is to use the batch invalidator. You could go to an access list of your URL's, or maybe do a recursive directory listing and replace the folders with URL's (if it's a flat site), or the like. At least, you don't have to do them one at a time. Once you have a list, paste the list into the invalidator (multiple lines).
The batch invalidator is here:
https://developers.facebook.com/tools/debug/sharing/batch/
It IS frustrating. I've searched several places, and don't really see a solution. We have a website with all of the proper tags, yet FB refuses to refresh any past posts with the new website data.
3 years later, but this can help someone: Paste yout url here and click fetch new scrape information: https://developers.facebook.com/tools/debug/og/object/

Web crawlers and IFrames

Hypothetical Situation: I have a small obscure website called "miniatureBoltsInCarburetors.com" which provides content about the miniature bolts which hold a carburetor together as well as some general related automotive information. My site also has a single page which allows someone to find the missing bolt in their carburetor, and while no one will access this page directly from my website, one billion other popular automotive sites have embedded this single page in their website using an iframe, yet not included a link back to my site.
I recognize that this question is related to SEO which is considered off topic, however, all of the many SEO related forums discuss the marketing steps one could take, and not the programming steps or strategies, and hope others will allow this question to be answered here.
I wish my site "miniatureBoltsInCarburetors.com" to be ranked high for general automotive searches. What could I do to allow the 3rd party sites which include an iframe back to my site to improve my ranking? Could using JavaScript in the iframe to create a link on the parent page provide any value? What about when my server renders the page, use PHP to get the referring URL from $_SERVER, and include it in the content?
I am providing a solution here. Not sure if this is what you want though.
In your page which is used by other websites in iframe you can put below Javascript. This javascript checks if the webpage is opened inside an iframe or directly in browser.
So using this check when you see it is opened in an iframe. On click on something navigate to your website.
// This works in all browsers
function inIframe () {
try {
return window.self !== window.top;
} catch () {
return true;
}
}
Also for your reference you can check the below URL.
How to prevent my site page to be loaded via 3rd party site frame of iFrame
Hope it helps.
Iframes are seen seperate pages by Google. Your approach may end up being penalized due to being sourced from untrusted site. According to Google Webmaster Support
Frames can cause problems for search engines because they don't
correspond to the conceptual model of the web. Google tries to
associate framed content with the page containing the frames, but we
don't guarantee that we will.
One of the best approaches to rank higher for a specific keyword is, make multiple related sites. In your case a 3-4 paged site about carburetors, bolts, other things your primary site contain would do it. These mini sites will be more intense about the subject due to less page count. Of course they should contain unique articles on each page. Then link from mini websites to primary websites and you can see the dramatic change.
In fact, the thing you are trying to do was a tactic to rank competitors down worked occasionally a few years ago. Now, it is still a risk.
I see. You don't want to mess up the page for your own site, but you want to do something with all the uncredited embeddings.
The solution is fairly simple:
Create a copy of the page.
Switch your site to use the copy.
Amend the version that countless other sites are embedding, so that there is a small link back to you. Or, add an iframe blocker script that will load your site.
If the page is active (ie user interacts with it to find the missing bolt) you could include a sales message with the response encouraging the user to visit your site.
I think that your goal is getting your link onto these other sites long enough to get indexed by Google before it is noticed by the people doing the embedding, so it's a bit of a balancing act.
I see conflicting advice about how Google indexes iframes. You should use a PageRank checker to see if the existing iframe page url has PageRank, and compare it to the page that you embed it on.
I dont Think you need to worry ,.
Google bot does seem to crawl through Iframes ,but the Web-Page Containing that Iframe is not Credited for that Content .. In other Words,, Page-Ranking of that particular Web-Page do not Change due to Contents from Iframe .
is IFrame crawled by Google?
Do robots crawl iframes?

Can I track who is linking or manipulating my site's data?

Is it possible to track if someone links to data on my site? Specifically if my data is used in a site dynamically generated by a developer program? I would like to know if someone is blatantly passing off my site's data as their own. There are obviously ways around directly linking to content, such as content manipulation or even manual manipulation. But if someone where to link(or directly add word for word or manipulate) my content into their website, is there a way to track it?
Can I avoid someone being able to scrape my website at all, or is everything just up for grabs?
the best answer and the easy one is called GOOGLE - WEBMASTER TOOLS!
HERE
actually doing that is very hard and you would need to crawl the web to discover those links that address to your pages... dynamic content as well is linked so it would be find by google as well.
this tool will allow you to see outer links that address to your site.. and you can check them.
for extra - you can monitor requests and traffic to your site and find ip's that are using the same page over and over again. that can tell u that an outer page is dynamically loading content from your web page.
EDIT:
here is a good article in this subject: link - scroll down and you can see the use of google
webmaster tool with some other progrmas and method.
here is a good start guide to the google webmaster: link
ENJOY!

Resources