I was looking into ways in which i can decode the Google adclicks URL to the actual website redirect via code...
I have a big db of URLs like following:
https://www.google.co.in/aclk?sa=L&ai=DChcSEwjY9KL2m4fRAhXTCioKHXEWBN0YABAK&sig=AOD64_3p0RvGkZj0fn81FSXIKtQ9XPVBvg&ctype=5&q=&ved=0ahUKEwialZ72m4fRAhVKwI8KHbGmDB8QvhcIKg&adurl=
https://www.googleadservices.com/pagead/aclk?sa=L&ai=DChcSEwjY9KL2m4fRAhXTCioKHXEWBN0YABAM&ohost=www.google.co.in&cid=CAASIuRoPu3Xxj7yyeUtRHLYBy-5952U-NXdaW3ftj91LB2rPAQ&sig=AOD64_0ksuGT2UtbiAEScV_lASVCVh7eFg&ctype=5&q=&ved=0ahUKEwialZ72m4fRAhVKwI8KHbGmDB8QvhcILw&adurl=
http://www.google.com/aclk?gclid=...
I am searching for methods to determine what the target website is. Any help appreciated.
only way to do it is with php file_get_contents
Related
I have a wordpress website https:www.x-equo.nl and we have single job which we want to share with linkedin. If we click ont he share button, we get the error message:
'It's not you it's us, give it another try please.'.
When I have a look at the URL created:
https://www.linkedin.com/cws/share/?url=https%3A%2F%2Fwww.x-equo.nl%2Fvacature%2Ffinancieel-administratief-medewerker-6%2F
I see that the "css/share" part in the URL is different than another website we have in which the sharing does work.
What is going wrong here? How can we change the share link URL so linkedin will accept it?
Hope anyone can help us.
Thank you I advance
We tried reading a lot of other helpdesk but non of them seem logic.
The following URL format works perfectly for me...
https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fwww.x-equo.nl%2Fvacature%2Ffinancieel-administratief-medewerker-6%2F
And the URL you have will simply redirect to the URL above. You have correctly encoded the URL, so, that's not the problem. But, yes, it definitely works:
Source: Official LinkedIn Shareable Documentation.
Not sure if your page is loading correctly? Test it out by inserting your URL into the LinkedIn Post Inspector.
I'm looking for an easy way to share through LinkedIn without all that hassle with OAuth 2.0 which I don't see required when I see other pages that use this kind of sharing (and they didn't required anything from - I can straight out share).
Straight to the issue:
this one works: https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Frefair.me
this one doesn't: https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Frefair.me%2Fjob%2F494
Seems like beyond main domain I can't get sharing working. For instance from other site a link that goes deeper and is still shareable: https://www.linkedin.com/shareArticle?mini=true&url=https://bulldogjob.pl/companies/jobs/2043-programista-java-warszawa-bms-sp-z-o-o&title=Programista+Java&summary=&source=https://bulldogjob.pl
I also tested with and without source and summary query params. Anyone had that issue?
LinkedIn uses the Open Graph protocol (http://ogp.me/) to determine how pages are shared in LinkedIn.
You may also use the LinkedIn Post Inspector (https://www.linkedin.com/post-inspector/) tool to debug how various pages would be shared in LinkedIn.
I decoded your URL so I could get a cleaner look...
https://www.linkedin.com/sharing/share-offsite/?url=https://refair.me/job/494
So, let's try to visit your URL: https://refair.me/job/494 . The webpage you are sharing DOES NOT LOAD.
Is your site down for everyone? Yes, your site is down for everyone.
In order to share a URL on LinkedIn, you must fulfill the following minimum requirements:
The URL must load.
If you just want to test out the API, try using wikipedia.org or google.com as test pages.
Surprisingly, the old refair.me URL by itself works fine in LinkedIn, but that could be from some internal cache, from way back in the day when the page once did work. It certainly does not do so anymore.
I have been trying to use nutch to crawl twitter and linkedin data
Nutch-0.9.
However when i try to crawl twitter the regex-filter doesnt seem to work, my regex-filter file has
+^https://([a-z0-9]*.)twitter.com/a
and what i wish to do is to crawl only those urls that follow the above pattern. I end up with urls such as https://twitter.com/document.
As for the linkedin part, it always shows a timeout whenever i try to crawl it, robots.txt on linkedin says that you need to mail to get your crawler whitelisted but they never respond.
Appreciate your help !
f you want to crawl this specific urls you should include following line too
-.*
this command will exclude all other urls!
Also if you want to crawl twitter or linkedin, you can use specified crawlers like twit4j or linkedin-j!
As I know so far, Nutch did not support crawling Twitter and Linkedin data. For crawling Titter data you should using Twitter API, check this one http://twitter4j.org/en/. For crawling Linked data, you could have a look on this https://github.com/pondering/scrapy-linkedin.
Hope this helps
I'm having a problem with Facebook Open Graph implementation with my wordpress blog. Although all the og meta are there in the head, Facebook debug tool told me: Error Parsing URL : Error parsing input URL, no data was scraped.
Things I've already done:
Check for links mistypo
Try the Wordpress SEO plugin by Yoast, Facebook Open Graph impementation
Try WP Facebook Open Graph Protocol plugin
Remove and put manualy all the og meta tags in my header file one by one to found the error
None of these tricks worked and all of them ended with Facebook debug tool telling me: Error Parsing URL : Error parsing input URL, no data was scraped.
But I founded that Google Webmaster Tool can see my tags and information about my blog. If you check source at sfapress.uphero.com you'll see that og meta tags are there where they are expected to be.
So, I'm just wondering: What am I missing? How wrong am I about that? Can anyone of you guys please help me figure this out?
Thanks for help.
It looks like your hosting provider, or possibly your web server, is denying access to the Facebook scraper. This is why it's saying no data was scraped. Take a look at https://developers.facebook.com/docs/opengraphprotocol/#bestpractices and https://developers.facebook.com/docs/ApplicationSecurity/#facebook_scraper for information on how the scraper appears to web servers.
I need help solving the following issue:
I need to validate cached URLs by Google search engine for a particular site. In the case the url will 404 or the page will not render some necessary html elements (considered broken) I need to log those URLs and later 301 redirect to correct URLs. I know PHP and a little bit of Python but I'm not sure what approach to use to scrap all URLs from search engine results for given site.
http://simplehtmldom.sourceforge.net/ - a simple html parser. there is an example at this page; not sure if this still works with googles instant search etc.