Problem in scrapping booking.com reviews using php - web-scraping

I use curl to get file from this link but it does not return full body of the page. I wanted to scrape the reviews from this url? How to fix this issue. The link is here ,
https://www.booking.com/reviewlist.nl.html?cc1=hr&pagename=apartment-sun-spalato-views&type=total&offset=0&rows=10

Related

FB Comment 'Also Post on Facebook' and FB Share - Linkback to different URLs?

This is SO annoying.
The Issue is a matter of two scenarios behaving differently when I expect (and wish) them to be the same, these relate to the same webpage/article which features both Facebook Comment widget and a Share icon.
It's important to mention that the webpage is a Wordpress article, and we redirect:
from www.example.com/wordpress/articles/news/thearticle
to www.example.com/news/thearticle
The issue explained:
Scenario 1) I visit the article and type a facebook comment, I also click 'Also Post on Facebook'. When I view may facebook wall and see the share with the comment I just made, the link is linking back to the wordpress URL - I do not want this.
Scenario 2) I visit the article and share it through the Share icon. On my wall this time the linkback URL is the short one - this is good.
Important info regarding debugging
My OG URL tag is correct:
<meta property="og:url" content="http://www.example.com/news/thearticle"/>
Facebook debugger is picking up the correct desired URL as above,
also listing the desired Fetched URL and Canonical URL. All seems to
be what I want it to be.
This is true regardless of whether or not I
scrape the URL before or after I try this procedure.
Even if I then go back and do another comment with 'also post on FB' after doing a scrape with FB debugger, still the share links to the wrong/long format/original URL.
However, if I at any point share via the share icon, not the comment-share, the desired URL is present as per the OG URL.
If I do a comment-share, then an icon-share, then another comment-share, the comment share ones still have the wrong URL even though the icon-share had the correct one!
Any ideas? Thanks so much in advance to anyone who can help :)
It seems that the Comments FB plugin doesn't get the page info from OpenGraph, but expect a data-href attribute, or if not provided will use the current url by default.
From the doc:
data-href
Description : The absolute URL that comments posted in the plugin will be permanently associated with. Stories on Facebook about comments posted in the plugin will link to this URL.
Default : Current URL.
So what you need to do is to provide the valid URL as an HTML5 attribute on your plugin wrapper:
<div class="fb-comments" data-href="http://www.example.com/news/thearticle" data-numposts="5"></div>

How do I get Facebook Group Events into WordPress?

The best way I have found to get Facebook Group Events into WordPress is to use Yahoo Pipes to create an RSS Feed. The feed that I have is as follows:
https://pipes.yahoo.com/pipes/pipe.run?Facebook_ID=1608419542769806&_id=1301d12f49b904e56afe3f420366a3c4&_render=rss
This works fine, but when I try to render this on a WordPress page with the Embed RSS plugin, it loads fine in the preview, but once it is inserted into the page, I receive the following error:
RSS Error: A feed could not be found at
https://pipes.yahoo.com/pipes/pipe.run?Facebook_ID=1608419542769806&_id=1301d12f49b904e56afe3f420366a3c4&%23038;_render=rss.
A feed with an invalid mime type may fall victim to this error, or
SimplePie was unable to auto-discover it.. Use force_feed() if you are
certain this URL is a real feed.
Now, I can easily add the following code to my functions.php theme file:
add_action('wp_feed_options', 'force_feed', 10, 1);
function force_feed($feed) {
$feed->force_feed(true);
}
but that just changes the error to:
RSS Error: A feed could not be found at
https://pipes.yahoo.com/pipes/pipe.run?Facebook_ID=1608419542769806&_id=1301d12f49b904e56afe3f420366a3c4&%23038;_render=rss.
This does not appear to be a valid RSS or Atom feed.
Is there something I'm missing? All I want to do is spit out text onto a WordPress page of the Facebook Group Event details.
There are different plugins for facebook streaming . Try the plugin 'custom-facebook-feed' . I think it will be the better option.

Escape a Shebang /#!/ in URL for Google URL Builder

Does anyone know if/how I can escape the shebang or encode the uri to make a link work properly in google analytics url builder? I want to add campaign parameters to product page urls to track ads success. The url for each individual product page looks like this:
http://www.oursite.com/classic-movies/#!/Title-of-Movie/p/12345678
When I put the product page url into the url builder, it says the url is invalid. I think it is because of the #!. I have tried escaping out the special characters, replacing the shebang with %23%21 or %21!
It appears valid in the url builder, and the builder generates a link with utm tags, BUT when you paste the tagged link into the browser, it does not take you to our product page. It takes you to our website, but gives a "sorry does not exist" message.
I also tried this:
http://www.oursite.com/classic-movies/?_escaped_fragment_=/Title-of-Movie/p/12345678
It generates a link in the builder and does link to the product page of our website (yay!), but the url adds this after the campaign name: #!/Title-of-Movie/p/1234567
The shebang is back! Will that be a problem?
For reference, we're using the Ecwid storefront plugin for a wordpress site.
Thanks in advance.
Short answer
You should use the URL without fragment (hash part) as a base for building URLs with queries (the part starting with '?') and then append the hash part to the end of URL.
Example:
1) Take http://www.example.com/classic-movies/#!/Title-of-Movie/p/12345678
2) Remove hash part: http://www.example.com/classic-movies/
3) Use this hash-free URL as a base and add query parameters yourself or use any automatic builder. Example: http://www.example.com/classic-movies/?utm_source=myblog&utm_campaign=xyz&abc=def
4) Append the hash part to the end of the URL: http://www.example.com/classic-movies/?utm_source=myblog&utm_campaign=xyz&abc=def#!/Title-of-Movie/p/12345678
You're done – the final URL is valid URL which will work fine for browser/customer, your site server and tracking tools like Google Analytics
Long answer
1) URLs could be very different, but their structure is actually quite the same and that's a part of the web standards.
URL is built this way:
protocol://site/path?query#fragment
(I simplified it and take in consideration only the parts we're talking about, the actual scheme is a bit more complicated)
Taking your product page URL, that will be:
protocol: http
site: www.example.com
path: classic-movies/
query: (empty)
fragment: !/Title-of-Movie/p/12345678
Now, if you want to add query parameters, you know where to insert them. As to the fragment part, it should be always in the end, regardless of whether it contains !
2) Google Analytics doesn't track the fragment parts of the URLs.
Urls like http://www.example.com/coolpage and http://www.example.com/coolpage#!anyparameter=anyvalue are the same for Goolgle Analytics. That's likely the reason why their URL builder tool doesn't accept that.
By the way, Ecwid uses fragment part of the URL all the time to address the product and category pages, but that's not an issue if you want to track your product pages in Google Analytics. Ecwid solved that problem by sending special 'virtual' page views to Google Analytics every time a customer browses your store. So in your GA reports you will see your store pages.
3) If you use Google Adwords for your ad campaigns, I'd suggest linking your Google Analytics and Google Adwords profiles to have better picture of customer behavior and the campaign performance. Check out this thread on Ecwid forums for the details:
http://www.ecwid.com/forums/showthread.php?t=10835

Content is deleted But URL still crawled

I have deleted some content from my website, but google search still shows me that content's url , after clicking on that url i get page not found error, please give me any link of drupal website to handle this situation.
Thanks & Regards,
Abbas Mulani
Create a Google Webmasters account and add your site. (all info on how to add available in Google webmasters itself).
Once you have done that you can submit list of URLs to Google to remove it from index, it might take a day or two. But this is the fastest way to get them removed.

LinkedIn Share does not handle "#" in the URL

I am running into an issue while using Share feature of LinkedIn when the shared URL has "#" in it.
My URL looks like this: http://shoshin.glgqa07.com/#!/content/detail/High-pay-without-advanced-degrees
When I share this URL on LinkedIn, it strips out everything after "#". So the page linked on the LinkedIn is pointing to http://shoshin.glgqa07.com/.
I am using following URL to Share page to LinkedIn.
http://www.linkedin.com/shareArticle?mini=true&url=http%3A%2F%2Fshoshin.glgqa07.com%2F%23!%2Fcontent%2Fdetail%2FHigh-pay-without-advanced-degrees&title=The+Shoshin+Project+%3A+QA+tesing&summary=&source=
When I do View Source on the above page (Shared news on LinkedIn page), the Hidden Input field "contentUrl-shareForm" has value "http://shoshin.glgqa07.com/" instead of the complete URL.
I am using ShareThis library to implement this feature.
Does anyone know solution to this problem?
Did you try to replace the sharp sign (#) with the ASCII code %23

Resources