rvest on extracting link within <a rel= ... href=> - r

I'm trying to use rvest package to scrape a list of links embedded on a page. Before I'd use something like this:
library(rvest)
page <- read_html("link")
page %>% html_nodes('{a href}') %>% html_attr('href')
However, this only gives me the link related to Here but not this link <a rel="external nofollow noopener" href="www.dropbox.com/abcdefg.rar" "target="_blank">Part 01</a>
My question is how to get the second link while ignore the first link?

Using xpath, perhaps //a[#rel] should help (it selects all a elements with an attribute rel).

Related

Wordpress RSS2 href attribute in <a> tags inside feed

I'm new to wordpress, in my feed (rss2) i've noticed that links are displayed like this:
<a title="myTitle" href="" target="_blank">my text</a>
As you can see the href attribute is empty because the rss2 function
get_the_content_feed with the filter "rss2" probably cut off href attribute.
Is it possible to retrieve href attribute?
Example of what I want
<a title="myTitle" href="http://www.mylink.com" target="_blank">my text</a>
Thank you
I post this in case someone needs, I solved this problem, it was a plugin called RSS Image Feed (https://wordpress.org/plugins/rss-image-feed/) that conflicted with ancor tags.
Disable it and it works again.
Cheers

How to add target=_blank automatically to PDF links using PHP?

I'm wondering is it possible to automatically add target=_blank to any PDF link I have on my website, so that it will convert
link text
link text
to
link text
link text
as this site has a lot of PDFs, and it would be easier to do this automatically rather than have to set them all one-by-one.
I have tried the Javascript solution:
$(".newWindow a[href$='pdf']").attr('target','_blank');
But am wondering is there a way to do this automatically in PHP, perhaps by adding something to Wordpress's functions file?
If your .pdf links are only in your pages content. The fastest and easiest way to to do it is to replace inside your theme template file that shows your page content the following:
the_content();
by
echo str_replace('.pdf"', '.pdf" target="_blank"', get_the_content());
This will be inside the file content.phpof your theme.
Note that the function quoted above may have a parameter, (for example the_content('',FALSE,'');) you need to pass the same parameter in the get_the_content()function as well.
Using regex, you can perform smarter replacements. This example below will work on all <a> tags with a PDF extension in the URL, even if the URL has parameters.
echo preg_replace('/((<a (?=.*\.pdf)(?!.*target="_blank").*?)>)/', '$2 target="_blank">', get_the_content());
This regex performs two lookaheads: first, to check if the URL contains a PDF extension; and second, to check if the target="_blank" attribute is not already set. If both of those requirements are satisfied, then it will append the target="_blank" attribute to the end of the tag.

Like button not passing URL parameters

I'm making a site which has a list of videos. Next to each video in the list I want a like button. My like button code looks like this:
<div class="fb-like" data-send="true" data-layout="button_count" data-width="100" data-show-faces="false"></div>
And the URL of the page is
http://dev2-rs-www.doubledip.com/video-play/?vid=aj5pbnNlcnRlZC1mcm9tLXRleHQtaW4tYnVpbGQtZGVwbG95LWFwcGxpY2F0aW9uaWQtb24tZGVwbG95bWVudHIkCxINQWNjb3VudFJlY29yZBgBDAsSC1ZpZGVvUmVjb3JkGAkM
However, the link on my wall is http://dev2-rs-www.doubledip.com/video-play/ without the ?vid= parameter. I also tried adding the data-href attribute, but that didn't do anything. How can I fix this?
Your problem lies with the misuse of the <link rel='canonical'> element. On the video page, it links to the page without the ?vid parameter, causing it to use that as the final value. Change it to the actual page and it should work fine.
Facebook like buttons doesnt use parameters in url.
User this: http://developers.facebook.com/tools/lint/
to check for info and errors

Twitter share button doesn't forward custom text

I'm working on a website with twitter share option for each specific product.
I followed twitter API instructions for tweet-sharing, and everything works fine except custom display of text. For example I want user to tweet like this:
"What do you think? Should I buy this? http://url.etc #mywebsite"
but all I get when user tweets is the link:
http://url.etc
This is the code:
<script type="text/javascript" src="//platform.twitter.com/widgets.js"></script>
<a target="_blank" href="https://twitter.com/share" data-url="http://bit.ly/twitter-api-announce" data-via="testtest" data-text="What do you think? Should I buy this? " data-count="none" data-counturl="http://groups.google.com/group/twitter-api-announce" >TWITTER</a>
The problems seems to be with data-text option.
Any experience on this? Ideas?
Thanks
On Wordpress I just used Tweet
Works like a charm!
simply use a link like :
tweet
Just change what is between [] (and remove them)
note that everything have to be RFC (with weird chars such as 'space' replaced by %20 etc.)
twitter propose a nice page to make the buttons
https://about.twitter.com/resources/buttons#tweet
But my solution avoid the javascript to force the design of the buton
You can use this:
<a href='https://twitter.com/share?url=google.com&text=Signup>Tweet</a <script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0];if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src='//platform.twitter.com/widgets.js';fjs.parentNode.insertBefore(js,fjs);}}(document,'script','twitter-wjs');</script>
Confirm that you have included the correct twitter scripts . Better still, generate your tweet button code from the twitter developer interface here ..
https://about.twitter.com/resources/buttons
If you would like to modify the tweetbutton content on the fly ...e.g after page load, you will have to creat and insert your tweet button into the html DOM dynamically .
Some guidance on that can be found here .
http://denvycom.com/blog/twitter-button-with-dynamic-custom-data-text-message/
Hopefully this is helpful.
You have to encode your text before inserting it in the link. The correct procedure is:
Encode the text with an online tool like this one
Put the result inside an HTML link (as suggested by #FenixAoras): Share on Twitter
If you generate your HTML with php, you can use the urlencode function directly in your script:
echo "Share on Twitter";
If you are using WordPress, the best way is creating a shortcode, beacuse with it you can use also the native functions of WordPress:
function tweet_this($atts, $content = null)
{
extract(shortcode_atts(array(
"text" => ''
), $atts));
return "<a href='https://twitter.com/intent/tweet?text=".urlencode( $text." - ".get_the_title()." - ".get_permalink() )."'>Share on Twitter</a>";
}
add_shortcode( 'tweet_this', 'tweet_this' );
(Note: the code above is just a lead, you can expand it and you have to test)
Usage:
[tweet_this text="my custom text with #hashtag and #Mention"]

How can I find feed or XML of a particular news source

I want to get xml file of a particular news source, Of if there is any project which converts html news to xml, parsing page and tokenizing its various traits such as date, author name, title, content etc. in a single xml or similar type of file.
For example see this link:
http://daily.bhaskar.com/article/NAT-TOP-yeddyurappa-breaks-venkaiah-naidus-laptop-slaps-minister-reports-2318460.html
How can I extract content, author, date etc from this webpage. Or if I can find this webpage's feed I can do that easily. But How can I search for that.
which technology are you using ?
If it's a purely client-side / web solution then you'll find js options in a previous StackOverflow question. If you're on the server-side you can use WebClient/LINQ to hit the ATOM feed and parse it
To find out if a page has a feed scan the HTML for a specific <link> tag with these rel and type attributes:
<link rel="alternate" type="application/rss+xml" title="Page as RSS"
href="http://example.com/page/feed">
The feed URL is stored in the href attribute. This mechanism is called RSS Autodiscovery

Resources