I am trying to write a script in R that will download several youtube movies. This is one of them: https://www.youtube.com/watch?v=nHm8otvMVTs (I come to this youtube page from a course page where the youtube links to all of the videos in this course are provided, so my script will follow that set of links, go to the respective youtube pages, and fetch the mp4's).
Although I have found several discussions on stackoverflow (and elsewhere) on how to retrieve the actual link to the video, I am still totally lost as to how to automatically find it. (also, I keep seeing comments that youtube has recently changed its system, so most of those solutions don't work any longer anyway).
All I need is to be able to recognize the actual link on the youtube html-page (I am find with regex, so once I know what to look for, I can automate it for multiple youtube movies). So the main question remains: how can I find the actual link to the mp4 for youtube pages like this one: https://www.youtube.com/watch?v=nHm8otvMVTs or https://www.youtube.com/watch?v=BO6XJSaFYzk
I would appreciate any help on this, thank you.
Related
I would like to scrape / download youtube videos directly from youtube using R. So to clarify, I'm not interested in the meta data, video titles, or comments as much as the video itself (be it formated as a video or audio file).
Is this possible? And if so, is it legal? I cannot think of a reason why it shouldn't be, given that there exist tools with which one can download single youtube videos as audio files and given also that material published on youtube and openly accessible are literally exactly that: openly accessible and available.
I had a look at {tuber} and {Rselenium} but that is only good for retrieving meta data, not the actualy video or audio content.
Any experiences or suggestions?
I've got a site that has multiple share buttons on entries in a WordPress site.
We designed this so there are no individual entries to view, they're Podcasts and videos. The listing page has a minimum of 10 entries, each with share buttons.
Currently the share links and titles are working correctly. But the page is not recognizing the og:image, and instead is picking up the default logo for the site itself.
I read another post on Stack Overflow that said it might be an issue for LinkedIn if the image is utilizing SSL for the link. But I just find that hard to believe.
The other issue I'm struggling with, the docs say once an image is scraped it stays cached for approximately 7 days.
I had an issue with FaceBook and there's a debugger that allows you to rescrape the page which let's me verify my changes worked.
My two questions are, is there something other than the og:image i should be specifying? since I can't specify it per post, it's in the head of the page itself, i would think it would pick that up. No?
Second, is there a way a developer can re-check after the meta info has been changed to see if the changes worked, without having to wait the TTL on the cache?
try this:
url/link?blah=1
url/link?blah=2
url/link?blah=3
to get around the cache.
This should trick it into thinking its a new page each time.
Can i get a link to test?
Anthony Walz posted the correct answer. Through email he also helped another problem i had which corrected a new issue i didn't realize I had until i looked.
my LinkedIn shares were not picking up the show title, they were picking up the page description instead (i have several podcasts showing on one page, we don't use individual post pages, they all play from the listing.)
he pointed me to the developer docs on formatting sharing links
Which gives a real world example - here:
https://www.linkedin.com/shareArticle?mini=true&url=http://developer.linkedin.com&title=LinkedIn%20Developer%20Network
&summary=My%20favorite%20developer%20program&source=LinkedIn
Thanks a ton for assist Anthony!
am working on my Wordpress website, and came to this point where i want to add panoramic photos(360 Deg) to pages and posts,I've searched for plugins and it seems there is no one can do the job properly, after a little search about tools which used for the the purpose as a whole i came to several viewers based on three.js,e.g pannellum,I've tried to upload js/css files to wp-content,then use wp_enqueue_script(),wp_enqueue_style() within function.php,and then use an iframe as demos of these tools showed, it didn't work, any help would be appreciated.
Jetpack has a built-in 360/panoramic viewer. It's a pretty unknown feature but it works amazingly. The announcement post is here: https://en.blog.wordpress.com/2016/12/15/introducing-vr-and-360-content-for-all-wordpress-com-sites/
I would also suggest you read this for detailed instructions on how to upload and implement these: https://en.support.wordpress.com/embedding-360-photos-and-virtual-reality-vr-content/
I am a huge fan of RSS.I am currently using Feedly as my default RSS Reader.I have a question though that I am unable to find the answer.How can I follow a website that does not provide RSS Feeds?I have tried several addons on firefox or extensions on chrome that automatically detect RSS when I am visiting a website,therefore with one-click I can add that website on Feedly.In addition I have searched through the internet to create manually an RSS Feed,when a website does not provide one,but it seems there is not a free way to do it,or if I try an online 'RSS Creator' (like page2rss and more) most of the times they are not working (either can't find the RSS of a website or create an invalid RSS).However,I didn't give up,so I was desperately seeking a way,to find the RSS Feed via the 'source code' of a website.Unfortunately,that only works for Youtube Channels and not for other websites.Is there a way via those actions to 'follow' another website?
I have found a way to 'detect changes' of Feed-less websites using update-scanner addon on firefox and page monitor on chrome.But,all I want to do is put those webpages in one app/website (like Feedly) so that I can follow them whether I am using my pc,or iphone/ipad (iOS),or tablet (android),or another user's pc/laptop.Any suggestions?Keep in mind that iOS devices don't support extensions.If I confused you,visit this link and you'll understand exactly what I am looking for.
http://googlereader.blogspot.gr/2010/01/follow-changes-to-any-website.html
The only drawback is that googleReader does not exist anymore!Do you know another RSS Reader that support this feature (like Feedly,the Old Reader etc) ?
Thanks!
A simple but basic solution is Page2Rss.com. You put the URL of the page. One's a day, the service crawl the page and generate an item for all what's new.
Feed43.com does a much better job, even its free version. You have to elaborate rules of extraction from the HTML code.
Feedity is much (much) more interactive, bit commercial.
I'm using a custom Genesis child theme and lately I've been noticing that many false articles have been showing up on webmaster tools. They look something like this:
I haven't written these nor are they topics my site focuses on so I have no clue why they are showing up. So far, I've had to delete about a hundred of these. I read on a forum that this can be due to my theme generating bad urls but I'm not sure what that means nor do I know how to fix it. What can be causing this?
I believe that this problem is due to your website being hacked or Google is trying to Crawl or follow a link within your content that is not really a link.
This is what webmaster tool tells you about the problem:
In Crawl Errors, you might occasionally see 404 errors for URLs you don't believe exist on your own site or on the web. These unexpected URLs might be generated by Googlebot trying to follow links found in JavaScript, Flash files, or other embedded content.
To find out if your website has been hacked. First get this total = WordPress number of pages + number of post + number of categories + number of PDF or files + Images. Then do a google search using the following query (without the quotes) "site:yourdomain.com" if the result number is exaggerated greater than the calculated total then your website is definitely hacked.
If you believe that your website is not hacked try to find from where these links are being generated. Here is the trick: Go to the Web Master Tool report and click on one of those links, check the "Linked from" tab. There should be one or many possible pages listed from where these unexpected links are coming from.
Two possible Outcomes:
The page from where the link is found is from your own website: Go
to that page and open the source code, do a Ctrl+F search for that
link, if found check what section or content is generating this
problem.
The page from where the link is found is NOT from your own website:
In this case try to contact the owner of the other site and ask the
link to be removed, if not possible I highly recommend you to create
a 404 page within your WordPress installation with some useful
links. Google how to do this, there are plenty of resources.
Hope this helps