Can I build an OPML file from a list of website URLs? - rss

This happens a lot so I wonder if there's a tool for working around it. Often, I find a website with a blogroll or a links page with a long list of 20 or more websites. I sure would like to keep up with those sites via the feed reader of my choice, but it sure it tedious to click on each and every link, look for an RSS link, subscribe to that, wash, rinse, repeat.
My favorite feed reader will accept an OPML to batch import a list of feeds, so that's a start, but here's my question:
If all I have is a list of the website URLs, is there a way to generate an OPML of the RSS feeds?

I was able to create an OPML file. All I had to do was create a text file, with a URL on each line. Then, I was able to use a PHP script to look at each URL, hunt each for the RSS feed's address and add each RSS feed address to the OPML file.
Incidentally, I've shared the project that this is part of on Github. I wanted to be able to subscribe to lots of litblogs at once.

Related

RSS feed link doesn't open up reader or just dumps out raw XML

I developed an RSS feed following a tutorial and I think the .xml file itself is in order. However, I have two problems:
When people click on the RSS link, it doesn't automatically load into their RSS readers
For those that don't have an RSS reader, clicking the link results in a page full of code which is not very understandable
I was hoping that there might be some tips on how to easily realize this.
Try to remove the <![CDATA[ and ]]> in the description tag.
I downloaded your xml, changed those lines, tested it on my server, and it worked in google's rss reader.
This is a browser and user profile dependent issue in how the RSS link is going to react when clicked on.
If the user has the action set up to automatically load it into their feed reader of choice, it will do that.
If they don't, then it won't.
For those that just see a raw dump, it could be that they're using a browser that does not support RSS feeds and will dump out the XML as raw text. Google Chrome (at least still in version 18) without the use of extensions or add-ons will usually be the dump truck culprit here.

Flex 3: Project Architecture & SEO

I've got a Flex 3 project. One of the problems I have is that not very much of its content is indexed by Google. Currently, I pull data from a mySQl database, so the Googlebot doesn't see most of the site.
My goal is to increase the amount of content indexed by Google, improve the SEO, and improve SERPs.
I thought that instead of pulling the data from the database that I would change the project's architecture and create separate "pages". So, in my case, I would compile each puzzle separately and upload it to the server in its own directory. This way the info in each puzzle would get indexed.
The negative is that if I add a puzzle, I'd have to add a link to it in all of the puzzles that are already on the server. I would have to add the link, re-compile each puzzle and upload it to the server. Is there a way to get around this problem? Also, if I wanted to communicate some data from one puzzle to another in the future, I wouldn't be able to do so.
Any suggestions?
Thank you.
-Laxmidi
The usual way to achieve this goal is to develop a hidden parallel site in HTML.
On the first page you will have your flash and, hidden by javascript, a list of links to the other pages. These links will be parsed by the robots. Ideally, the href pages are virtual (look for "url rewriting"). On each "fake" page, your server-side language will print on the page a content or links from your database AND the flash. The flash will be provided with a string explaining where it is and what it's supposed to show.
Ex: http://www.mysite.com/category1/content7 The URL rewriting sends this request to http://www.mysite.com/index.php?uri=category1/content7. The page should display the Flash with FlashVar "uri=category1/content7". The Flash knows which content it has to display so when an user comes from google, following this link, he will find the content he was looking for.
Every linking and content for SEO should be in HTML, don't trust robots capability of reading Flash.
have a look at Adobe's reference on deep-linking.
you can generate a website's sitemap.xml with a cron process (daily), such that the URLs encode the state of the application you need. This URL will encode whatever content you need to retrieve from the db, with just one index.html page.
good luck!

How to play audio podcast file from libsyn rss feed? (drupal)

Got an established libsyn rss feed, got a new drupal website for the podcast. Libsyn provides a player but not correct aesthetic. I can upload and play mp3 files with audio module and mp3player module, and like the mp3 player's output, a simple flash player, but I don't want to be manually moving the podcast audio files (mp3) over every week. Looked at importing automatically with Feeds, but it's not working and besides that's creating extra files unnecessarily on the drupal site.
Just want to use the mp3player modulee's flash player in a drupal page, which feeds the latest mp3 file from a libsyn rss feed. Don't really need to store or play multiple episodes, just the latest episode.
How would you do it?
Create a content type for my podcasts with a title and a field for the URL of the MP3
Use FeedAPI and map the title to the title of the node
Map the file URL to the URL field
Use Contemplate to set the URL field to display as [swf file="token_for_URL_field"], which will use the SWF Tools module and whatever player I've selected to play the file
So you need these modules CCK, Contemplate, FeedAPI, and SWF Tools, and that should do the trick.
Why don't you have your own site be the master and libsyn get it from you? Do they not allow an import feature, with which you'd keep your existing RSS through them. Then you can have total control over your site and push the content to all kinds of other great podcasting networks.
Realize I may have no idea about libsyn works.
When you say feeds didn't work, how did it fail? Are you using feed mapper? You may need to write a custom plug-in for feed mapper to get it to do the right thing with the video files. Feed api supports expiring imported feed items so you should be able to get it to automatically delete old ones. I'm not sure whether the video files will be automatically deleted when the nodes are. If not, you should be able to make this happen by implementing nodeapi's delete op for the content type you are using to store your imported rss items.
Alternatively, maybe you could just harvest the video's url on libsyn, and have the player use that. I don't know whether there's a good player which supports using a field's data for the location of the source it should play.
Also, if you haven't already, I'd encourage you to post your question on groups.drupal.org since that's read by lots of drupal experts.

Preventing RSS feed scraping?

On a Wordpress site, I have both a normal blog that I want Google to detect and an RSS feed for outgoing links to other sites. I don't need/want bots to get at this other RSS feed nor do I want people to be able to get the link for their own use.
I've disabled RSS for the main blog successfully but am not sure how to encrypt/protect/hide the RSS link for this additional feed.
I'm not sure how Facebook runs a newsfeed without RSS but however they do it is probably beyond my means/experience to replicate.
Where these are just outgoing links, I don't think copyright notices in the feed will do much. Maybe there is a way to output the links automatically through a means other than RSS?
Use Robots.Text www.robotstxt.org to prevent google from following the link. All self respecting robots should follow the directives in the robots.txt file. This file needs to go in the root of your sit.
The basic answer to this is to use a method of getting the feed entries in a manner other than using the actual RSS like outputting JSON, going through the API, etc.
It will help prevent scraping though not completely.

Collecting RSS Feeds Online?

I'd like to be able to collect RSS feeds online as an alternative to collecting them on a desktop machine using a regularly running process.
Ideally, it would either collect all feeds and simply email them to a single address as soon as it finds a new one (or even without checking for new feeds) or aggregates all the smaller feeds and sends them out as a bulk larger feed less periodically.
It would have to run on a web server continually, but would be a nice to be able to collect all feeds, not just the ones I happen to pick up when a feed reader is running on my machine. Is something like this available?
Just use Google Reader. :)
Google Reader.
Maybe Yahoo's Pipes could help you. It is an interesting way of combining and manipulating feeds.
I'm not sure if you have ever used it but iGoogle allows you to customise the google homepage to display information from around the web. You can add tabs to the page to allow you to split the information up. It's extremely useful and as you can log into it from any computer / browser you can access your feeds anywhere.
If you have a lot of feeds of one type or feeds that update infrequently then iGoogle can also be combined with google reader.
It's also great for adding other plugins like gmail, games, Dilbert :) and more.
To create an iGoogle page go to the google home page and click the iGoogle link in the top right corner. iGoogle will then provide you with a starter page and some suggested content which you can add or ignore. If you click the "Add Stuff" link then "Add feed or gadget" you can manually add all your RSS feeds. However, you can also configure Firefox to automatically select google as your RSS reader when ever you click on an RSS feed icon in the navigation bar. You can select / change this under Tools -> Options -> Applications -> Web Feed.
In order to use your iGoogle on multiple browsers / computers you will need a gmail / google account however it's free and easy to create.
T
simplepie is great if you have PHP installed.
Universal Feed Parser if you're programming in python might be of help

Resources