I am trying to embed an RSS feed on a web page I am designing. The feed is for an Austin, TX Craigslist page:
http://austin.craigslist.org/search/fua?query=%22modern+salvage%22&srchType=A&minAsk=&maxAsk=
Depending on which URL I use for the feed I get one of these results:
**Feed URL**
http://www.gmodules.com/ig/creator?url=http://austin.craigslist.org/search/fua?query=%22modern%20salvage%22&srchType=A&format=rss
Error parsing module spec:
Not a properly formatted file
missing xml header
**<link> in the XML for the above URL**
http://www.gmodules.com/ig/creator?url=http://austin.craigslist.org/search/fua?query="modern%20salvage"&srchType=A
Information is temporarily unavailable
**I have also tried the URL in the head of the HTML doc:**
http://austin.craigslist.org/search/fua?query="modernsalvage"&srchType=A&format=rss" title="RSS feed for craigslist | furniture - all ""modern salvage"" in austin
Information is temporarily unavailable.
Although Craigslist encourages users to embed RSS feeds I wonder if the Craigslist server is denying the request. I have a background in design, not programming. Any suggestions?
Thank You.
I'm not sure I understand what gadget you're using...
Anyway, I was able to make your page load with the RSS Reader+ gadget on a Google Sites page (won't stay up forever).
Attempts to make it work with http://www.gstatic.com/sites-gadgets/rss-sites/rss_sites.xml were unsuccessful. I think that gadget is broken, according to comments on http://www.google.ca/ig/directory?type=gadgets&url=www.google.com/ig/modules/reader.xml
Related
I want to make a tech news app in android by parsing rss feed from various tech websites, I have found techcrunch rss feed URL by looking for href in page source using type="application/rss+xml".
RSS feed URL of techcrunch:
http://techcrunch.com/feed/
I'm unable to find similar URLs for other websites like cnet, gizmodo etc.
Is there any other way of finding rss feed URLs ?
While on the homepage, http://techcrunch.com/ for you, press ctrl+u and the source code appears. There press ctrl+f and try searching for stuff like rss, atom, feed... Works for me in Firefox (not sure if ctrl+u has different compilation in Chrome for example).
If that's a slow approach for you, try installing an add-on like Feedbro, open the site on a tab and from the icon of Feedbro click "Find feeds on Current Tab", and then go to the add-on to see the details on the feed.
Some sites use ATOM, rather than RSS, feeds. Some websites will not even specify the RSS link in the head tag. You will have to leave it to the user to find the RSS link and let him add it to your app.
SO i have a webview displaying user defined websites. I want to autodetect if that url contains any rss feed and post it in a Label/textarea.
The most straight forward way is to parse the HTML into a DOM document, then traverse the document looking for nodes that define RSS links. You may try using QXmlSimpleReader but this can be frustrating because most HTML is not well formed XML so you will have to handle exceptions.
In an answer to this question, the following SourceForge project was recommended. This might be worth a look.
Dear all,I am now using a webtool
http://fiddesktop.cs.northwestern.edu/mmp/scrape?url=
to parse a webpage.
For example,we can parse newyorktimes homepage,we do:
http://fiddesktop.cs.northwestern.edu/mmp/scrape?url=http://www.nytimes.com/pages/world/index.html
in the address bar of our browser,it will parse things nicely for us.
However,it just fails for google pages.
For example,if I want to parse Google news headpage,like:
http://fiddesktop.cs.northwestern.edu/mmp/scrape?url=http://news.google.com/nwshp?hl=en&tab=wn
I will always get 500 Internal Server Error.
I am sure that is somthing to do with google website,I think probably we need some API for google,does anyone have any idea how to to sort this out for google pages?
Many thanks.
Per the google.com robots.txt file, you are explictly requested not to scrape their content. Google does not provide an API for machine-readable search results; they want to control the presentation of their content via widgets and embedding strategies.
On a Wordpress site, I have both a normal blog that I want Google to detect and an RSS feed for outgoing links to other sites. I don't need/want bots to get at this other RSS feed nor do I want people to be able to get the link for their own use.
I've disabled RSS for the main blog successfully but am not sure how to encrypt/protect/hide the RSS link for this additional feed.
I'm not sure how Facebook runs a newsfeed without RSS but however they do it is probably beyond my means/experience to replicate.
Where these are just outgoing links, I don't think copyright notices in the feed will do much. Maybe there is a way to output the links automatically through a means other than RSS?
Use Robots.Text www.robotstxt.org to prevent google from following the link. All self respecting robots should follow the directives in the robots.txt file. This file needs to go in the root of your sit.
The basic answer to this is to use a method of getting the feed entries in a manner other than using the actual RSS like outputting JSON, going through the API, etc.
It will help prevent scraping though not completely.
Is there any website/service which will enable me to add RSS subscription to any website?
This is for my company I work. We have a website which displays company related news. These news are supplied by an external agency and they gets updated to our database automatically. Our website picks up random/new news and displays them. We are looking at adding a "Subscribe via RSS" button to our website.
If you have the data in your database, creating one yourself is fairly straight forward - there's a simple tutorial here.
Once you've set up a feed, in the <head> of your page, you put text like:
<link rel="alternate" title="RSS Feed"
href="http://www.example.com/rss-feed/latest/" type="application/rss+xml" />
This allows the feed to be "auto-discovered" by your user's browser (e.g. the RSS icon appears in the address bar in FF).
Here's an article that discusses various webscrapers that will generate feeds: http://www.masternewmedia.org/news/2006/03/09/how_to_create_a_rss.htm
If you don't care to click through, here are the services the author discusses:
http://www.feedyes.com/
http://www.feed43.com/
http://www.feedfire.com/site/index.html
Other webscrapers suggested in the other answers:
http://page2rss.com/
http://www.dapper.net/
However, you're probably better off generating the feeds yourself from the info in the DB.
Your question is a little difficult to understand. Are you trying to generate the RSS for others to consume, or are you trying to consume someone else's RSS?
If you are trying to generate your RSS feed for others to consume you will need to read the spec:
http://cyber.law.harvard.edu/rss/rss.html
If you are trying to consume it, that link will also help. Then you'll need to look into an XML / RSS parser.
If you can provide more details I can update my answer.
If you are not in a position to add an RSS feed to the existing site, see Page2Rss as an intermediate solution.
Might Dapper be of some use? You just need to set up which bits of your news feed to scour and voila, instant rss without having to touch any code...
Actually this is very doable with Yahoo! Pipes. Assuming that 1) your page is under 200k, 2) your robots.txt file does not disallow Pipes, and 3) your news feed has a unique ID, like so:
<ul id="newsfeed">
... you could use the Fetch Page module, trim it to just the items inside the news feed, loop though each list item, and use an Item Builder module to mangle the relevant bits as a proper RSS feed. Then, in the head of your document, you'd put in an RSS link, like so:
<link rel="alternate" type="application/atom+xml" title="News Feed" href="http://pipes.yahoo.com/your_pipe_id" />
This is of course completely ass-backwards, but would work for a quick fix, or in situations where you had no control over the body of the page.
Write a webhandler that exposes the content of the database as an RSS feed.
You either need to roll your own, or get a service that is a screen scraper.
After you have created your feed, you can use something like Feedburner to disseminate it.
If you happen to be using ASP.NET, you might want to check out the ASP.NET RSS Toolkit. It's useful for both generating and consuming feeds.