How to make a feed generator for any webpages? - rss

Feedity makes feeds address for any webpages and I would like to make an application like this.
How did they implement it?

This looks a little like yql, which can be used for something similar. Given that HTML can be XML, and RSS feeds are XML as well, this should not be too difficult to implement. If I were to approach a custom implementation of this, I would probably attempt the following:
Pull in html from the requested url
Cleanse the HTML so it could be converted to XML (or use something like the HTML Agility Pack)
Use XSLT to translate the XML document into an RSS feed based on a set of rules (that extract links, etc.)
All of that having been said, if I could use something like yql instead, I would definitely do that, as there can be a lot of pitfalls in the custom implementation (bad html, changing url's, defining rules, caching, etc.)

Related

Get query string from Mediawiki page

Is there any way (like an undocumented magic word perhaps) to get the current query string (or full URL including query string) from within a Mediawiki template or Scribunto (Lua) module?
If this is an option, consider obtaining HTML content with API. This should be simpler than writing an extension. Of course this won't be a regular page, rather something composed client-side on blank article or server-side on non-wiki site. With Labeled Section Transclusion extension you mentioned this should work.
Alternatively, consider some server-side post processing on generated HTML. It should perform quite well as MediaWiki caches a lot.
AFAIK there is no magic word for checking query string and, IMO, this would be a very bad thing. Article source is like a model in MVC pattern — you shouldn't put presentation stuff there.

How do I create an html link that has a link name, the same as the URL address?

Is this the easiest way in an html doc to create a link to a page that has the same name as the url?
So basically it will say:
Please click the following link:
http://test.com.
That is all I want it to say.
The code I wrote for this is as follows:
http://test.com.
Or is there a more all inclusive way where you don't have to write the name of the url twice?
Obviously my code doesnt include the initial text, this is just for example purposes.
Unless you want to copy the URL from one place to another using JavaScript, you will have to write the URL twice.
I advise agains the JavaScript copying, because its performance and SEO costs are much worse than the cost of typing everything twice.
What you have got now is the easiest way.
If it's not an option for some reason you can use server side scripting to search the page content for URLs and wrap an <a> tag around them.
This will require some very complicated regex. Daring Fireball has a very good blog post instructing you how to do this, and explaining exactly why it's actually impossible for this to be perfectly reliable (which is probably why HTML doesn't allow it):
http://daringfireball.net/2010/07/improved_regex_for_matching_urls
I've done this sort of thing before (with emails actually) and it's very difficult and took years to get right. If at all possible, you should just do what you're already doing - manually type in the <a> tag yourself.
Alternatively, you could use something like smarty (for PHP. I don't know what the ASP equivalent would be) to write something along the lines of the following, to programatically generate the full <a> tag:
{link url='http://example.com'}
Why don't we just sidestep the issue by making our links more semantically-rich?
Instead of:
For more information on our delicious pizza, visit www.pizzasrawesome.com.
Use this:
Read more about our delicious pizza.

What formatting can RSS readers reliably interprete?

Im making a normal RSS feed for my website. I need to include simple html formatting in the description eg paragraphs, line breaks, lists, etc. To do this I need to wrap the description content as CDATA.
The issue with this is that when I validate my feed the content of the CDATA is ignored. So although the feed validates, I dont actually know if everything is ok or not.
How can I find out what markup will likely be read ok by the various RSS readers?
Can I use whatever markup I would happily put in a website? How about inline styles? Or is more like designing html emails? Thanks
RSS files are XML Formatted plain text, I think that's the only standard you can rely upon.
I think most Syndicators look like they're handling HTML in RSS as they simply download the linked article when you choose the header.
If you're looking to embed rich content, then you may well be better investigating Atom instead of RSS.
Have a look at this S/O question: Which is better for encoding HTML for RSS?

Why do Google News feeds have such strange structure?

I'm trying to incorporate a google news feed in my website (Using the built-in SimplePie functionality of WordPress).
However, the default feed gets rendered in a strange table structure. Sure enough, when I inspect the feed XML, I see that Google News has a whole bunch of table html as its 'description' element, complete with embedded styles, etc (See this example)- essentially dictating how the feed must be displayed, and not allowing for any effective css based customization.
This seems really dumb- can anyone help explain what is going on, or at least agree with me that this is just a terrible feed architecture?
Feeds often include html tags, as many (most?) readers will handle and use them, and that way the RSS provider can have some nice looking output in the reader, as you've guessed. (I prefer flagging it as CDATA unless it's proper xhtml, as it's not valid xml/rss otherwise). It's not in the original spirit of RSS perhapts, but the Google feed is just an extreme example of common practice. As per your problem, does strip_htmltags help (simplepie.org/wiki/reference/simplepie/strip_htmltags)?

Should I be using XML + Stylesheets vs. XHTML and CSS?

I have been developing web apps for a while now and for the past year I have been really exploring as many technologies as possible. I know some people are creating pages using XML and XSLT or maybe css style sheets; however, it seems to me that the trends are still not moving in direction. Plus it seems less functional/easy than XHTML/CSS based pages.
What are the benefits of using XML/XSLT, and is it ideal to start developing in that manor? Is there anything else new that is pulling ahead of the pack in regards of front end web development?
The reason I am bringing this stuff up is because it seems that many people are switching from XML as a datasource to JSON, which makes more sense as a datasource; however, XML is still functional as a markup language...
And on that note, why would I even want to use XSLT vs CSS for the XML pages if i were to start develop that way. It seems to me that they serve the same purpose except that XSLT looks like tag soup.
I hope this question makes sense....
XSLT can be useful if you have an XML data source that needs transforming into HTML. Otherwise you should be using HTML, CSS and jQuery for front-end development.
Right now, there is no reason to use XSLT at all. It's virtually incomprehensible compared to XML/XHTML, and offers no real advantage for you or your users.
As for using XML in lieu of (X)HTML, with the growing acceptance of the emerging HTML5 standard, I can't see why you'd give up canvas and the (eventually, they'll be good!) audio capabilities for XML. Even now, XML is nice for marking up documents, but for marking up a webpage, HTML is king – it's essentially XML tailor-made for the web.
There is no antagonism between XML/XSLT and XHTML/CSS, these are complementary technologies. Thus, in my web apps, XHTML pages are produced by mean of XML/XSLT (transformation occurs in client side).
You'd use XSLT to transform some XML document into XHTML. Then you'd use CSS to style the XHTML.
XSLT is for transformation of one XML format into another. The data stays the same, but the representation changes. There is even XSLT-FO, which transforms XML into other objects, like pdf.
Also note, XSLT can be used client-, or serverside. You can do XSLT transformation in the browser or with a simple handler on the server. Java-based nonsql data stores like existdb use XQUERY to transform database entries with XSLT to any other XML format, including XHTML.
Using XSLT to generate XHTML from simple XML documents basically gives you a templating engine.
Since browsers still lack XFORMS support, you can use javascript+XSLT to transform XFORMS into valid HTML.
JSON is used to serialize and deserialize objects and transport them, thus replacing XML as a transport format, more specifically as a AJAX query response, in rich internet applications.

Resources