Could someone please help me understand what the “link” tags are used for within an ATOM feed?
Do they point to a physical resource, or just like an identifier?
What is the difference between link URLs in the beginning and for each “entry” block?
Is it compulsory to have this linkURL?
Any information regarding this would be much appreciated!
I have provided an example snippet of code below.
<?xml version="1.0"?>
<atom:feed>
<link rel="self" href="http://publisher.example.com/happycats.xml" />
<updated>2008-08-11T02:15:01Z</updated>
<!-- Example of a full entry. -->
<entry>
<title>Heathcliff</title>
<link href="http://publisher.example.com/happycat25.xml" />
<id>http://publisher.example.com/happycat25.xml</id>
<updated>2008-08-11T02:15:01Z</updated>
<content>
What a happy cat. Full content goes here.
</content>
</entry>
Atom is a syndication format that can be used by applications employing ReSTful communication through hypermedia. It's very good for publication of feeds, which is not only for blogs but can also be used in distributed applications (for example, for publishing events to other parts of a system) to utilise the benefits of HTTP (caching, scalability, etc) and the decoupling involved in using REST.
elements in Atom are called link relations and can indicate to the consumer of the feed a number of things:
rel="self" normally indicates that the current element (in your case, the feed itself) represents an actual resource, and this is the URI for that resource
rel="via" can identify the original source of the information in the feed or the entry within the feed
rel="alternate" specifies a link to an alternative representation of the same resource (feed or entry)
rel="enclosure" can mean that the linked to resource is intended to be downloaded and cached, as it may be large
rel="related" indicates the link is related to the current feed or entry in some way
A provider of ATOM could also specify their own reasons for a link to appear, and provide a custom rel value
By providing links to related resources in this way you can decouple systems - the only URI the system needs to know about is 1 entry point, and from then on other actions are provided to the consumer via these link relations. The links effectively tell the consumer that they can use these links to either take actions on or retrieve data for the entry they are related to.
A great book I can recommend for REST which goes into depth about Atom is REST in Practice by Jim Webber, Savas Parastatidis and Ian Robinson.
Related
I know that Google Dictionary was discontinued in 2011, but the dictionary information and definitions are still available through google search results:
Does anyone know whether this information can be accessed through the Custom Search API or the Translate API?
I found this related question (but sadly without a satisfying answer).
I also needed Google Dictionary API for my project, it was not present so I decided to create one.
I scrapped the WebPage for the url https://www.google.com/#q=define+term where term is any word you want to get meaning of, and created the API, you can find it here Google Dictionary API.
How to use
The basic syntax of a URL request to the API is shown below:
https://api.dictionaryapi.dev/api/v2/entries/<--language_code-->/<--word-->
As an example, to get definition of English word hello, you can send request to:
https://api.dictionaryapi.dev/api/v2/entries/en/hello
The API also provides other meanings of the word, example sentences, and synonyms, if any.
If you want me to include any other details, please comment and I will happily extend the API to cover your needs.
In case you wish to see the code, it is on github.
Google Dictionary's content is licenced from Oxford Dictionaries' Lexico. Their API can be accessed from here.
Note their free access platform ("prototype") has a number of limitations:
1000 requests per month
Limited data access
Limited request rate
It doesn't look promising from the API Explorer
https://developers.google.com/apis-explorer/#search/dictionary/
I've been trying to wrap my head around authoring profiles in FHIR. The trouble I'm having is around the use of using extensions.
The documentation talks about extensions as if they are simply just there to extend existing elements of the resource which a profile belongs to, this is kind of confirmed to me when using forge because I can add new elements which don't have extensions.
It feels very foreign to me as in our proprietary storage system, we have the equivalent of profiles, and they have properties about them (which I think are similar to elements in fhir), however a property is only designed to store one type of thing; e.g. you might have a patient profile that has the properties DOB, ethniticy, identifier, etc. I don't really understand what profiles are for in the context of fhir, are they similar to my properties? Can I use the to limit the datatype that a profile instance can have for a particular element?
Is there any better documentation than the spec? I'm finding it really hard to get to grips with.
FHIR extensions are used to be able to enter extra data elements, when there's no field for that in the standard definition. Mother's maiden name is an example of that for the Patient resource.
The use of an extension is a standard FHIR mechanism and will always look like this:
<extension>
<url value="http://hl7.org/fhir/StructureDefinition/patient-mothersMaidenName"/>
<valueString value="Williams"/>
</extension>
The url is the canonical url for the definition of the extension, which is a StructureDefinition resource defining the extension and the datatype(s) of the value.
You can have extensions on every level of a resource/datatype.
Since profiling is a very overloaded term, it is hard for me to understand what you're saying about profiles and properties in your proprietary system, or how that relates to your question. But in general, FHIR profiling is needed and used to
be able to add data when there's no data field for it in the specification (i.e. an extension of the specs)
constrain the specification in places where you need to be more strict, for example to make an optional field mandatory (i.e. a constraint on the specs, also called a profile)
I recommend browsing through some of the profiles and their descriptions on the Simplifier repository to get an idea of why people are creating profiles on FHIR.
I'm developing a software, which is going to provide in-deep information about url's.
While the get-params are simple, I'm having trouble with the hash.
At first it was used to mark places in the document to navigate to, but we're past that now. I've seen JS engines using it to store params similar to the get strings.
So, here's my question: is everything that comes after a hash free game, or are there any conventions about what it should look like?
Try these sites it could help. Fragment Identifier, Wikipedia or Pound Sign, Google
It's got a list of examples you could use.
It all depends on what you need. Hashes are used in modern web applications that make use of asynchronous calls to the server using ajax. This e.g. allows the user to copy the link and receive the same content after pasting (actions taken are put into hash which changes the url which otherwise would remain static).
You want to read http://www.jenitennison.com/blog/node/154
I've been experimenting with writing my own RSS reader. I can handle the "parse XML" bit. The thing I'm getting stuck on is "How do I fetch older posts?"
Most RSS feeds only list the 10-25 most recent items in their XML file. How do I get ALL the items in a feed, and not just the most recent ones?
The only solution I could find was using the "unofficial" Google Reader API, which would be something like
http://www.google.com/reader/atom/feed/http://fskrealityguide.blogspot.com/feeds/posts/default?n=1000
I don't want to make my application dependent on Google Reader.
Is there any better way? I noticed that on Blogger, I can do "?start-index=1&max-results=1000", and on WordPress I can do "?paged=5". Is there any general way to fetch an RSS feed so that it gives me everything, and not just the most recent items?
RSS/Atom feeds does not allow for historic information to be retrieved. It is up to the publisher of the feed to provide it if they want such as in the blogger or wordpress examples you gave above.
The only reason that Google Reader has more information is that it remembered it from when it came up the first time.
There is some information on something like this talked about as an extension to the ATOM protocol, but I don't know if it is actually implemented anywhere.
As the other replies here mentioned, a feed may not provide archival data but historical items may be available from another source.
Archive.org’s Wayback Machine has an API to access historical content, including RSS feeds (if their bots have downloaded it). I’ve created the web tool Backfeed that uses this API to regenerate a feed containing concatenated historical items. If you'd like to discuss the implementation in detail please get in touch.
In my experience with RSS, the feed is compiled by the last X items where X is a variable. Certain Feeds may have the full list, but for bandwidth sake most places are likely limiting to just the last few items.
The likely answer for google reader having the old info, is that it is storing it on its side for users later.
Further to what David Dean said the RSS/Atom feeds will only contain what the publisher of the feed has up at that moment and someone would need to be actively collecting this informaton in order to have any historical information. Basically Google Reader was doing this for free and when you interacted with it you could retrieve this stored informaton from the google database servers.
Now that they have retired the service, to my knowledge you have two choices. You either have to start collection of this information from your feeds of interest and store the data using XML or some such, or you could pay for this data from one of the companies who sell this type of archived feed information.
I hope this information helps somebody.
Seán
Another potential solution that might not have been available when the question was originally asked and shouldn't require any specific service.
Find the URL of the RSS feed you want and use waybackpack to get the archived urls for that feed.
Use FeedReader or a similar library to pull down the archived RSS feed.
Take the URLs from each feed and scrape them as you wish. If you're going way back in time it's possible there might be some dead links.
All previous answers more or less relied on existing services to still have a copy of that feed or the feed engine to be able to provide older items dynamically.
There's though another, admittedly pro-active and rather theoretical way to do so: Let your feedreader use a caching proxy which semantically understands RSS and/or Atom feeds and caches them on a per-item base up to as many items as you configure.
If the feedreader doesn't poll feeds regularily, the proxy could fetch known feeds time-based on its own to not miss an item in highly volatile feeds like the one from User Friendly which has only one item and changes every day (or at least used to do so). Hence if the feedreadere.g. crashed or lost network connection while you are away for a few days, you might loose items in your feedreader's cache. Having the proxy to fetch those feeds regularily (e.g. from a data center instead from at home or on a server instead of a laptop) allows you to easily run the feedreader only then and when without loosing items which were posted after your feedreader fetched feeds the last time but rotated out again before you fetch them the next time.
I call that concept a Semantic Feed Proxy and I've implemented a proof of concept implementation called sfp. It's though not much more than a proof of concept and I haven't developed it further. (So I'd be happy about hints to projects with similar ideas or purposes. :-)
Why does this problem exist?
Most RSS readers need to import feeds through a live URL, which makes things harder for sites that are unindexed on Wayback Machine.
The reason why Wayback Machine feeds can be imported is that the reader can regularly poll the server for updates according to its defined TTL configuration. The reader compares the current datetime with the RSS feed posts pubDate or lastBuildDate keys in the XML response. We can't hack the machine datetime to work around the datetime resolution because the current datetime is fetched live.
I've outlined an alternative solution without Wayback below. Unfortunately, I have not been able to find a universal solution for all feed sources.
Alternative Solution(s)
In my experience, NOT ALL feeds are partial though. The XML doesn't have to specify the datetime of each post. This means the RSS Reader doesn't have a datetime to filter the feed with. An example of this feed type can be found here.
This kind of reading experience is useful when chronological order is irrelevant, and the content doesn't need to be sorted. This approach is useful for sites where ALL the content is valuable, and the linked Essays of Paul Graham is a good example.
If the site has a generic, non-chronological feed option, subscribe to that RSS instead (the preferred option).
Download the linked timestamped .rss file, strip datetimes and host the file on your own server. Note, we can implement this via an AWS Lambda.
Set up a server that fetches the RSS from live.
Strip the pubDate tags from the XML file on fetch.
Host the modified RSS on your own server.
Note
These are suboptimal solutions due to loss of orders, however, I wanted to provide a potential alternative to WaybackMachine.
In addition, some existing answers require advanced SysDesign workarounds, more prework and in some cases are outdated (Google Reader is shut down). I hope it's helpful for those who really need a solution for a complete feed list. Constructing new RSS feeds is not too hard from the original RSS file.
I am working on a project that requires reliable access to historic feed entries which are not necessarily available in the current feed of the website. I have found several ways to access such data, but none of them give me all the characteristics I need.
Look at this as a brainstorm. I will tell you how much I have found and you can contribute if you have any other ideas.
Google AJAX Feed API - will limit you to 250 items
Unofficial Google Reader API - Perfect but unofficial and therefore unreliable (and perhaps quasi-illegal?). Also, the authentication seems to be tricky.
Spinn3r - Costs a lot of money
Spidering the internet archive at the site of the feed - Lots of complexity, spotty coverage, only useful as a last resort
Yahoo! Feed API or Yahoo! Search BOSS - The first looks more like an aggregator, meaning I'd need a different registration for each feed and the second should give more access to Yahoo's data but I can find no mention of feeds.
(thanks to Lou Franco) Bloglines Sync API - Besides the problem of needing an account and being designed more as an aggregator, it does not have a way to add feeds to the account. So no retrieval of arbitrary feeds. You need to manually add them through the reader first.
Other search engines/blog search/whatever?
This is a really irritating problem as we are talking about semantic information that was once out there, is still (usually) valid, yet is difficult to access reliably, freely and without limits. Anybody know any alternative sources for feed entry goodness?
Bloglines has an API to sync accounts
http://www.bloglines.com/services/api/sync
You have to make an account, subscribe to the feed you want to download, but then then you can download based on Date, which can be way in the past. Not sure of the terms.
The best answer I've found so far, is this: Google reader's unofficial API turns out to have a public access point for their feeds, which means there is no authentication needed. Use is as follows:
http://www.google.com/reader/public/atom/feed/{your feed uri here}?n=1000
replace the text in the squigglies (including the squigglies themselves) with the feed URI you're interested in. More information about the precise arguments can be found here:
http://blog.martindoms.com/2009/10/16/using-the-google-reader-api-part-2/
but remember to use the /public/ url if you don't want to mess with authentication