Any simple code snipped to GET the number of subscribers to any feed URL?
Thanks!
First off, I'll start by saying there is no easy way to do this. You, however, do have several options.
Option 1: Use feedburner. These statistics are not 100% correct, but its by far the least painful method, but you can only use it for future and not backwards to see how many people are already subscribed.
Option 2: Use Google Webmaster to calculate the number of subscribers.
Option 3: I found this perl script on rsslib.com that parses your server logs to figure out the number of subscribers
When you are using services like Feedburner, you can easily see the number of subscribers. If you are simply hosting the RSS feed yourself it will be pretty hard to find out returning visitors - you would need to include some kind of token identifying each user and match it to your server records.
I'd say you should use something like Feedburner and you are good to go.
Related
Is there a way to add a feed or something to a website to show the upcoming football games (who's playing and at what time)?
I was thinking something like this: http://www.bbc.com/sport/football/fixtures
I think they have an RSS feed but I can't find how to utilise it. Is this even the right thing? I've never used any sort of feeds before.
I have found this: market.mashape.com/heisenbug/champions-league-live-scores I'm not sure if it displays the upcoming matches or not, but it's the closest thing I've found. Most of the sports APIs I've found seem to charge quite a lot per month to use. This one has a free version, but I don't fully understand it. It says 50 free per month, but 50 free what? Requests? If so, is it one 'request' per update (which is every 10mins with this plan)? Then it would only last just over 8hrs??? market.mashape.com/heisenbug/champions-league-live-scores/
Thanks
I did found two webpages which supplies the information you need. Since you didn't add source code, I think this answer will suit better for you.
First, enter to ScoresPro and pick any of the available sports rss feeds.
For this example I select Soccer.
Later, enter to Feedwind, paste the feed URL and press ENTER.
This is the result.
It is that we can add a url to digg.com reader to add them to one's reading list. Once we submit the link they(digg) find several links which points to the feed map(.xml or from feedburner etc). How do they do this ?
Feed discovery is step 1.
Step 2 is probably to have a "reverse" discovery mechanism. Each feed (RSS or Atom) as a link to an HTML representation. So, even if the other discovery part is missing you can still find a feed URL from a site if you have that 2nd relation... Combine that with user data (number of subscribers... etc) and you get a pretty good solution!
Step 3 is "dumb" search: each story in a feed has a link to domain. The feed itself has a domain. If they both match, you likely have found the feed you're looking for!
There are probably other techniques too :)
I think the question has been answered here before,but i could not find the desired topic.I am a newbie in web scraping.I have to develop a script that will take all the google search result for a specific name.Then it will grab the related data against that name and if there is found more than one,the data will be grouped according to their names.
All I know is that,google has some kind of restriction on scraping.They provide a custom search api.I still did not use that api,but hoping to get all the resulted links corresponding to a query from that api. But, could not understand what will be the ideal process to do the scraping of the information from that links.Any tutorial link or suggestion is very much appreciated.
You should have provided a bit more what you have been doing, it does not sound like you even tried to solve it yourself.
Anyway, if you are still on it:
You can scrape Google through two ways, one is allowed one is not allowed.
a) Use their API, you can get around 2k results a day.
You can up it to around 3k a day for 2000 USD/year. You can up it more by getting in contact with them directly.
You will not be able to get accurate ranking positions from this method, if you only need a lower number of requests and are mainly interested in getting some websites according to a keyword that's the choice.
Starting point would be here: https://code.google.com/apis/console/
b) You can scrape the real search results
That's the only way to get the true ranking positions, for SEO purposes or to track website positions. Also it allows to get a large amount of results, if done right.
You can Google for code, the most advanced free (PHP) code I know is at http://scraping.compunect.com
However, there are other projects and code snippets.
You can start off at 300-500 requests per day and this can be multiplied by multiple IPs. Look at the linked article if you want to go that route, it explains it in more details and is quite accurate.
That said, if you choose route b) you break Googles terms, so either do not accept them or make sure you are not detected. If Google detects you, your script will be banned by IP/captcha. Not getting detected should be a priority.
I'll be working on a project that will require a live output of a number of tweets users have hash tagged on Twitter as well as their tweets. Something along the lines of MTV's Twitter Tracker: http://vma-twittertracker.mtv.com/live/#buzz.
What intrigued me about this site is how can they constantly make API calls to Twitter without breaching the request limit?
I'd appreciate if anyone could guide me on the most effective way to accomplish this. From the research I've carried out thus far, I presume I will need to use Twitter's Streaming API.
Since there is a chance that the number of tweets output to my page could be in their thousands (AJAX loaded) along with stats on number of retweets/favourites, what would be the most scalable approach within my .NET site? Any examples or guidance would be appreciated.
Check out Linq2Twitter. It is a great wrapper around the Twitter API, and provides two mechanisms that will help you:
There is a search function that allows you to search for hash tags, etc, which will limit the amount of data you are getting back
You have the option to specify getting all the data since a certain tweet ID. You can therefore incrementally search the feed by performing searches and searching, in subsequent calls, from the ID you left off on.
I have used this many times to search the public feed and have not had any issues to date. I think the search function is key not requesting too much. Good luck!
you can look into Storm framework. Below are few links for further reference:-
http://storm-project.net/
https://github.com/nathanmarz/storm
Thanks for all your responses.
It looks like sites such that display a lot of Twitter stats/data use third party approved providers that have direct access to Twitter's Firehose API.
I have managed to get in contact with an approved provider to supply us with the feeds of data required (and it ain't cheap!).
What's the best way to track RSS subscribers reliably without using Feedburner? Some of the obvious approaches like tracking by IP or by the number of hits have some fata flaws. IP addresses can change with each request or multiple users can use the same IP. Also, feed readers can request a feed multiple times per day or even hour. Both problems make it really hard to get reliable stats on unique subscribers.
I've read articles by both Leo Notenboom and Tim Bray on the topic, but none of their suggestions seems to really solve how to track subscribers in an accurate and reliable way. Leo suggests having a unique ID generated programatically to be appended to the RSS feed URL for each time the referring page is loaded. Tim advocates having RSS readers generate a unique hashtag and also has suggestions ranging from tracking the referrers to using cookies. A unique URL would be reliable, but it has two flaws: It's not a user-friendly URL and it creates duplicate content for SEO. Are there any other reliable methods of tracking RSS subscribers? How does Feedburner estimate subscribers?
There isn't really a standard way to do this. Subscriber counting is always unreliable but you can get good estimates with it.
Here's how Google does it (source):
Subscribers counts are calculated by matching IP address and feed reader
combinations, then using our detailed understanding of the multitude of
readers, aggregators, and bots on the market to make additional inferences.
Of course part of this is easy for Google, as they can first calculate how many Google Reader users are subscribed to the feed in question. After that they use IP address matching also, and that's what you should use too.
You could calculate individual IP addresses (i.e. unique) from the web-servers logs, but that would count 10 people as 1 if they all use the same address. That's why you should inspect the HTTP-headers which are sent by the client, more specifically header fields HTTP_X_FORWARDED_FOR and HTTP_VIA. You could use the HTTP_VIA address as the "main" address, and then calculate how many unique HTTP_X_FORWARDED_FOR addresses are subscribed to the feed. If the subscriber doesn't have these proxy-added fields, then it's counted as a unique IP address. These should be handled in the code that generates the feed. You could also add a GeoIP lookup for the IP's and store everything to a database. This would allow you to see which country has the most subscribers to your feed.
This has it's problems too. All proxies don't use these fields and it doesn't fix the problem of calculating subscribers behind NAT gateways. It is however a good estimate. Besides, you are probably more interested in the order of magnitude rather than the exact count of subscribers, aren't you? If the counter says that you have 5989 subscribers you probably have more subscribers as the counter gives you the lower bound.
Standard and Reliable are not exactly word in RSS dictionary :-) Got to remember that the thing doesn't even have standard XSD after how many years ? If by tracking you mean the "count" there are a few things you can do and the tactics depend on the purpose i.e. are demonstrating a big number or small number? It is a marketing thing so you have to define your goals :-)
You may have to classify IP numbers for a start - to have the basic collection of big / corporate / umbrella IP numbers. For them, you can use referrer as a reasonable filtering criteria and count everything else as unique unless proven otherwise. Vast majority of IP numbers remain stable for about 2 days but again it always good to use basic referrer logic as a filter for people who just keep "clicking" so to speak.
Then you need a decent list of aggregators and a classification on how they process URLs and if they obscure end readers completely then you need either published or inferred averages - it's always fair game to use equitable distribution of an average count. Using cookies may help to collect aggregator IPs and differentiate between automated agents and individuals.
One very important thing is to keep in mind that you can't use just one method and expect it to be a silver bullet - you need to use these 3-4 aspects at the same time plus basic statistical reasoning.
You could query your web server logs for traffic to your RSS feed, perhaps filter it by IP to get the number of uniques.
The problem is, that would rely on folks checking the feed daily. The frequency of hits to your RSS feed by one individual could vary day to do and the number could be lower.
If you configure your RSS feed to require some kind of authentication, you can do user-based metrics instead of ip-based metrics. Although this would be a technically-correct solution, getting people to opt into an authenticated blog in anything other than an Intranet scenario is a stretch.