Is it legal to store RSS feed data in my own server? - rss

There are lots of free rss feed api in the net. Most of them gives limited data(like latest 10 to 15). I want to store these data in my own server for later use.
Is this legal to do so?

Related

How do services like Jungle Scout and ASINspector provide Amazon review/rating data if Amazon doesn't provide that info through their API?

I'm trying to write a tool that looks up the average review score and number of ratings for a given Amazon product.
Unfortunately, Amazon seems to intentionally exclude those two things from their API, which has been the subject of many forum threads.
You can technically scrape a product page's HTML and get it, but Amazon will quickly notice that you're running a script and begin serving a CAPTCHA, furthering the idea that they don't want you to collect it.
But with all of that being the case, how do third-party services collect and serve that data? Are they violating Amazon's TOS and collecting it through shady means, or is there some kind of legitimate method that I'm not seeing?

Can the LinkedIn API be used to automate collecting data about people's current job, previous jobs and education?

I'm doing a machine learning project for my thesis at university. For this I require information about where people are currently working, where they have previously worked and what their education is. I require this data on a large scale, so collecting it by hand is not viable. Therefore, I wondered if it's possible to use the LinkedIn API to collect this kind of data for this purpose? In particular, I would need to collect this data from thousands of profiles for a given company, for about 50 different companies (i.e. on the order of 100,000 profiles). For instance, I would need the employment and educational history of some 2000 Microsoft employees, or 2000 Facebook employees, and so on, all exported into a spreadsheet file.
Can the LinkedIn API be used for this, or is there some other way of achieving this?

Can I save the Google Map API attributes for further search?

I am selling stuff online and I would like to geocode my customer delivery addresses before delivery to make the delivery address is correct to avoid wrong delivery. If I use Google Map API, after I query an address, can I save the returned attribute in the own storage (such as building and street names, lat/lon) so that I don't need to re-query every time? Some customers addresses are repeating or written in incorrect format. If I can search it from my own archieve before Google Map API query, it can save the amount of time/queries required?
The Terms of service allow a temporary caching up to 30 days with a purpose of improving the performance of your application. The permanent storage is prohibited.
For further details refer to section 10.5 of Terms of service:
No caching or storage. You will not pre-fetch, cache, index, or store any Content to be used outside the Service, except that you may store limited amounts of Content solely for the purpose of improving the performance of your Maps API Implementation due to network latency (and not for the purpose of preventing Google from accurately tracking usage), and only if such storage:
is temporary (and in no event more than 30 calendar days);
is secure;
does not manipulate or aggregate any part of the Content or Service; and
does not modify attribution in any way.

Putting records into the Elasticsearch index before the relational database

I have an application which consumes RSS feeds and makes them searchable by performing the following steps:
pulling article from the feed URL
storing that data in a relational DB
indexing the data in Elasticsearch
I'd like to reverse this process so that I can use the RSS River Elasticsearch plugin to pull data from feeds. However, this plugin integrates directly with Elasticsearch, bypassing my relational DB (which is a problem for other parts of the application which rely on each article having a record in the DB).
How can I have Elasticsearch notify the DB when a new article has been indexed (and de-indexed)?
Edit
Currently I'm using Ruby on Rails 4 with a PostgreSQL DB. RSS feeds are fetched in the background using Sidekiq to manage jobs. They go directly into PG and are then indexed by Elasticsearch. I'm using Chewy to provide an interface to the ES index. It doesn't support callbacks like I'm looking for (no Ruby library does afaik?).
Searching queries ES for matches then loads the records from PG to display results.
It sounds like you are looking for the sort of notification/trigger functionality described in this feature request. In the absence of that feature I think the approach suggested in that thread by the user "cravergara" is your best bet - that is, you can alter the RSS river Elasticsearch plugin to update your DB whenever an article is indexed.
That would handle the indexing requirement. To sync the de-indexing, you should make sure that any code that deletes your Elasticsearch documents also deletes the corresponding DB records.

Online service for buffering rss feed items

I am desperately looking for an online service for buffering rss feed items.
Basically, I have one rss feed that publishes approximately 40 items per hours; however the rss feed only exposes the latest 20. I would like to have access to a buffered rss feed that would for example expose the latest 250 items.
If such a service exists, and you know about it, let me know!
Ben.
Why not just consume the service and write it to DB? Then you have access as far back as you care to go. This is simple in just about every scripting language out there. And, as a bonus, it'll lessen the amount of http hits your site does (if you're displaying to the user from DB instead of on load from the RSS feed site)
Check out http://superfeedr.com (I'm the founder). We do not "cache" the feeds, but we poll them often enough that we should not miss any entry.
Also, check this CouchApp, as it can allow you to very easily store these updates in a couchdb app. You can be running in a few minutes, without even needing a server.

Resources