Twitter REST API vs Streaming API - how does it impacts server side - http

There is an application that should read user tweets for every registered user, process them and store data for future usage.
It can reach Twitter 2 ways: either REST API (poll twitter every x mins), or use its Streaming API to get tweets delivered.
Besides completely different implementations on server side I wonder what are other impacts on server side?
Say application has tousands of users. Is it better to build kind of queue and poll twitter for each user (the simplest scenario), or is it better to use Streaming API and keep HTTP connection open for each user? I'm a bit worried about the latter as it'd require keeping tousands of connections open all the time. Are there any drawbacks of that I'm not aware of? If I'd like to deploy my app on Heroku or on EC2 instance, would it be ok or are there any limits?
How it is done in other apps that constantly need getting data for each user?

Related

Is there a way to prevent content caching or scraping from an API?

Imagine the following situation. I have an API and a developer builds an application that retrieves new content from it on a daily base. She stores this content and provides this data to all the instances of an app she developed. In this way these apps do not have to call the API directly.
Is there a way to prevent this and force the apps (and therefore the end users) to use the API and not only the application on the server.
I found many questions about how to cache API data but not how to prevent that. I am fairly new to this, so maybe I am overlooking something or maybe it is not possible to prevent this.
Thank you in advance!
Assuming you are using Apigee for API-management, you have some options. First, consider the options available to you contractually, if this is that sort of business relationship and you can impose certain API behavior with a business partner through a contract.
Separate from the legal side of things, we remember that you control your API and the credentials you issue for use by your API clients. You cannot though control, practically, what a client developer does with the credentials you issue: she could promise to embed the credentials in the mobile apps' API client, but change her mind and use it centrally, and then design her mobile client to call into her central cache. If though you really insist that only mobile app clients should be calling your API and not a hub/cache server, then you could consider applying constraint policies on your API (within the Apigee proxy, such as Access Control). For instance, you could blacklist your partner's hub/cache server IP address, although that is weak security at best. Or, you could apply a constraint that only clients with certain identifying User-Agent strings (mobile OS, client) are allowed to connect to your API. Or use GeoIP filtering to allow only clients from certain regions, if that applies to your use-case.
Finally, depending on the data model, you might be able to rate-limit such that a bulk cache becomes impractical: if your edge-client use-cases is to fetch a single record, but a cache would have to hold thousands of records, then you could impose a per-client rate limit (Quota policy) which is no bother to individual mobile clients, but makes the work of a hub/cache server untenable.

Best strategy to develop back end of an app with large userbase, taking into account limitations of bandwidth, concurrent connections etc.?

I am developing an Android app which basically does this: On the landing(home) page it shows a couple of words. These words need to be updated on daily basis. Secondly, there is an 'experiences' tab in which a list of user experiences (around 500) shows up with their profile pic, description,etc.
This basic app is expected to get around 1 million users daily who will open the app daily at least once to see those couple of words. Many may occasionally open up the experiences section.
Thirdly, the app needs to have a push notification feature.
I am planning to purchase a managed wordpress hosting, set up a website, and add a post each day with those couple of words, use the JSON-API to extract those words and display them on app's home page. Similarly for the experiences, I will add each as a wordpress post and extract them from the Wordpress database. The reason I am choosing wordpress is that it has ready made interfaces for data entry which will save my time and effort.
But I am stuck on this: will the wordpress DB be able to handle such large amount of queries ? With such a large userbase and spiky traffic, I suspect I might cross the max. concurrent connections limit.
What's the best strategy in my case ? Should I use WP, or use firebase or any other service ? I need to make sure the scheme is cost effective also.
My app is basically very similar to this one:
https://play.google.com/store/apps/details?id=com.ekaum.ekaum
For push notifications, I am planning to use third party services.
Kindly suggest the best strategy I should go with for designing the back end of this app.
Thanks to everyone out there in advance who are willing to help me in this.
I have never used Wordpress, so I don't know if or how it could handle that load.
You can still use WP for data entry, and write a scheduled function that would use WP's JSON API to copy that data into Firebase.
RTDB-vs-Firestore scalability states that RTDB can handle 200 thousand concurrent connections and Firestore 1 million concurrent connections.
However, if I get it right, your app doesn't need connections to be active (i.e. receive real-time updates). You can get your data once, then close the connection.
For RTDB, Enabling Offline Capabilities on Android states that
On Android, Firebase automatically manages connection state to reduce bandwidth and battery usage. When a client has no active listeners, no pending write or onDisconnect operations, and is not explicitly disconnected by the goOffline method, Firebase closes the connection after 60 seconds of inactivity.
So the connection should close by itself after 1 minute, if you remove your listeners, or you can force close it earlier using goOffline.
For Firestore, I don't know if it happens automatically, but you can do it manually.
In Firebase Pricing you can see that 100K Firestore document reads is $0.06. 1M reads (for the two words) should cost $0.6 plus some network traffic. In RTDB, the cost has to do with data bulk, so it requires some calculations, but it shouldn't be much. I am not familiar with the pricing small details, so you should do some more research.
In the app you mentioned, the experiences don't seem to change very often. You might want to try to build your own caching manually, and add the required versioning info in the daily data.
Edit:
It would possibly be more efficient and less costly if you used Firebase Hosting, instead of RTDB/Firestore directly. See Serve dynamic content and host microservices with Cloud Functions and Manage cache behavior.
In short, you create a HTTP function that reads your database and returns the data you need. You configure hosting to call that function, and configure the cache such that subsequent requests are served the cached result via hosting (without extra function invocations).

Firebase connection count with angular bindings?

I've read quite a few posts (including the firebase.com website) on Firebase connections. The website says that one connection is equivalent to approximately 1400 visiting users per month. And this makes sense to me given a scenario where the client makes a quick connection to the Firebase server, pulls down some data, and then closes the connection. However, if I'm using angular bindings (via angularfire), wouldn't each client visit (in the event the user stays on the site for a period of time) be a connection? In this example having 100 users (each of which is making use of firebase angular bindings) connecting to the site at the same time would be 100 connections. If I opted not to use angular bindings, that number could be (in a theoretical sense) 0 if all the clients already made their requests for data and were just idling.
Do I understand this properly?
AngularFire is built on top of Firebase's regular JavaScript/Web SDK. The connection count is fundamentally the same between them: if a 100 users are using your application at the same time and you are synchronizing data for each of them, you will have 100 concurrent connections at that time.
The statement that one concurrent connection is the equivalent of about 1400 visits per month is based on the extensive experience that the Firebase people have with how long the average connection lasts. As Andrew Lee stated in this answer: most developers vastly over-estimate the number of concurrent connections they will have.
As said: AngularFire fundamentally behaves the same as Firebase's JavaScript API (because it is built on top of that). Both libraries keep an open connection for a user, so that they can synchronize any changes that occur between the connected users. You can manually drop such a connection by calling goOffLine and then re-instate it with goOnline. Whether that is a good approach is largely dependent on the type of application you're building.
Two examples:
There recently was someone who was building a word game. He used Firebase to store the final score for each game. In his case explicitly managing the connections makes sense, because the connection is only needed for a relatively short time when compared to the time the application is active.
The "hello world" for Firebase programming is a chat application. In such an application it doesn't make a lot of sense to manage the connections yourself. So briefly connect every 15 seconds and then disconnect again. If you do this, you're essentially reverting to polling for updates. Doing so will lose you one of the bigger benefits of using Firebase: it automatically synchronizes data to connected clients.
So only you can decide whether explicit connection management is best for you application. I'd recommend starting without it (it's simpler) and first testing your application on a smaller scale to see how actual usage holds up to your expectation.

How to Design a Database Monitoring Application

I'm designing a database monitoring application. Basically, the database will be hosted in the cloud and record-level access to it will be provided via custom written clients for Windows, iOS, Android etc. The basic scenario can be implemented via web services (ASP.NET WebAPI). For example, the client will make a GET request to the web service to fetch an entry. However, one of the requirements is that the client should automatically refresh UI, in case another user (using a different instance of the client) updates the same record AND the auto-refresh needs to happen under a second of record being updated - so that info is always up-to-date.
Polling could be an option but the active clients could number in hundreds of thousands, so I'm looking for a more robust and lightweight (on server) solution. I'm versed in .NET and C++/Windows and I could roll-out a complete solution in C++/Windows using IO Completion Ports but feel like that would be an overkill and require too much development time. Looked into ASP.NET WebAPI but not being able to send out notifications is its limitation. Are there any frameworks/technologies in Windows ecosystem that can address this scenario and scale easily as well? Any good options outside windows ecosystem e.g. node.js?
You did not specify a database that can be used so if you are able to use MSSQL Server, you may want to lookup SQL Dependency feature. IF configured and used correctly, you will be notified if there are any changes in the database.
Pair this with SignalR or any real-time front-end framework of your choice and you'll have real-time updates as you described.
One catch though is that SQL Dependency only tells you that something changed. Whatever it was, you are responsible to track which record it is. That adds an extra layer of difficulty but is much better than polling.
You may want to search through the sqldependency tag here at SO to go from here to where you want your app to be.
My first thought was to have webservice call that "stays alive" or the html5 protocol called WebSockets. You can maintain lots of connections but hundreds of thousands seems too large. Therefore the webservice needs to have a way to contact the clients with stateless connections. So build a webservice in the client that the webservices server can communicate with. This may be an issue due to firewall issues.
If firewalls are not an issue then you may not need a webservice in the client. You can instead implement a server socket on the client.
For mobile clients, if implementing a server socket is not a possibility then use push notifications. Perhaps look at https://stackoverflow.com/a/6676586/4350148 for a similar issue.
Finally you may want to consider a content delivery network.
One last point is that hopefully you don't need to contact all 100000 users within 1 second. I am assuming that with so many users you have quite a few servers.
Take a look at Maximum concurrent Socket.IO connections regarding the max number of open websocket connections;
Also consider whether your estimate of on the order of 100000 of simultaneous users is accurate.

how to add simple security and measure performance of web service

So I'm making a app for a bank, but it doesnt manage very important data. I have two problems, it will run over a VERY large LAN network protected by all kinds of security(antivirus and firewalls) and the bandwidth in certain regions is as low as 56kbps.(Its a desktop app with a web server backend connected by web services)
From the security point of view all I want is to prevent someone from executing the web services from some other source or app results in change in the database . I'm thinking of each desktop app installed with a install code, this will be hashed and required as a parameter for every function call and will act as an authentication ticket? Is this good enough? Are they better SIMPLER means?
For performance, how do I measure or know if the web service will send and receive data at a decent rate?
Thanks
Gideon
Assuming you are on a windows domain. You could configure the server to use windows authentication and restrict the users which can access the web service.
For performance measuring - asp.net will show you a sample request and response if you hit the web service from a browser, you can work out the site of a message and use the bandwidth to calculate how long it should take. You could also call the web service and use the stopwatch class to measure the time it takes.
I would prefer assigning usernames and passwords. Either way, the user can disclose their code to someone else. And either the user or a recipient can access the app using other programs (there's no way to prevent someone extracting an install code). But if you assign usernames, they are more likely to take personal responsibility for what happens using the authorization.

Resources