Is there a way to prevent content caching or scraping from an API? - web-scraping

Imagine the following situation. I have an API and a developer builds an application that retrieves new content from it on a daily base. She stores this content and provides this data to all the instances of an app she developed. In this way these apps do not have to call the API directly.
Is there a way to prevent this and force the apps (and therefore the end users) to use the API and not only the application on the server.
I found many questions about how to cache API data but not how to prevent that. I am fairly new to this, so maybe I am overlooking something or maybe it is not possible to prevent this.
Thank you in advance!

Assuming you are using Apigee for API-management, you have some options. First, consider the options available to you contractually, if this is that sort of business relationship and you can impose certain API behavior with a business partner through a contract.
Separate from the legal side of things, we remember that you control your API and the credentials you issue for use by your API clients. You cannot though control, practically, what a client developer does with the credentials you issue: she could promise to embed the credentials in the mobile apps' API client, but change her mind and use it centrally, and then design her mobile client to call into her central cache. If though you really insist that only mobile app clients should be calling your API and not a hub/cache server, then you could consider applying constraint policies on your API (within the Apigee proxy, such as Access Control). For instance, you could blacklist your partner's hub/cache server IP address, although that is weak security at best. Or, you could apply a constraint that only clients with certain identifying User-Agent strings (mobile OS, client) are allowed to connect to your API. Or use GeoIP filtering to allow only clients from certain regions, if that applies to your use-case.
Finally, depending on the data model, you might be able to rate-limit such that a bulk cache becomes impractical: if your edge-client use-cases is to fetch a single record, but a cache would have to hold thousands of records, then you could impose a per-client rate limit (Quota policy) which is no bother to individual mobile clients, but makes the work of a hub/cache server untenable.

Related

Add users for ASP.NET Core from internal website

Sorry no code here because I am looking for a better idea or if I am on the right track?
I have two websites, lets call them A and B.
A is a website exposed to the internet and only users with valid account can access.
B is a internal (intranet) website with (Windows authentication using Active directory). I want Application B (intranet) to create users for Application A.
Application A is using the inbuilt ASP.NET JWT token authentication.
My idea is to expose a Api on the extranet website (A) and let (B) access this API. I can use CORS to make sure only (B) has access to the end point but I am not sure if this is a good enough protection? We will perform security penetrations test from a third party company so this might fail the security test?
Or
I can use entity framework to a update the AspnetUsers table manually. Not idea if this is feasible or the right way or doing things.
Any other solution?
In my opinion, don't expose your internal obligations with external solutions like implementing APIs etc ...
Just share the database to be accessible for B. In this way, the server administration is the only security concern and nobody knows how you work. In addition, It's not important how you implement the user authentication for each one (whether Windows Authentication or JWT) and has an independent infrastructure.
They are multiple solution to this one problem. It then end it really depends on your specific criteria.
You could go with:
B (intranet) website, reaching into the database and creating user as needed.
A (internet) website, having an API exposing the necessary endpoint to create user.
A (internet) website, having data migration running every now and then to insert users.
But they all comes with there ups and downs, I'll try to break them down for you.
API solution
Ups:
Single responsibility, you have only one piece of code touching this database which makes it easier to mitigate side effect
it is "future proof" you could easily have more services using this api.
Downs:
Attack surface increased, the API is on a public so subject to 3rd parties trying to play with it.
Maintain API as the database model changes (one more piece to maintain)
Not the fastest solution to implement.
Database direct access
Ups:
Attack surface minimal.
Very quick to develop
Downs:
Database model has to be maintained twice
migration + deployment have to be coordinated, hard to maintain.
Make the system more error prone.
Migration on release
Ups:
Cheapest to develop
Highest performance on inserts
Downs:
Not flexible
Very slow for user
Many deployment
Manual work (will be costly over time)
In my opinion I suggest you go for the API, secure the API access with OAuth mechanism. It OAuth is too time consuming to put in place. Maybe you can try some easier Auth protocols.

How does it work to implement an API for Payments in separate ends of a project?

Alright, A friend and I are developing an App where I'm developing the back-end and he is developing the front-end. The project is separated into two repositories the front-end and the back-end, and we need to implement a payment API.
Now, since we're using the REST API Concept, we communicate both ends through JSON data.
My question is, when we're making the connection to the payment API, who needs to execute that request? The front-end or the back-end?
I know it's a silly question, but first timer here.
The backend will obviously process the payment, I'm not sure which payment API you're going to use. But depending on the API you go with, the implementation will vary. But the actual processing of the payment will be processed in the backend for sure.
It completely depends on the API.
In some cases, a payment can be accomplished via a secure web service call, which would be issued by your friend's REST service. The front end will still need to collect data (e.g. payment amount and card number) and may also need to collect additional information to satisfy the API (e.g. IP address or browser signature, for risk management purposes).
In other cases, the payment is sent directly to the service from the browser. The role of your application would be to render an iFrame housing a page that is reached via SSO. The back end may need to call a service to retrieve an SSO token, or may have to compute an SSO token using a shared key.
You should probably refer to the payment API's documentation. They often have very specific guidance which you must follow carefully in order to achieve payment card (PCI-DSS) compliance. There is nothing special about "payments" that says that allows StackOverflow users to guess anything about its API.

Sharing particular data between realms

I am about to start working on the back-end for a mobile app (initially iOS/Android, later also website) and I am thinking whether Realm could fulfill all my needs.
The basic idea is that there are two types of users - customers and service-providers. The customers send requests to the server once in a while and are subscribed (real-time) for any event that might occur in relation to this request in the future. Each service-provider is listening for specific requests from all customers and is the one who is going to trigger various events (send data) for each of those requests.
From the Realm docs, it is obvious that the real-time data sync is not going to be a problem. The thing I am concerned about is how to model the scenario (customer/service-provider) in the Realm 'world'. Based on what I read, it is preferred to have one realm per user. Therefore, I suppose the user will register and will be given a realm. Then whenever he makes a request, it is going to be stored in his realm. Now the question is how to model the service-provider. There are going to be various service-providers each responding (triggering various kinds of events up to one hour after request) to different kinds of requests. (Each user can send any request and therefore be served by any service-provider.)
I read a bit about that Realm supports data sharing among different realms which could be a partial solution for this problem, however I was not able to find if this 'sharing' could share only particular requests. (Meaning each service-provider will get only requests intended for him.)
My question is whether this scenario is doable using Realm?
This sounds like a perfect fit for Realm's server-side event-handling. Put simply, Realm offers the ability through our Node SDK to listen for changes across Realms on the server.
So in your example, where each mobile user would have their own Realm, the URL for this would be /~/myRealm in which the tilde represents the Realm user ID. The Node SDK event handling API allows you to register a JS function that will run in response to changes represented by a Regex pattern for Realm URLs. In this case you could use: ^/([0-9a-f]+)/myRealm so that any time any user's myRealm updated, the server could perform some logic.
In this manner, the server via the Node SDK is really a "super-user" or service-provider as you describe. When an event fires, the JS function that runs is provided the Realm that was updated and a list of indexes pertaining to the objects in the Realm that were inserted, deleted, or modified. You can then perform any logic in JS, such as using the changed data to call out to another API or opening the Realm in question or any other and writing changes which will get pushed back out to the respective clients.
The full server-side event handling is part of Realm Professional Edition, but we recently released another way to interact with this called Realm Functions. This provides the ability through the server's dashboard to create the same JS functions that will run in response to changes across Realms. The developer edition support 3 functions so you can try it out immediately!

How to Design a Database Monitoring Application

I'm designing a database monitoring application. Basically, the database will be hosted in the cloud and record-level access to it will be provided via custom written clients for Windows, iOS, Android etc. The basic scenario can be implemented via web services (ASP.NET WebAPI). For example, the client will make a GET request to the web service to fetch an entry. However, one of the requirements is that the client should automatically refresh UI, in case another user (using a different instance of the client) updates the same record AND the auto-refresh needs to happen under a second of record being updated - so that info is always up-to-date.
Polling could be an option but the active clients could number in hundreds of thousands, so I'm looking for a more robust and lightweight (on server) solution. I'm versed in .NET and C++/Windows and I could roll-out a complete solution in C++/Windows using IO Completion Ports but feel like that would be an overkill and require too much development time. Looked into ASP.NET WebAPI but not being able to send out notifications is its limitation. Are there any frameworks/technologies in Windows ecosystem that can address this scenario and scale easily as well? Any good options outside windows ecosystem e.g. node.js?
You did not specify a database that can be used so if you are able to use MSSQL Server, you may want to lookup SQL Dependency feature. IF configured and used correctly, you will be notified if there are any changes in the database.
Pair this with SignalR or any real-time front-end framework of your choice and you'll have real-time updates as you described.
One catch though is that SQL Dependency only tells you that something changed. Whatever it was, you are responsible to track which record it is. That adds an extra layer of difficulty but is much better than polling.
You may want to search through the sqldependency tag here at SO to go from here to where you want your app to be.
My first thought was to have webservice call that "stays alive" or the html5 protocol called WebSockets. You can maintain lots of connections but hundreds of thousands seems too large. Therefore the webservice needs to have a way to contact the clients with stateless connections. So build a webservice in the client that the webservices server can communicate with. This may be an issue due to firewall issues.
If firewalls are not an issue then you may not need a webservice in the client. You can instead implement a server socket on the client.
For mobile clients, if implementing a server socket is not a possibility then use push notifications. Perhaps look at https://stackoverflow.com/a/6676586/4350148 for a similar issue.
Finally you may want to consider a content delivery network.
One last point is that hopefully you don't need to contact all 100000 users within 1 second. I am assuming that with so many users you have quite a few servers.
Take a look at Maximum concurrent Socket.IO connections regarding the max number of open websocket connections;
Also consider whether your estimate of on the order of 100000 of simultaneous users is accurate.

Architecture For A Real-Time Data Feed And Website

I have been given access to a real time data feed which provides location information, and I would like to build a website around this, but I am a little unsure on what architecture to use to achieve my needs.
Unfortunately the feed I have access to will only allow a single connection per IP address, therefore building a website that talks directly to the feed is out - as each user would generate a new request, which would be rejected. It would also be desirable to perform some pre-processing on the data, so I guess I will need some kind of back end which retrieves the data, processes it, then makes it available to a website.
From a front end connection perspective, web services sounds like it may work, but would this also create multiple connections to the feed for each user? I would also like the back end connection to be persistent, so that data is retrieved and processed even when the site is not being visited, I believe IIS will recycle web services and websites when they are idle?
I would like to keep the design fairly flexible - in future I will be adding some mobile clients, so the API needs to support remote connections.
The simple solution would have been to log all the processed data to a database, which could then be picked up by the website, but this loses the real-time aspect of the data. Ideally I would be looking to push the data to the website every time the data changes or now data is received.
What is the best way of achieving this, and what technologies are there out there that may assist here? Comet architecture sounds close to what I need, but that would require building a back end that can handle multiple web based queries at once, which seems like quite a task.
Ideally I would be looking for a C# / ASP.NET based solution with Javascript client side, although I guess this question is more based on architecture and concepts than technological implementations of these.
Thanks in advance for all advice!
Realtime Data Consumer
The simplest solution would seem to be having one component that is dedicated to reading the realtime feed. It could then publish the received data on to a queue (or multiple queues) for consumption by other components within your architecture.
This component (A) would be a standalone process, maybe a service.
Queue consumers
The queue(s) can be read by:
a component (B) dedicated to persisting data for future retrieval or querying. If the amount of data is large you could add more components that read from the persistence queue.
a component (C) that publishes the data directly to any connected subscribers. It could also do some processing, but if you are looking at doing large amounts of processing you may need multiple components that perform this task.
Realtime web technology components (D)
If you are using a .NET stack then it seems like SignalR is getting the most traction. You could also look at XSockets (there are more options in my realtime web tech guide. Just search for '.NET'.
You'll want to use signalR to manage subscriptions and then to publish messages to registered client (PubSub - this SO post seems relevant, maybe you can ask for a bit more info).
You could also look at offloading the PubSub component to a hosted service such as Pusher, who I work for. This will handle managing subscriptions and component C would just need to publish data to an appropriate channel. There are other options all listed in the realtime web tech guide.
All these components come with a JavaScript library.
Summary
Components:
A - .NET service - that publishes info to queue(s)
Queues - MSMQ, NServiceBus etc.
B - Could also be a simple .NET service that reads a queue.
C - this really depends on D since some realtime web technologies will be able to directly integrate. But it could also just be a simple .NET service that reads a queue.
D - Realtime web technology that offers a simple way of routing information to subscribers (PubSub).
If you provide any more info I'll update my answer.
A good solution to this would be something like http://rubyeventmachine.com/ or http://nodejs.org/ . It's not asp.net, but it can easily solve the issue of distributing real time data to other users. Since user connections, subscriptions and broadcasting to channels are built in to each, that will make coding the rest super simple. Your clients would just connect over standard tcp.
If you needed clients to poll for updates then you would need a que system to store info for the next request. That could be a simple array, or a more complicated que system depending on your requirements and number of users.
There may be solutions for .net that I am not aware of that do the same thing, but those are the 2 I know of.

Resources