Finding suspicious logins using amazon neptune graph database - graph

I have a below structure of graph loaded in amazon neptune.
In words graph can be described as: Device is getting accessed from location and Device is logging into the account.
Problem Statement: I want to find suspecious logins to the account.
How?: Ideally one account should get accessed from only one device. If it is getting accessed from more than 1 device then one of the login is suspicious. But this becomes false when customer gets a new device. To increase accuracy we can say that if device which is accessed later is atleast 1km away from device which was accessed earlier. Then it becomes suspicious.
I am trying to write the query for the same in gremlin query language but haven't gone very far and looking for a query: to find account and devices where account was accessed from more than one device and both the devices are atleast 1km away from each other.

Related

Why does Firebase Realtime Database say I have more connections than I actually do?

I am utilizing Firebase's realtime database to host a chatroom type thing. Ever since I started, the analytics have only shown the maximum number of connections I have had. Here is an example of what the analytics look like:
Even when I click on the information "?" next to connections it says, "The number of simultaneous realtime connections to your database". You can even see in the graph that the number has gone down but the number on the left stays at 100/100.
Does anyone know why this is or how I can see the actual number of realtime connections?

How would you identify if a visitor to one of your sites is the same person who visited another site of yours before (different domain)?

My question is more of a conceptual one, but in my specific case I am using Google Analytics 4. If the question is unclear, here it is in scenario form: Some guy visits my site x.com after a google search. He closes the tab, does another google search, and arrives at my other site y.com. How do I know it's the same person? I don't think there's anything I can do with User ID's in this situation. How would I solve this?
This isn't without fault, but if you are implementing it via Google Tag Manager, you have more control over the data being sent and on top of that, if you are transporting the data via Google Tag Manager server side container.
You would use a single server (but possibly different containers) or use BigQuery and either use the templateDataStorage API call or the BigQuery API call.
Essentially, the first time you see a google cid or an IP address or combination of user agent and ip address you would store it in the server or in a BigQuery table as a key and create a random associated value next to it.
At each time, across all your sites, you would check to see if the IP address or CID or combination of user agent and ip exists in the server or in the BigQuery table, then output the random value as a custom dimension and if not, it will create one.
Actually you probably wouldn't.
Presumably you could try fingerprinting, but depending on your legislation that might not be quite legal, and it tends to work a lot better in a lab than in real life. Also browsers start to implement anti-fingerprinting measures such as trimming the user agent, and denying access to browser properties such as installed plugins.
I have heard of experimental approaches to recongnize users via usage patterns - e.g. how do they move their mouse etc. I am not aware of any actual product that uses this, and I am not convinced it is a useful (or even legal) approach.
But in general, when it comes to cross-domain detection for unrelated visits (moving from domain to domain works via link decorators, and even that is affected by browser protections) you have the combined power of browser vendor against you, who try to make this harder (either for genuine concerns about privacy, or to establish themselves as the single gatekeeper for user identity. E.g. Google has a huge user base that is almost constantly logged in to Google accounts or Android smartphones, which helps with identifying users all over the web).

Firebase Realtime Database - Scaling above 100.000 concurrent connections

The application I'm currently working on needs real-time communication that is scalable. We have been looking into and tried out Firebase real-time database and firestore. It seems Firebase real-time database is more mature and tested out, while firestore is still in beta, which is why we are leaning towards the real-time database.
We are however worried about its scaling capabilities in our context. Our queries will mainly be geo spatial based on the user's location. According to Firebase simultaneous realtime connections to my database and https://firebase.google.com/pricing/#faq-simultaneous the maximum number of concurrent users is 100.000, which will be too low for our needs.
According to their documentation, it seems like database sharding is the way to scale beyond 100.000 concurrent users https://firebase.google.com/docs/database/usage/sharding. Since our queries are based on the user's location, we could group the data into regions, e.g. US West, US Central, and US East and have a database instance for each of those three regions.
While this method may work, it seems very cumbersome to set it up. We would probably need to have a service which the user initially connects to in order to be redirected to the correct database instance that fits the region which the user is in. Additionally, it should handle the case where a user moves into another region, and should therefore be redirected to another database instance containing the data for that specific region.
Another complex task would be to distribute the data into the correct database instances.
Is there a more simple approach to scale beyond 100.000 users or is it possible to increase the amount of concurrent connections for a single Firebase real-time database?
To me it seems like almost a waste to use Firebase if it requires you to do so much "load" balancing yourself.
The 100K concurrent connections is a hard cap on the Firebase Realtime Database.
The approach you describe with a two-step connect is quite idiomatic. The first step is usually quite simple. In fact for many apps it is part of their authentication flow, or based on the outcome of that. For example, many apps base the user's shard on a hash of their UID.
In your case, you could inject the users region into their token as a custom claim when they register. Then you'd get that claim when they sign in, and can redirect them to their shard. You could also persist the shard info in the client when they first connect, so that you only have to determine that only once for each client/device.
Is there a more simple approach to scale beyond 100.000 users or is it
possible to increase the amount of concurrent connections for a single
Firebase real-time database?
Yes. use Firestore database.
Scales completely automatically. Currently, scaling limits are:
Around 1 million concurrent connections and 10,000 writes/second. (they plan to increase these limits in the future) (source)
Maximum write rate to a document is 1 per second (source)
Is officially out of beta and in General Availability from 31/1/2019 (source)

iTunes search api not returning keyword results in the same order as iOS App Store

I'm trying to do some SEO and I want to index the location of an application based on keyword searching. By using the official search API, I've come up with the following query:
https://itunes.apple.com/search?media=software&term=sql+server&limit=&country=us&limit=200
To search the US App Store, for the term sql server. The app I am looking for current shows up in this list at position ~20'th. If I search from my phone, the app is closer to the ~30'th position (other search terms perform even worse). I have tried to use Wireshark to capture the search from my phone to try and see if they use different endpoints, but was unable to capture due to ssl.
Does anyone know of a way to scrape iOS App Store search results in the proper order?
Apple is under no obligation to return search results in any particular order, and is likely to change them depending on client/what search cluster you hit.
Seeing what your phone is sending to the app store is very difficult, however, as Apple takes excessive measures to ensure that communications aren't being read. Last time I tried required BURP Suite, a jailbroken phone, an app to disable SSL pinning, and manually restarting the app store on the phone, which would occasionally crash it.
Set your user agent to replicate an iPhone device browsing the App Store.
Try this one... This is the most recent App Store version, emulating an iPhone X with iOS 11.3.1:
AppStore/3.0 iOS/11.3.1 model/iPhone10,6 hwp/t8015 build/15E302 (6; dt:162)

Understanding How to Store Web Push Endpoints

I'm trying to get started implementing Web Push in one of my apps. In the examples I have found, the client's endpoint URL is generally stored in memory with a comment saying something like:
In production you would store this in your database...
Since only registered users of my app can/will get push notifications, my plan was to store the endpoint URL in the user's meta data in my database. So far, so good.
The problem comes when I want to allow the same user to receive notifications on multiple devices. In theory, I will just add a new endpoint to the database for each device the user subscribes with. However, in testing I have noticed that endpoints change with each subscription/unsubscription on the same device. So, if a user subscribes/unsubscribes several times in a row on the same device, I wind up with several endpoints saved for that user (all but one of which are bad).
From what I have read, there is no reliable way to be notified when a user unsubscribes or an endpoint is otherwise invalidated. So, how can I tell if I should remove an old endpoint before adding a new one?
What's to stop a user from effectively mounting a denial of service attack by filling my db with endpoints through repeated subscription/unsubscription?
That's more meant as a joke (I can obvioulsy limit the total endpoints for a given user), but the problem I see is that when it comes time to send a notification, I will blast notification services with hundreds of notifications for invalid endpoints.
I want the subscribe logic on my server to be:
Check if we already have an endpoint saved for this user/device combo
If not add it, if yes, update it
The problem is that I can't figure out how to reliably do #1.
I will just add a new endpoint to the database for each device the user subscribes with
The best approach is to have a table like this:
endpoint | user_id
add an unique constraint (or a primary key) on the endpoint: you don't want to associate the same browser to multiple users, because it's a mess (if an endpoint is already present but it has a different user_id, just update the user_id associated to it)
user_id is a foreign key that points to your users table
if a user subscribes/unsubscribes several times in a row on the same device, I wind up with several endpoints saved for that user (all but one of which are bad).
Yes, unfortunately the push API has a wild unsubscription mechanism and you have to deal with it.
The endpoints can expire or can be invalid (or even malicious, like android.chromlum.info). You need to detect failures (using the HTTP status code, timeouts, etc.) when you try to send the push message from your application server. Then, for some kind of failures (permanent failures, like expiration) you need to delete the endpoint.
What's to stop a user from effectively mounting a denial of service attack by filling my db with endpoints through repeated subscription/unsubscription?
As I described above, you need to properly delete the invalid endpoints, once you realize that they are expired or invalid. Basically they will produce at most one invalid request. Moreover, if you have high throughput, it takes only a few seconds for your server to make requests for thousands of endpoints.
My suggestions are based on a lot of experiments and thinking done when I was developing Pushpad.
Another way is to have a keep alive field on you server and have your service worker update it whenever it receives a push notification. Then regularly purge endpoints which haven't been responded to recently.

Resources