Unreliable 401 errors with GCM services - push-notification

I implemented a GCM push notification service that runs on my computer (at least during development).
To do that, I basically format and send requests to https://android.googleapis.com/gcm/send.
For authentication, I obviously used a server key I generated on the Google Developers console.
I tested it and it works perfectly (the devices receive the push notification, and it's actually damn fast).
I send those push notifications several times in the day, but yesterday I faced a situation were the request sending returned a 401 (which stands for authentication required). My request was exactly the same as usual. I retried 4 times with the same result then suddenly, it worked again and returned a 200.
I'm at loss trying to understand why this happens. According to this documentation, 401 is only supposed to happen if:
Authorization header missing or with invalid syntax.
Invalid project number sent as key.
Key valid but with GCM service disabled.
Request originated from a server not whitelisted in the Server Key IPs.
None of these conditions have changed between the different calls.
Is there some kind of threshold or quota that might explain this ? Or is there a guideline regarding what to do when this happens ?

Related

How to get messages that were not delivered while Telegram Bot web-server was down?

Curious, if there is way to avoid skipping messages sent from Telegram Bot while web-server that accepts Webhooks is down (because of redeploy, failure or maintenance).
When you use polling - Telegram API sends messages starting from last retrieved and no message are skipped.
But how to be with Webhooks? Use polling or there are some special mechanism for that?
Telegram keeps the incoming message for 24hrs, if you are Webhook is down (ie redeploying) then the message will be delivered once it is again online.
It works on Heroku for example where your Dyno is down: as soon as it starts the Chatbot will register again with Telegram and will receive the messages still available in the queue.
There are two mutually exclusive ways of receiving updates for your bot — the getUpdates method on one hand and Webhooks on the other. Incoming updates are stored on the server until the bot receives them either way, but they will not be kept longer than 24 hours.
See Telegram documentation for more details.
I had the same problem recently but I just resolved it by when the server starts save the started time to a variable and then use Telegrambot.Message.date and compare the time if it was sent before the server start time or not.

How to handle data replication lag and event notification

We have a simple application, who upon every update of an entity sends out a notification to SNS(it could very well have been any other queuing system). Clients are listening to these notifications and they do a get of updated entity based on these notifications.
The problem we are facing is, when clients do a get, sometimes data is not replicated and we return 404 or sometimes stale data(even worse).
How can we mitigate this while sending notifications?
Here are Few strategies to mitigate this with pros and cons
Instead of sending notification from application send notification using database streams
For example dynamodb streams ans aws lambda. This pattern can be useful in the case of multiregion deployment as well. where all the subscriber, publisher will subscribe to their regional database streams. And also atomicity of sending message and writing to database is preserved. And we wont loose events in the case of regional failure.
Send delayed messages to your broker
Some borkers like activemq and sqs support this functionality, but SNS does not. A workaround for that could be writing to sqs queue which then writes to sns. This might be a good option when your database does not support streams.
Send special error code for retry-able gets
Since we know that eventual consistency is there we can return special error code to clients, so that they can retry based on this error code. The retry strategy should be exponential backoff. but this may mean giving away your problems to clients. Also we should have some sort of versioning in place.
Fetch from another region
If entity is not found in the same region application can go to another region or master database to fetch it. NOTE Don't do this. as it is an anti pattern. I am mentioning it here just for the sake of completion.
Send the full entity in message
If entities to be fetched by rest service is small and there are no security constrain around who can access what, we can send the full entity in message. This is ensure that client don't have to do explicit fetch of it every time a new message is arrived.

Firebase cloud function: how to deal with continuous request

When working with Firebase (Firebase cloud function in this case), we have to pay for every byte of bandwidth.
So, i wonder how can we deal with case that someone who somehow find out our endpoint then continuous request intentionally (by a script or tool)?
I did some search on the internet but don't see anything can help.
Except for this one but not really useful.
Since you didn't specify which type of request, I'm going to assume that you mean http(s)-triggers on firebase cloud functions.
There are multiple limiters you can put in place to 'reduce' the bandwidth consumed by the request. I'll write a few that comes to my mind
1) Limit the type of requests
If all you need is GET and say for example you don't need PUT you can start off by returning a 403 for those, before you go any further in your cloud function.
if (req.method === 'PUT') { res.status(403).send('Forbidden!'); }
2) Authenticate if you can
Follow Google's example here and allow only authorized users to use your https endpoints. You can simply achieve this by verifying tokens like this SOF answer to this question.
3) Check for origin
You can try checking for the origin of the request before going any further in your cloud function. If I recall correctly, cloud functions give you full access to the HTTP Request/Response objects so you can set the appropriate CORS headers and respond to pre-flight OPTIONS requests.
Experimental Idea 1
You can hypothetically put your functions behind a load balancer / firewall, and relay-trigger them. It would more or less defeat the purpose of cloud functions' scalable nature, but if a form of DoS is a bigger concern for you than scalability, then you could try creating an app engine relay, put it behind a load balancer / firewall and handle the security at that layer.
Experimental Idea 2
You can try using DNS level attack-prevention solutions to your problem by putting something like cloudflare in between. Use a CNAME, and Cloudflare Page Rules to map URLs to your cloud functions. This could hypothetically absorb the impact. Like this :
*function1.mydomain.com/* -> https://us-central1-etc-etc-etc.cloudfunctions.net/function1/$2
Now if you go to
http://function1.mydomain.com/?something=awesome
you can even pass the URL params to your functions. A tactic which I've read about in this medium article during the summer when I needed something similar.
Finally
In an attempt to make the questions on SOF more linked, and help everyone find answers, here's another question I found that's similar in nature. Linking here so that others can find it as well.
Returning a 403 or empty body on non supported methods will not do much for you. Yes you will have less bandwidth wasted but firebase will still bill you for the request, the attacker could just send millions of requests and you still will lose money.
Also authentication is not a solution to this problem. First of all any auth process (create token, verify/validate token) is costly, and again firebase has thought of this and will bill you based on the time it takes for the function to return a response. You cannot afford to use auth to prevent continuous requests.
Plus, a smart attacker would not just go for a req which returns 403. What stops the attacker from hitting the login endpoint a millions times?? And if he provides correct credentials (which he would do if he was smart) you will waste bandwidth by returning a token each time, also if you are re-generating tokens you would waste time on each request which would further hurt your bill.
The idea here is to block this attacker completely (before going to your api functions).
What I would do is use cloudflare to proxy my endpoints, and in my api I would define a max_req_limit_per_ip and a time_frame, save each request ip on the db and on each req check if the ip did go over the limit for that given time frame, if so you just use cloudflare api to block that ip at the firewall.
Tip:
max_req_limit_per_ip and a time_frame can be custom for different requests.
For example:
an ip can hit a 403 10 times in 1 hour
an ip can hit the login successfully 5 times in 20 minutes
an ip can hit the login unsuccessfully 5 times in 1 hour
There is a solution for this problem where you can verify the https endpoint.
Only users who pass a valid Firebase ID token as a Bearer token in the Authorization header of the HTTP request or in a __session cookie are authorized to use the function.
Checking the ID token is done with an ExpressJs middleware that also passes the decoded ID token in the Express request object.
Check this sample code from firebase.
Putting access-control logic in your function is standard practice for Firebase, BUT the function still has to be invoked to access that logic.
If you don't want your function to fire at all except for authenticated users, you can take advantage of the fact that every Firebase Project is also a Google Cloud Project -- and GCP allows for "private" functions.
You can set project-wide or per-function permissions outside the function(s), so that only authenticated users can cause the function to fire, even if they try to hit the endpoint.
Here's documentation on setting permissions and authenticating users. Note that, as of writing, I believe using this method requires users to use a Google account to authenticate.

Intermittently receiving 999 Request Denied from the LinkedIn API. What does reason code 1,2,1 refer to

Intermittently, over the past two days, two different LinkedIn "apps" have started to receive 999 Request Denied errors. Along with this, I receive: "reason-code=1,2,1" as a header. Specifically, this has been captured from the 3rd step of the oAuth process (communicating with https://www.linkedin.com/uas/oauth2/accessToken) - however making POST requests to the sharing endpoint also fails, but I've not yet captured the HTTP response code and data for those failures.
These two apps are the live and test sites for the same platform, running off different servers, different IP addresses, different client ids, and the two apps are tied to two different users (though I'm an admin on both). One thing they do have in common, is that they are both hosted on Linode.
Locally, (with a different client id/app) the integrations work without problems. If I use the test app client id locally (with the callback URLs added) the integration works without issues.
This seems to be intermittent. Over the past two days, there has been a high failure rate of POST requests to the API. And also intermittent complaints of performing an oAuth connection failing.
From the Developer website, I see the apps are well within their usage allowance.
Does anyone know what this particular error with reason code 1,2,1 refers to? The LinkedIn developer website has reference to the 999 error, but it seems to be restricted such that only registered members of the partner program can access them. And, is there any approach for preventing this error. Some other questions indicate that the 999 response code has started to come about through LinkedIn blocking various cloud providers, however that doesn't seem to explain why this is intermittent.

SignalR - notification order

I know this issue was raised many times, but I couldn't find a real answer, my clients are registered to groups and the server sends notifications on changes, do I need to use sequences so the client will re-order it? or can i trust on order delivery for each client?
SignalR doesn't guarantee message delivery but the message order should normally be preserved

Resources