AWS Lambda Intermittent Connection Issues to External HTTP Requests - http

I am currently building a script in AWS Lambda that requires it to send HTTP Post requests to an External API. However, occasionally the script loses connection and cannot send requests to anyone (the external site, other Lambda Function URLS, Google, etc) for about 20 minutes before it can send requests again. It just reaches the timeout point and retries a couple times. I also need to reach internal AWS services such as RDS.
I have tried using the AWS Systems Manager to run the AWSSupport-TroubleshootLambdaInternetAccess automation and that returns a success.
I have set up an Internet Gateway and a NAT Gateway as well as a 'public' and 'private' subnet. The public subnet is routed to the internet gateway and the private is routed to the NAT gateway. The Lambda is connected to both of those subnets.
Although, this question Why can't an AWS lambda function inside a public subnet in a VPC connect to the internet? says to have 2 privates and not 1 private, 1 public subnet, but if I do this I cannot access RDS.
The Lambda function has the following permissions:
"ec2:DescribeNetworkInterfaces",
"ec2:CreateNetworkInterface",
"ec2:DeleteNetworkInterface",
"ec2:DescribeInstances",
"ec2:AttachNetworkInterface"
"logs:CreateLogStream",
"logs:PutLogEvents"
The Python 3.9 script related to testing the connection is as follows and functions properly on a local machine.
def ConnectTest():
http = urllib3.PoolManager(timeout=Timeout(connect=1.3,read=1.3));
logger.info("Lodging")
url = "---" # A link to one of my servers that simply responses "Hello" to POST HTTP requests.
headers = {"Accept": "application/json"}
response = ""
try:
response = http.request('POST',url, headers=headers,timeout=Timeout(connect=1.3,read=1.3))
except Exception as e:
logger.error("Request error | %s", response);
return response.data;
I have also tried increasing the timeout time but it really shouldn't take more than a couple seconds to send a request.
Thanks for your help!

Despite the AWS Guide stating to use a public and private subnet, this does not work and ended up being the cause of the intermittent connection.
I have the Lambda function and the RDS database on 3 private subnets and that seems to work okay, I haven't had any issues so far.

Related

Mirror requests from cloudrun service to other cloudrun service

I'm currently working on a project where we are using Google Cloud. Within the Cloud we are using CloudRun to provide our services. One of these services is rather complex and has many different configuration options. To validate how these configurations affect the quality of the results and also to evaluate the quality of changes to the service, I would like to proceed as follows:
in addition to the existing service I deploy another instance of the service which contains the changes
I mirror all incoming requests and let both services process them, only the responses from the initial service are returned, but the responses from both services are stored
This allows me to create a detailed evaluation of the differences between the two services without having to provide the user with potentially worse responses.
For the implementation I have setup a NGINX which mirrors the requests. This is also deployed as a CloudRun service. This now accepts all requests and takes care of the authentication. The original service and the mirrored version have been configured in such a way that they can only be accessed internally and should therefore be accessed via a VPC network.
I have tried all possible combinations for the configuration of these parts but I always get 403 or 502 errors.
I have tried setting the NGINX service to the HTTP and HTTPS routes from the service, and I have tried all the VPC Connector settings. When I set the ingress from the service to ALL it works perfectly if I configure the service with HTTPS and port 443 in NGINX. As soon as I set the ingress to Internal I get errors with HTTPS -> 403 and with HTTP -> 502.
Does anyone have experience in this regard and can give me tips on how to solve this problem? Would be very grateful for any help.
If your Cloud Run service are internally accessible (ingress control set to internal only), you need to perform your request from your VPC.
Therefore, as you perfectly did, you plugged a serverless VPC connector on your NGINX service.
The set up is correct. Now, why it works when you route ALL the egress traffic and not only the private traffic to your VPC connector?
In fact, Cloud Run is a public resource, with a public URL, and even if you set the ingress to internal. This param say "the traffic must come to the VPC" and not say "I'm plugged to the VPC with a private IP".
So, to go to your VPC and access a public ressource (Your cloud run services), you need to route ALL the traffic to your VPC, even the public one.

Trouble connecting to gRPC server on AWS Fargate

I have a Python gRPC server running on AWS Fargate (configured very similar to this AWS guide here), and another AWS Fargate task (call it the "client") that attempts to make a connection to my gRPC server (also using Python gRPC). However, the client is unable to make a call to my server, with the following error:
<_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "failed to connect to all addresses"
debug_error_string = "{"created":"#1619057124.216955000","description":"Failed to pick subchannel",
"file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":5397,
"referenced_errors":[{"created":"#1619057124.216950000","description":"failed to connect to all addresses",
"file":"src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc",
"file_line":398,"grpc_status":14}]}"
Based on my reading online, it seems like there are myriad situations in which this error is thrown, and I'm having trouble figuring out which one pertains to my case. Here is some additional information:
When running client and server locally, I am able to successfully connect by having the client connect to localhost:[PORT]
I have configured an application load balancer target group following the guide from AWS here that makes health check requests to the / route of my gRPC server, using the gRPC protocol, and expect gRPC response code 12 (UNIMPLEMENTED); these health check requests are coming back as expected, which I believe implies the load balancer is able to successfully communicate with the server (although I could be misunderstanding)
I configured a service discovery system (following this guide here) that should allow me to reach my gRPC server within my VPC via the name service-name.dev.co.local. I can confirm that the corresponding DNS record exists in Route 53, and when I SSH into my VPC, I am indeed able to ping service-name.dev.co.local successfully.
Anyone have any ideas? Would appreciate any and all advice, and I'm happy to answer any further questions.
Thank you for your help!
on your grpc server use 0.0.0.0:[port] and expose this port with TCP on your container.

Google Cloud Platform: VPC Connector Not Working with Cloud Function

We're developing a web server with Firebase that will have to talk to various PLCs behind customer firewalls. I have a couple PLCs on my local network that I'm working with. I'm just trying to do a basic ping right now. I installed the ping module and am trying to run this node.js code:
function pingTest(){
const ping = require('ping');
const hosts = [
'10.10.100.11',
'10.10.100.119',
'10.10.100.12',
'10.10.100.118',
];
hosts.forEach(function(host){
ping.sys.probe(host, function(isAlive){
var msg = isAlive ? 'host ' + host + ' is alive' : 'host ' + host + ' is dead';
console.log(msg);
});
});
}
I've setup a VPN gateway and tunnel on GCP. The tunnel status is "Established". I have a Serverless VPC Access Connector setup as well. Everything is using the "us-central1" region. The connector is assigned to my cloud function for all traffic.
Running locally, I can ping the .11 and .12 PLCs just fine. When running in Firebase though, all four hosts report "dead".
When looking at the GCP Logs Viewer, I'm not seeing any error codes. The logs for my function are only what I'm writing to the console:
10:06:00.180 AM every1MinuteBackgroundPlcRead Function execution started
10:06:02.693 AM every1MinuteBackgroundPlcRead host 10.10.100.11 is dead
10:06:03.793 AM every1MinuteBackgroundPlcRead host 10.10.100.12 is dead
10:06:03.794 AM every1MinuteBackgroundPlcRead host 10.10.100.119 is dead
10:06:04.393 AM every1MinuteBackgroundPlcRead host 10.10.100.118 is dead
10:07:00.206 AM every1MinuteBackgroundPlcRead Function execution took 306 ms, finished with status: 'ok'
I'm not even seeing an option for VPC connector logging. I see the option for the tunnel logs; but there're empty. The logs for the gateway are just these lines repeated over and over:
2020-02-24T15:00:27.885866755Z sending DPD request
2020-02-24T15:00:27.886004404Z generating INFORMATIONAL request 95 [ ]
2020-02-24T15:00:27.886142741Z sending packet: from 34.66.113.10[500] to 50.205.87.130[500] (49 bytes)
2020-02-24T15:00:27.911746950Z received packet: from 50.205.87.130[500] to 34.66.113.10[500] (49 bytes)
2020-02-24T15:00:27.911828853Z parsed INFORMATIONAL response 95 [ ]
It doesn't seem like my code is utilizing the VPC connector that I assigned to the function. I assume I'm missing a critical link in my chain? Any help? Thanks!

Health check to detect redis master from google tcp load balancer

I am trying to setup a google TCP internal Load Balancer. Instance group behind this lb consists of redis-server processes listening on port 6379. Out of these redis instances, only one of them is master.
Problem: Add a TCP health check to detect the redis master and make lb divert all traffic to redis master only.
Approach:
Added a TCP Health Check for the port 6379.
In order to send the command role to redis-server process and parse the response, I am using the optional params provided in the health check. Please check the screenshot here.
Result: Health check is failing for all. If I remove the optional request/response params, health check starts passing for all.
Debugging:
Connected to lb using netcat and issued the command role, it sends the response starting with *3(for master) and *5(for slave) as expected.
Logged into instance and stopped redis-server process. Started listening on port 6379 using nc -l -p 6379 to check what exactly is being received at the instance's side in the health check. It does receive role\r\n.
After step 2, restarted redis-server and ran MONITOR command in redis-cli, to watch log of commands received by this process. Here there is no log of role.
This means, instance is receiving the data(role\r\n) over tcp but is not received by the process redis-cli(as per MONITOR command) or something else is happening. Please help.
Unfortunately GCP's TCP health checks is pretty limited on what can be checked in the response. From https://cloud.google.com/sdk/gcloud/reference/compute/health-checks/create/tcp:
--response=RESPONSE
An optional string of up to 1024 characters that the health checker expects to receive from the instance. If the response is not received exactly, the health check probe fails. If --response is configured, but not --request, the health checker will wait for a response anyway. Unless your system automatically sends out a message in response to a successful handshake, only configure --response to match an explicit --request.
Note the word "exactly" in the help message. The response has to match the provided string in full. One can't specify a partial string to search for in the response.
As you can see on https://redis.io/commands/role, redis's ROLE command returns a bunch of text. Though the substring "master" is present in the response, it also has a bunch of other text that would vary from setup to setup (based on the number of slaves, their addresses, etc.).
You should definitely raise a feature request with GCP for regex matching on the response. A possible workaround until then is to have a little web app on each host that performs "redis-cli role | grep master" command locally and return the response. Then the health check can be configured to monitor this web app.

Why do I have a slow initial response using by the Active Directory server?

Connecting to a named Active Directory server in the DMZ (i.e. not in the domain), over port 636 using DirectoryEntry then pulling attributes using DirectorySearcher.
String
string serverPath = "LDAP://some.domain.com:636/OU=XXXX,DC=xxxx,DC=xxxxxxxxx";
var searchRoot = new DirectoryEntry(
serverPath,
User,
Pass,
AuthenticationTypes.Secure);
The first query is very slow, around 22-25 seconds. It was explained to me that this may because IIS 7.5 may be doing a look up of the certificate on the AD server against a CRL, but not getting a response. Then, subsequent queries accept that answer until the process times out, so that the next query will again take 22-25 seconds.
Is the type of connection that I've described in my code example actually pull the certificate, or is the traffic simply sent over the port in an encrypted state, without a handshake between servers relative to the cert?
Is it mandatory that I IIS have a certificate as well for this to work? I should say that I am using this pattern: http://forums.asp.net/p/907421/1007517.aspx.
SSL isn't involved here.
What's serverPath look like? You probably are timing out on something - DNS perhaps. I'd start with a network trace.
When you use port 636, LDAP over SSL is used,
http://en.wikipedia.org/wiki/Ldap
If you use Microsoft Network Monitor or Wireshark to capture the packets, you may gain more insights on packet level.
In this case, CRL is necessary as it is enabled by default. But you can turn it off on machine or application level,
http://www.page-house.com/blog/2009/04/how-to-disable-crl-checking.html

Resources