AWS CodeDeploy vs Windows 2016 in ASG - aws-code-deploy

I use AWS CodeDeploy to deploy builds from GitHub to EC2 instances in AutoScaling Group.
It's working fine for Windows 2012 R2 with all Deployment configurations.
But for Windows 2016 it totally fails on "OneAtTime" deploy;
During "AllAtOnce" deploy only one or two instances deployed successfully, all other fails.
In the logfile on agent this suspicious message is present:
ERROR [codedeploy-agent(1104)]: CodeDeploy Instance Agent Service: CodeDeploy Instance Agent Service: error during start or run: Errno::ETIMEDOUT
- A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. - connect(2)
All policies, roles, software, builds and other stuff are the same, I even tested this on brand new AWS account.
Does anybody faced such behaviour?

I ran into the same problem, but during my investigation, I found out that server's route table had wrong routes for 169.254.169.254 network (there was specified the gateway from the network where my template was captured), so that it couldn't read instance metadata.

From the above error it looks like the agent isn't able to talk to CodeDeploy endpoint after instance starts up. Please check if the routing tables and other proxy related settings are set up correctly. Also if you do not have it already, you can turn on the debug log by setting :verbose to true in the agent config and restart the agent. This would help debug the issue better.

Related

corda CENM networkmap server start failing to connect database after a few week run

we operate CENM(1.2 and use helm template to run on k8s cluster) to construct our own private network and keep on running CENM network map server for a few week, then launching new node start failing.
with further investigation, its appeared that request timeout for http://nmap:10000/network-map causes problem.
in nmap server’s log, we found following output when access to above url with curl.
[NMServer] - Error while handling socket client message com.r3.enm.servicesapi.networkmap.handlers.LatestUnsignedNetworkParametersRetrievalMessage#760c53ea: HikariPool-1 - Connection is not available, request timed out after 30000ms.
netstat shows there is at least 3 establish connection to the database from the container which network map server runs, also I can connect database directly with using CLI.
so I don’t think it is neither database saturated nor network configuration problem.
anyone have an idea why this happens? I think restart probably solve the problem, but want to know the root cause...
regards,
Please test the following options.
Since it is the HikariCP (connection pool) component that is throwing the error it would be worth seeing if increasing the pool size in the network map configuration may help - see below)
Corda uses Hikari Pool for creating the connection pool. To configure the connection pool any custom properties can be set in the dataSourceProperties section.
dataSourceProperties = {
dataSourceClassName = "org.postgresql.ds.PGSimpleDataSource"
...
maximumPoolSize = 10
connectionTimeout = 50000
}
Has a healthcheck been conducted to verify there are sufficient resources on that postgres database i.e basic diagnostic checks ?
Another option to get more information logged from the network map service is to run with TRACE logging also:
From https://docs.corda.net/docs/cenm/1.2/troubleshooting-common-issues.html
Enabling debug/trace logging
Each service can be configured to run with a deeper log level via command line flags passed at startup:
java -DdefaultLogLevel=TRACE -DconsoleLogLevel=TRACE -jar <enm-service-jar>.jar --config-fi

Weblogic 12c, task in progress forever

I have my domain configured in weblogic 12c. When I try to start servers, they come up(State is changed to Running) and web services are active. However, the Status of Last Action in the weblogic console is always Task in Progress
What are the possible reasons that this is not changed to completed.
Also, it gets changed to None after I restart my Admin.
I would check your AdminServer and Managed servers for any errors. I suspect some communication issues between the Admin and the managed servers

SocketException: No connection could be made because the target machine actively refused it XX.XXX.XX.XXX:443

I got 2 servers with two equal wcf services hosted on them and one client application server. I can connect to endpoints and send a requests to both services using test wcf client app (.NET Web Service Studio) from my local machine successfully. But when I am trying to connect from client application server using the same test wcf client app I successfully connected only to the one wcf service server, but I have got an error when connecting to another one:
System.Net.WebException: There was an error downloading 'https://XXX/XXX?wsdl'. ---> System.Net.WebException: Unable to connect to the remote server ---> System.Net.Sockets.SocketException: No connection could be made because the target machine actively refused it XX.XXX.XX.XXX:443
I performed netstat -an | find "443" command in command prompt on the client server and on my local machine to find out the difference and here what I have got:
1. On my local machine:
2. On the client app server:
What I already tried to do on client application server is:
- turned off firewall;
- stopped windows firewall service
- uninstalled mcafee virusscan enterprise application.
(I tried to set "prevent mass mailing worms from send mail" first, but mcafee was in foreign language that I don't understand, so I just uninstalled it)
after running command netstat -aon | findstr "443" on client application server I have got this result:
but I still got an error.
Does anybody know how to solve this issue?
Could be the problem on the wcf service server side?
The solution was predictable simple one - firewall was blocking the port,
but it's important to notice that the issue was caused by firewall on the wcf service server side, but not on client application server, which is making the request to that service.
I asked the technical support of that server, and they made firewall changes.
After that error was disappeared.
I faced the same issue and tried different ways to fix this. Nothing works. Later i found the issue which is, the application i tried to run is https and in my IIS, https binding was not created. I created binding https with the website and it works.

BizTalk SSO Configuration - There are no more endpoints available from the endpoint mapper

I have a two node BTS2010 group with a separate SQL Server hosting the BTS databases including SSODB; Biz01, Biz02 and Sql01. This environment was configured by a previous employee and I have no documentation available.
There seems to be something not right with the SSO config but I'm not sure how to resolve it.
When I run ssoconfig -status on Biz02 all looks good - it tells me that the SSO Server is Biz02 and the SQL Server is Sql01 plus a load of other stuff. However, when I run the same command on Biz01 I get the message: "Error 0xC0002A0F: Could not contact the SSO server 'Sql01'. Check that SSO is configured and that the SSO service is running on that server'
I'm not clear on what Biz01 is trying to do here - is it trying to reach the EntSSO windows service on Biz02 via an RPC call, before ultimately attempting to retrie config info from Sql01?
I have checked that the ENTSSO service is running on Biz01, Biz02 and that the RPC service is running on each of the three servers.
Can anyone help advise what further steps I can take to determine the root cause of this configuration problem?
Many thanks
Rob.
I'm not sure if you have your servers clustered or not but I've run into something similar before within a cluster. Your SSO name should be your network name and not the individual computers name. Here's an post about the issue I had. Hope it helps.

GlassFish 3.1.2 - validate-dcom fails with "The remote file, C: doesn't exist" (Centralized Administration with Windows DCOM)

OS - Windows 2008 server R2 X 2 (firewall disabled on both machines)
I wish to take advantage of GlassFish 3.1.2 Windows DCOM feature to setup communication between GlassFish DAS and a remote node. I've successfully followed Byron Nevins instructions on using GlassFish 3.1.2 DCOM Configuration Utility
However I'm having an issue validating DCOM following the instructions in GlassFish 3.1.2 Guide - 2 Enabling Centralized Administration of GlassFish Server Instances
When I run command validate-dcom --passwordfile C:/Sun/AppServer/password.txt -v 192.168.0.80 I get the following output:
asadmin> validate-dcom --passwordfile C:/Sun/AppServer/password.txt -v 192.168.0.80
remote failure:
Successfully verified that the host, 192.168.0.80, is not the local machine as required.
Successfully resolved host name to: /192.168.0.80
Successfully connected to DCOM Port at port 135 on host 192.168.0.80.
Successfully connected to NetBIOS Session Service at port 139 on host 192.168.0.80.
Successfully connected to Windows Shares at port 445 on host 192.168.0.80.
The remote file, C: doesn't exist on 192.168.0.80 : Logon failure: unknown user name or bad password.
Password file, password.txt, contains a single entry:
AS_ADMIN_WINDOWSPASSWORD=my-windows-password
I have double-checked I can successfully login with my windows password on the remote machine 192.168.0.80. I've also tried this test with two Windows XP professional machines and get the same error.
Also performed this operation by creating a New Node in Admin Console, got the same error:
Cannot figure what is going wrong or what I may be missing
Thanks in advance
I have had similar issues while setting up the new production env. at work last friday, and could not find any useful information on the interwebs, except people encountering the same issue, some with comments as fresh as the day I was looking it up.
So after a rather excessive amount of painful, in-depth debugging, I was able to figure out a few things:
You must explicitly specifiy the local windows user you create for the purpose of running glassfish in both the add-node dialog, and the validate-dcom subcommand (option -w), else it will either default to 'admin' or the user the DAS is running as.
There is a bug in validate-dcom that causes it to ignore whatever you specify as the test directory. No matter what you do it will always use C:\, and result in "access-denied".
The documentation omits another registry key that must be given access to in order for WMI to work
Regarding the first issue, you will most likely encounter it if your nodes are not part of a domain or you are using a local account. Windows NT6+ has a new default security policy that prevents local users from elevating privileges over the network, which causes that test to fail, necessarily, seeing how writing to the root of a system drive not something one can do without elevation.
I previously blogged about it for someone to stumble upon it if needed:
http://www.raptorized.com/2008/08/19/access-administrative-shares-on-server-2008vista/
The gist of it is that you have to navigate to the following registry key:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System
and create a new DWORD named LocalAccountTokenFilterPolicy with a value of 1.
There is no need to reboot, the first, broken test should pass. However you will then see an error about being unable to connect to WMI, and it will fail again.
To remedy this, you must also take ownership and grant your local service account user full control over the following registry key, in addition to the other ones described in the HA Administration Guide:
HKEY_CLASSES_ROOT\CLSID\{76A64158-CB41-11D1-8B02-00600806D9B6}
Afterwards, validate-dcom should report success and you will be able to add it as a node, and create instances on it.
I hope this helps, because the seeming lack of activity from Oracle on that issue was infuriating.
I am also less than pleased by the hackish, ugly, insecure nature of the DCOM support in Glassfish 3 :(

Resources