Phase duration exceeded, but not all users were launched - tsung

I've test socket with tsung.
tsung.xml:
<?xml version="1.0"?>
<!DOCTYPE tsung SYSTEM "/usr/local/share/tsung/tsung-1.0.dtd">
<tsung loglevel="notice" version="1.0">
<!-- Client side setup -->
<clients>
<client host="localhost" use_controller_vm="true" maxusers="10000"/>
</clients>
<!-- Server side setup -->
<servers>
<server host="127.0.0.1" port="5678" type="tcp"/>
</servers>
<!-- to start os monitoring (cpu, network, memory). Use an erlang
agent on the remote machine or SNMP. erlang is the default -->
<!-- <monitoring>
<monitor host="localhost"></monitor>
</monitoring> -->
<!-- <load duration="1" unit="minute" loop="3"> -->
<load loop="3">
<arrivalphase phase="1" duration="1" unit="minute">
<!-- <users interarrival="0.001" unit="second"></users> -->
<users arrivalrate="200" unit="second" />
</arrivalphase>
</load>
<!--
<options>
<option type="ts_http" name="user_agent">
<user_agent probability="80">Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050513 Galeon/1.3.21</user_agent>
<user_agent probability="20">Mozilla/5.0 (Windows; U; Windows NT 5.2; fr-FR; rv:1.7.8) Gecko/20050511 Firefox/1.0.4</user_agent>
</option>
</options>
-->
<!-- start a session for a http user. the probability is the
frequency of this type os session. The sum of all session's
probabilities must be 100 -->
<sessions>
<session probability="100" name="socket-example" type="ts_socket">
<request>
<socket></socket>
</request>
</session>
</sessions>
</tsung>
And in almost every phase, I received report: Phase duration exceeded, but not all users were launched, just like folloing:
=INFO REPORT==== 21-Sep-2012::14:28:44 ===
ts_config_server:(5:<0.50.0>) All remote beams started, sync
=INFO REPORT==== 21-Sep-2012::14:28:44 ===
ts_config_server:(5:<0.50.0>) New arrival phase 1 for client "localhost" (last ? true): will start 6000 users
=INFO REPORT==== 21-Sep-2012::14:28:44 ===
ts_config_server:(5:<0.50.0>) New arrival phase 2 for client "localhost" (last ? true): will start 6000 users
=INFO REPORT==== 21-Sep-2012::14:28:44 ===
ts_config_server:(5:<0.50.0>) New arrival phase 3 for client "localhost" (last ? true): will start 6000 users
=INFO REPORT==== 21-Sep-2012::14:28:44 ===
ts_config_server:(5:<0.50.0>) New arrival phase 4 for client "localhost" (last ? true): will start 6000 users
=INFO REPORT==== 21-Sep-2012::14:28:44 ===
ts_launcher:(5:<0.84.0>) Expected duration of first phase: 60.0 sec (6000 users)
=INFO REPORT==== 21-Sep-2012::14:28:44 ===
ts_launcher:(5:<0.84.0>) Activate launcher (6000 users) in 10019 msec
=INFO REPORT==== 21-Sep-2012::14:29:54 ===
ts_launcher:(5:<0.84.0>) Phase duration exceeded, but not all users were launched (440 users, 7.3% of phase)
Is there any problem?

Are you sure that's the tsung.xml associated with that log? It looks like you've only specified 1 phase while the log is showing 4...or does that loop recycle the 1st phase identically, 3 more times?
Not sure how familiar you are with Tsung, I just started using it a couple weeks ago and I've found it to be NO JOKE. Extremely powerful. I had to tune it way back because of how wickely it can scale, distribute and force multiply a load.
It looks like you've got it set kinda stout. It's late and my math's never been good but it looks like you're blasting 200,000 http get's a second for a minute. Go ahead and multiply 200,000 x 3 to get a bare minimum packets per second. Is that what your intent was?
What I did was bump it all the way down to 1 user arriving 1 time a second for 1 min to see if it would actaully stop on it's own and get a feeling for it's loading capabilities

Related

Getting "Remotely closed" error for HTTP POST Request In the MULE 4

I am Getting below error while calling a Mulesoft system api from mulesoft process API via DLB in the cloudHub.
The frequency of closing the remote connection error is not fixed.
Sometimes this error comes after 2 minutes and sometimes after 5 minutes.
Though, with second retry it works but still I want to avoid this error as its being happening very frequently.
HTTP POST on resource 'https://internal-nonprod-dlb.lb.anypointdns.net:443/api/sys/aws/s3/databricks/object' failed: Remotely closed.
Mule Version : 4.4
HTTP connector version: 1.7.3
DLB Timeout : 7 minutes
Payload size : ~ 30 MB System
APIs Listener Idle timeout : 5 Minutes
request configuration in process api
<http:request method="POST" doc:name="POST GZIP / aws system api" doc:id="0b490747-5069-4546-9446-8b77130ae848" config-ref="Aws_Sys_API_HTTP_Request_configuration" path="${awsSysApi.databricksPath}" responseTimeout="600000">
<reconnect />
<http:headers><![CDATA[#[output application/java
---
{
"client_secret" : p('secure::awsSysApi.client_secret'),
"Content-Type" : "application/gzip",
"client_id" : p('secure::awsSysApi.client_id')
}]]]></http:headers>
<http:query-params><![CDATA[#[output application/java
---
{
"bucketName" : p('aws.bucket.datalakeRawDeBucket'),
"key" : vars.key
}]]]></http:query-params>
<http:response-validator>
<http:success-status-code-validator values="200..499" />
</http:response-validator>
</http:request>
HTTP requests global configuration in process API
<http:request-config name="Aws_Sys_API_HTTP_Request_configuration" doc:name="HTTP Request configuration" doc:id="5a7eb30f-9850-4de5-8cca-a7d77b0c10d4" basePath="${awsSysApi.basepath}">
<http:request-connection host="${awsSysApi.host}" port="${awsSysApi.port}" protocol="HTTPS" connectionIdleTimeout="${awsSysApi.idletTimeout}">
<reconnection>
<reconnect frequency="${retry.millisecondsBetweenRetries}" count="${retry.maxRetries}" />
</reconnection>
<tls:context>
<tls:trust-store insecure="true" />
<tls:key-store type="jks" path="${tls.keyStore.path}" keyPassword="${secure::tls.keyStore.keyPassword}" password="${secure::tls.keyStore.password}" />
</tls:context>
</http:request-connection>
</http:request-config>
system APIs listener configuration:
<http:listener-connection host="${http.host}" port="${http.private.port}" readTimeout="300000" connectionIdleTimeout="360000">
<reconnection>
<reconnect frequency="30000" count="2" />
</reconnection>
</http:listener-connection> ```
Please let me know if i have missed any information.
Probably you system API is failing to process big payloads. It could be also related to concurrency, or both. Maybe insufficient resources.
You should try to reproduce locally with a similar memory configuration with different payloads to understand the issue.
The HTTP configurations alone are unlikely to be related to the issue. FYI the Connection idle timeout is not related to connection timeouts.

collectd not able to send data to graphite carbon daemon

We have a graphite full stack server which receives metrics from different machines. While other collectd client are sending data fine, one of the client is giving the below error:
Jan 29 23:24:44 collectd-client collectd[25489]: write_graphite plugin: send
to graphite-server:2003 ((null)) failed with status -1 (Connection
refused) Jan 29 23:24:44 collectd-client collectd[25489]: collectd: Stopping
5 write threads.
collectd.conf as below
LoadPlugin syslog
LoadPlugin cpu
LoadPlugin df
LoadPlugin disk
LoadPlugin interface
LoadPlugin load
LoadPlugin memory
LoadPlugin rrdtool
LoadPlugin write_graphite
<Plugin df>
MountPoint "/"
</Plugin>
<Plugin disk>
Disk "/^[hs]d[a-f][0-9]?$/"
</Plugin>
<Plugin interface>
Interface "eth0"
</Plugin>
<Plugin write_graphite>
<Node "carbon">
Host "sde-graphite"
Port "2003"
Prefix "collectd"
Postfix "collectd"
StoreRates true
AlwaysAppendDS false
EscapeCharacter "_"
</Node>
</Plugin>
Verify whether carbon is running in host sde-graphite at port 2003. you can do a netstat and see if there is a UDP listener at 2003. I guess, it is not running.
SOLVED:
I had the same issue, my metrics are always working but randomly some nodes stop sending metrics. And collectd shows the same error:
Jun 18 15:04:23 node-a collectd[20235]: write_graphite plugin: send to 10.8.0.100:2003 (udp) failed with status -1 (Invalid argument)
Jun 18 15:04:23 node-a collectd[20235]: Filter subsystem: Built-in target `write': Dispatching value to all write plugins failed with status -1.
The daemon is still alive but not sending metrics to graphite.
NOTE: My nodes send data to graphite trough an openvpn tunnel.
It seems to be a connection timeout error against the graphite server. I can reproduce the error by stopping/interrupting vpn service and immediately collectd shows the error above.
Hope it helps
Enjoy!

Infinispan Clustered Cache & JGroups - Servers don't see each other

I'm using Infinispan to create a distributed cache between two servers and to leverage its failover feature.
I initially tested my webservice on two local instances of tomcat, using the pre-configured JGroups configuration file provided by infinispan-core-7.0.0.Final.jar. I was able to get the distributed cache working between the two Tomcat instances since the pre-configured xml files were using the loopback ip address.
I then moved the webservice onto two separate servers and have been unable to have them join the same Group. I created my own custom JGroups tcp configuration xml because using the loopback ip in the pre-configured one was causing some issues.
I don't have much experience in setting up tcp or udp channel, so I think the problem may lie with my JGroups configuration file (I based it off the pre-configured one).
<config xmlns="urn:org:jgroups"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/JGroups-3.4.xsd">
<!-- bind_addr="${jgroups.tcp.address:127.0.0.1}"-->
<TCP
bind_addr="GLOBAL"
bind_port="${jgroups.tcp.port:7800}"
port_range="30"
recv_buf_size="20m"
send_buf_size="640k"
max_bundle_size="31k"
use_send_queues="true"
enable_diagnostics="false"
bundler_type="sender-sends-with-timer"
thread_naming_pattern="pl"
thread_pool.enabled="true"
thread_pool.min_threads="2"
thread_pool.max_threads="30"
thread_pool.keep_alive_time="60000"
thread_pool.queue_enabled="true"
thread_pool.queue_max_size="100"
thread_pool.rejection_policy="Discard"
oob_thread_pool.enabled="true"
oob_thread_pool.min_threads="2"
oob_thread_pool.max_threads="30"
oob_thread_pool.keep_alive_time="60000"
oob_thread_pool.queue_enabled="false"
oob_thread_pool.queue_max_size="100"
oob_thread_pool.rejection_policy="Discard"
internal_thread_pool.enabled="true"
internal_thread_pool.min_threads="2"
internal_thread_pool.max_threads="4"
internal_thread_pool.keep_alive_time="60000"
internal_thread_pool.queue_enabled="true"
internal_thread_pool.queue_max_size="100"
internal_thread_pool.rejection_policy="Discard"
/>
<!-- Ergonomics, new in JGroups 2.11, are disabled by default in TCPPING until JGRP-1253 is resolved -->
<!--
<TCPPING timeout="3000"
initial_hosts="localhost[7800],localhost[7801]"
port_range="5"
num_initial_members="3"
ergonomics="false"
/>
-->
<!-- bind_addr="${jgroups.bind_addr:127.0.0.1}" -->
<!-- ip_ttl="${jgroups.udp.ip_ttl:2}"-->
<MPING bind_addr="GLOBAL" break_on_coord_rsp="true"
mcast_addr="${jgroups.mping.mcast_addr:228.2.4.6}"
mcast_port="${jgroups.mping.mcast_port:43366}"
num_initial_members="3"/>
<MERGE3/>
<FD_SOCK/>
<FD timeout="3000" max_tries="5"/>
<VERIFY_SUSPECT timeout="1500"/>
<pbcast.NAKACK2 use_mcast_xmit="false"
xmit_interval="1000"
xmit_table_num_rows="100"
xmit_table_msgs_per_row="10000"
xmit_table_max_compaction_time="10000"
max_msg_batch_size="100"/>
<UNICAST3 xmit_interval="500"
xmit_table_num_rows="20"
xmit_table_msgs_per_row="10000"
xmit_table_max_compaction_time="10000"
max_msg_batch_size="100"
conn_expiry_timeout="0"/>
<pbcast.STABLE stability_delay="500" desired_avg_gossip="5000" max_bytes="1m"/>
<pbcast.GMS print_local_addr="false" join_timeout="3000" view_bundling="true"/>
<tom.TOA/> <!-- the TOA is only needed for total order transactions-->
<MFC max_credits="2m" min_threshold="0.40"/>
<FRAG2 frag_size="30k"/>
<RSVP timeout="60000" resend_interval="500" ack_on_delivery="false" />
</config>
My initial thought is that the problem may be with the bind_addr in the TCP and MPing elements. The two servers are on the same network and are able to ping each other. Anyone have any tips/insights on the configuration file above?
If it helps I've posted what's in the log file in regards to the Infinispan/JGroups startup below:
SERVER 1:
INFO JGroupsTransport - ISPN000078: Starting JGroups channel esrs
Nov 20, 2014 3:22:43 AM org.jgroups.logging.JDKLogImpl warn
WARNING: JGRP000014: Discovery.num_initial_members has been deprecated: will be ignored
INFO JGroupsTransport - ISPN000094: Received new cluster view for channel esrs: [udmesrs02-61057|0] (1) [udmesrs02-61057]
INFO JGroupsTransport - ISPN000079: Channel esrs local address is udmesrs02-61057
INFO GlobalComponentRegistry - ISPN000128: Infinispan version: Infinispan 'Guinness' 7.0.0.Final
SERVER 2:
INFO JGroupsTransport - ISPN000078: Starting JGroups channel esrs
Nov 20, 2014 3:20:28 AM org.jgroups.logging.JDKLogImpl warn
WARNING: JGRP000014: Discovery.num_initial_members has been deprecated: will be ignored
INFO JGroupsTransport - ISPN000094: Received new cluster view for channel esrs: [udmesrs01-16389|0] (1) [udmesrs01-16389]
INFO JGroupsTransport - ISPN000079: Channel esrs local address is udmesrs01-16389
INFO GlobalComponentRegistry - ISPN000128: Infinispan version: Infinispan 'Guinness' 7.0.0.Final
There are two possible issues: IPv4/IPv6 issues and UDP routing.
First try to set -Djava.net.preferIPv4Stack=true on both machines.
If that does not help, check your UDP firewall and routing settings.
If you don't find anything strange there, you'll have to use tcpdump on udp and port 43366 and tcp 7800 and see if there's any activity - there should be some multicast packet going from each node at least every 15 s.

tomcat http request aborted but https fine

I have a web application deployed on tomcat 7.0.23, and there are two connectors are set, almost default value.
<Service name="Catalina">
<Connector port="8080" protocol="HTTP/1.1"
connectionTimeout="20000"
compression="on"
compressableMimeType="text/xml"
address="SERVER_HOSTNAME" />
<Connector port="8443" protocol="HTTP/1.1" SSLEnabled="true"
maxThreads="400" scheme="https" secure="true"
address="SERVER_HOSTNAME"
clientAuth="false" SSLProtocol="ALL"
SSLCertificateFile="/PATH/tomcat-server.crt"
SSLCertificateKeyFile="/PATH/tomcat-server.rsa"
SSLCipherSuite="ALL:!ADH:!SSLv2:!EXPORT40:!EXP:!LOW"
compression="on" compressableMimeType="text/xml"/>
After tomcat just restarts, both http:8080 and https:8443 work fine. While after a few days, the 8080 will not work, but the 8443 still works fine. The meaning of "8080 not work" is when using firefox to access the http:8080, some resources like js/css files will unavailable randomly.
In firebug, sometimes the A.js file will be shown as "Aborted", sometime the B.js will be shown as "Aborted". I tried to access one single file, like http://:8080/js/A.js file, the result is also random, sometime the full content can be shown in browser, sometime http request is aborted.
I also tried to increase the connectionTimeout to "60000", the only change thing is in Firebug, the aborted request was 0B but now is actual size. The only way to make 8080 work fine is to restart the tomcat.
Please someone tell me what's the cause or which way I should try? Thanks.
Another process might be taking the port 8080 somehow. And this process does not respond correctly to requests you address to Tomcat.
So, next time you see this issue, before restarting Tomcat, check which process the port 8080 currently belongs to.
On Linux I use the following command for this:
netstat -nlpt | grep 8080
One of the columns (the last one if I remember correctly) will be the ID of the process that consumes the port.
In case you have a Windows setup, use
netstat -ano | find "LISTENING" | find "8080"
Then find this PID in the Task Manager.
FYI: Windows Task Manager – showing the PID

What's the meaning of "simultaneous" in Tsung?

Here is my tsung.xml:
<?xml version="1.0"?>
<!DOCTYPE tsung SYSTEM "/usr/local/share/tsung/tsung-1.0.dtd">
<tsung loglevel="warning" version="1.0">
<clients>
<client host="localhost" use_controller_vm="true" maxusers="30000"/>
</clients>
<servers>
<server host="127.0.0.1" port="9988" type="tcp"/>
</servers>
<!--
<monitoring>
<monitor host="localhost" type="erlang"></monitor>
</monitoring>
-->
<load duration="90" unit="second">
<arrivalphase phase="1" duration="1" unit="minute">
<users arrivalrate="300" unit="second" />
</arrivalphase>
</load>
<options>
<option name="thinktime" value="0" random="false" override="true"/>
<option name="tcp_snd_buffer" value="4096"/>
<option name="tcp_rcv_buffer" value="4096"/>
<option name="ports_range" min="1025" max="65535"/>
</options>
<sessions>
<session name="mysocket" probability="100" type="ts_raw">
<request>
<raw datasize="1" ack="local"></raw>
</request>
</session>
</sessions>
</tsung>
It tests my socket program, but I cannot fully understand the Tsung report.
Please help to look at this pic, what does "simultaneous" mean?
If anyone can help to tell me something about the report stats, that will be fine.
Stats like following(fetched from Tsung's manual):
users: Number of simultaneous users.
connected: Number of simultaneous connected users. new in 1.2.2.
Is it good or bad if users and connected is low?
Thank you in advance.
The http server will close the TCP socket if there is no action on it for a while or if it is overloaded. The opening and closing of TCP Sockets is handled automatically by Tsung. That said the 'connected' value is the current number 'users' that are connected to the server. Hope this helps

Resources