I have a .NET Core v6 self-contained application running in K8S pod, base image is mcr.microsoft.com/dotnet/runtime-deps:6.0-bullseye-slim.
First I open a shell in the pod, and installed DotNet SDK and tools
Then when I execute dotnet-gcdump collect -v -p PID , it ends with following error.
Writing gcdump to '/app/20221224_072905_8.gcdump'...
0.0s: Creating type table flushing task
0.3s: Flushing the type table
0.5s: Done flushing the type table
0.5s: Requesting a .NET Heap Dump
17.0s: gcdump EventPipe Session started
17.1s: Starting to process events
17.2s: .NET Dump Started...
Found a Gen2 Induced non-background GC Start at 3.562 msec GC Count 138
17.3s: Making GC Heap Progress...
30.1s: Timed out after 30 seconds
30.1s: Shutting down gcdump EventPipe session
30.7s: EventPipe Listener dying
31.1s: still reading...
32.1s: still reading...
33.1s: still reading...
34.1s: still reading...
35.1s: still reading...
36.1s: still reading...
37.1s: still reading...
38.1s: still reading...
39.1s: still reading...
40.1s: still reading...
41.1s: still reading...
42.1s: still reading...
43.1s: still reading...
43.3s: gcdump EventPipe session shut down
43.3s: gcdump EventPipe Session closed
43.4s: [Error] Exception during gcdump: System.ApplicationException: ETL file shows the start of a heap dump but not its completion.
at DotNetHeapDumpGraphReader.ConvertHeapDataToGraph() in /_/src/Tools/dotnet-gcdump/DotNetHeapDump/DotNetHeapDumpGraphReader.cs:line 512
at Microsoft.Diagnostics.Tools.GCDump.EventPipeDotNetHeapDumper.DumpFromEventPipe(CancellationToken ct, Int32 processID, MemoryGraph memoryGraph, TextWriter log, Int32 timeout, DotNetHeapInfo dotNetInfo) in /_/src/Tools/dotnet-gcdump/DotNetHeapDump/EventPipeDotNetHeapDumper.cs:line 205
[ 43.6s: Done Dumping .NET heap success=False]
According to this documentation,
dotnet-gcdump is unable to generate a .gcdump file due to missing
information, for example, [Error] Exception during gcdump:
System.ApplicationException: ETL file shows the start of a heap dump
but not its completion.. Or, the .gcdump file doesn't include the
entire heap.
Can't I dump a self-contained dotnet process?
ArangoDB start in failing with the following error in centos6. I'm using latest arangodb version arangodb3-3.3.16-1.x86_64.rpm
[root#vm1 RPM]# service arangodb3 start
Starting /usr/sbin/arangod: : arena 0 background thread creation failed (13)
/etc/init.d/arangodb3: line 43: 3576 Segmentation fault $ARANGO_BIN --uid arangodb --gid arangodb --server.rest-server false --log.foreground-tty false --database.check-version
FATAL ERROR: EXIT_CODE_RESOLVING_FAILED for code 139 - could not resolve exit code 139
[root#vm1 RPM]#
Any help will be really appreciable.
The exit code 139 = 128 + 11 means that the Arangod process crashed with a segmentation violation
Can you try the following:
Check your server for a faulty memory bank
Run a memory test with memtester
This should fix the problem
There was a similar issue that has been raised on GitHub: https://github.com/arangodb/arangodb/issues/2329
I'm seeing an IO error on the Riak console. I'm not sure what the cause is as the owner of the directory is riak. Here's how the error looks.
2018-01-25 23:18:06.922 [info] <0.2301.0>#riak_kv_vnode:maybe_create_hashtrees:234 riak_kv/730750818665451459101842416358141509827966271488: unable to start index_hashtree: {error,{{badmatch,{error,{db_open,"IO error: lock /var/lib/riak/anti_entropy/v0/730750818665451459101842416358141509827966271488/LOCK: already held by process"}}},[{hashtree,new_segment_store,2,[{file,"src/hashtree.erl"},{line,725}]},{hashtree,new,2,[{file,"src/hashtree.erl"},{line,246}]},{riak_kv_index_hashtree,do_new_tree,3,[{file,"src/riak_kv_index_hashtree.erl"},{line,712}]},{lists,foldl,3,[{file,"lists.erl"},{line,1248}]},{riak_kv_index_hashtree,init_trees,3,[{file,"src/riak_kv_index_hashtree.erl"},{line,565}]},{riak_kv_index_hashtree,init,1,[{file,"src/riak_kv_index_hashtree.erl"},{line,308}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,304}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]}}
2018-01-25 23:18:06.927 [info] <0.2315.0>#riak_kv_vnode:maybe_create_hashtrees:234 riak_kv/890602560248518965780370444936484965102833893376: unable to start index_hashtree: {error,{{badmatch,{error,{db_open,"IO error: lock /var/lib/riak/anti_entropy/v0/890602560248518965780370444936484965102833893376/LOCK: already held by process"}}},[{hashtree,new_segment_store,2,[{file,"src/hashtree.erl"},{line,725}]},{hashtree,new,2,[{file,"src/hashtree.erl"},{line,246}]},{riak_kv_index_hashtree,do_new_tree,3,[{file,"src/riak_kv_index_hashtree.erl"},{line,712}]},{lists,foldl,3,[{file,"lists.erl"},{line,1248}]},{riak_kv_index_hashtree,init_trees,3,[{file,"src/riak_kv_index_hashtree.erl"},{line,565}]},{riak_kv_index_hashtree,init,1,[{file,"src/riak_kv_index_hashtree.erl"},{line,308}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,304}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]}}
2018-01-25 23:18:06.928 [error] <0.27284.0> CRASH REPORT Process <0.27284.0> with 0 neighbours exited with reason: no match of right hand value {error,{db_open,"IO error: lock /var/lib/riak/anti_entropy/v0/890602560248518965780370444936484965102833893376/LOCK: already held by process"}} in hashtree:new_segment_store/2 line 725 in gen_server:init_it/6 line 328
Any ideas on what the problem could be?
I'm running the developers edition of Realm Object Server v1.8.3 as a mac app. I start it with the start-object-server.command. It has been running fine for a number of days and everything was working really well, but ROS is now crashing within seconds of starting it.
Clearly the issue is with the JavaScript element, but I am not sure what has led to this position, nor how best to recover from this error. I have not created any additional functions, so not adding any NODE.js issues: it's just ROS with half a dozen realms.
The stack dump I get from the terminal session is as below. Any thoughts on recovery steps and how to prevent it happening again would be appreciated.
Last few GCs
607335 ms: Mark-sweep 1352.1 (1404.9) -> 1351.7 (1402.9) MB, 17.4 / 0.0 ms [allocation failure] [GC in old space requested].
607361 ms: Mark-sweep 1351.7 (1402.9) -> 1351.7 (1367.9) MB, 25.3 / 0.0 ms [last resort gc].
607376 ms: Mark-sweep 1351.7 (1367.9) -> 1351.6 (1367.9) MB, 15.3 / 0.0 ms [last resort gc].
JS stacktrace
Security context: 0x3eb4332cfb39
1: DoJoin(aka DoJoin) [native array.js:~129] [pc=0x1160420f24ad] (this=0x3eb433204381 ,w=0x129875f3a8b1 ,x=3,N=0x3eb4332043c1 ,J=0x3828ea25c11 ,I=0x3eb4332b46c9 )
2: Join(aka Join) [native array.js:180] [pc=0x116042067e32] (this=0x3eb433204381 ,w=0x129875f3a8b1
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
1: node::Abort() [/Applications/realm-mobile-platform/realm-object-server/.prefix/bin/node]
2: node::FatalException(v8::Isolate*, v8::Local, v8::Local) [/Applications/realm-mobile-platform/realm-object-server/.prefix/bin/node]
3: v8::internal::V8::FatalProcessOutOfMemory(char const*, bool) [/Applications/realm-mobile-platform/realm-object-server/.prefix/bin/node]
4: v8::internal::Factory::NewRawTwoByteString(int, v8::internal::PretenureFlag) [/Applications/realm-mobile-platform/realm-object-server/.prefix/bin/node]
5: v8::internal::Runtime_StringBuilderJoin(int, v8::internal::Object**, v8::internal::Isolate*) [/Applications/realm-mobile-platform/realm-object-server/.prefix/bin/node]
6: 0x1160411092a7
/Applications/realm-mobile-platform/start-object-server.command: line 94: 39828 Abort trap: 6 node "$package/node_modules/.bin/realm-object-server" -c configuration.yml (wd: /Applications/realm-mobile-platform/realm-object-server/object-server)
Your ROS instance has run out of memory. To figure out why it runs out of memory, it would be helpful to see the log file of the server. Can you turn
on the debug level for logging.
If you want to send a log file to Realm, it is better to open an issue for this at https://github.com/realm/realm-mobile-platform/issues.
I have setup a spark cluster on EC2 with 20 nodes and set all the node IPs in conf/slaves of the master and launched a job with SparkR and 50 slices. My nodes are dual core with 4GB memory and at the end of my job collect the results into a csv file which should contain about 15000 lines (and 7 columns with floats). The job runs fine for a while (6000s) until I get the following error from the master (this is not from the spakr master log, but from the terminal window where I execute the spark job):
16/03/21 22:39:31 INFO TaskSetManager: Finished task 27.0 in stage 0.0 (TID 27) in 5954810 ms on ip-xxx-yy-xx-zzz.somewhere.compute.internal (8/40)
16/03/21 22:39:38 INFO TaskSetManager: Finished task 12.0 in stage 0.0 (TID 12) in 5962190 ms on ip-xxx-xx-xx-xxx.somewhere.compute.internal (9/40)
Error in if (returnStatus != 0) { : argument is of length zero
Calls: <Anonymous> -> <Anonymous> -> .local -> callJMethod -> invokeJava
Execution halted
16/03/21 22:40:16 INFO SparkContext: Invoking stop() from shutdown hook
16/03/21 22:40:16 INFO SparkUI: Stopped Spark web UI at http://172.31.21.134:4040
16/03/21 22:40:16 INFO DAGScheduler: Job 0 failed: collect at NativeMethodAccessorImpl.java:-2, took 6001.135894 s
16/03/21 22:40:16 INFO DAGScheduler: ShuffleMapStage 0 (RDD at RRDD.scala:36) failed in 6000.500 s
16/03/21 22:40:16 ERROR RBackendHandler: collect on 16 failed
16/03/21 22:40:16 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerStageCompleted(org.apache.spark.scheduler.StageInfo#6c9d21b2)
16/03/21 22:40:16 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerJobEnd(0,1458600016592,JobFailed(org.apache.spark.SparkException: Job 0 cancelled because SparkContext was shut down))
16/03/21 22:40:16 INFO SparkDeploySchedulerBackend: Shutting down all executors
I checked in the worker logs and I see the following two lines at the end of the log file:
16/03/21 22:40:16 INFO CoarseGrainedExecutorBackend: Driver commanded
a shutdown 16/03/21 22:40:16 ERROR CoarseGrainedExecutorBackend:
RECEIVED SIGNAL 15: SIGTERM
and then the log stops abruptly (no other errors or warnings before).
I don't see any hint towards what could cause the crash in the log file, my only guess is that it could be an out-of-memory error because when I run on a reduced input dataset it runs fine. Am I missing something?