Dask job fails in Jupyter notebook cell with KilledWorker - jupyter-notebook

I am running a join task in a Jupyter notebook which is producing many warnings from Dask about a possible memory leak before finally failing with a killed worker error:
2022-07-26 21:38:05,726 - distributed.worker_memory - WARNING - Worker is at 85% memory usage. Pausing worker. Process memory: 1.59 GiB -- Worker memory limit: 1.86 GiB
2022-07-26 21:38:06,319 - distributed.worker_memory - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker-memory.html#memory-not-released-back-to-the-os for more information. -- Unmanaged memory: 1.59 GiB -- Worker memory limit: 1.86 GiB
2022-07-26 21:38:07,501 - distributed.worker_memory - WARNING - Worker tcp://127.0.0.1:46137 (pid=538697) exceeded 95% memory budget. Restarting...
2022-07-26 21:38:07,641 - distributed.nanny - WARNING - Restarting worker
KilledWorker: ("('assign-6881b18750807133ba976bf463a98c23', 0)", <WorkerState 'tcp://127.0.0.1:46137', name: 0, status: closed, memory: 0, processing: 50>)
This happens when I run my code on a laptop with 32GB RAM (Kubuntu 20). Maybe I have not configured Dask correctly for the environment? I can watch the memory usage go up and down in the system monitor but at no point does it consume all the memory. How can I tell Dask to use all the cores and as much memory as it can manage? It seems to be running in single processor mode, maybe because I'm running on a laptop rather than a proper cluster?
For context: I'm joining two datasets, both are text files with sizes 25GB and 5GB. Both files have been read into Dask DataFrame objects using dd.read_fwf(), then I transform a string field on one of the frames, then join (merge) on that field.

There are certain memory constraints for a particular worker, which you can read from here : https://distributed.dask.org/en/stable/worker-memory.html .
Apart from this, you can try increasing the number of workers, threads while initializing the dask client.

Related

Stress-ng - Overload Memory

I want to test the systems reaction to a process that wants to consume more memory than there is available.
I run stress-ng with the following command (on a 6G RAM machine):
stress-ng --vm-bytes 8G --vm-keep -m 1 --aggressive
but I get this error:
stress-ng: error: [5035] stress-ng-vm: gave up trying to mmap, no available memory
Is it possible to force the program to ignore its own secure mechanism ?
try to add this parameter --vm 4
I was having the same problem and it is gone after that.

Mysterious mariadb 10.4.1 ram ussage

i have upgraded from mariadb 10.1.36 to 10.4.8 and i can see mysterious increasing ram ussage on that new version. I also edited innodb_buffer_pool_size ant seems there is no effect if its set to 15M or 4G, ram is just slowly increasing. After while it eat whole ram and oom killer kills mariadb and this is repeating.
My server has 8GB RAM and its increasing like 60-150MB per day. Its not terrible but i have around 150 database servers so its huge problem.
I can temporary fix problem by restarting mariadb and its start again.
Info about database server:
databases: 200+
tables: 28200(141 per database)
average active connections: 100-200
size of stored data: 100-350GB
cpu: 4
ram: 8GB
there is my config:
server-id=101
datadir=/opt/mysql/
socket=/var/lib/mysql/mysql.sock
tmpdir=/tmp/
gtid-ignore-duplicates=True
log_bin=mysql-bin
expire_logs_days=4
wait_timeout=360
thread_cache_size=16
sql_mode="ALLOW_INVALID_DATES"
long_query_time=0.8
slow_query_log=1
slow_query_log_file=/opt/log/slow.log
log_output=TABLE
userstat = 1
user=mysql
symbolic-links=0
binlog_format=STATEMENT
default_storage_engine=InnoDB
slave_skip_errors=1062,1396,1690innodb_autoinc_lock_mode=2
innodb_buffer_pool_size=4G
innodb_buffer_pool_instances=5
innodb_log_file_size=1G
innodb_log_buffer_size=196M
innodb_flush_log_at_trx_commit=1
innodb_thread_concurrency=24
innodb_file_per_table
innodb_write_io_threads=24
innodb_read_io_threads=24
innodb_adaptive_flushing=1
innodb_purge_threads=5
innodb_adaptive_hash_index=64
innodb_flush_neighbors=0
innodb_flush_method=O_DIRECT
innodb_io_capacity=10000
innodb_io_capacity_max=16000
innodb_lru_scan_depth=1024
innodb_sort_buffer_size=32M
innodb_ft_cache_size=70M
innodb_ft_total_cache_size=1G
innodb_lock_wait_timeout=300
slave_parallel_threads=5
slave_parallel_mode=optimistic
slave_parallel_max_queued=10000000
log_slave_updates=on
performance_schema=on
skip-name-resolve
max_allowed_packet = 512M
query_cache_type=0
query_cache_size = 0
query_cache_limit = 1M
query_cache_min_res_unit=1K
max_connections = 1500
table_open_cache=64K
innodb_open_files=64K
table_definition_cache=64K
open_files_limit=1020000
collation-server = utf8_general_ci
character-set-server = utf8
log-error=/opt/log/error.log
log-error=/opt/log/error.log
pid-file=/var/run/mysqld/mysqld.pid
malloc-lib=/usr/lib64/libjemalloc.so.1
I solved it! The problem is memory allocation library.
If you do this SQL query:
SHOW VARIABLES LIKE 'version_malloc_library';
You must to get value "jemalloc" library. If you get only "system", you may have problems.
To change that, you need edit any .conf file in this directory:
/etc/systemd/system/mariadb.service.d/
There, add this line:
Environment="LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.1"
(this library file may be in other folder)
Then you must to restart mysqld
service mysqld stop && systemctl daemon-reload && service mysqld start
You got carried away in increasing values in my.cnf.
Many of the caches grow until hitting their limit, hence the memory growth you experienced.
What is the value from SHOW GLOBAL STATUS LIKE 'Max_used_connections';? Having a large max_connections accentuates several of the other values; lower it.
But perhaps the really bad one(s) involve table caches -- which have units of tables, not bytes. Crank these down a lot:
table_open_cache=64K
innodb_open_files=64K
table_definition_cache=64K
I have exactly the same problem. Is it due to a bad configuration? Or is it a bug of the new version?
mariadb 10.1 was updated to 10.3 just when I upgraded Debian 9 to Debian 10. I tried solve the problem with mariadb 10.4 but nothing changed.
I want to downgrade version but I think it's neccesary dump all database and restore it, and that means being hours without service.
I don't think Debian 10 has to do with the issue
Please read my previous comments about alternative memory allocators...
When jemalloc is used:
When default memory allocator used:
Try with tcmalloc
Environment="LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libtcmalloc_minimal.so.4.5.3.1"

R error: C stack usage too close to limit on small dataset

I have just done a clean install of R on Ubuntu 18.04, and it does not really work.
I can do stuff like make vectors and dataframes, and get summaries. I can also make a histogram with the hist command just fine. However, the plot command does not work.
The following code, which is very basic and should work just fine:
data(faithful)
plot(faithful$eruptions)
runs for about 30 seconds, before giving the following error:
Error: C stack usage 7970244 is too close to the limit
I have seen lots of posts of other people having the same error, but it seems to be because they are dealing with large datasets/lots of recursion or something like that, but I have this problem even with a dataset of just 3 values. R should definitely be able to handle this without me increasing the limit, and it should not take 30 seconds to run.
Does anybody know what the problem could be?
Edit:
Output of ulimit -a:
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 28697
max locked memory (kbytes, -l) 16384
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 28697
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Version info:
R version 3.4.4 (2018-03-15) -- "Someone to Lean On"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
This is plain R in the terminal, not through any IDE.
Edit 2:
I have discovered that it works just fine on other user accounts on the computer. I've also started getting other weird issues (after a reboot I have to login to a Gnome session, then log out before I can log into plasma, but other users don't have that problem, sometimes the terminal can't launch, lots of things are crashing, etc) so I think this has nothing to do with R and is much bigger. Unfortunately, it might be a hardware issue (this computer has had wacky issues before on other operating systems).

502 Gitlab is taking too much time to respond

After taking gitlab backup everyday gitlab is throwing 502 error.
I saw nginx logs but did not find that much information.
After gitlab-ctl restart it starts working again.
System Configurations:
OS : Ubuntu 16.04 LTS
4 GB Ram
200 GB Disk Space
can anyone give permanent solution for it.
There is a high possibility that it run out of shared memory. As each time after the backup you got the 502 error.
To check it with gitlab-ctl tail tail detail
It will show something like:
2019-04-12_12:37:17.27154 FATAL: could not map anonymous shared memory: Cannot allocate memory
2019-04-12_12:37:17.27157 HINT: This error usually means that PostgreSQL's request for a shared memory segment exceeded available memory, swap space, or huge pages. To reduce the request size (currently 4345470976 bytes), reduce PostgreSQL's shared memory usage, perhaps by reducing shared_buffers or max_connections.
2019-04-12_12:37:17.27171 LOG: database system is shut down
Then check it with free -m, which shows there is no available shared memory.
total used free shared buffers cached
Mem: 16081 13715 2365 0 104 753
-/+ buffers/cache: 12857 3223
Then you need to check if there is some process take too many shared memory, or too many zomibe process, then kill it with command like ps -aef | grep ffmpeg | awk '{print $2}' | xargs kill 9
Check it with free -h, there is about 112M shared memory now.
total used free shared buffers cached
Mem: 15G 4.4G 11G 112M 46M 416M
-/+ buffers/cache: 3.9G 11G
Swap: 0B 0B 0B
At last,restart you gitlab with gitlab-ctl restart, after sometime the gitlab booted, the 502 gone.
After long search i got something about it. After taking backup my gitlab-workhorse is getting ideal and gitlab.socket is refusing the connection. As temporary solution i have installed a new cron job for restarting gitlab service after the complpetion of gitlab backup cronjob.
If the gitlab is installed in Virtual-Box - Ubuntu server either 18.04 or 20.04,
please increase the RAM to 4gb and the provide atleast 3 processors.

Not enough space when running Hudson jobs

My Hudson jobs are crashing on each run with this error:
Caused by: java.io.IOException: error=12, Not enough space
at java.lang.UNIXProcess.forkAndExec(Native Method)
I found documention on StackOverflow and on the Jenkins website regarding this error, which indicate a problem of swap space (https://wiki.jenkins-ci.org/display/JENKINS/IOException+Not+enough+space).
However, maybe my problem is different or not, but if I launch the process manually it works fine.
A weird thing is I see different resuls from top of from prstat:
Specs:
Hudson processes are running in their own Unix user
OS: SunOS dc5c00-d12 5.10 Generic_147440-19 sun4v sparc sun4v
Memory:
from top:
32G phys mem, 6255M free mem, 16G total swap, 16G free swap
from prstat
NPROC USERNAME SWAP RSS MEMORY TIME CPU
50 user1 12G 12G 39% 89:02:31 0.3%
36 user2 11G 6779M 21% 155:17:41 0.0%
26 user3 10G 8509M 26% 4787:37:4 8.0%
6 hudson 572M 556M 1.7% 0:08:25 0.0%
57 root 280M 285M 0.9% 138:46:05 0.0%
Can anywone confirm if I have a swap issue? top shows 16GB free...
EDIT:
results from swap -s (after problem being remporarly resolved)
total: 19940168k bytes allocated + 12578048k reserved = 32518216k used, 4118208k available
.
It is certainly a swap issue.
top is reporting as free swap blocks that do not contain paginated data. However, even while unused, some of these blocks can be reserved (i.e untouched still allocated virtual memory). When you have no more blocks to back memory reservations, you got this "Not enough space" exception.
swap -s shows your applications are reserving more that 12 GB while your swap area is just 16 GB. I would double the size of your swap to prevent virtual memory shortage in your case.

Resources