Can i decide how much memory to allocate in LSF queue - unix

Is there any option to decide how much memory I can allocate in LSF?
I tried
bsub -R "rusage[mem=10000]" sleep 1000s
But when i checked resource using "bjobs -l "
I get this:
Job <203180>, User <xxxxx>, Project <default>, Status <RUN>, Queue <medium>,
Job Priority <50>, Command <sleep 1000s>
Thu Apr 12 09:49:56: Submitted from host <xxxx>, CWD <xx>, Requested Resources <rusa
ge[mem=10000]>;
Thu Apr 12 09:49:58: Started on <xxxx>, Execution Home <xxxx>, E
xecution CWD <xxxxx>;
Thu Apr 12 09:49:58: Resource usage collected.
MEM: 3 Mbytes; SWAP: 16 Mbytes; NTHREAD: 1
PGID: 28231; PIDs: 28231
Where am I wrong?

bsub -R "rusage[mem=10000]": will initially reserve 10000 MBytes of memory.
Whereas:
"MEM: 3 Mbytes" is the total resident memory usage of all currently running processes in your job.
"SWAP: 16 Mbytes" is the total virtual memory usage of all currently running processes in your job.
The values "3 Mbytes" and "16 Mbytes" may change during the runtime.

In my system we use -M, say bsub -M 1 to request 1 G of memory limit, the job is killed if it goes above that limit.

Related

Airflow simple tasks failing without logs with small parallelism LocalExecutor (was working with SequentialExecutor)

Running airflow (v1.10.5) dag that ran fine with SequentialExecutor now has many (though not all) simple tasks that fail without any log information when running with LocalExecutor and minimal parallelism, eg.
<airflow.cfg>
# overall task concurrency limit for airflow
parallelism = 8 # which is same as number of cores shown by lscpu
# max tasks per dag
dag_concurrency = 2
# max instances of a given dag that can run on airflow
max_active_runs_per_dag = 1
# max threads used per worker / core
max_threads = 2
# 40G of RAM available total
# CPUs: 8 (sockets 4, cores per socket 4)
see https://www.astronomer.io/guides/airflow-scaling-workers/
Looking at the airflow-webserver.* logs nothing looks out of the ordinary, but looking at airflow-scheduler.out I see...
[airflow#airflowetl airflow]$ tail -n 20 airflow-scheduler.out
....
[2019-12-18 11:29:17,773] {scheduler_job.py:1283} INFO - Executor reports execution of mydag.task_level1_table1 execution_date=2019-12-18 21:21:48.424900+00:00 exited with status failed for try_number 1
[2019-12-18 11:29:17,779] {scheduler_job.py:1283} INFO - Executor reports execution of mydag.task_level1_table2 execution_date=2019-12-18 21:21:48.424900+00:00 exited with status failed for try_number 1
[2019-12-18 11:29:17,782] {scheduler_job.py:1283} INFO - Executor reports execution of mydag.task_level1_table3 execution_date=2019-12-18 21:21:48.424900+00:00 exited with status failed for try_number 1
[2019-12-18 11:29:18,833] {scheduler_job.py:832} WARNING - Set 1 task instances to state=None as their associated DagRun was not in RUNNING state
[2019-12-18 11:29:18,844] {scheduler_job.py:1283} INFO - Executor reports execution of mydag.task_level1_table4 execution_date=2019-12-18 21:21:48.424900+00:00 exited with status success for try_number 1
....
but not really sure what to take away from this.
Anyone know what could be going on here or how to get more helpful debugging info?
Looking again at my lscpu specs, I noticed...
[airflow#airflowetl airflow]$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 2
Notice Thread(s) per core: 1
Looking at my airflow.cfg settings I see max_threads = 2. Setting max_threads = 1 and restarting both the scheduler seems to have fixed the problem.
If anyone knows more about what exactly is going wrong under the hood (eg. why the task fails rather than just waiting for another thread to become available), would be interested to hear about it.

snakemake always report " MissingOutputException in line 44, Missing files after 5 seconds:

I always get the same error report in my RNAs-seq pipeline by snakemake:
MissingOutputException in line 44 of /root/s/r/snakemake/my_rnaseq_data/Snakefile:
Missing files after 5 seconds:
03_align/wt2.bam
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
Here is my Snakefile:
SBT=["wt1","wt2","epcr1","epcr2"]
rule all:
input:
expand("02_clean/{nico}_1.paired.fq", nico=SBT),
expand("02_clean/{nico}_2.paired.fq", nico=SBT),
expand("03_align/{nico}.bam", nico=SBT)
rule trim:
input:
"01_raw/{nico}_1.fastq",
"01_raw/{nico}_2.fastq"
output:
"02_clean/{nico}_1.paired.fq.gz",
"02_clean/{nico}_1.unpaired.fq.gz",
"02_clean/{nico}_2.paired.fq.gz",
"02_clean/{nico}_2.unpaired.fq.gz",
shell:
"java -jar /software/Trimmomatic-0.36/trimmomatic-0.36.jar PE -threads 16 {input[0]} {input[1]} {output[0]} {output[1]} {output[2]} {output[3]} ILLUMINACLIP:/software/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 &"
rule gzip:
input:
"02_clean/{nico}_1.paired.fq.gz",
"02_clean/{nico}_2.paired.fq.gz"
output:
"02_clean/{nico}_1.paired.fq",
"02_clean/{nico}_2.paired.fq"
run:
shell("gzip -d {input[0]} > {output[0]}")
shell("gzip -d {input[1]} > {output[1]}")
rule map:
input:
"02_clean/{nico}_1.paired.fq",
"02_clean/{nico}_2.paired.fq"
output:
"03_align/{nico}.sam"
log:
"logs/map/{nico}.log"
threads: 40
shell:
"hisat2 -p 20 --dta -x /root/s/r/p/A_th/WT-Al_VS_WT-CK/index/tair10 -1 {input[0]} -2 {input[1]} -S {output} >{log} 2>&1 &"
rule sort2bam:
input:
"03_align/{nico}.sam"
output:
"03_align/{nico}.bam"
threads:30
shell:
"samtools sort -# 20 -m 20G -o {output} {input} &"
everything is fine until I add "rule sort2bam" part.
When I dry-run ,it goes fine. But when I execute it,it report error as the question describe. And Surprisely it run the task where it report it stuck in the background.But it always run the one task.like these:
rule sort2bam:
input: 03_align/epcr1.sam
output: 03_align/epcr1.bam
jobid: 11
wildcards: nico=epcr1
Waiting at most 5 seconds for missing files.
MissingOutputException in line 45 of /root/s/r/snakemake/my_rnaseq_data/Snakefile:
Missing files after 5 seconds:
03_align/epcr1.bam
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
[Sat Apr 27 06:10:22 2019]
rule sort2bam:
input: 03_align/wt1.sam
output: 03_align/wt1.bam
jobid: 9
wildcards: nico=wt1
Waiting at most 5 seconds for missing files.
MissingOutputException in line 45 of /root/s/r/snakemake/my_rnaseq_data/Snakefile:
Missing files after 5 seconds:
03_align/wt1.bam
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
[Sat Apr 27 06:23:13 2019]
rule sort2bam:
input: 03_align/wt2.sam
output: 03_align/wt2.bam
jobid: 6
wildcards: nico=wt2
Waiting at most 5 seconds for missing files.
MissingOutputException in line 44 of /root/s/r/snakemake/my_rnaseq_data/Snakefile:
Missing files after 5 seconds:
03_align/wt2.bam
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
I don't know what's wrong with my code? Any ideals? Thanks in advance!
As you figured out, & is the problem. Control operator & makes your command run in the background in a subshell, and this leads snakemake to think that job is complete when in fact it is not. In your case, its usage doesn't appear to be required.
From man bash on usage of & (stolen from this answer):
If a command is terminated by the control operator &, the shell
executes the command in the background in a subshell. The shell does
not wait for the command to finish, and
the return status is 0.
I know how to solve, but I don't know why it works!
Just delete the '&' in
samtools sort -# 20 -m 20G -o {output} {input} &

intel_pt data cannot be imported properly into Intel VTune 2018

I am using Intel VTune 2018 to profile and derive the control flow dependencies by making use of the Intel_PT PMU under the system:
Kernel: 4.15.0-13-generic, 64bit Ubuntu
CPU: Intel® Core™ i7-7820X # 3.60GHz × 16
I started with the following commands:
1- amplxe-perf record -o a.perf -T -e intel_pt// -- ps
PID TTY TIME CMD
21471 pts/1 00:00:00 amplxe-perf
21472 pts/1 00:00:00 ps
58693 pts/1 00:00:00 sudo
58694 pts/1 00:00:00 su
58695 pts/1 00:00:00 bash
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 3.154 MB a.perf ]
2- amplxe-cl -import a.perf -r folder
amplxe: Importing a new result 100 % done
amplxe: Using result path `/home/amad/May2/folder'
amplxe: Executing actions 12 % Loading 'a.perf' file
amplxe: Error: Cannot load data file `/home/amad/May2/folder/data.0/a.perf' (Data file is corrupted).
amplxe: Executing actions 50 % done
amplxe: Error: 0x4000001e (Cannot load raw collector data)
Although intel_pt data has not been successfully imported, the data for other kernel PMU events like "cpu-cycles" and "instructions" could be properly handled:
1- amplxe-perf record -o p.perf -T -e cpu-cycles,instructions -- ps
PID TTY TIME CMD
8410 pts/0 00:00:00 sudo
8458 pts/0 00:00:00 amplxe-perf
8467 pts/0 00:00:00 ps
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.024 MB p.perf (96 samples) ]
2- amplxe-cl -import p.perf -r r2
amplxe: Importing a new result 100 % done
amplxe: Using result path `/home/amad/r2'
amplxe: Executing actions 19 % Resolving information for `libprocps.so.6.0.0'
amplxe: Warning: Cannot locate debugging information for file `/lib/x86_64-linux-gnu/libprocps.so.6.0.0'.
amplxe: Executing actions 21 % Resolving information for `vmlinux'
amplxe: Warning: Cannot locate debugging information for the Linux kernel. Source-level analysis will not be possible. Function-level analysis will be limited to kernel symbol tables. See the Enabling Linux Kernel Analysis topic in the product online help for instructions.
amplxe: Executing actions 75 % Generating a report
Collection and Platform Info
----------------------------
Parameter r2
---------------- ------------------------------------
Operating System 4.15.0-13-generic
Computer Name amad-pc
Result Size 2766877
Collector Type Driverless Perf per-process sampling
CPU
---
Parameter r2
----------------- ----------
Frequency 3600000000
Logical CPU Count 16
Summary
-------
Elapsed Time: 0.011
Paused Time: 0.0
CPU Time: 0.011
Average CPU Utilization: 0.897
Event summary
-------------
Hardware Event Type Hardware Event Count:Self Hardware Event Sample Count:Self Events Per Sample
------------------- ------------------------- -------------------------------- -----------------
cpu-cycles 40521584 45 4000
instructions 36302909 51 4000
amplxe: Executing actions 100 % done
What is wrong with Intel_pt data?

502 Gitlab is taking too much time to respond

After taking gitlab backup everyday gitlab is throwing 502 error.
I saw nginx logs but did not find that much information.
After gitlab-ctl restart it starts working again.
System Configurations:
OS : Ubuntu 16.04 LTS
4 GB Ram
200 GB Disk Space
can anyone give permanent solution for it.
There is a high possibility that it run out of shared memory. As each time after the backup you got the 502 error.
To check it with gitlab-ctl tail tail detail
It will show something like:
2019-04-12_12:37:17.27154 FATAL: could not map anonymous shared memory: Cannot allocate memory
2019-04-12_12:37:17.27157 HINT: This error usually means that PostgreSQL's request for a shared memory segment exceeded available memory, swap space, or huge pages. To reduce the request size (currently 4345470976 bytes), reduce PostgreSQL's shared memory usage, perhaps by reducing shared_buffers or max_connections.
2019-04-12_12:37:17.27171 LOG: database system is shut down
Then check it with free -m, which shows there is no available shared memory.
total used free shared buffers cached
Mem: 16081 13715 2365 0 104 753
-/+ buffers/cache: 12857 3223
Then you need to check if there is some process take too many shared memory, or too many zomibe process, then kill it with command like ps -aef | grep ffmpeg | awk '{print $2}' | xargs kill 9
Check it with free -h, there is about 112M shared memory now.
total used free shared buffers cached
Mem: 15G 4.4G 11G 112M 46M 416M
-/+ buffers/cache: 3.9G 11G
Swap: 0B 0B 0B
At last,restart you gitlab with gitlab-ctl restart, after sometime the gitlab booted, the 502 gone.
After long search i got something about it. After taking backup my gitlab-workhorse is getting ideal and gitlab.socket is refusing the connection. As temporary solution i have installed a new cron job for restarting gitlab service after the complpetion of gitlab backup cronjob.
If the gitlab is installed in Virtual-Box - Ubuntu server either 18.04 or 20.04,
please increase the RAM to 4gb and the provide atleast 3 processors.

Error establishing a database connection EC2 Amazon

I hope you can help me. I can not stand having to keep restarting my ec2 instance on Amazon.
I have two wordpress sites hosted there. My sites have always worked well until two months ago, one of them started having this problem. I tried all ways pack up, and the only solution was to reconfigure.
Now that all was right with the two. The second site started the same problem. I think Amazon is clowning me.
I am using a free micro instance. If anyone knows what the problem is, please help me!
Your issue will be the limited memory that is allocated to the T1 Micro instances in EC2. I'm assuming you are using ANI Linux in this case and if an alternate version of Linux is used then you may have different locations for your log and config files.
Make sure you are the root user.
Have a look at your MySQL logs in the following location:
/var/log/mysqld.log
If you see repeated instances of the following it's pretty certain that the 0.6GB of memory allocated to the micro instance is not cutting it.
150714 22:13:33 InnoDB: Initializing buffer pool, size = 12.0M
InnoDB: mmap(12877824 bytes) failed; errno 12
150714 22:13:33 InnoDB: Completed initialization of buffer pool
150714 22:13:33 InnoDB: Fatal error: cannot allocate memory for the buffer pool
150714 22:13:33 [ERROR] Plugin 'InnoDB' init function returned error.
150714 22:13:33 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
150714 22:13:33 [ERROR] Unknown/unsupported storage engine: InnoDB
150714 22:13:33 [ERROR] Aborting
You will notice in the log excerpt above that my buffer pool size is set to 12MB. This can be configured by adding the line innodb_buffer_pool_size = 12M to your MySQL config file /etc/my.cnf.
A pretty good way to deal with InnoDB chewing up your memory is to create a swap file.
Start by checking the status of your memory:
free -m
You will most probably see that your swap is not doing much:
total used free shared buffers cached
Mem: 592 574 17 0 15 235
-/+ buffers/cache: 323 268
Swap: 0 0 0
To start ensure you are logged in as the root user and run the following command:
dd if=/dev/zero of=/swapfile bs=1M count=1024
Wait for a bit as the command is not verbose but you should see the following response after about 15 seconds when the process is complete:
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 31.505 s, 34.1 MB/s
Next set up the swapspace with:
mkswap /swapfile
Now set up the swap event:
swapon /swapfile
If you get a permissions response you can ignore it or address the swap file by changing the permissions to 600 with the chmod command.
chmod 600 /swapfile
Now add the following line to /etc/fstab to create the swap spaces on server start:
/swapfile swap swap defaults 0 0
Restart your MySQL instance:
service mysqld restart
Finally check to see if your swap file is working correctly with the free -m command.
You should see something like:
total used free shared buffers cached
Mem: 592 575 16 0 16 235
-/+ buffers/cache: 323 269
Swap: 1023 0 1023
Hope this helps.

Resources