Wget In Parralel. What can I do to improve the download speed? - unix

I'm trying to make a web crawler using wget. The crawler only fetches the homepages of subdomains, and I'm running it like this:
cat urls.txt | xargs -n 1 -P 800 -I {} wget {} --max-redirect 3 --tries=1 --no-check-certificate --read-timeout=95 --no-dns-cache --connect-timeout=60 --dns-timeout=45 -q
When I run that, I only get speeds of ~5mbps. The server I'm crawling from has a 100mbps bandwidth connecton and can download files from individual sites at 20mbps+.
What can I do to speed up this crawler?
Note:
The nameserver is Google DNS (8.8.8.8)
I have these ulimits
ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 254243
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 100024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 254243
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
and have tried these speed tweaks:
echo 30 > /proc/sys/net/ipv4/tcp_fin_timeout
echo 30 > /proc/sys/net/ipv4/tcp_keepalive_intvl
echo 5 > /proc/sys/net/ipv4/tcp_keepalive_probes
echo 1 > /proc/sys/net/ipv4/tcp_tw_recycle
echo 1 > /proc/sys/net/ipv4/tcp_tw_reuse
echo 1024 65535 > /proc/sys/net/ipv4/ip_local_port_range

Related

Error: C stack usage when compile R Markdown

I get a new error, when I try to compile an R Markdown file int appears the next message:
Error: C stack usage 7971408 is too close to the limit
Execution halted
I did some research and I found some people with the same error:
Error: C stack usage is too close to the limit
C stack usage 7970960 is too close to the limit
GenomicRanges: C stack usage ... is too close to the limit
R mapping (C stack usage 7971616 is too close to the limit)
C stack usage 7972356 is too close to the limit #335
But these guys have problems with some function or something like that.
The actions I did in orden to try to solve this situation:
Uninstall R and RStudio, reinstall de last versions of both, reboot my computer... nothing.
Try to change ulimit -s, and this point is interesting because this is my ulimit -a on R terminal:
geomicrobio-mac:~ geomicrobio$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
file size (blocks, -f) unlimited
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 10240
pipe size (512 bytes, -p) 1
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 1392
virtual memory (kbytes, -v) unlimited
When I try to change de ulimit -s for unlimited or 65532 on R terminal, it doesn't change.
The ulimit -a of my terminal (macOS Monterey v12.0.1) is:
-t: cpu time (seconds) unlimited
-f: file size (blocks) unlimited
-d: data seg size (kbytes) unlimited
-s: stack size (kbytes) 65532
-c: core file size (blocks) 0
-v: address space (kbytes) unlimited
-l: locked-in-memory size (kbytes) unlimited
-u: processes 1392
-n: file descriptors 2560
This just happen with R Markdown, I can do Shinny apps, and run scripts, etc. but I can`t compile any R Markdown despite it contains only text.
This is the info when I put base::Cstack_info() on console:
size current direction eval_depth
7969177 14032 1 2
My version of R:
platform x86_64-apple-darwin17.0
arch x86_64
os darwin17.0
system x86_64, darwin17.0
status
major 4
minor 1.2
year 2021
month 11
day 01
svn rev 81115
language R
version.string R version 4.1.2 (2021-11-01)
nickname Bird Hippie
If you know how to solve this I really appreciate your help.
Thank you.
I just delete the .Rprofile .-.

Asterisk EAGI audio while running AMD or other asterisk app via "EXEC"

Is it possible to use "AMD" to detect silence in EAGI script and receive the audio on fd 3 at the same time?
Is this scenario supported or I am doing something wrong?
Simple demonstration bash script, which is run as EAGI(/home/agi/eagi.sh) from asterisk:
#!/bin/bash
log=/tmp/eagi.log
# Read all variables sent by Asterisk to array
declare -a array
while read -e ARG && [ "$ARG" ] ; do
array=(` echo $ARG | sed -e 's/://'`)
export ${array[0]}=${array[1]}
echo $ARG | sed -e 's/://' >>$log
done
/usr/bin/dd if=/dev/fd/3 of=/tmp/eagi.tmp.out &>>$log &
### or just sleep 10 ###
sleep 1
echo "EXEC AMD"
read line # blocks until silence is detected by AMD
echo $line >>$log
sleep 1
### ###
kill -USR1 %1; sleep 0.1; kill %1
ls -lh /tmp/eagi.tmp.out >>$log
echo "EXEC HANGUP "
read line
echo $line >>$log
exit
What it does is it starts capturing the audio data from fd 3 via dd started as background process. When I have just sleep 10 instead of the echo EXEC AMD, after the 10 seconds, dd has recorded the full audio file.
However with "AMD", dd stops receiving data on fd 3 as soon as the "AMD" is executed (confirmed also via strace) and continues after "AMD" finishes. So while "AMD" is running, no audio is recorded.
Output in the logfile looks like this:
Working (with just sleep):
1522+501 records in
1897+0 records out
971264 bytes (971 kB, 948 KiB) copied, 10.0023 s, 97.1 kB/s
-rw-r--r-- 1 asterisk asterisk 958K Sep 24 10:16 /tmp/eagi.tmp.out
Non-working (with "AMD" which detected silence after 6 seconds, and dd was running the whole time but only 1 second before and 1 second after "AMD" was recorded into the file):
322+101 records in
397+0 records out
203264 bytes (203 kB, 198 KiB) copied, 8.06516 s, 25.2 kB/s
-rw-r--r-- 1 asterisk asterisk 208K Sep 24 10:13 /tmp/eagi.tmp.out
So is this some kind of bug in Asterisk, or just unsupported usage? I didn't find much info about EAGI in the Asterisk documentation, so not sure what is supported and what not. Version of Asterisk is 16.2.1 on Debian 10, the testing call was done via webphone on Chrome browser, audio passed via fd 3 was 48 kHz, 16bit, mono (maybe with some other audio format/codec, both fd 3 and "AMD" would work at the same time?)
EDIT2: Removed info about my complicated setup and added simple reproducible example.
EDIT3: During further debugging I used "EXEC Background" to output some short audio file to the caller and also during this no audio was recorded. So the issue seems to be not only with "EXEC AMD", but also "EXEC Background" and probably also other asterisk applications invoked by "EXEC".

Why isn't rsync faster at copying a modified file locally?

$ dd if=/dev/urandom of=1 bs=1048576 count=3
3+0 records in
3+0 records out
3145728 bytes transferred in 0.263337 secs (11945641 bytes/sec)
$ rsync -avz 1 2
building file list ... done
1
sent 3147373 bytes received 42 bytes 6294830.00 bytes/sec
total size is 3145728 speedup is 1.00
$ dd if=/dev/urandom of=new_prefix bs=1048576 count=3
3+0 records in
3+0 records out
3145728 bytes transferred in 0.276985 secs (11357037 bytes/sec)
$ cat 1 >> new_prefix
$ rsync -avz new_prefix 2
building file list ... done
new_prefix
sent 6294646 bytes received 42 bytes 4196458.67 bytes/sec
total size is 6291456 speedup is 1.00
Why aren't I receiving any speed-up when adding a prefix to file? AFAIK, rsync shouldn't just yield speedups for in-place modifications.
So what you're doing is:
Use rsync to copy a local file 1 to 2.
Make a new file new_prefix that is the same as 1 but has some more data inserted at the start.
Copy new_prefix on top of 2.
Think about what rsync has to do, to execute step 3.
Remember there is no OS interface to say "insert data at the start of a file": the only option is to rewrite the entire file 2. So rsync has to read the entire new_prefix file, and then write the entire 2. The IO is the limiting factor and there's no magic way around it.
If file 2 was remote, then librsync can make use of the similarity to send less network traffic and probably will show a speedup.

Buffered and Cache memory in Solaris

how to get the Buffer, Cache memory and Block in-out in Solaris ? For Example: In Linux I can get it using vmstat. vmstat in Linux gives
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
Where as vmstat in Solaris doesn't give buff and cache under ------memory----. Also there is no -----io----. How to get these fields on Solaris ?
Kernel memory:
kstat -p > /var/tmp/kstat-p
more details kernel memory statistics:
kstat -p -c kmem_cache
kstat -p -m vmem
kstat -p -c vmem
alternative:
echo “::kmastat” | mdb -k > /var/tmp/kmastat
Do not use iostat that way,
try to show busy disks with realtime sampling (you want this to start with):
iostat -xmz 2 4 # -> 2 seconds sampling time, 4 sampling intervals
show historical average data:
iostat -xm

Solaris 10 - How to view limits for a given process

In linux, I can do the following:
ps -ef | grep <some process>
<some user> 4847 1864 0 Oct 13 ? 28:45 <some program>
Then I can find out what limits are applied by viewing the following file:
cat /proc/4847/limits
Is there a way to do the same in Solaris 10?
Use the plimit command:
$ plimit 4350
4350: ksh -o vi
resource current maximum
time(seconds) unlimited unlimited
file(blocks) unlimited unlimited
data(kbytes) unlimited unlimited
stack(kbytes) 8192 unlimited
coredump(blocks) unlimited unlimited
nofiles(descriptors) 256 65536
vmemory(kbytes) unlimited unlimited

Resources