I don't have Wordpress on my server, someone are trying requesting "/wp08/wp-includes/dtcla.php" (these is no this file in my server) all day, I have banned the ip of requester, but he still requesting, I want to know what is "dtcla.php", How should I solved it?
om" AGENT: "Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 5.2) Java/1.5.0_08" FORWARDED: "-"
IP: 127.0.0.1 - USER: - [17/Feb/2023:10:05:53 +0800] "GET / HTTP/1.1" 301 SENT: 169 REFERER: "-" AGENT: "curl/7.29.0" FORWARDED: "-"
IP: 123.51.239.35 - USER: - [17/Feb/2023:10:05:55 +0800] "GET /wp08/wp-includes/dtcla.php HTTP/1.1" 301 SENT: 169 REFERER: "http://www.google.com" AGENT: "Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 5.2) Java/1.5.0_08" FORWARDED: "-"
IP: 123.51.239.35 - USER: - [17/Feb/2023:10:05:55 +0800] "GET /wp08/wp-includes/dtcla.php HTTP/1.1" 301 SENT: 169 REFERER: "http://www.google.com" AGENT: "Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 5.2) Java/1.5.0_08" FORWARDED: "-"
IP: 123.51.239.35 - USER: - [17/Feb/2023:10:05:55 +0800] "GET /wp08/wp-includes/dtcla.php HTTP/1.1" 403 SENT: 555 REFERER: "http://www.google.com" AGENT: "Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 5.2) Java/1.5.0_08" FORWARDED: "-"
IP: 123.51.239.35 - USER: - [17/Feb/2023:10:05:57 +0800] "GET /wp08/wp-includes/dtcla.php HTTP/1.1" 301 SENT: 169 REFERER: "http://www.google.com" AGENT: "Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 5.2) Java/1.5.0_08" FORWARDED: "-"
IP: 123.51.239.35 - USER: - [17/Feb/2023:10:05:58 +0800] "GET /wp08/wp-includes/dtcla.php HTTP/1.1" 301 SENT: 169 REFERER: "http://www.google.com" AGENT: "Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 5.2) Java/1.5.0_08" FORWARDED: "-"
IP: 123.51.239.35 - USER: - [17/Feb/2023:10:05:58 +0800] "GET /wp08/wp-includes/dtcla.php HTTP/1.1" 403 SENT: 555 REFERER: "http://www.google.com" AGENT: "Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 5.2) Java/1.5.0_08" FORWARDED: "-"
IP: 127.0.0.1 - USER: - [17/Feb/2023:10:05:58 +0800] "GET / HTTP/1.1" 301 SENT: 169 REFERER: "-" AGENT: "curl/7.29.0" FORWARDED: "-"
IP: 123.51.239.35 - USER: - [17/Feb/2023:10:06:00 +0800] "GET /wp08/wp-includes/dtcla.php HTTP/1.1" 301 SENT: 169 REFERER: "http://www.google.com" AGENT: "Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 5.2) Java/1.5.0_08" FORWARDED: "-"
IP: 123.51.239.35 - USER: - [17/Feb/2023:10:06:00 +0800] "GET /wp08/wp-includes/dtcla.php HTTP/1.1" 301 SENT: 169 REFERER: "http://www.google.com" AGENT: "Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 5.2) Java/1.5.0_08" FORWARDED: "-"
IP: 123.51.239.35 - USER: - [17/Feb/2023:10:06:02 +0800] "GET /wp08/wp-includes/dtcla.php HTTP/1.1" 403 SENT: 555 REFERER: "http://www.google.com" AGENT: "Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 5.2) Java/1.5.0_08" FORWARDED: "-"
IP: 127.0.0.1 - USER: - [17/Feb/2023:10:06:03 +0800] "GET / HTTP/1.1" 301 SENT: 169 REFERER: "-" AGENT: "curl/7.29.0" FORWARDED: "-"
IP: 123.51.239.35 - USER: - [17/Feb/2023:10:06:04 +0800] "GET /wp08/wp-includes/dtcla.php HTTP/1.1" 301 SENT: 169 REFERER: "http://www.google.com" AGENT: "Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 5.2) Java/1.5.0_08" FORWARDED: "-"
IP: 123.51.239.35 - USER: - [17/Feb/2023:10:06:04 +0800] "GET /wp08/wp-includes/dtcla.php HTTP/1.1" 301 SENT: 169 REFERER: "http://www.google.com" AGENT: "Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 5.2) Java/1.5.0_08" FORWARDED: "-"
IP: 123.51.239.35 - USER: - [17/Feb/2023:10:06:04 +0800] "GET /wp08/wp-includes/dtcla.php HTTP/1.1" 403 SENT: 555 REFERER: "http://www.google.com" AGENT: "Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 5.2) Java/1.5.0_08" FORWARDED: "-"
IP: 123.51.239.35 - USER: - [17/Feb/2023:10:06:07 +0800] "GET /wp08/wp-includes/dtcla.php HTTP/1.1" 301 SENT: 169 REFERER: "http://www.google.com" AGENT: "Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 5.2) Java/1.5.0_08" FORWARDED: "-"
IP: 123.51.239.35 - USER: - [17/Feb/2023:10:06:08 +0800] "GET /wp08/wp-includes/dtcla.php HTTP/1.1" 301 SENT: 169 REFERER: "http://www.google.com" AGENT: "Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 5.2) Java/1.5.0_08" FORWARDED: "-"
IP: 123.51.239.35 - USER: - [17/Feb/2023:10:06:08 +0800] "GET /wp08/wp-includes/dtcla.php HTTP/1.1" 403 SENT: 555 REFERER: "http://www.google.com" AGENT: "Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 5.2) Java/1.5.0_08" FORWARDED: "-"
IP: 127.0.0.1 - USER: - [17/Feb/2023:10:06:08 +0800] "GET / HTTP/1.1" 301 SENT: 169 REFERER: "-" AGENT: "curl/7.29.0" FORWARDED: "-"
IP: 123.51.239.35 - USER: - [17/Feb/2023:10:06:12 +0800] "GET /wp08/wp-includes/dtcla.php HTTP/1.1" 301 SENT: 169 REFERER: "http://www.google.com" AGENT: "Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 5.2) Java/1.5.0_08" FORWARDED: "-"
IP: 123.51.239.35 - USER: - [17/Feb/2023:10:06:12 +0800] "GET /wp08/wp-includes/dtcla.php HTTP/1.1" 301 SENT: 169 REFERER: "http://www.google.com" AGENT: "Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 5.2) Java/1.5.0_08" FORWARDED: "-"
IP: 123.51.239.35 - USER: - [17/Feb/2023:10:06:12 +0800] "GET /wp08/wp-includes/dtcla.php HTTP/1.1" 403 SENT: 555 REFERER: "http://www.google.com" AGENT: "Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 5.2) Java/1.5.0_08" FORWARDED: "-"
IP: 127.0.0.1 - USER: - [17/Feb/2023:10:06:13 +0800] "GET / HTTP/1.1" 301 SENT: 169 REFERER: "-" AGENT: "curl/7.29.0" FORWARDED: "-"
IP: 123.51.239.35 - USER: - [17/Feb/2023:10:06:15 +0800] "GET /wp08/wp-includes/dtcla.php HTTP/1.1" 301 SENT: 169 REFERER: "http://www.google.com" AGENT: "Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 5.2) Java/1.5.0_08" FORWARDED: "-"
IP: 123.51.239.35 - USER: - [17/Feb/2023:10:06:15 +0800] "GET /wp08/wp-includes/dtcla.php HTTP/1.1" 301 SENT: 169 REFERER: "http://www.google.com" AGENT: "Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 5.2) Java/1.5.0_08" FORWARDED: "-"
IP: 123.51.239.35 - USER: - [17/Feb/2023:10:06:15 +0800] "GET /wp08/wp-includes/dtcla.php HTTP/1.1" 403 SENT: 555 REFERER: "http://www.google.com" AGENT: "Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 5.2) Java/1.5.0_08" FORWARDED: "-"
IP: 123.51.239.35 - USER: - [17/Feb/2023:10:06:17 +0800] "GET /wp08/wp-includes/dtcla.php HTTP/1.1" 301 SENT: 169 REFERER: "http://www.google.com" AGENT: "Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 5.2) Java/1.5.0_08" FORWARDED: "-"
I banned the ip and response 403
I am trying to run my Django app (V 1.8.6) in a docker container using docker-compose.
This is my setting.py:
MEDIA_ROOT = "/srv/media/"
MEDIA_URL = "/media/"
STATIC_URL = "/static/"
STATIC_ROOT = os.path.join(BASE_DIR, "static")
This is my docker compose:
version: "3"
services:
#
# Nettle API server
#
nettle:
image: mcvitty_img
command: python manage.py runserver 0.0.0.0:8000 --settings=sites.production.settings
ports:
- "80:8000"
volumes:
bluebell: {}
postgres-data: {}
rabbitmq-data: {}
I am successfully running the collect static in my Dockerfile:
RUN python manage.py migrate --settings=sites.production.settings --fake-initial
RUN python manage.py sitetree_resync_apps --settings=sites.production.settings
RUN python manage.py collectstatic --settings=sites.production.settings --noinput
But sadly this is the result:
This are the logs I get:
nettle_1 | [29/Sep/2020 15:08:19] "GET /static/css/all.min.css HTTP/1.1" 404 4463
nettle_1 | [29/Sep/2020 15:08:19] "GET /static/autocomplete_light/widget.js HTTP/1.1" 404 4463
nettle_1 | [29/Sep/2020 15:08:19] "GET /static/js/all.min.js HTTP/1.1" 404 4463
nettle_1 | [29/Sep/2020 15:08:19] "GET /static/autocomplete_light/autocomplete.js HTTP/1.1" 404 4463
nettle_1 | [29/Sep/2020 15:08:19] "GET /static/autocomplete_light/style.css HTTP/1.1" 404 4463
nettle_1 | [29/Sep/2020 15:08:19] "GET /static/autocomplete_light/remote.js HTTP/1.1" 404 4463
nettle_1 | [29/Sep/2020 15:08:19] "GET /static/autocomplete_light/text_widget.js HTTP/1.1" 404 4463
nettle_1 | [29/Sep/2020 15:08:19] "GET /static/autocomplete_light/autocomplete.js HTTP/1.1" 404 4463
nettle_1 | [29/Sep/2020 15:08:19] "GET /static/autocomplete_light/addanother.js HTTP/1.1" 404 4463
nettle_1 | [29/Sep/2020 15:08:19] "GET /static/autocomplete_light/widget.js HTTP/1.1" 404 4463
nettle_1 | [29/Sep/2020 15:08:19] "GET /static/js/datatableview.js HTTP/1.1" 404 4463
nettle_1 | [29/Sep/2020 15:08:19] "GET /static/images/logo.png HTTP/1.1" 404 4463
nettle_1 | [29/Sep/2020 15:08:19] "GET /static/images/frontimage1.jpg HTTP/1.1" 404 4463
nettle_1 | [29/Sep/2020 15:08:19] "GET /static/autocomplete_light/addanother.js HTTP/1.1" 404 4463
nettle_1 | [29/Sep/2020 15:08:19] "GET /static/images/frontimage2.jpg HTTP/1.1" 404 4463
nettle_1 | [29/Sep/2020 15:08:19] "GET /static/autocomplete_light/text_widget.js HTTP/1.1" 404 4463
nettle_1 | [29/Sep/2020 15:08:19] "GET /static/images/frontimage3.png HTTP/1.1" 404 4463
nettle_1 | [29/Sep/2020 15:08:19] "GET /static/autocomplete_light/remote.js HTTP/1.1" 404 4463
nettle_1 | [29/Sep/2020 15:08:19] "GET /static/images/frontimage4.jpg HTTP/1.1" 404 4463
nettle_1 | [29/Sep/2020 15:08:19] "GET /static/autocomplete_light/style.css HTTP/1.1" 404 4463
nettle_1 | [29/Sep/2020 15:08:19] "GET /static/images/frontimage6.jpg HTTP/1.1" 404 4463
nettle_1 | [29/Sep/2020 15:08:19] "GET /static/images/frontimage8.png HTTP/1.1" 404 4463
nettle_1 | [29/Sep/2020 15:08:19] "GET /static/css/all.min.css HTTP/1.1" 404 4463
nettle_1 | [29/Sep/2020 15:08:19] "GET /static/images/frontimage7.jpg HTTP/1.1" 404 4463
nettle_1 | [29/Sep/2020 15:08:19] "GET /static/images/frontimage5.png HTTP/1.1" 404 4463
nettle_1 | [29/Sep/2020 15:08:19] "GET /static/js/datatableview.js HTTP/1.1" 404 4463
If I hop into the container I can see the static folder has been created:
I recently set up GitLab using Helm in an on-prem kubernetes cluster. It works fine. Can access all aspects of the web ui, can SSH into it via external ingress controller (deployed separately from gitlab) just fine.
But when I try to run a job, I get the following error.
Running with gitlab-runner 12.4.1 (05161b14)
on gitlab-gitlab-runner-6db97976bb-bsfqj SFKvKAyD
Using Kubernetes namespace: gitlab
Using Kubernetes executor with image node:6 ...
Waiting for pod gitlab/runner-sfkvkayd-project-1-concurrent-09tsz7 to be running, status is Pending
Waiting for pod gitlab/runner-sfkvkayd-project-1-concurrent-09tsz7 to be running, status is Pending
Running on runner-sfkvkayd-project-1-concurrent-09tsz7 via gitlab-gitlab-runner-6db97976bb-bsfqj...
Fetching changes with git depth set to 50...
Initialized empty Git repository in /builds/redacted/web/.git/
Created fresh repository.
fatal: unable to access 'https://gitlab-ci-token:[MASKED]#gitlab.example.com/redacted/web.git/': The requested URL returned error: 502
ERROR: Job failed: command terminated with exit code 1
Why would I be getting a 502 error?
nginx-ingress values.yml:
controller:
config:
resolver-address: 10.0.0.1
hsts-include-subdomains: "false"
server-name-hash-bucket-size: "256"
enable-vts-status: "true"
use-http2: "false"
ssl-ciphers: "ECDHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-SHA384:ECDHE-RSA-AES128-SHA256:ECDHE-RSA-AES256-SHA:ECDHE-RSA-AES128-SHA:AES256-GCM-SHA384:AES128-GCM-SHA256:AES256-SHA256:AES128-SHA256:AES256-SHA:AES128-SHA:!aNULL:!eNULL:!EXPORT:!DES:!MD5:!PSK:!RC4"
ssl-protocols: "TLSv1.1 TLSv1.2"
server-tokens: "false"
tcp:
22: "gitlab/gitlab-gitlab-shell:22"
Gitlab values.yml:
global:
edition: ee
hosts:
domain: example.com
https: true
gitlab:
name: gitlab.example.com
https: true
minio:
name: minio.example.com
https: false
ingress:
configureCertmanager: false
enabled: true
tls:
enabled: true
annotations:
kubernetes.io/ingress.class: nginx
cert-manager.io/cluster-issuer: "letsencrypt-prod"
kubernetes.io/tls-acme: true
gitaly:
persistence:
size: 4Gi
minio:
enabled: true
grafana:
enabled: false
appConfig:
ldap:
servers:
main:
label: 'LDAP'
host: 'ipa.example.com'
port: 389
uid: 'uid'
base: 'dc=example,dc=com'
bind_dn: 'uid=system,cn=sysaccounts,cn=etc,dc=example,dc=com'
password:
secret: ldap-bind-secret
key: ldap-password
encryption: 'plain'
registry:
enabled: false
bucket: registry
gitlab:
unicorn:
ingress:
tls:
secretName: gitlab-unicorn-tls
upgradeCheck:
enabled: false
certmanager:
install: false
nginx-ingress:
enabled: false
prometheus:
install: false
redis:
persistence:
size: 1Gi
postgresql:
install: true
persistence:
size: 1Gi
registry:
enabled: false
gitlab-runner:
install: true
rbac:
create: true
runners:
locked: false
cache:
cacheType: s3
s3BucketName: runner-cache
cacheShared: true
s3BucketLocation: us-east-1
s3CachePath: gitlab-runner
s3CacheInsecure: false
minio:
persistence:
size: 4Gi
gitaly:
persistence:
size: 4Gi
Edit:
Log from the nginx-ingress-controller:
10.0.10.1 - - [02/Nov/2019:06:49:45 +0000] "POST /api/v4/jobs/request HTTP/1.1" 204 0 "-" "gitlab-runner 12.4.1 (12-4-stable; go1.10.8; linux/amd64)" 722 0.004 [gitlab-gitlab-unicorn-8181] [] 10.42.0.179:8181 0 0.004 204 7c34fc2038325a786d949c9fdc82915b
(line is repeated several times)
Edit 2: Full logs from the moment I hit 'retry' on the job.
10.0.10.1 - - [04/Nov/2019:01:05:04 +0000] "POST /redacted/web/-/jobs/7/retry HTTP/1.1" 302 123 "https://gitlab.example.com/redacted/web/-/jobs/7" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36" 1279 0.492 [gitlab-gitlab-unicorn-8181] [] 10.42.0.179:8181 123 0.492 302 580cf6d7036706ea9f9182a5ed2385d6
10.0.10.1 - - [04/Nov/2019:01:05:05 +0000] "GET /redacted/web/-/jobs/8 HTTP/1.1" 200 9053 "https://gitlab.example.com/redacted/web/-/jobs/7" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36" 1020 0.593 [gitlab-gitlab-unicorn-8181] [] 10.42.0.184:8181 35481 0.596 200 5fb58ca0be4512efd80cea567bbd3127
10.0.10.1 - - [04/Nov/2019:01:05:05 +0000] "GET /redacted/web/-/jobs/8/trace.json?state= HTTP/1.1" 200 139 "https://gitlab.example.com/redacted/web/-/jobs/8" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36" 1011 0.171 [gitlab-gitlab-unicorn-8181] [] 10.42.0.184:8181 139 0.168 200 1b5673807d48539dad6765a9b88970e0
10.0.10.1 - - [04/Nov/2019:01:05:06 +0000] "GET /redacted/web/-/jobs/8.json HTTP/1.1" 200 1475 "https://gitlab.example.com/redacted/web/-/jobs/8" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36" 998 1.154 [gitlab-gitlab-unicorn-8181] [] 10.42.0.179:8181 4158 1.156 200 17a2c976b8fc2e93f252be68281aadc6
10.0.10.1 - - [04/Nov/2019:01:05:07 +0000] "GET /redacted/web/pipelines/1/stage.json?stage=build&retried=1 HTTP/1.1" 200 1233 "https://gitlab.example.com/redacted/web/-/jobs/8" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36" 1082 0.591 [gitlab-gitlab-unicorn-8181] [] 10.42.0.184:8181 8059 0.593 200 4ba12adb7acf8635b40aac0d6c761797
10.0.10.1 - - [04/Nov/2019:01:05:10 +0000] "GET /redacted/web/-/jobs/8/trace.json?state= HTTP/1.1" 304 0 "https://gitlab.example.com/redacted/web/-/jobs/8" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36" 1064 0.174 [gitlab-gitlab-unicorn-8181] [] 10.42.0.179:8181 0 0.172 304 d7f47ceeaf4e501c7b97e7c5a65f5ad0
10.0.10.1 - - [04/Nov/2019:01:05:14 +0000] "GET /redacted/web/-/jobs/8/trace.json?state= HTTP/1.1" 304 0 "https://gitlab.example.com/redacted/web/-/jobs/8" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36" 1064 0.531 [gitlab-gitlab-unicorn-8181] [] 10.42.0.184:8181 0 0.528 304 7cccd15f2a3b5d57e22245b20b455962
10.0.10.1 - - [04/Nov/2019:01:05:17 +0000] "GET /redacted/web/-/jobs/8.json HTTP/1.1" 304 0 "https://gitlab.example.com/redacted/web/-/jobs/8" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36" 1051 0.689 [gitlab-gitlab-unicorn-8181] [] 10.42.0.179:8181 0 0.692 304 144e9352bed9f000e78e9336c159761c
10.0.10.1 - - [04/Nov/2019:01:05:18 +0000] "GET /redacted/web/-/jobs/8/trace.json?state= HTTP/1.1" 200 139 "https://gitlab.example.com/redacted/web/-/jobs/8" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36" 1064 0.250 [gitlab-gitlab-unicorn-8181] [] 10.42.0.184:8181 139 0.248 200 ec6b33ca557c52a0547d1ec4c3b6b564
10.0.10.1 - - [04/Nov/2019:01:05:19 +0000] "POST /api/v4/jobs/request HTTP/1.1" 201 6432 "-" "gitlab-runner 12.4.1 (12-4-stable; go1.10.8; linux/amd64)" 722 0.540 [gitlab-gitlab-unicorn-8181] [] 10.42.0.184:8181 6432 0.540 201 08addd7e961ddd2c124c938ed4ec8d01
10.0.10.1 - - [04/Nov/2019:01:05:19 +0000] "POST /api/v4/jobs/request HTTP/1.1" 204 0 "-" "gitlab-runner 12.4.1 (12-4-stable; go1.10.8; linux/amd64)" 722 0.130 [gitlab-gitlab-unicorn-8181] [] 10.42.0.179:8181 0 0.132 204 745513af8de06310ff8efba16b99a795
10.0.10.1 - - [04/Nov/2019:01:05:22 +0000] "PATCH /api/v4/jobs/8/trace HTTP/1.1" 202 7 "-" "gitlab-runner 12.4.1 (12-4-stable; go1.10.8; linux/amd64)" 721 0.098 [gitlab-gitlab-unicorn-8181] [] 10.42.0.184:8181 7 0.100 202 cf049526b5190c8d540b67bcf2d1a4b3
10.0.10.1 - - [04/Nov/2019:01:05:23 +0000] "GET /redacted/web/-/jobs/8/trace.json?state= HTTP/1.1" 200 645 "https://gitlab.example.com/redacted/web/-/jobs/8" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36" 1064 0.219 [gitlab-gitlab-unicorn-8181] [] 10.42.0.179:8181 1501 0.216 200 64a0d629bd31b4041349a4dd389a043b
[04/Nov/2019:01:05:24 +0000]TCP200390120.026
10.0.10.1 - - [04/Nov/2019:01:05:25 +0000] "PATCH /api/v4/jobs/8/trace HTTP/1.1" 202 7 "-" "gitlab-runner 12.4.1 (12-4-stable; go1.10.8; linux/amd64)" 373 0.080 [gitlab-gitlab-unicorn-8181] [] 10.42.0.179:8181 7 0.080 202 6bd14f3c6e7f7be2b638b5c1199beaa0
10.0.10.1 - - [04/Nov/2019:01:05:27 +0000] "PATCH /api/v4/jobs/8/trace HTTP/1.1" 202 8 "-" "gitlab-runner 12.4.1 (12-4-stable; go1.10.8; linux/amd64)" 976 0.176 [gitlab-gitlab-unicorn-8181] [] 10.42.0.184:8181 8 0.176 202 1c6327ae37886df62f7ce1bd824d2e43
10.0.10.1 - - [04/Nov/2019:01:05:27 +0000] "GET /redacted/web/-/jobs/8/trace.json?state=eyJvZmZzZXQiOjQ1MCwibl9vcGVuX3RhZ3MiOjAsImZnX2NvbG9yIjpudWxsLCJiZ19jb2xvciI6bnVsbCwic3R5bGVfbWFzayI6MCwic2VjdGlvbnMiOlsicHJlcGFyZS1zY3JpcHQiXSwibGluZW5vX2luX3NlY3Rpb24iOjF9 HTTP/1.1" 200 868 "https://gitlab.example.com/redacted/web/-/jobs/8" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36" 1183 0.233 [gitlab-gitlab-unicorn-8181] [] 10.42.0.184:8181 2223 0.232 200 98a2b255a8854c49d10769132f6889c8
10.0.10.1 - - [04/Nov/2019:01:05:27 +0000] "PUT /api/v4/jobs/8 HTTP/1.1" 200 4 "-" "gitlab-runner 12.4.1 (12-4-stable; go1.10.8; linux/amd64)" 691 0.267 [gitlab-gitlab-unicorn-8181] [] 10.42.0.179:8181 4 0.268 200 0ed6eb1dce657e0b0c711dede3c4bcd9
10.0.10.1 - - [04/Nov/2019:01:05:28 +0000] "GET /redacted/web/-/jobs/8.json HTTP/1.1" 200 1502 "https://gitlab.example.com/redacted/web/-/jobs/8" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36" 1051 0.695 [gitlab-gitlab-unicorn-8181] [] 10.42.0.179:8181 4444 0.696 200 75058bd730898e877034a10a7e386300
10.0.10.1 - - [04/Nov/2019:01:05:32 +0000] "GET /redacted/web/-/jobs/8/trace.json?state=eyJvZmZzZXQiOjEyNTIsIm5fb3Blbl90YWdzIjowLCJmZ19jb2xvciI6bnVsbCwiYmdfY29sb3IiOm51bGwsInN0eWxlX21hc2siOjAsInNlY3Rpb25zIjpbXSwibGluZW5vX2luX3NlY3Rpb24iOjF9 HTTP/1.1" 200 264 "https://gitlab.example.com/redacted/web/-/jobs/8" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36" 1163 1.050 [gitlab-gitlab-unicorn-8181] [] 10.42.0.184:8181 283 1.048 200 706cf9b018952ffd2d875a4a060edb77
I know that the browser will automatically request "/favicon.icon".
But many requests did not visit my page, it only visit "/favicon.icon".
They don't hava Referer and User-Agent.
The nginx log like this:
47.88.135.135 - - [02/Sep/2019:03:31:22 -0400] "HEAD /favicon.ico HTTP/1.1" 404 0 "-" "-"
47.74.138.142 - - [02/Sep/2019:03:31:24 -0400] "HEAD /favicon.ico HTTP/1.1" 404 0 "-" "-"
47.254.113.21 - - [02/Sep/2019:03:31:24 -0400] "HEAD /favicon.ico HTTP/1.1" 404 0 "-" "-"
64.71.142.65 - - [02/Sep/2019:03:31:25 -0400] "HEAD /favicon.ico HTTP/1.1" 404 0 "-" "-"
Then I tried to set the "favicon.icon" file, but I still receive a lot of "/favicon.icon" requests. And the nginx log like :
161.117.143.163 - - [02/Sep/2019:06:25:26 -0400] "HEAD /favicon.ico HTTP/1.1" 200 0 "-" "-"
64.71.142.65 - - [02/Sep/2019:06:25:26 -0400] "HEAD /favicon.ico HTTP/1.1" 200 0 "-" "-"
47.91.195.140 - - [02/Sep/2019:06:25:27 -0400] "HEAD /favicon.ico HTTP/1.1" 200 0 "-" "-"
80.231.126.202 - - [02/Sep/2019:06:25:27 -0400] "HEAD /favicon.ico HTTP/1.1" 200 0 "-" "-"
47.244.73.59 - - [02/Sep/2019:06:25:28 -0400] "HEAD /favicon.ico HTTP/1.1" 200 0 "-" "-"
I'm coding on a simple HTTP crawler but I have an issue running the code at the bottom. I'm requesting 50 URLs and get the content of 20+ back. I've generated few files with 150kB size each to test the crawler. So I think the 20+ responses are limited by the bandwidth? BUT: how to tell the Erlang snippet not to quit until the last file is not fetched? The test data server is online, so plz try the code out and any hints are welcome :)
-module(crawler).
-define(BASE_URL, "http://46.4.117.69/").
-export([start/0, send_reqs/0, do_send_req/1]).
start() ->
ibrowse:start(),
proc_lib:spawn(?MODULE, send_reqs, []).
to_url(Id) ->
?BASE_URL ++ integer_to_list(Id).
fetch_ids() ->
lists:seq(1, 50).
send_reqs() ->
spawn_workers(fetch_ids()).
spawn_workers(Ids) ->
lists:foreach(fun do_spawn/1, Ids).
do_spawn(Id) ->
proc_lib:spawn_link(?MODULE, do_send_req, [Id]).
do_send_req(Id) ->
io:format("Requesting ID ~p ... ~n", [Id]),
Result = (catch ibrowse:send_req(to_url(Id), [], get, [], [], 10000)),
case Result of
{ok, Status, _H, B} ->
io:format("OK -- ID: ~2..0w -- Status: ~p -- Content length: ~p~n", [Id, Status, length(B)]);
Err ->
io:format("ERROR -- ID: ~p -- Error: ~p~n", [Id, Err])
end.
That's the output:
Requesting ID 1 ...
Requesting ID 2 ...
Requesting ID 3 ...
Requesting ID 4 ...
Requesting ID 5 ...
Requesting ID 6 ...
Requesting ID 7 ...
Requesting ID 8 ...
Requesting ID 9 ...
Requesting ID 10 ...
Requesting ID 11 ...
Requesting ID 12 ...
Requesting ID 13 ...
Requesting ID 14 ...
Requesting ID 15 ...
Requesting ID 16 ...
Requesting ID 17 ...
Requesting ID 18 ...
Requesting ID 19 ...
Requesting ID 20 ...
Requesting ID 21 ...
Requesting ID 22 ...
Requesting ID 23 ...
Requesting ID 24 ...
Requesting ID 25 ...
Requesting ID 26 ...
Requesting ID 27 ...
Requesting ID 28 ...
Requesting ID 29 ...
Requesting ID 30 ...
Requesting ID 31 ...
Requesting ID 32 ...
Requesting ID 33 ...
Requesting ID 34 ...
Requesting ID 35 ...
Requesting ID 36 ...
Requesting ID 37 ...
Requesting ID 38 ...
Requesting ID 39 ...
Requesting ID 40 ...
Requesting ID 41 ...
Requesting ID 42 ...
Requesting ID 43 ...
Requesting ID 44 ...
Requesting ID 45 ...
Requesting ID 46 ...
Requesting ID 47 ...
Requesting ID 48 ...
Requesting ID 49 ...
Requesting ID 50 ...
OK -- ID: 49 -- Status: "200" -- Content length: 150000
OK -- ID: 47 -- Status: "200" -- Content length: 150000
OK -- ID: 50 -- Status: "200" -- Content length: 150000
OK -- ID: 17 -- Status: "200" -- Content length: 150000
OK -- ID: 48 -- Status: "200" -- Content length: 150000
OK -- ID: 45 -- Status: "200" -- Content length: 150000
OK -- ID: 46 -- Status: "200" -- Content length: 150000
OK -- ID: 10 -- Status: "200" -- Content length: 150000
OK -- ID: 09 -- Status: "200" -- Content length: 150000
OK -- ID: 19 -- Status: "200" -- Content length: 150000
OK -- ID: 13 -- Status: "200" -- Content length: 150000
OK -- ID: 21 -- Status: "200" -- Content length: 150000
OK -- ID: 16 -- Status: "200" -- Content length: 150000
OK -- ID: 27 -- Status: "200" -- Content length: 150000
OK -- ID: 03 -- Status: "200" -- Content length: 150000
OK -- ID: 23 -- Status: "200" -- Content length: 150000
OK -- ID: 29 -- Status: "200" -- Content length: 150000
OK -- ID: 14 -- Status: "200" -- Content length: 150000
OK -- ID: 18 -- Status: "200" -- Content length: 150000
OK -- ID: 01 -- Status: "200" -- Content length: 150000
OK -- ID: 30 -- Status: "200" -- Content length: 150000
OK -- ID: 40 -- Status: "200" -- Content length: 150000
OK -- ID: 05 -- Status: "200" -- Content length: 150000
Update:
thanks stemm for the hint with the wait_workers. I've combined your and mine code but same behaviour :(
-module(crawler).
-define(BASE_URL, "http://46.4.117.69/").
-export([start/0, send_reqs/0, do_send_req/2]).
start() ->
ibrowse:start(),
proc_lib:spawn(?MODULE, send_reqs, []).
to_url(Id) ->
?BASE_URL ++ integer_to_list(Id).
fetch_ids() ->
lists:seq(1, 50).
send_reqs() ->
spawn_workers(fetch_ids()).
spawn_workers(Ids) ->
%% collect reference to each worker
Refs = [ do_spawn(Id) || Id <- Ids ],
%% wait for response from each worker
wait_workers(Refs).
wait_workers(Refs) ->
lists:foreach(fun receive_by_ref/1, Refs).
receive_by_ref(Ref) ->
%% receive message only from worker with specific reference
receive
{Ref, done} ->
done
end.
do_spawn(Id) ->
Ref = make_ref(),
proc_lib:spawn_link(?MODULE, do_send_req, [Id, {self(), Ref}]),
Ref.
do_send_req(Id, {Pid, Ref}) ->
io:format("Requesting ID ~p ... ~n", [Id]),
Result = (catch ibrowse:send_req(to_url(Id), [], get, [], [], 10000)),
case Result of
{ok, Status, _H, B} ->
io:format("OK -- ID: ~2..0w -- Status: ~p -- Content length: ~p~n", [Id, Status, length(B)]),
%% send message that work is done
Pid ! {Ref, done};
Err ->
io:format("ERROR -- ID: ~p -- Error: ~p~n", [Id, Err]),
%% repeat request if there was error while fetching a page,
do_send_req(Id, {Pid, Ref})
%% or - if you don't want to repeat request, put there:
%% Pid ! {Ref, done}
end.
Running the crawler forks fine for a handful of files, but then the code even doesnt fetch the entire files (file size each 150000 bytes) - he crawler fetches some files partially, see the following web server log :(
82.114.62.14 - - [13/Sep/2012:15:17:00 +0200] "GET /10 HTTP/1.1" 200 150000 "-" "-"
82.114.62.14 - - [13/Sep/2012:15:17:00 +0200] "GET /1 HTTP/1.1" 200 150000 "-" "-"
82.114.62.14 - - [13/Sep/2012:15:17:00 +0200] "GET /3 HTTP/1.1" 200 150000 "-" "-"
82.114.62.14 - - [13/Sep/2012:15:17:00 +0200] "GET /8 HTTP/1.1" 200 150000 "-" "-"
82.114.62.14 - - [13/Sep/2012:15:17:00 +0200] "GET /39 HTTP/1.1" 200 150000 "-" "-"
82.114.62.14 - - [13/Sep/2012:15:17:00 +0200] "GET /7 HTTP/1.1" 200 150000 "-" "-"
82.114.62.14 - - [13/Sep/2012:15:17:00 +0200] "GET /6 HTTP/1.1" 200 150000 "-" "-"
82.114.62.14 - - [13/Sep/2012:15:17:00 +0200] "GET /2 HTTP/1.1" 200 150000 "-" "-"
82.114.62.14 - - [13/Sep/2012:15:17:00 +0200] "GET /5 HTTP/1.1" 200 150000 "-" "-"
82.114.62.14 - - [13/Sep/2012:15:17:00 +0200] "GET /50 HTTP/1.1" 200 150000 "-" "-"
82.114.62.14 - - [13/Sep/2012:15:17:00 +0200] "GET /9 HTTP/1.1" 200 150000 "-" "-"
82.114.62.14 - - [13/Sep/2012:15:17:00 +0200] "GET /44 HTTP/1.1" 200 150000 "-" "-"
82.114.62.14 - - [13/Sep/2012:15:17:00 +0200] "GET /38 HTTP/1.1" 200 150000 "-" "-"
82.114.62.14 - - [13/Sep/2012:15:17:00 +0200] "GET /47 HTTP/1.1" 200 150000 "-" "-"
82.114.62.14 - - [13/Sep/2012:15:17:00 +0200] "GET /49 HTTP/1.1" 200 150000 "-" "-"
82.114.62.14 - - [13/Sep/2012:15:17:00 +0200] "GET /43 HTTP/1.1" 200 150000 "-" "-"
82.114.62.14 - - [13/Sep/2012:15:17:00 +0200] "GET /37 HTTP/1.1" 200 150000 "-" "-"
82.114.62.14 - - [13/Sep/2012:15:17:00 +0200] "GET /46 HTTP/1.1" 200 150000 "-" "-"
82.114.62.14 - - [13/Sep/2012:15:17:00 +0200] "GET /48 HTTP/1.1" 200 150000 "-" "-"
82.114.62.14 - - [13/Sep/2012:15:17:00 +0200] "GET /36 HTTP/1.1" 200 150000 "-" "-"
82.114.62.14 - - [13/Sep/2012:15:17:01 +0200] "GET /42 HTTP/1.1" 200 150000 "-" "-"
82.114.62.14 - - [13/Sep/2012:15:17:01 +0200] "GET /41 HTTP/1.1" 200 150000 "-" "-"
82.114.62.14 - - [13/Sep/2012:15:17:01 +0200] "GET /45 HTTP/1.1" 200 150000 "-" "-"
82.114.62.14 - - [13/Sep/2012:15:17:01 +0200] "GET /17 HTTP/1.1" 200 150000 "-" "-"
82.114.62.14 - - [13/Sep/2012:15:17:01 +0200] "GET /35 HTTP/1.1" 200 150000 "-" "-"
82.114.62.14 - - [13/Sep/2012:15:17:01 +0200] "GET /16 HTTP/1.1" 200 150000 "-" "-"
82.114.62.14 - - [13/Sep/2012:15:17:01 +0200] "GET /15 HTTP/1.1" 200 17020 "-" "-"
82.114.62.14 - - [13/Sep/2012:15:17:01 +0200] "GET /21 HTTP/1.1" 200 120360 "-" "-"
82.114.62.14 - - [13/Sep/2012:15:17:01 +0200] "GET /40 HTTP/1.1" 200 117600 "-" "-"
82.114.62.14 - - [13/Sep/2012:15:17:01 +0200] "GET /34 HTTP/1.1" 200 60660 "-" "-"
Any hints are welcome. I have no clue what's going wrong there :(
So, if I've understand you correctly - you don't want to return control from function spawn_workers until each of worker is not stopped (and fetched a page)? If that - you may change your code in such way:
spawn_workers(Ids) ->
%% collect reference to each worker
Refs = [ do_spawn(Id) || Id <- Ids ],
%% wait for response from each worker
wait_workers(Refs).
wait_workers(Refs) ->
lists:foreach(fun receive_by_ref/1, Refs).
receive_by_ref(Ref) ->
%% receive message only from worker with specific reference
receive
{Ref, done} ->
done
end.
do_spawn(Id) ->
Ref = make_ref(),
proc_lib:spawn_link(?MODULE, do_send_req, [Id, {self(), Ref}]),
Ref.
do_send_req(Id, {Pid, Ref}) ->
io:format("Requesting ID ~p ... ~n", [Id]),
Result = (catch ibrowse:send_req(to_url(Id), [], get, [], [], 10000)),
case Result of
{ok, Status, _H, B} ->
io:format("OK -- ID: ~2..0w -- Status: ~p -- Content length: ~p~n", [Id, Status, length(B)]),
%% send message that work is done
Pid ! {Ref, done};
Err ->
io:format("ERROR -- ID: ~p -- Error: ~p~n", [Id, Err]),
%% repeat request if there was error while fetching a page,
do_send_req(Id, {Pid, Ref})
%% or - if you don't want to repeat request, put there:
%% Pid ! {Ref, done}
end.
Edit:
I've noticed that your entry point (function start) returns control without waiting for all workers are end their tasks (because of you calling there spawn). If you want to wait there too - just do the similar trick:
start() ->
ibrowse:start(),
Ref = make_ref(),
proc_lib:spawn(?MODULE, send_reqs, [self(), Ref]),
receive_by_ref(Ref).
send_reqs(Pid, Ref) ->
spawn_workers(fetch_ids()),
Pid ! {Ref, done}.
You can use a combination of supervisors and the queue module: spawn N fetching children, each child fetches 1 item of the queue and processes it. When done notify parent process to continue with next item in queue. That way you can put a cap on number of concurrent requests.
If you spawn 500 reqs at the time ibrowse might be confused. Do you get any errors in the console?
See ibrowse:get_config_value/1 and ibrowse:set_config_value/2