I replaced mywebsite with the correct website that can be resolved when I run curl www.mywebsite.com. These are the options I am using:
curl -X 'GET https://www.mywebsite.com/Web2/PDF.aspx?page=1' \
-H 'Host: www.mywebsite.org' \
-H 'Connection: keep-alive' \
-H 'Upgrade-Insecure-Requests: 1' \
-A 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.119 Safari/537.36 OPR/51.0.2830.26' \
-H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8' \
-H 'DNT: 1' \
-e 'the-referer' \
-H 'Accept-Encoding: gzip, deflate, br' \
-H 'Accept-Language: en-US,en;q=0.9' \
-b '_the-cookies'
When I try to run this in OSX terminal, the following happens:
$ curl -X 'GET https://www.mywebsite.com/Web2/PDF.aspx?page=1' \
-H 'Host: www.mywebsite.org' \
-H 'Connection: keep-alive' \
-H 'Upgrade-Insecure-Requests: 1' \
-A 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.119 Safari/537.36 OPR/51.0.2830.26' \
curl: (6) Could not resolve host:
Mac-mini-3:~ myuser$
It says:
curl: (6) Could not resolve host:
Why is this happening? And why is it trying to run commands when I used the \ escape sequence in terminal? It should not be running any commands until all the options are passed.
Because you have not specified a host. The host is specified as a request command (part of -X arg).
You need to have (note the placement of single quote)
curl -X GET 'https://www.mywebsite.com/Web2/PDF.aspx?page=1' ...
Related
avito.ru has some special scraping protections and i try to understand how it works.
When i request this Url https://www.avito.ru/all?q=car, without cookies, as fresh user, i receive the correct HTML Content.
Once i copy the request over to cUrl, it fails.
curl 'https://www.avito.ru/all?q=car' \
-H 'authority: www.avito.ru' \
-H 'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8' \
-H 'accept-language: de-DE,de;q=0.5' \
-H 'cache-control: no-cache' \
-H 'pragma: no-cache' \
-H 'sec-ch-ua: "Not_A Brand";v="99", "Brave";v="109", "Chromium";v="109"' \
-H 'sec-ch-ua-mobile: ?0' \
-H 'sec-ch-ua-platform: "macOS"' \
-H 'sec-fetch-dest: document' \
-H 'sec-fetch-mode: navigate' \
-H 'sec-fetch-site: none' \
-H 'sec-fetch-user: ?1' \
-H 'sec-gpc: 1' \
-H 'upgrade-insecure-requests: 1' \
-H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36' \
--compressed
I receive then the VPN / IP blocking page. The request inside the Browser works always fine, regardless what i do.
Why is my cloned cUrl request not working ? Any ideas ?
I'm trying to make a request to the OpenSea.io API. When I go to the network inspector I can see a whole slew of requests that come through to/from the page. When I select one, right click, and choose copy as curl I can then paste that into my terminal and normally the data comes through as output to the terminal. For a few reqeuests, I got a message about binary output that I was able to resolve by modifying the request. For example:
curl 'https://api.opensea.io/tokens/?limit=100' \
-X 'GET' \
-H 'Pragma: no-cache' \
-H 'Accept: */*' \
-H 'Accept-Language: en-US,en;q=0.9' \
-H 'Accept-Encoding: gzip, deflate, br' \
-H 'Cache-Control: no-cache' \
-H 'Origin: https://opensea.io' \
-H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.0 Safari/605.1.15' \
-H 'Connection: keep-alive' \
-H 'Referer: https://opensea.io/' \
-H 'Host: api.opensea.io' \
-H 'X-API-KEY: 2f6f419a083c46de9d83ce3dbe7db601' \
-H 'X-BUILD-ID: da14c5fd3811187c88141eb116061b5f6cf87f45'
The above gave me the binary error message, I resolve it by adding --compressed at the end to decompress the "binary" data and removed the br option from the encoding header. The below request works just fine in my terminal now.
curl 'https://api.opensea.io/tokens/?limit=100' \
-X 'GET' \
-H 'Pragma: no-cache' \
-H 'Accept: */*' \
-H 'Accept-Language: en-US,en;q=0.9' \
-H 'Accept-Encoding: gzip, deflate' \
-H 'Cache-Control: no-cache' \
-H 'Origin: https://opensea.io' \
-H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.0 Safari/605.1.15' \
-H 'Connection: keep-alive' \
-H 'Referer: https://opensea.io/' \
-H 'Host: api.opensea.io' \
-H 'X-API-KEY: 2f6f419a083c46de9d83ce3dbe7db601' \
-H 'X-BUILD-ID: da14c5fd3811187c88141eb116061b5f6cf87f45' --compressed
So that's all fine and dandy, but that didn't fix my issues for all of the of the requests. I went through and found the requests that have the data that I'm looking for but they give a new error about not being the website owner. Consider the below request:
curl 'https://api.opensea.io/graphql/' \
-X 'POST' \
-H 'Content-Type: application/json' \
-H 'Pragma: no-cache' \
-H 'Accept: */*' \
-H 'Host: api.opensea.io' \
-H 'Cache-Control: no-cache' \
-H 'Accept-Language: en-US,en;q=0.9' \
-H 'Origin: https://opensea.io' \
-H 'Content-Length: 451' \
-H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.0 Safari/605.1.15' \
-H 'Referer: https://opensea.io/' \
-H 'Accept-Encoding: gzip, deflate' \
-H 'Connection: keep-alive' \
-H 'Cookie: _ga_9VSBF2K4BX=GS1.1.1653330281.9.1.1653332997.0; csrftoken=BVdZtaJOMRxED1ALVr79hZfFHIcUUTeNokvuFbqkb17fPoZiEqpe5Fb26Mq4RQsg; sessionid=eyJzZXNzaW9uSWQiOiI0MzJjMWVlYi0zY2Q5LTQ4Y2QtODljZS1jZWFhNzk0NzI2ZDIifQ:1ntDPZ:iRgNCzJHvxP1nDBSR90Hjx4hcpPy8UmpZl7GG6lV2e8; ajs_anonymous_id=41ec97c3-3ebf-467b-a921-a31f94abeb2f; amp_ddd6ec=yUkvg9MB9AgtD0-EafL8wO...1g3p2k0km.1g3p52466.5c.54.ag; _fbp=fb.1.1652624043939.1609498506; _ga=GA1.2.337370304.1652623932; _gid=GA1.2.1049414718.1653330282; _uetsid=9d339a80dac511ec84300fb0b22c8619; _uetvid=ebc21490d88011ec99749d8ebc9bcd13; __cf_bm=OZmIijoynqXFgy9j69FEOB2a0As_1yLXG3751dUFAO4-1653332831-0-AX1rqerC9b2mttE3Lg4rIp33aWgqCGg2fozR3+cJTaeEEJ6xgpz1/VY5OIrHCONfYfGI26n0qHHCGtxb5YDwVBw=; cf_chl_2=; cf_chl_prog=; cf_clearance=mfMY41rDtGcV.Hkkmp5dZkZUtz10Y7fXRmobKhROBlw-1653331507-0-150; _gcl_au=1.1.13890619.1653330282; __os_session=eyJpZCI6IjQzMmMxZWViLTNjZDktNDhjZC04OWNlLWNlYWE3OTQ3MjZkMiJ9; __os_session.sig=xyK0HcEq8hEtOPpbnB0ra5A18qm3t-xGKx_2YDCmObc' \
-H 'x-signed-query: d73eda68d997705a2785aa8222d5a3c5663c392d0df699f665e44fb31e14642b' \
-H 'X-BUILD-ID: da14c5fd3811187c88141eb116061b5f6cf87f45' \
-H 'X-API-KEY: 2f6f419a083c46de9d83ce3dbe7db601' \
--data-binary '{"id":"TraitsDropdownQuery","query":"query TraitsDropdownQuery(\n $collection: CollectionSlug!\n) {\n collection(collection: $collection) {\n assetCount\n numericTraits {\n key\n value {\n max\n min\n }\n }\n stringTraits {\n key\n counts {\n count\n value\n }\n }\n defaultChain {\n identifier\n }\n id\n }\n}\n","variables":{"collection":"boredapeyachtclub"}}' --compressed
When the webpage makes the request, the site server returns back a JSON file with all kinds of useful data inside. But for some reason when I make the request it gives me back an HTML file and says:
<h1>
<span class="error-description">Access denied</span>
<span class="code-label">Error code <span>1020</span></span>
</h1>
<div class="large-font">
<p>You do not have access to api.opensea.io.</p><p>The site owner may have set restrictions that prevent you from accessing the site. Contact the site owner for access or try loading the page again.</p>
</div>
Can anybody help in resolving this? What changes do I need to make to the curl request so that I actually get the JSON data I'm looking for? I understand the page is saying that I am not the website owner and that's correct, but then why does it give the JSON data to my browser and not to me through a CURL request? How does the server know the difference between my terminal and a browser making a request when I pass through all of the same headers and cookies that the browser had given it? I noticed that in the cookies there was some cf_bm and similar cookies that hold some info like a unix time stamp. I tried to pass along the current unix time stamp generating on the fly using NODE.js and Axios but I still got the same message so I believe there's something more going on besides a cookie difference. Additionally, I tried finding the cookie values from previous requests to see if maybe the server gave it some info that you have to send back later but I couldn't find any matching values between one request to the next.
Any help is much appreciated, both in fixing this specific problem as well as explaining the overall process of how the server identifies the differences between browser and terminal.
Reason Of Access Denied or 1020 is The target source is blocking you on ip or User Agent Level
Solution: Use Proxy And Set your Request Header random.
i'm fairly new to Google Analytics and I'm starting with the new Google Analytics 4. I've set it up via Google Tag Manager.
I have two custom events:
cta_visible (event visible)
click_meeting_link (outbound click)
When I debug my page with the https://tagassistant.google.com/, I can see both events beeing triggered.
In the debug view of Google Analytics, the cta_visible event is displayed, the click_meeting_link is missing. I thought, that it's maybe a bug, caused by the fact, that as I'm clicking the link, my browser is leaving the page.
But I can see the event cta_visible in my reports, click_meeting_link is also missing there.
In the network tab I see both events being sent to GA (with a response code of 204).
curl 'https://www.google-analytics.com/g/collect?v=2&tid=G-NKBZG0FK64>m=2oead0&_p=1988538019&sr=1792x1120&gcs=G100&gdid=dOThhZD&ul=en-gb&cid=1495603155.1634555573&_s=5&dl=https%3A%2F%2Finnovation.tarent.de%2Fsparring&dt=Innovation%20Sparring%20%7C%20tarent&sid=1634555572&sct=1&seg=0&en=click_meeting_link&_c=1&_et=2&ep.debug_mode=true&ep.click_url=https%3A%2F%2Fmeetings.hubspot.com%2Ffrederik-vosberg%2Finnovation-sparring' \
-X 'POST' \
-H 'authority: www.google-analytics.com' \
-H 'content-length: 0' \
-H 'pragma: no-cache' \
-H 'cache-control: no-cache' \
-H 'sec-ch-ua: "Chromium";v="94", "Google Chrome";v="94", ";Not A Brand";v="99"' \
-H 'sec-ch-ua-mobile: ?0' \
-H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36' \
-H 'sec-ch-ua-platform: "macOS"' \
-H 'content-type: text/plain;charset=UTF-8' \
-H 'accept: */*' \
-H 'origin: https://innovation.tarent.de' \
-H 'sec-fetch-site: cross-site' \
-H 'sec-fetch-mode: no-cors' \
-H 'sec-fetch-dest: empty' \
-H 'referer: https://innovation.tarent.de/' \
-H 'accept-language: en-GB,en-US;q=0.9,en;q=0.8,de;q=0.7' \
--compressed
curl 'https://www.google-analytics.com/g/collect?v=2&tid=G-NKBZG0FK64>m=2oead0&_p=1800931673&sr=1792x1120&gcs=G100&ul=en-gb&cid=175794657.1634555667&_s=2&dl=https%3A%2F%2Finnovation.tarent.de%2Fsparring&dt=Innovation%20Sparring%20%7C%20tarent&sid=1634555666&sct=1&seg=0&en=cta_visible&_fv=1&_nsi=1&_ss=1&_eu=C&ep.debug_mode=true' \
-X 'POST' \
-H 'authority: www.google-analytics.com' \
-H 'content-length: 0' \
-H 'pragma: no-cache' \
-H 'cache-control: no-cache' \
-H 'sec-ch-ua: "Chromium";v="94", "Google Chrome";v="94", ";Not A Brand";v="99"' \
-H 'sec-ch-ua-mobile: ?0' \
-H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36' \
-H 'sec-ch-ua-platform: "macOS"' \
-H 'content-type: text/plain;charset=UTF-8' \
-H 'accept: */*' \
-H 'origin: https://innovation.tarent.de' \
-H 'sec-fetch-site: cross-site' \
-H 'sec-fetch-mode: no-cors' \
-H 'sec-fetch-dest: empty' \
-H 'referer: https://innovation.tarent.de/' \
-H 'accept-language: en-GB,en-US;q=0.9,en;q=0.8,de;q=0.7' \
--compressed
Any suggestions, what can cause this?
Thanks in advance
I've found the problem: The Consent Management Platform usercentrics activated the new Google Consent Mode. We didn't configure it properly, so analytics tracking was denied. This added the gcs:G100 parameter to the request, which tells Google Analytics to ignore it.
This prevented also that the DebugView works properly.
I don't see anything wrong in the network requests. They look perfectly fine. This data should be accessible in GA4.
But if you don't see the other event in GA4, I only can presume the event is not created in GA4. GA4 limits the number of unique events that you can create ( https://support.google.com/analytics/answer/9267744?hl=en ). Even though the limit is quite allowing, it's still a fitting security measure to not create them on the fly.
Also, you don't need to preserve log. It's usually just easier to prevent the page from reloading by executing this in the local console:
window.onbeforeunload = function(){return false;}
I'm using Scrapy to replicate a POST request to a site and I'm sure I'm passing the right form arguments but somehow the site isn't responding what it should.
Copying the curl request from Chrome gives (it is modified):
curl 'https://example.com/somepath' -H 'origin: https://example.com/' -H 'x-requested-with: XMLHttpRequest' -H 'pragma: no-cache' -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36' -H 'content-type: application/json'--data '{"foor":"var"}' --compressed
Here is my Scrapy request:
FormRequest(url="https://example.com/somepath", formdata={'foo': 'var'})
You are missing to include the Content-Type header, and also you won't be able to do that request with FormRequest. Just use normal Request with the correct body:
import json
...
Request(
url="https://example.com/somepath",
body=json.dumps({'foo': 'var'}),
headers={'Content-Type': 'application/json'},
)
I can make the following request from any remote client/server:
curl 'http://my.drupalserver.com/node/4688?_format=json' -H 'Authorization: Basic dXNlcm5hbWU6cGFzc3dvcmQ=' -H 'Accept-Encoding: gzip, deflate, sdch' -H 'Accept-Language: nl-NL,nl;q=0.8,en-US;q=0.6,en;q=0.4' -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36' -H 'Content-Type: application/hal+json' -H 'Accept: */*' -H 'Connection: keep-alive' --compressed
And it works: I get my node as expected.
However: when I make this exact same request from the same server the website is hosted on I get a 403 forbidden error.
I'm at my wits end, the drupal web profiler for both requests clearly shows the request headers are identical, so I have no idea what the problem could be.
I have already cleared the caches, checked the "trusted hosting", ...
I'm running Drupal 8.0.5