GA4 Measurement Protocol Tag sends events to an unexpected endpoint - google-analytics

We use GA4 Measurement Protocol for sending events to GTM Server-Side which than passes them along to GA4. However, I noticed an unexpected behavior.
GTMS Client receives an event in the following format:
{
"client_id":"ND+VxzSadb1BmLKxlzHZiLidLj6X6kyM2mTNewDRSIc=.1675251033",
"user_id":"8229012",
"non_personalized_ads":false,
"events":[
{
"name":"registration submission",
"params":{
"method":"sendpulse.com",
"service":"emailservice",
"x-fb-ck-fbp":"fb.1.1675251033570.550668735",
"x-fb-ck-fbc":"",
"user_agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36",
"session_id":"1675336459",
"user_data":{
"email_address":"2bb61ab0fa7f4937e5598a552bee3cc5280da453e609c594845f82085c416b31",
"phone_number":"1a6c0f1e473e472ef49dfa407fd915748a3d6e4c80481814c3dd0d088698bfcb"
}
}
}
]
}
But GA4 tag passes the event along to an unexpected endpoint:
https://www.google-analytics.com/g/collect?cid=ZjceIJMKEZPkhLBLd1%2BzF0ekLIINdDre%2FJO1UF0RvMM%3D.1675344158&tid=G-46NQ594GKJ&v=2&gtm=45091e31v1&uid=8229115&ep.method=social_facebook&ep.service=emailservice&ep.session_id=1675344160&en=registration%20submission
One would expect the endpoint to be www.google-analytics.com/mp/collect (The endpoint for measurement protocol events) and the request body to be the same JSON that was sent to GTMS, but somehow it's not the case. I suspect this behavior causes session_id to be incorrectly interpreted on the GA4 end and probably causes a bunch of other problems. Is this a normal behavior?

Related

ASP.NET acquire JWT token from ADFS by code or CLI

We are developing an ASP.NET (not .Net Core) API for another team to consume.
I need to get a JWT token from our ADFS to test if the security of the API is working.
I can't use the login page of ADFS, I need to do this by code, CLI or anything.
How can I do that?
Edit :
I tried to call adfs using postman (POST /adfs/oauth2/token) and got this error :
Activity ID: 4acc6a7b-dafe-4a4b-1c00-0080000000dd
Error time: Tue, 03 Aug 2021 08:59:10 GMT
Cookie: enabled
User agent string: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.164 Safari/537.36
You can use the client credentials flow.
It relies on a secret key rather than a login.
You could do this via Postman.

How to get Request Headers automatically using Scrapy?

Please forgive me if this question is too stupid.
We know that in the browser it is possible to go to Inspect -> Network -> XHR -> Headers and get Request Headers. It is then possible to add these Headers to the Scrapy request.
However, is there a way to get these Request Headers automatically using the Scrapy request, rather than manually?
I tried to use: response.request.headers but this information is not enough:
{b'Accept': [b'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'], b'Accept-Language': [b'en'], b'User-Agent': [b'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 S afari/537.36'], b'Accept-Encoding': [b'gzip,deflate']}
We see a lot more of Request Headers information in the browser. How to get this information?
Scrapy uses these headers to scrape the webpage. Sometimes if a website needs some special keys in headers (like an API), you'll notice that the scrapy won't be able to scrape the webpage.
However there is a workaround, in DownloaMiddilewares, you can implement Selenium. So the requested webpage will be downloaded using selenium automated browser. then you would be able to extract the complete headers as the selenium initiates an actual browser.
## Import webdriver from Selenium Wire instead of Selenium
from seleniumwire import webdriver
## Get the URL
driver = webdriver.Chrome("my/path/to/driver", options=options)
driver.get("https://my.test.url.com")
## Print request headers
for request in driver.requests:
print(request.url) # <--------------- Request url
print(request.headers) # <----------- Request headers
print(request.response.headers) # <-- Response headers
You can use the above code to get the request headers. This must be placed within DownlaodMiddleware of Scrapy so both can work together.

how to deal with captcha when web scraping using R

I'm trying to scrape data from this website, using httr and rvest. After several times of scraping (around 90 - 100), the website will automatically transfer me to another url with captcha.
this is the normal url: "https://fs.lianjia.com/ershoufang/pg1"
this is the captcha url: "http://captcha.lianjia.com/?redirect=http%3A%2F%2Ffs.lianjia.com%2Fershoufang%2Fpg1"
When my spider comes accross captcha url, it will tell me to stop and solve it in browser. Then I solve it by hand in browser. But when I run the spider and send GET request, the spider is still transferred to captcha url. Meanwhile in browser, everything goes normal, even I type in the captcha url, it will transfer me back to the normal url in browser.
Even I use proxy, I still got the same problem. In browser, I can normally browse the website, while the spider kept being transferred to captcha url.
I was wondering,
Is my way of using proxy correct?
Why the spider keeps being transferred while browser doesn't. They are from the same IP.
Thanks.
This is my code:
a <- GET(url, use_proxy(proxy, port), timeout(10),
add_headers('User-Agent' = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36',
'Connection' = 'keep-alive',
'Accept-Language' = 'en-GB,en;q=0.8,zh-CN;q=0.6,zh;q=0.4,en-US;q=0.2,fr;q=0.2,zh-TW;q=0.2',
'Accept-Encoding' = 'gzip, deflate, br',
'Host' = 'ajax.api.lianjia.com',
'Accept' = '*/*',
'Accept-Charset' = 'GBK,utf-8;q=0.7,*;q=0.3',
'Cache-Control' = 'max-age=0'))
b <- a %>% read_html %>% html_nodes('div.leftContent') %>% html_nodes('div.info.clear') %>%
html_nodes('div.title') %>% html_text()
Finally, I turned to RSelenium, it's slow but no more captchas. Even when it appears, I can directly solve it in the browser.
You are getting CAPTCHAs because that is the way website is trying to prevent non-human/programming script scrapping their data. So, when you are trying to scrape the data, it's detecting you as non-human/robotic script. The reason why this is happening because your script sending very frequent GET request along with some parameters data. Your program need to behave like a real user (Visiting website in random time pattern, different browsers, and IP).
You can avoid getting CAPTCHA by manipulating with these parameters as below. So your program would appear like a real user:
Use randomness when sending GET request. Like you can use Sys.sleep function (use random distribution) to sleep before sending each GET request.
Manipulate user agent data(Mozilla, Chrome, IE etc), cookie acceptance, and encoding.
Manipulate your source location (ip address, and server info)
Manipulating these information will help you to avoid getting CAPTACHA validation in some way.

Microsoft Edge 14 options request response not received?

I'm trying to load a JSON file from my CDN, and all browsers work except MS Edge. When I inspect the network tab of Edge, I see that the OPTIONS requests that preceed the request to the actual JSON file keep their "pending" status, and the response headers as well as the response body stay empty in that panel, as shown in the screenshot below :
When I reload the page, the files that I was trying to fetch show up, before even my page request, and are shown to have taken many seconds to load (here, 3887 seconds) :
My server is definite that a 200 response was sent as a response to the initial OPTIONS request :
method: "OPTIONS"
resource: "/lang/locale-survey-en_US.json?1490092072501"
httpVersion: "HTTP/1.1"
status: 200
responseSize: "175"
userAgent: "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.79 Safari/537.36 Edge/14.14393"
My best guess is that Edge is not able to read the response to the OPTIONS request, but if so why would it send OPTIONS requests in the first place ? As it works in all other browsers, I'm out of ideas about how I can debug this, so if you have suggestions about which logs or what I should check, any help will be greatly appreciated.

Jetty client http error code 412

I am accessing one website (I am hiding origin website name as it is against the policy) using browser and jetty/apache httpclient.
The website works fine with web browser.
Using api I am able to login into website,gets the session cookie JSESSIONID and home page html content. But after that when I submit any form or call the links from html I receive the HTTP error code 412(Pre condition failed).
I understand this error is due problem in client header. I set all the headers from browser(checked using inspect element in chrome browser). Still I have the same error.
I am not able to track down which header is causing the problem.
Here is the Header from browser
Request Headers
Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,/;q=0.8
Accept-Encoding:gzip, deflate
Accept-Language:en-GB,en-US;q=0.8,en;q=0.6
Cache-Control:max-age=0
Connection:keep-alive
Content-Length:318
Content-Type:application/x-www-form-urlencoded
Cookie:language=en_IN; __gads=ID=14c64d8f9fd7de54:T=1424658276:S=ALNI_Mba1kvJO4mLo7R-T2jUJE9zCYck5A; SLB_Cookie=ffffffff09461c2d45525d5f4f58455e445a4a422971; JSESSIONID=36m4Oo6dCML_Wvx-Wgmm9rtLh9mbURxnZhWIVwg-zHaNzFQeUt9C!-1989013783; _ga=GA1.3.379900459.1428120216
Host:www.irctc.co.in
Origin:https://www.example.com
Referer:https://www.example.com/context/home
Upgrade-Insecure-Requests:1
User-Agent:Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36
Same header I am setting from jetty client.
Request request = httpClient.newRequest(url);
request.method(HttpMethod.POST);
request.agent(USER_AGENT);
request.accept("text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8");
request.header(HttpHeader.REFERER,"https://www.example.com/context/home");
request.header(HttpHeader.ACCEPT_ENCODING, "gzip, deflate");
request.header(HttpHeader.ACCEPT_LANGUAGE, "en-GB,en-US;q=0.8,en;q=0.6");
request.header(HttpHeader.CACHE_CONTROL, "max-age=0");
request.header(HttpHeader.CONNECTION, "keep-alive");
request.header(HttpHeader.CONTENT_TYPE, "application/x-www-form-urlencoded");
request.header(HttpHeader.HOST, "www.example.com);
for (Map.Entry<String, String> entry : params.entrySet()){
request.param(URLEncoder.encode(entry.getKey(), "UTF-8"), URLEncoder.encode(entry.getValue(), "UTF-8"));
if(!StringUtils.isEmpty(content)){
content+="&";
}
content+=URLEncoder.encode(entry.getKey(), "UTF-8")+"="+URLEncoder.encode(entry.getValue(), "UTF-8");
}
request.header(HttpHeader.CONTENT_LENGTH, ""+content.length());
I see JSESSIONID and SLB_Cookie are present in the request. Since the website is out of our control I really can not track what is the issue.
Please help me to resolve this issue. Any pointers to resolve the issue on client side is appreciated. is there any way we can make sure which header causing this issue.
I solved the problem.
Issue with the form parameters value. I was sending encoded values where were encoded by jetty client again

Resources