I've been working on open tracking by attaching tracking pixel to an email which is sent via PHP web application that I've been developing. The first problem I encountered was marking email as open just it was sent. I found help here: False open trackings using SES and gmail where I found link to the article describing how to prevent triggering opens by Google bot by checking user agent. I created the following function to check if I deal with the bot:
function isGoogleBot(string $userAgent): bool
{
$googleBotUserAgent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36 Edge/12.246 Mozilla/5.0';
return $userAgent === $googleBotUserAgent;
}
I also read some articles about caching images by Gmail but unfortunately I'm still dealing with some problems.
I did the following test in Chrome, Firefox, Edge and Safari:
I sent an email via application that I've been working on and then opened it in Gmail web client.
Then I went to the application to see open tracking statistics.
I repeated above test twice to make sure every time I get the same result.
With the last email I checked what is happening with subsequent opens of the same email.
Here are my observations:
In every browser except Firefox when I open email for the first time
an extra open is being made so I see 2 opens in the statistics. In
Firefox only 1 open is numbered.
On subsequent opens only 1 open is being added to the total opens
statistics every time I open the email. The exception is also Firefox
where total opens counter is not changing. No matter how many times I
open the email the total opens count is equal 1.
Is there any solution how to make reliable open tracking in every browser when email is opened in Gmail?
Related
I am exploring the Google PageSpeed insights api and there in the response I see a tag called:
{
...
lighthouse.userAgent:'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/84.0.4147.140 Safari/537.36'
...
}
Docs mention userAgent="The user agent that was used to run this LHR."
https://developers.google.com/speed/docs/insights/rest/v5/pagespeedapi/runpagespeed#LighthouseResultV5 What does that mean? How is this performance aggregated by running on all browsers?
PS: This is for Desktop version.
What does that mean?
This lets you know what browser was used to run the test.
It is useful for if you believe that there is an issue with Lighthouse (a bug in your report) so you can test it directly on the same browser that Lighthouse uses.
There is also the "environment" object which contains how Lighthouse presented itself (it sent a header saying "treat me like this browser") to the website that was being tested. (lighthouseResult.environment.networkUserAgent)
This is useful so you can check your server isn't blocking requests for that user agent etc.
It is also useful for checking your server logs to see what requests Lighthouse made etc. etc.
See the Wikipedia page for user agent for more info on user agents
How is this performance aggregated by running on all browsers?
As for your second question it doesn't quite make sense, but it has no impact on performance unless your server does something different for that user agent string if that is what you mean.
We have created an application to send out bulk emails using AWS SES. We are able to send out the emails and track the metrics like Opens, Clicks etc using AWS SNS successfully. The only problem we have is that in the "Opens" object that SNS is sending, it is always returning the same value "Mozilla/5.0 (Windows NT 5.1; rv:11.0) Gecko Firefox/11.0 (via ggpht.com GoogleImageProxy)". What we are looking at is to determine where the email is opened like Mobile/Tab/Desktop and in which browser. Even when the email is opened in Chrome, it is returning as Mozilla. Any help/suggestion in this regard is highly appreciated.
Additional Info: I figured out that the userAgent is being correctly returned in "clicks" object. But not in the "Open" object. Not sure why. We would like to track the same information when the email is opened also as not all the recipients click on a link.
There isn't actually a way to determine that a message has been opened.¹ Detecting "opens" relies on detection of the viewer fetching an image embedded in the message when the mail is "opened."
At the bottom of each message, we insert a 1 pixel by 1 pixel transparent GIF image. Each email includes a unique link to this image file; when the image is opened, we can tell exactly which message was opened and by whom.
When the viewer is Gmail, the user's browser doesn't fetch this image.
https://aws.amazon.com/blogs/messaging-and-targeting/open-and-click-tracking-have-arrived/
When a message is opened in gmail, the user's browser doesn't fetch the image directly, it fetches it from the google image proxy, and the image proxy fetches it from SES and generates the tracking event. Hence, (via ggpht.com GoogleImageProxy).
This isn't something that you have control over, as the sender.
The proxy can identify itself by saying whatever it likes in the User-Agent field -- there is no reason to believe that the entire user-agent string isn't being created by the proxy. Google searching the topic seems to confirm that this is how the proxy always appears. Mozilla/5.0 is a generic user agent string, that does not mean anything more than "I am some kind of web browser, or want the server to believe that I am."
¹there isn't actually a way... well, technically, there is, but thanks to the widespread profusion of spam, this standard is almost never applied to Internet mail. As noted in RFC-8098, "The presence of a Disposition-Notification-To header field in a message is merely a request for an MDN. The recipients' user agents are always free to silently ignore such a request." This is almost always what happens... nothing.
I'm new to webscraping. what I'm trying to do is scraping all the amazon movies from amazon website. I went to the amazon website www.amazon.com.
I chose amazon video on the left side of search box and type in 'video' and search. I got a list of whole lots of movies. The web Url is https://www.amazon.com/s/ref=nb_sb_noss_1?url=search-alias%3Dinstant-video&field-keywords=video&rh=n%3A2858778011%2Ck%3Avideo
Next, I I went to the scrapy shell and type scrapy shell 'https://www.amazon.com/s/ref=nb_sb_noss_1?url=search-alias%3Dinstant-video&field-keywords=video&rh=n%3A2858778011%2Ck%3Avideo'
My response status is 400.
I also tried adding user agent. scrapy shell -s USER_AGENT='Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36' 'https://www.amazon.com/s/ref=nb_sb_noss_1?url=search-alias%3Dinstant-video&field-keywords=video&rh=n%3A2858778011%2Ck%3Avideo'
I still got response status ```400``.
Why that happens?
How can I find the starting Url so that I can start scraping all the movie info?
I have no clue how to deal with it. I truly appreciate it if anyone can help.Thanks a lot in advance.
First I tried scrapy shell "https://www.amazon.com/s/ref=nb_sb_noss_1?url=search-alias%3Dinstant-video&field-keywords=video&rh=n%3A2858778011%2Ck%3Avideo" and i got a 503, then I use command view(response) to see what happened on the page. The Amazon gives me verification code to verify if I'm a robot.
So I entered your second scrapy shell command with User-Agent set, and I got 200 response
Maybe you could try using view(response) and see what you got there, or you could try scrapy shell for a few more times?
I have a desktop application that uses CEF for displaying a built in web page.
I have customized the User-Agent (Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) DesktopApp MyAppName/1.0 (MyApp release 1.0 stamp 99999) Safari/537.36) but Google Analytics only shows as Safari 537.36.
Are browsers outside the known universe of real browsers supported by GA when looking up browsers used? I would like this to instead be MyApp instead of Safari or Chrome.
I just looked at my browser reports and unless "aaa", "ddd" and "this is a test ua" are actually existing browsers it would seem that GA also tracks unknown user agents.
More seriously, the measurement protocol (on top of which Google Analytics is built) allows for a user agent override parameter (&ua), which probably would make very little sense if you could only pass in known browser names (after all this is meant so support e.g IoT devices which might not even have a real user agent name).
Can someone please explain what "Safari (in-app)" means in Google Analytics under Audience | Technology | Browser & OS?
For the Google Analytics for our website, we suddenly started to see significant traffic from this source.
It sounds like it just means that visitors are coming to us through browsers embedded within apps (e.g. like a web viewing control) except that there doesn't seem to be reason why we should be getting such traffic and so suddenly.
We went from zero traffic from this source to almost 40% of our traffic in only two days for no apparent reason! We haven't done anything that can explain this sudden new source of traffic (e.g. we haven't released any apps ourselves) that point back at our website. We're hoping that, if we can find out what "in-app" actually means, we'll be able to understand this traffic.
Thank you
I did some tests on a separate page available only for me using IOS 8.2 and updated 1Password and Facebook App:
General Safari browser is reported as Safari 8.0:
Mozilla/5.0 (iPhone; CPU iPhone OS 8_2 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12D508 Safari/600.1.4
1Password is reported as Safari 6.0:
Mozilla/5.0 (iPhone; CPU iPhone OS 6_1_3 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10B329 Safari/8536.25
Facebook App is reported as Safari (in-app)
Mozilla/5.0 (iPhone; CPU iPhone OS 8_2 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Mobile/12D508 [FBAN/FBIOS;FBAV/27.0.0.10.12;FBBV/8291884;FBDV/iPhone7,2;FBMD/iPhone;FBSN/iPhone OS;FBSV/8.2;FBSS/2; FBCR/Play;FBID/phone;FBLC/en_US;FBOP/5]
Tested on 2015.03.31 (yyyy.mm.dd)
Visitors using built-in search box within Safari are now having their searches sent though Google SSL Search, if they use iOS 6. Searches done with Google and through Safari's search box will continue to grow, as people continue to upgrade to iOS 6 (currently for iPhone 4/4S highest version is 6.1.3, for iPhone 5 is 6.1.4). Also Google updated their user-agent parsing code and UA like this:
1Password
Mozilla/5.0 (iPhone; CPU iPhone OS 6_1 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) 1Password/4.1.2 (like Version/10B143 Mobile/6.1 Safari/8536.25)
will be categorised like Safari (in-app).
Read more in the articles below and do not trust Google Analytics fully.
Hope this helps!
iOS 6 change Google traffic from Safari
Safari (in-app) as reported by Google Analytics
As you concluded, many people think those represent traffic from iOS apps opening links not on the browser but inside the app (WebView). However, I haven't being able to find a concrete Google Documentation on that.
As for the sudden increase, it is not unusual. Just yesterday I saw how HTC Hero (#1 Device in visits) went from 25.000 visits/day to 0 in two days, and Elocity A7 A7 Internet Tablet from 0 to 30.000 visits/day at the same time. Seems, Google keeps updating its User-Agent Parsing code.
I noticed it being reported in email tracking. Could it be Email Opens in Safari if you have set up email tracking in GA.
Looks like people accessing your website via a home screen bookmark. "Add to Home Screen" function, rather than via the Safari browser are listed as "Safari (in-app)".
I'm looking at a GA account that only watches iPads displaying a "WebApp". It looks like those devices running 10.3.2 and higher display as "Safari (in-app)" and those below show as "Safari" only.
This would also explain the sudden jump in metrics. As users update their iOS will suddenly be listed as "Safari (in-app)".