Direct traffic in google analytics dropped in my site by 25% (around 100,000 users) in less than 2 weeks.
Please help me in listing possibilities behind that loss
Compared to? Year on year? Previous period? Do you see an increase in other channels or are they stable/the same/lower? This is important to know to exclude seasonality or organic/paid search combinations for users (e.g. when there's a large drop in organic or paid traffic, direct traffic will most likely follow).
Considering the scenario that all your other channels except referrals are stable, and there's no seasonality involved, it most likely sounds like a problem with the referrer header not being passed.
http to http – Referral data sent
http to https – Referral data sent
https to https – Referral data sent
https to http – No referral data sent and traffic is attributed to
direct
Let's say you updated your website to force https. In the previous situation, you would get traffic from https to http, and no referral data would be sent. In the new situation, you would get traffic from https to https—the referral data is sent and attributed to the right channel in Google Analytics instead of the traffic being attributed to direct. This in turn causes a drop in direct traffic.
Related
I've setup NGINX as a warm cache server in front of Wowza > HTTP-Origin application to act as an edge server. The config is working great streaming over HTTPS with nDVR and adaptive streaming support. I've combed the internet looking for examples and help on configuring NGINX and/or other solutions to give me live statistics (# of viewers per stream_name) as well parse the logs to give me stream duration per stream_name/session and data_transferred per stream_name/session. The logging in NGINX for HLS streams logs each video chunk. With Wowza, it is a bit easier to get this data by reading the duration or bytes transferred values from the logs when the stream is destroyed... Any help on this subject would be hugely appreciated. Thank you.
Nginx isn't aware of what the chunks are. It's only serving resource to clients over HTTP, and doesn't know or care that they're interrelated. Therefore, you'll have to derive the data you need from the logs.
To associate client requests together as one, you need some way to track state between requests, and then log that state. Cookies are a common way to do this. Alternatively, you could put some sort of session identifier in the request URI, but this hurts your caching ability since each client is effectively requesting a different resource.
Once you have some sort of session ID logged, you can process those logs with tools such as Elastic Stack to piece together the reports you're looking for.
Depending on your goals with this, you might find it better to get your data client-side. There, you have a better idea of what a session actually is, and then you can log client-side items such as buffer levels and latency and what not. The HTTP requests don't really tell you much about the experience the end users are getting. If that's what you want to know, you should use the log from the clients, not from your HTTP servers. Your HTTP server log is much more useful for debugging underlying technical infrastructure issues.
Here are my questions followed by some more information.
Is an IP Address considered PII (Personally Identifiable Information)?
We need to filter our measurement protocol traffic via the user's IP address, is there a way to do this?
We are using the Measurement Protocol to send custom event data to our Google Analytics account. All of the data is being sent via PHP cURL from the server. We have 3 different views setup in our account, (View #1) a view that is completely unfiltered, (View #2) another view that is filtering out internal traffic via IP addresses, and a final third view (View #3) for testing purposes.
View #2's filters have stopped working since we moved to this method of sending the event data to Google. I imagine that is because the requests are now coming from the server's IP address instead of each specific user. I have been told about a field that you can use to send the user's IP address to Google, the field is labeled "uip" however Google anonymizes this data and does seem to use it for filtering the views (what would the purpose of this field be then?).
I have a custom dimension setup in which I am sending a hashed IP address (as I am not sure if an IP is considered PII) I am then filtering the reports on those specific hashes ... however this leaves me unable to filter out IP ranges ... certain bot traffic can originate from different ranges of IP addresses and I would be unable to filter them from the reports.
I have been scouring the internet to try to determine whether or not it is a privacy concern for me to simply store the user IP (unhashed) in a custom dimension and setup our filtering rules based on that. This would allow me to create regex that filters out entire ranges of IP's. Most of the articles that say an IP is PII refer to Google's Universal Analytics Guidelines: https://support.google.com/analytics/answer/2795983 - but I have been all over those articles and I cannot see Google specifically stating anywhere whether or not an IP is PII.
Thank you for your time.
For the question of hashed vs. unhashed values - Google has two different policies on the question of hashing (as I only found out when I was researching your question).
For the question if IPs are PIIs - Google at document on "Best practices to avoid sendig PII":
https://support.google.com/analytics/answer/6366371?hl=en&ref_topic=2919631
which does not mention IP addresses. However Google does take some trouble to protect IP addresses (e.g. automatically anonymizing, not exposing them in the interface) so I'd suggest (based on gut feeling, not anything binding) that you do the same and at least hash them with a salted hash 8and filter by the hash).
Also part from the Google TOS there are national laws to consider (don't know where you are doing business, I live in Germany and here IP addresses are definitively PII. I think this is true for the rest of the EU as well).
If your computer is infected, apparently Google will tell you so - as shown in the image below:
According to this article, Google use HTTP headers to work this out. But how do they do it, what sort of headers should we look for?
Thank you!
The Google Security blog post you linked doesn't mention HTTP headers.
A key point in the blog post is this:
This particular malware causes infected computers to send traffic to Google through a small number of intermediary servers called “proxies.”
And this:
...taking steps to notify users whose traffic is coming through these proxies...
Google doesn't say much about the proxies, for instance if they were standards-compliant(ish) HTTP proxies or just servers echoing the user requests.
The "unusual" traffic that originated from Google would have been from a small set of IP addresses. No special HTTP headers would be necessary. Google only had to add the warning message to pages being served to the suspect IP addresses. That's it.
The term "signature" in the the follow up link from your comments is used very informally, probably alluding to the IP addresses of the proxy servers. If you want to imagine something more complicated than that, then I suppose it's possible that these proxies (like many HTTP clients) could be detected by some pattern of HTTP headers unique to them. For example the User-Agent or Via headers, or even something more subtle like the ordering or capitalization of headers. I doubt it came to that though, and I don't see much value in speculating, especially two years after the fact.
How many http request does a browser can handle in a single html page.
Their is a popular saying that browser can handle only a certain http request from a single domain and so its better to create static domain(cdn). so that http request can be shared between the 2 domains.
q1)How many http request can a browser handle in a single html page or atleast the saturation point(say 1000 requests)?
q2)How many http request from a single domain name can a browser render(say 100 from the same domain name)?
also any suggestions for best practices!!!
Section 8.1.4 of the HTTP/1.1 RFC says a "single-user client SHOULD NOT maintain more than 2 connections with any server or proxy."
However, the key word is "should"; most browsers use a different number. See this blog for a table of max connections per browser.
In theory there is no limit. But as the number of requests required to construct a page grows, the time taken for the page to be rendered increases. The relationship is not linear at low counts. Typically latency has a far bigger effect than bandwidth on actual throughput and there are mechanisms in HTTP to minimise the effect of this - such as keepalives and parallel requests. As Jon Grant says, there are limits on the number of concurrent requests.
A full answer to this question would fill a book - here's a good one.
is it possible to find out the connection speed of the client when it requests a page on my website.
i want to serve video files but depending on how fast the clients network is i would like to serve higher or lower quality videos. google analytics is showing me the clients connection types, how can i find out what kind of network the visitor is connected to?
thx
No, there's no feasible way to detect this server-side short of monitoring the network stream's send buffer while streaming something. If you can switch quality mid-stream, this is a viable approach because if the user's Internet connection suddenly gets burdened by a download you could detect this and switch to a lower-quality stream.
But if you just want to detect the speed initially, you'd be better off doing this detection on the client and sending the results to the server with the video request.
Assign each request a token /videos/data.flv?token=uuid123, and calculate amount of data your webserver sends for this token per second (possible check for multiple tokens at one username at a time period). You can do this with Apache sources and APR.