I try to optimize the whole Pagespeed of this page but I can't get the CLS under 0.1 on mobile. I really don't know why as I use critical css, page-caching and font-preloading and I cant reproduce the behaviour in tests.
https://developers.google.com/speed/pagespeed/insights/?url=https%3A%2F%2Fwww.birkengold.com%2Frezept%2Fselbstgemachte-zahnpasta
Tested with an simulated Galaxy S5 on 3G Fast.
https://www.webpagetest.org/result/210112_DiK9_256ca61d8f9383a5b927ef5f55644338/
In no Scenario I get somewhere near the 0.1 in CLS.
Field Data and Origin Summary
Field data and Origin Summary are real world data.
There is the key difference between these metrics and the synthetic test that Page Speed Insights runs.
For example: CLS is measured until page unload in the real world, as mentioned in this explanation on CLS from Addy Osmani who works on Google Chrome.
For this reason your CLS can be high for pages if they perform poorly at certain screen sizes (as Lighthouse / PSI only tests one mobile screen size by default) or if there are things like lazy loading not performing well in the real world and causing layout shifts when things load too slowly.
It could also be certain browsers, connection speeds etc. etc.
How can you find the page / root cause that is ruining your Web Vitals?
Let's assume you have a page that does well in the Lighthouse synthetic test but it performs poorly in the real world at certain screen sizes. How can you identify it?
For that you need to gather Real User Metrics (RUM) data.
RUM data is data gathered in the real world as real users use your site and stored on your server for later analysis / problem identification.
There is an easy way to do this yourself, using the Web Vitals Library.
This allows you to gather CLS, FID, LCP, FCP and TTFB data, which is more than enough to identify pages that perform poorly.
You can pipe the data gathered to your own API, or to Google Analytics for analysis.
If you gather and then combine the web vitals information with User Agent strings (to get the browser and OS) and the browser size information (to get the effective screen size) you can narrow down if the issue is down to a certain browser, a certain screen size, a certain connection speed (as you can see slower connections from high FCP / LCP figures) etc. etc.
I can't find anything on the web about how to sample Adobe Analytics data? I need to integrate Adobe Analytics into a new website with a ton of traffic so the stakeholders want to sample the data to avoid exorbitant server calls. I'm using DTM but not sure if that will help or be a non-factor? Can anyone either point me to some documentation or give me some direction on how to do this?
Adobe Analytics does not have any built-in method for sampling data, neither on their end nor in the js code.
DTM doesn't offer anything like this either. It doesn't have any (exposed) mechanisms in place to evaluate all requests made to a given property (container); any rules that extend state beyond "hit" scope are cookie based.
Adobe Target does offer ability to output code based on % of traffic so you can achieve sampling this way, but really, you're just trading one server call cost for another.
Basically, your only solution would be to create your own server-side framework for conditionally outputting the Adobe Analytics (or DTM) tag, to achieve sampling with Adobe Analytics.
Update:
#MichaelJohns comment below:
We have a file that we use as a boot strap file to serve the DTM file.
What I think we are going to do is use some JS logic and cookies
around that to determine if a visitor should be served the DTM code.
Okay, well maybe i'm misunderstanding what your goal here is (but I don't think I am) but that's not going to work.
For example, if you only want to output tracking for 50% of visitors, how would you use javascript and cookies alone to achieve this? In order to know that you are only filtering 50%, you need to know the total # of people in play. By itself, javascript and cookies only know about ONE browser, ONE person. It has no way of knowing anything about those other 99 people unless you have some sort of shared state between all of them, like keeping track of a count in a database server-side.
The best you can do solely with javascript and cookies is that you can basically flip a coin. In this example of 50%, basically you'd pick a random # between 1 and 100 and lower half gets no tracking, higher half gets tracking.
The problem with this is that it is possible for the pendulum to swing 100% one way or the other. It is the same principle as flipping a coin 100 times in a row: it is entirely possible that it can land on tails all 100 times.
In theory, the trend over time should show an overall average of 50/50, but this has a major flaw in that you may go one month with a ton of traffic, another month with few. Or you could have a week with very little traffic followed by 1 day of a lot of traffic. And you really have no idea how that's going to manifest over time; you can't really know which way your pendulum is swinging unless you ARE actually recording 100% of the traffic to begin with. The affect of all this is that it will absolutely destroy your trended data, which is the core principle of making any kind of meaningful analysis.
So basically, if you really want to reliably output tracking to a % of traffic, you will need a mechanism in place that does in fact record 100% of traffic. If I were going to roll my own homebrewed "sampler", I would do this:
In either a flatfile or a database table I would have two columns, one representing "yes", one representing "no". And each time a request is made, I look for the cookie. If the cookie does NOT exist, I count this as a new visitor. Since it is a new visitor, I will increment one of those columns by 1.
Which one? It depends on what percent of traffic I am wanting to (not) track. In this example, we're doing a very simple 50/50 split, so really, all I need to do is increment whichever one is lower, and in the case that they are currently both equal, I can pick one at random. If you want to do a more uneven split, e.g. 30% tracked, 70% not tracked, then the formula becomes a bit more complex. But that's a different topic for discussion ( also, there are a lot of papers and documents and wikis out there published by people a lot smarter than me that can explain it a lot better than me! ).
Then, if it is fated that that I incremented the "yes" column, I set the "track" cookie to "yes". Otherwise I set the "track" cookie to "no".
Then in in my controller (or bootstrap, router, whatever all requests go through), I would look for the cookie called "track" and see if it has a value of "yes" or "no". If "yes" then I output the tracking script. If "no" then I do not.
So in summary, process would be:
Request is made
Look for cookie.
If cookie is not set, update database/flatfile incrementing either yes or no.
Set cookie with yes or no.
If cookie is set to yes, output tracking
If cookie is set to no, don't output tracking
Note: Depending on language/technology of your server, cookie won't actually be set until next request, so you may need to throw in logic to look for a returned value from db/flatfile update, then fallback to looking for cookie value in last 2 steps.
Another (more general) note: In general, you should beware sampling. It is true that some tracking tools (most notably Google Analytics) samples data. But the thing is, it initially records all of the data, and then uses complex algorithms to sample from there, including excluding/exempting certain key metrics from being sampled (like purchases, goals, etc.).
Just think about that for a minute. Even if you take the time to setup a proper "sampler" as described above, you are basically throwing out the window data proving people are doing key things on your site - the important things that help you decide where to go as far as giving visitors a better experience on your site, etc..so now the only way around it is to start recording everything internally and factoring those things in to whether or not to send the data to AA.
But all that aside.. Look, I will agree that hits are something to be concerned about on some level. I've worked with very, very large clients with effectively unlimited budgets, and even they worry about hit costs racking up.
But the bottom line is you are paying for an enterprise level tool. If you are concerned about the cost from Adobe Analytics as far as your site traffic.. maybe you should consider moving away from Adobe Analytics, and towards a different tool like GA, or some other tool that doesn't charge by the hit. Adobe Analytics is an enterprise level tool that offers a lot more than most other tools, and it is priced accordingly. No offense, but IMO that's like leasing a Mercedes and then cheaping out on the quality of gasoline you use.
Can anybody give me any idea about what kind of traffic / sample size I need to get a statistically significant result when doing a google content experiement for 2 variations?
Google uses Multi Armed Bandit testing.
Here is a good article on this Googles answer
The best way in practice is to watch the percentage in the Google analytics experiments tab and see how quickly it moves toward 95%.
You can't get an exact answer because it changes as you take measurements and based on the difference you are trying to measure. So if one variation performs 300% better than the other it will take a lot smaller sample size than if one variation only performs 10% better than the other.
To see how the math for straight up statistical significance works here is a good explanation. Statistical significance tutorial
Here is a spot where it has a calculator Calculator
As far as the math for the Multi Armed Bandit this quote by Peter Whittle sums it up
[The bandit problem] was formulated during the [second world] war, and efforts to solve it so sapped the energies and minds of Allied analysts that the suggestion was made that the problem be dropped over Germany, as the ultimate instrument of intellectual sabotage.
I have a website which has around 500-600 vists per week. I would like to increase the sampling size to 100% in the google analytics tracking code.
I would like to know if this has eny effect on the performance of my website
Contrary to the comment above Google has an option to set sample sizes:
https://developers.google.com/analytics/devguides/collection/gajs/methods/gaJSApiBasicConfiguration#_gat.GA_Tracker_._setSampleRate
This will not affect you visitors (since the pixel is sent anyway, sampling happens during data processing).
I don't think it will make any difference if you have only a few hundred visitors, though (I'm guessing that Google is smart enough not to use sampling unless necessary, and in any case you should use setSampleRate to make your samples smaller, not bigger).
2016 answer
According to the official doc, the default sampling rate is 1%. If your site has little traffic, you can increase this rate with setSiteSpeedSampleRate()
Is there a proper way, equation or technique in general to say, "My web application needs to support N number of total users which via this equation/technique/rockHardExperience tells me that I need to support X number of concurrent page requests"?
From my research and/or gut feeling it seems like it would be something like:
totalLoadCapabilityRequired = (totalUsersN x .10 ) * .5
where .10 is for roughly 10% on at any given time
and the whole thing multiplied by 50% to suggest a 50% chance of those total users online executing a request at roughly the same time
any insights would help me in making sure I implement support in my application that is on par for the demand. I expect a lot of users but don't want to over anticipate too early. I know for starters that the org I am programming for will have 45,000 users that they want to use my system, with an anticipation on success for many more.
Here's a couple of things to think about:
What's the time span in which you expect the bulk of your visits? If it's an office application within the same physical company your capacity planning should be based on an 8 hour period. If most visits will come from the same continent you can plan for a 12 hour period instead, etc. Base your visitor spread on that.
Which pages do you anticipate will be the most popular and how heavy are those pages (i.e. how many pages can you load in one second)? Get an understanding of parts that would benefit from caching to squeeze out more performance.
Don't plan based on peak load; design your app to scale and start small.
Design your app in a way that you can take run snapshots at every 500th request; you can use tools like xhprof to create files that you can run through cachegrind tools to analyze the performance as it runs.
In short, there's no catch-all formula :) for a ballpark figure your formula will probably be good enough, but take the above points in consideration.