Website Layout Statistics - web-statistics

I have a client who has suggested laying out a long list of categories in a custom order. The order is to be decided by them based on product items they sell the most etc.
I tend to disagree and feel that people browsing the internet prefer to search lists of categories that are in alphabetical order or sorted by something they can take reference of such as a date.
I would like to know others thoughts on this and it would be appreciated if anyone could point me in the direction of any open source surveys that have been taken in this area.
Thanks
Ben

What a silly stance to take regarding a simple customer request. Allow for both orderings, and other ones too. There is no survey that will demonstrate that the client is wrong as they are - by definition - correct.
Code that allows for different orderings has greater utility anyway, and real user data will be able to show them which - if either - should be the default.

Related

How to only display traffic from particular sources in a Google Analytics dashboard widget

I can't believe this is as difficult as I'm finding it to be, so I must be missing something obvious!
I want to track data from two particular ads, one on TheSixFifty.com and one in the mountain view voice email newsletter. I've gotten as far as identifying these two sources in a table:
https://imgur.com/a/ljbeonT
I want to only display those two sources, so I thought a filter would be the way to do that, set up like so:
https://imgur.com/p7lBxnk
But that results in this sad, sad empty table:
https://imgur.com/hOzdOdu
Please tell me what I am doing wrong! Does "containing" not mean what I think it does? Help!
You're right - it is something simple!
Your filter contains an AND statement, so it will only show data where the source contains BOTH mv-voice.com and TheSixFifty.com.
Your filter should look like:
Only show Source Matching RegEx:
(TheSixFifty|mv-voice)\.com
Here's a great intro to Regular Expressions from Robbin Steif's guide, they'll be incredibly useful for any analysis.

R: Clustering customers based on similar product interests for an event

I have a dataset with a list of customers and their product preferences. Basically, it is a simple CSV with a column called "CUSTOMER" and 5 other columns called "PRODUCT_WANTED_A", "PRODUCT_WANTED_B" and so on.
I asked these customers if they were interested to know more about a particular product, and answers could be simply YES or NO (1 or 0 in the dataset). The dataset can be downloaded here. Obviously, there will be customers with many different interests, based on the mix of their YES or NO in these 5 columns.
My goal is to understand which customers are similar to others in such interests. This will help me manage an agenda of product presentations and, in each meeting, I would like to understand the best grouping for it. I started with a hierarchical plot like this:
customer_list <- read.csv("customers_products_wanted.csv", sep=",", header = TRUE)
customer.hclust <- hclust(dist(customers_list))
plot(customer.hclust, customer_list$CUSTOMER)
library(rect.hclust)
rect.clust(customer.hplot,5)
This is the plot I got, asking for 5 clusters:
Tried the same, but with 10 clusters:
Question 1: I know it's always hard to tell, but looking at the charts and dataset, what would be your 'cut' to group customers? 5? 10?
I was reviewing the results, and in the same group, I had CUSTOMER112 with 1,0,1,0,1 as their preferences together with CUSTOMER 110 (1,1,1,1,1), CUSTOMER106 (1,1,1,1,0) and so on. The "distance" can be right, but in a given group I have customers with some relevant differences in their preferences.
Question 2: I don't know if it's a case of total ignorance about clustering, the code I used or even the dataset. Based on your experience, what would be your approach for the best clustering in this case?
Any comments will be highly appreciated. As you see, I did some efforts, but still in doubt.
Thanks a lot!
Ricardo
All answers were important, but #Ben video recommendation and #Samuel Tan advice on breaking the customers into grids, I found a good way to handle it.
The video gave me a lot of insights about "noisy" variables in hierarchical clustering, and the grid recommendation helped me think on what the data is really trying to tell me.
That said, a basic data cleaning process eliminated all customers with no interests in any products (this is obvious, but I didn't pay attention to it at first). Then, I ignored customers with a specific interest (single product). It was done because these customers wouldn't need to attend the workshop series I'm planning (they just want to listen about one product).
Evaluating all the others, interested in more than one product, I realized the product mix could point me to a better classification. From there, I grouped customers into 3 clusters: integration opportunities (2 or 3 products), convergence opportunities (4 products) and transformation opportunities (all products).
Now it's clear to me which customers I should focus on for my workshops, and plan my post-workshop sales campaigns leveraging materials that target each customer group (integration, convergence, transformation).
Thanks for all the advices!
Ricardo

Find out how many different 'brands' (custom dim) a user looks at on average?

Our website has many related products from a large number of different brands
I'd like to try and use Google Analytics data to find out, on average, how many different brands - which I have set as a hit level custom dimension - a user will look at over a given time period
I'm not sure if this is possible, but it would be really cool to know!
Helps with understanding brand loyalty/defection
I started by creating a simple custom report to see how many manufacturers get seen by how many users, but I don't know where to go next with this question in order to answer it!

Scrape all google search result for a specific name

I think the question has been answered here before,but i could not find the desired topic.I am a newbie in web scraping.I have to develop a script that will take all the google search result for a specific name.Then it will grab the related data against that name and if there is found more than one,the data will be grouped according to their names.
All I know is that,google has some kind of restriction on scraping.They provide a custom search api.I still did not use that api,but hoping to get all the resulted links corresponding to a query from that api. But, could not understand what will be the ideal process to do the scraping of the information from that links.Any tutorial link or suggestion is very much appreciated.
You should have provided a bit more what you have been doing, it does not sound like you even tried to solve it yourself.
Anyway, if you are still on it:
You can scrape Google through two ways, one is allowed one is not allowed.
a) Use their API, you can get around 2k results a day.
You can up it to around 3k a day for 2000 USD/year. You can up it more by getting in contact with them directly.
You will not be able to get accurate ranking positions from this method, if you only need a lower number of requests and are mainly interested in getting some websites according to a keyword that's the choice.
Starting point would be here: https://code.google.com/apis/console/
b) You can scrape the real search results
That's the only way to get the true ranking positions, for SEO purposes or to track website positions. Also it allows to get a large amount of results, if done right.
You can Google for code, the most advanced free (PHP) code I know is at http://scraping.compunect.com
However, there are other projects and code snippets.
You can start off at 300-500 requests per day and this can be multiplied by multiple IPs. Look at the linked article if you want to go that route, it explains it in more details and is quite accurate.
That said, if you choose route b) you break Googles terms, so either do not accept them or make sure you are not detected. If Google detects you, your script will be banned by IP/captcha. Not getting detected should be a priority.

How to mitigate against bandwagon effect (voting behavior) in my ranking system?

What I mean by bandwagon effect describes itself like so:
Already top-ranked items have a higher tendency to get voted on at all, possibly even to get upvoted.
What I am hoping to get is some concrete recommendations, at best based on your practical experience with a mathematical formula and in which situation it helped.
However, any useful pointers are more than welcome!
My ranking system
Please consider a ranking system at a website that has a reputation system and where users cast only upvotes on items and the ranking table is reset to start fresh every month.
Every user has one upvote per item within each month, and there is a reward for users who, within a certain month, upvoted an item that made it into the top ranks at the end of that month.
Users are told the following about what increases the weight of their upvote:
1)... the more reputation you have at the time of upvoting
2)... the fewer items you upvote within the current month (including the current upvote)
3)... the fewer upvotes that item already has within the current month before your own upvote
The ranking table is recalculated once a day and is visible to all.
Goal
I'd like to implement part 3) in an effort to correct the ranks of items where one cannot tell if some users just upvoted it because of the bandwagon effect (those users might hope to gain a "tactical" advantage simply by voting what they perceive lots of other users already upvoted)
Also, I hope to mitigate this way against the possible use of sock puppets that managed to attain some reputation, but upvote the same item or group of items.
Question
Is there a (maybe even tested?) mathematical formula that I could just apply on the time-ordered list of upvotes for each item to get a coffecient for each of those upvotes so that their weights will be corrected in a sensible fashion?
I'm thinking it's got to be something of a lograthmic function but I can't quite get a grip on it...
Thank you!
Edit
Zack says: "beyond a certain level of popularity, additional upvotes decrease the probability that something will be displayed"
To further clarify: what I am after is which actual mathematical approaches are worth trying out that will, in the form of a mathematical function, translate this descrease in pop (i.e., apply coefficients to the weights, see above) in sensible, balanced manner.
My hope is someone has practical experience with such approaches in a simmilar or general situation to the one above.
Consider applying the "Indie Rock Peter Principle": beyond a certain level of popularity, additional upvotes decrease the probability that something will be displayed.
Term coined by Leonard Richardson in this paper. Indie Rock Peter is of course from Diesel Sweeties.
I have always disliked the bandwagon effect in voting systems, especially "most viewed" rankings in which simply clicking on a highly ranked item increases its rank. My solution to this problem, which I have never tested or seen implemented, would be to keep track of how an item was reached (and then voted for), and ignore (or greatly decrease the weight of) votes that came from any sorted-by-ranking page.

Resources