I am following the paper found here and am trying to do Batch Gradient Descent (BGD) instead of the Stochastic Gradient Descent (SGD) as described in the paper.
For SGD what I gather is you do this (pseudocode):
for each user's actual rating {
1. calculate the difference between the actual rating
and the rating calculated from the dot product
of the two factor matrices (user vector and item vector).
2. multiply answer from 1. by the item vector
corresponding to that rating.
3. alter the initial user vector by the figure
calculated in 2. x by lambda e.g.:
userVector = userVector + lambda x answer from 2.
}
Repeat for every user
Do the same for every Item, except in 2. multiply by the user vector instead of the item vector
Go back to start and repeat until some breakpoint
For BGD what I did was:
for each user {
1. sum up all their prediction errors e.g.
real rating - (user vector . item vector) x item vector
2. alter the user vector by the figure calculated in 1. x by lambda.
}
Then repeat for the Items exchanging item vector in 2. for user vector
This seems to make sense, but on further reading, I have become confused about BGD. It says that BGD must iterate throughout the entire dataset just to make 1 change. Does this mean like what I have done, the entire dataset relative to that particular user, or does it literally mean the entire dataset?
I made an implementation that goes through the entire dataset, summing every single prediction error and then using that figure to update every single user vector (so all user vectors are updated by the same amount!). However, it does not approach a minimum and fluctuates rapidly, even with a lambda rate of 0.002. It can go from an average error of 12'500 to 1.2, then to -539 etc. Eventually, the number approaches infinity and my program fails.
Any help on the mathematics behind this would be great.
Related
I connect Tableau to R and execute an R function for recommending products. When R ends, the return value is a string which will have all products details, like below:
ID|Existing_Prod|Recommended_Prod\nC001|NA|PROD008\nC002|PROD003|NA\nF003|NA|PROD_ABC\nF004|NA|PROD_ABC1\nC005|PROD_ABC2|NA\nC005|PRODABC3|PRODABC4
(Each line separated by \n indicating end of line)
On Tableau, I display the calculated field which is as below:
ID|Existing_Prod|Recommended_Prod
C001|NA|PROD008
C002|PROD003|NA
F003|NA|PROD_ABC
F004|NA|PROD_ABC1
C005|PROD_ABC2|NA
C005|PRODABC3|PRODABC4
Above data reaches Tableau through a calculated field as a single string which I want to split based on pipeline ('|'). Now, I need to split this into three columns, separated by the pipeline.
I used Split function on the calculated field :
SPLIT([R_Calculated_Field],'|',1)
SPLIT([R_Calculated_Field],'|',2)
SPLIT([R_Calculated_Field],'|',3)
But the error says "SPLIT function cannot be applied on Table calculations", which is self explanatory. Are there any alternatives to solve this ?? I googled to check for best practices to handle integration between R and Tableau and all I could find was simple kmeans clustering codes.
Make sure you understand how partitioning and addressing work for table calcs. Table calcs pass vectors of arguments to the R script, and receive a single vector in response. The cardinality of those vectors depends on the partitioning of the table calc. You can view that by editing the table calc, clicking specific dimensions. The fields that are not checked determine the partitioning - and thus the cardinality of the arguments you send and receive from R
This means it might be tricky to map your problem onto this infrastructure. Not necessarily impossible. It was designed to send a series of vector arguments with one cell per partitioning dimension, say, Manufacturer and get back one vector with one result per Manufacturer (or whatever combination of fields partition your data for the table calc). Sounds like you are expecting an arbitrary length list of recommendations. It shouldn’t be too hard to have your R script turn the string into a vector before returning, but the size of the vector has to make sense.
As an example of an approach that fits this model more easily, say you had a Tableau view that had one row per Product (and you had N products) - and some other aggregated measure fields in the view per Product. (In Tableau speak, the view’s level of detail is at the Product level.)
It would be straightforward to pass those measures as a series of argument vectors to R - each vector having N values, and then have R return a vector of reals of length N where the value returned at each location was a recommender score for the product at that position. (Which is why the ordering aka addressing of the vectors also matters)
Then you could filter out low scoring products from the view and visually distinguish highly recommended products.
So the first step to understanding R integration is to understand how table calcs operate with partitioning and addressing and to think in terms of vectors of fixed lengths passed in both directions.
If this model doesn’t support your use case well, you might be able to do something useful with URL actions or the JavaScript API.
I tried playing around with this example in the Julia documentation. My attempt was to make the cell split into two parts that have half of the amount of protein each, so I set Theta=0.5. However, the plot looks like this:
It is obvious that the number of cells doubles every time they hit the target amount of protein, at the same time, since they are equal. How could I plot this? I also don't understand why the number of cells stops at 3 in the case below.
Plot the protein amount in each cell and think about the model you've created. After the first division, both cells have the same value. So at exactly the same time, you have an event fire. The "maximum" (whichever index is lower, so 1) will split, while 2 will keep growing above 1. But now that u[2] > 1, the rootfinding condition 1-maximum(u) will never hit zero again, and thus no more splits will occur. This means you'll have two splits total, i.e. 3 cells.
Remember, programs will do exactly what you tell them to. I assume that what you meant was, as your effect, split any cells that are greater than or equal to 1. If that's the affect! that you wanted, then you'd have to write it:
function affect!(integrator)
u = integrator.u
idxs = findall(x->x>=1-eps(eltype(u)),u)
resize!(integrator,length(u)+length(idxs))
u[idxs] ./ 2
u[end-idxs:end] = 0.5
nothing
end
would be one way to do it, and of course there are many others.
I have a data-frame with 30k rows and 10 features. I would like to calculate distance matrix like below;
gower_dist <- daisy(data-frame, metric = "gower"),
This function returns whole dissimilarity matrix. I want to get just the first row.
(Just distances of the first element in data-frame). How can I do it? Do you have an idea?
You probably need to get the source and extend it.
I suggest you extend the API by adding a second parameter y that defaults to x. Then the method should return the pairwise distances of each element in x to each element in y.
Fortunately, R is GPL open source, so this is easy.
This would likely be a welcome extension, you should submit it to the package authors for inclusion.
I apologize ahead of time for the crude way this question is worded. I was under the impression for the longest time that what I'm trying to do is called "Normalizing data" but after googling to try and find the method to do this, I seem to be mistaken so I'm not sure exactly what it's called that I'm trying to do (bear with me please).
I have a set of data like this:
0.17407
0.05013
0.08520
0.02892
0.02986
0.06286
0.04453
0.00425
0.20470
0.02267
0.01470
0.02460
0.01735
0.01069
0.02168
0.13912
0.02004
0.02018
0.07837
When you add them all you get 1.05392.
I'd like to "adjust" the data set so that the relative values all remain the same but the sum is equal to 1. When I googled normalizing data sets, I found a formula like this:
(x-min(x))/(max(x)-min(x))
However, this simply "ranks" each data point as a certain percentage of the maximum value so that your max value in your data set is equal to 1 and the minimum, 0.
Extra: Could someone enlighten me what this is called if not normalizing data. Obviously I've been carrying around this ignorant belief for far too long.
If you want your data to sum to 1 you normalize your data. You normalize by dividing by the sum of you series (sum_i x_i, where x_i are the elements of your data series).
The formula you mention is another possible rescaling, but as you observed it has a different effect. Note that in the first case you map x -> c*x (in your case: x -> 1/1.05392*x), while the second case rescales with x -> c*x + offset. Note also, that the later is not linear (unless min(x) = 0), that is f(x+y) != f(x) + f(y).
If your whole confusion is about the naming of things, than I would not worry to much. After all there is only convention and common agreement, but no absolute truth/authority. And the terms are reused in different fields, cf. Normalization on Wikipedia:
Normalization or normalisation refers to a process that makes something more normal or regular
So I am taking a course that requires learning R and I am struggling with one of the questions:
In this question, you will practice calling one function from within another function. We will estimate the probability of rolling two sixes by simulating dice throws. (The correct probability to four decimal places is 0.0278, or 1 in 36).
(1) Create a function roll.dice() that takes a number ndice and returns the result of rolling ndice number of dice. These are six-sided dice that can return numbers between 1 and 6. For example roll.dice(ndice=2) might return 4 6. Use the sample() function, paying attention to the replace option.
(2) Now create a function prob.sixes() with parameter nsamples, that first sets j equal to 0, and then calls roll.dice() multiple times (nsample number of times). Every time that roll.dice() returns two sixes, add one to j. Then return the probability of throwing two sixes, which is j divided by nsamples.
I am fine with part one, or at least I think so, so this is what I have
roll.dice<-function(ndice)
{
roll<-sample(1:6,ndice,TRUE)
return(roll)
}
roll.dice(ndice=2)
but I am struggling with part two. This is what I have so far:
prob.sixes<-function(nsamples) {
j<-vector
j<-0
roll.dice(nsamples)
if (roll.dice==6) {
j<-j+1
return(j)
}
}
prob.sixes(nsamples=3)
Sorry for all the text, but can anybody help me?
Your code has a couple of problems that I can see. The first one is the interpretation of the question. The question says:
Now create a function prob.sixes() with parameter nsamples, that first sets j equal to 0, and then calls roll.dice() multiple times (nsample number of times).
Check on your code, are you doing this? Or are you calling roll.dice() a single time? Look for ways to do the same thing (in your case, roll.dice) several times; you may consider the function for. Also, here, you need to store the result of this function on a variable, something like
rolled = roll.dice(2)
Second problem:
Every time that roll.dice() returns two sixes, add one to j.
You are checking if roll.dice==6. But this has two problems. First, roll.dice is a function, not a variable. So it will never be equal to 6. Also, you don't want to check if this variable is equal to six. You should ask whether this variable is equal to a pair of sixes. How can you write "a pair of sixes"?