I'm am trying to create a matrix of certain pairwise values, first by doing the calculations in a matrix and then melt it and join in some extra information. I would like to also include that extra information on the columns and rows so that I achieve something like the following if I convert it back into a matrix format (or data frame or whatever is possible):
X.col 1 2 3 4
Y.col 1 2 3 4
Z.col 1 2 3 4
Col 1 2 3 4
X.row Y.row Z.row Row
1 1 1 1 1 0 0 1
2 2 2 2 0 1 0 0
3 3 3 3 0 1 1 0
4 4 4 4 0 1 0 1
or perhaps without the names, like this:
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
1 1 1 1 1 0 0 1
2 2 2 2 0 1 0 0
3 3 3 3 0 1 1 0
4 4 4 4 0 1 0 1
Basically x,y,z contain some extra information on some products which ID's are stored in row and col. I'm doing some pairwise comparisons which I then present in a matrix for our managers, who would also like to see that extra information along with the matrix as shown above.
So for the data, let a df contain the melted matrix joined with the extra information:
row = c(rep(1,4), rep(2,4), rep(3,4), rep(4,4)) #e.g. product id
col = rep(c(1,2,3,4), 4) #e.g. product id
value = c(1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1) #pairwise index value, calculated from comparing product in row with product in col
x.row = c(rep(1,4), rep(2,4), rep(3,4), rep(4,4)) #some x information on row id
y.row = c(rep(1,4), rep(2,4), rep(3,4), rep(4,4)) #some y information on row id
z.row = c(rep(1,4), rep(2,4), rep(3,4), rep(4,4)) #some z information on row id
x.col = rep(c(1,2,3,4), 4) #some x information on col id
y.col = rep(c(1,2,3,4), 4) #some y information on col id
z.col = rep(c(1,2,3,4), 4) #some z information on col id
df <- data.frame(row, col, value, x.row, y.row, z.row, x.col, y.col, z.col)
The question is then: how to accomplish that matrix visual as shown above, or something like it in R?
It is fairly easy to go about the issue in excel since it is cell-based, but I'm more interested in a solution in R (if possible). So I guess I'm looking for inspiration on how I might get about it, or maybe even a specific solution on how to do it. I've been thinking if it is possible using the openxlsx package, and manipulating a sheet in excel through R. Or maybe using lists, and storing them on the DF... Or heatmaply (which has an option for e.g. a dendrogram above a heatmap).
I must admit, however, I'm stuck. I can't get my head around it... So I guess I'm looking for your expertise :)
Related
I have a data frame that contains multiple values in each spot, like this:
ID<-c(1,1,1,2,2,2,2,3,3,4,4,4,5,6,6)
W<-c(29,72,32,33,34,44,42,78,32,42,18,26,10,34,39)
df1<-data.frame(ID, W)
df<-ddply(df1, .(ID), summarize,
X=paste(unique(W),collapse=","))
ID X
1 1 29,72,32
2 2 33,34,44,42
3 3 78,32
4 4 42,18,26
5 5 10
6 6 34,39
I am trying to generate another column using an if-else function so that every ID that has an X value greater than 70 will show a 1, and all others will show a 0, like this:
ID X Y
1 1 29,72,32 1
2 2 33,34,44,42 0
3 3 78,32 1
4 4 42,18,26 0
5 5 10 0
6 6 34,39 0
This is the code that I tried:
df$Y <- ifelse(df$X>=70, 1, 0)
But it doesn't work; it only seems to put the first value of each spot through the function:
ID X Y
1 1 29,72,32 0
2 2 33,34,44,42 0
3 3 78,32 1
4 4 42,18,26 0
5 5 10 0
6 6 34,39 0
It worked fine on my one column that has only one value per spot. Is there a way to get to the if-else function to evaluate every value in each spot and assign a 1 if any of them fit the statement?
Thank you, I'm sorry that I do not know a lot of R vocabulary yet.
As 'X' is a string, we can split the 'X' at the , to create a list of vectors, loop over the list with map check if there are any numeric converted values are greater than 70
library(dplyr)
library(purrr)
df %>%
mutate(Y = map_int(strsplit(X, ","), ~ +(any(as.numeric(.x) > 70))))
New to SO, but can't figure out how to get this code to work. I have a dataframe that is very large, and is set up like this:
Number Year Type Amount
1 1 A 5
1 2 A 2
1 3 A 7
1 4 A 1
1 1 B 5
1 2 B 11
1 3 B 0
1 4 B 2
This goes onto multiple for multiple numbers. I want to take this dataframe and make a new dataframe that has two of the rows together, but it would be nested (for example, row 1 and row 2, row 1 and row 3, row 1 and row 4, row 2 and row 3, row 2 and row 4) where each combination of each year is together within types and numbers.
Example output:
Number Year Type Amount Number Year Type Amount
1 1 A 5 1 2 A 2
1 1 A 5 1 3 A 7
1 1 A 5 1 4 A 1
1 2 A 2 1 3 A 7
1 2 A 2 1 4 A 1
1 3 A 7 1 4 A 1
I thought that I would do a for loop to loop within number and type, but I do not know how to make the rows paste from there, or how to ensure that I am only getting the combinations of the rows once. For example:
for(i in 1:n_number){
for(j in 1:n_type){
....}}
Any tips would be appreciated! I am relatively new to coding, so I don't know if I should be using a for loop at all. Thank you!
df <- data.frame(Number= rep(1,8),
Year = rep(c(1:4),2),
Type = rep(c('A','B'),each=4),
Amount=c(5,2,7,1,5,11,0,2))
My interpretation is that you want to create a dataframe with all row combinations, where Number and Type are the same and Year is different.
First suggestion - join on Number and Type, then remove rows that have different Year. I added an index to prevent redundant matches (1 with 2 and 2 with 1).
df$index <- 1:nrow(df)
out <- merge(df,df,by=c("Number","Type"))
out <- out[which(out$index.x>out$index.y & out$Year.x!=out$Year.y),]
Second suggestion - if you want to see a version using a loop.
out2 <- NULL
for (i in c(1:(nrow(df)-1))){
for (j in c((i+1):nrow(df))){
if(df[i,"Year"]!=df[j,"Year"] & df[i,"Number"]==df[j,"Number"] & df[i,"Type"]==df[j,"Type"]){
out2 <- rbind(out2,cbind(df[i,],df[j,]))
}
}
}
I illustrate my question with a small date frame such as:
X1 X2 X3
1 0 1 2
2 0 1 3
3 0 1 4
4 0 2 3
5 0 2 4
6 0 3 4
7 1 2 3
8 1 2 4
9 1 3 4
10 2 3 4
(The real one will have a huge number of rows...)
I have to expand each row of this data frame with 12 additional values, considering that the 3 values already present are the 3 starting terms of a series defined by the recurrence equation:
U(n) = U(n-1) - Min(U(n-2), U(n-3))
Consider for example the 1st row with 0, 1, 2. The next term (4th) has to be :
2 - Min(1, 0) = 2 - 0 = 2
etc. At the end, my first row will be :
0 1 2 2 1 -1 -2 -1 1 3 4 3 0 -3 -3
And I have to repeat this operation on each row of my initial data frame. Of course, I know I can use intricated loops "for {***}" to do this, but it's time consuming.
Is there any way to build the final data frame column by column? (I mean not listing the rows but constructing at once entire columns based on the recurrence equation)
You do not have to write an 'intricate loop' and work row by row. You can write just one simple loop and calculate column by column such as:
# recreate the sample dataframe
data <- data.frame(x1 = c(rep(0, 6), 1,1,1,2),
X2 = c(1,1,1,2,2,4,2,2,3,3),
X3 = c(2,3,4,3,4,4,3,4,4,4))
# create placeholder dataframe full of zeros
temp <- data.frame(matrix(data = 0, nrow = 10, ncol = 15))
# write the original dataframe into the placeholder dataframe
temp[, 1:3] <- data
# for columns 4 to 15 in the placeholder dataframe
for(i in 4:15)
{
# calculate each column based on the given formula
temp[, i] <- temp[, i-1] - pmin(temp[,i-2], temp[, i-3])
}
I am new to R . I have a data frame(usr.query) with structure as shown below
[
Now I want to take text of each id and compare it to text of all the other id and and if there is a match, i want to append it to a new column say count of match.
A0008 with A0043,A0065,A0082,B0018,B0026
A0043 with A0008,A0065,A0082,B0018,B0026
Function to apply
count_match = length(intersect(unlist(strsplit(query1," ")),unlist(strsplit(query2," "))))
The query 1 here is text of A0008 and query 2 is text of A0043,A0065,A0082,B0018,B0026
I tried the suggested solution and here is the result.
No loops are necessary; you'll usually find that's the case in R, because it's really good at utilizing vectorized operations. In this case, you can get the necessary combinations with combn, and then make the match_count column by subsetting the original data.frame with the combinations of the new one, and testing for equality. Adding zero changes the values from Boolean to numeric (use as.integer, if you prefer).
# assemble sample data
df <- data.frame(id = 1:5, text = c('apple', 'mango', 'apple', 'apple', 'mango'))
# make combinations
df2 <- as.data.frame(t(combn(df$id, 2)))
# add names
names(df2) <- c('main_id', 'compared_to_id')
# test for match
df2$match_count <- (df[df2$main_id, 'text'] == df[df2$compared_to_id, 'text']) + 0
The result:
> df2
main_id compared_to_id match_count
1 1 2 0
2 1 3 1
3 1 4 1
4 1 5 0
5 2 3 0
6 2 4 0
7 2 5 1
8 3 4 1
9 3 5 0
10 4 5 0
I have returned stats on my data using the table command as such:
subject<-c(4,4,2,2,3,3)
correct<-c(0,1,1,1,0,0)
test<-data.frame(subject,correct)
freq_test<-head(table(test$subject,test$correct))
This returns a table which looks like this
0 1
2 0 2
3 2 0
4 1 1
That's great, but the problem is that I would like, the first column to be a vector rather than row.names (so that I can code it properly as "subject").
Is there a way to get this column to act in this way?
Just make a new data frame with the row names of freq_test as the first column:
> df<-data.frame(as.numeric(rownames(freq_test)),freq_test)
> colnames(df)[1]="subject"
> df
subject X0 X1
2 2 0 2
3 3 2 0
4 4 1 1
>
Of course, you can rename X0 and X1 to whatever you want by editing colnames(df) as above.
If you want the data in "long" format (useful for some models and plotting, and especially when your tables are more complicated), the table method for the generic function as.data.frame will take care of this for you:
> as.data.frame(table(test))
subject correct Freq
1 2 0 0
2 3 0 2
3 4 0 1
4 2 1 2
5 3 1 0
6 4 1 1
I think you should have used the standard method of construction of a data.frame, which is with name=values pairs:
test <- data.frame( subject=subject, correct=correct)
The first subject will be interpreted as a name to be quoted and the second subject will be interpreted .... i.e, the enclosing environments will be searched for an object named subject and its value will be assigned to the "subject" column of "test".