I am trying two merge two columns in data table 'A' with another column in another data table 'B' which is the unique value of a column . I want to merge in such a way that for every unique combination of two variables in data table 'A' , we get all unique values of column in data table 'B' repeated.
I tried merge but it doesn't give me all the values.I also tried the automated recycling function in data.table but this also doesn't give me the result.
Input:
data.table A
X Y
1 1
1 2
1 3
2 1
3 1
4 4
4 5
5 6
data.table B
Z
1
2
Expected output
X Y Z
1 1 1
1 1 2
1 2 1
1 2 2
1 3 1
1 3 2
2 1 1
2 1 2
3 1 1
3 1 2
4 4 1
4 4 2
4 5 1
4 5 2
5 6 1
5 6 2
We can make use of crossing from tidyr
library(tidyr)
crossing(A, B)
# X Y Z
#1 1 1 1
#2 1 1 2
#3 1 2 1
#4 1 2 2
#5 1 3 1
#6 1 3 2
#7 2 1 1
#8 2 1 2
#9 3 1 1
#10 3 1 2
#11 4 4 1
#12 4 4 2
#13 4 5 1
#14 4 5 2
#15 5 6 1
#16 5 6 2
Or with merge from base R, but the order will be slightly different
merge(A, B)
To get the correct order, replace the arguments in reverse and then order the columns
merge(B, A)[c(names(A), names(B))]
This question already has answers here:
Extract row corresponding to minimum value of a variable by group
(9 answers)
Closed 5 years ago.
I have a table which contains multiple rows of the different data for a key of multiple columns.
Table looks like this:
A B C
1 1 1 2
2 1 1 3
3 2 1 4
4 1 2 4
5 2 2 3
6 2 3 1
7 2 3 2
8 2 3 2
I also discovered how to remove all of the duplicate elements using unique command for multiple colums, so the data duplication is not a problem.
I would like to know how to for every key(columns A and B in example) in the table to find only the minimum value in third column(C column in table)
At the end table should look like this
A B C
1 1 1 2
3 2 1 4
4 1 2 4
5 2 2 3
6 2 3 1
Thanks for any help. It is really appreciated
In any question, feel free to ask
con <- textConnection(" A B C
1 1 1 2
2 1 1 3
3 2 1 4
4 1 2 4
5 2 2 3
6 2 3 1
7 2 3 2
8 2 3 2")
df <- read.table(con, header = T)
df[with(df, order(A, B, C)), ]
df[!duplicated(df[1:2]),]
# A B C
# 1 1 1 2
# 3 2 1 4
# 4 1 2 4
# 5 2 2 3
# 6 2 3 1
So I have a data set that has multiple variables that I want to use to create a new variable. I have seen other questions like this that use the ifelse statement, but this would be extremely insufficient since the new variable is based on 32 other variables. The variables are coded with values of 1, 2, 3, or NA, and I am wanting the new variable to be coded as 1 if 2 or more of the 32 variables take on a value of 1, and 2 otherwise. Here is a small example of what I have been trying to do.
df <- data.frame(id = 1:10, v1 = c(1,2,2,2,3,NA,2,2,2,2), v2 = c(2,2,2,2,2,1,2,1,2,2),
v3 = c(1,2,2,2,2,3,2,2,2,2), v4 = c(2,2,2,2,2,1,2,2,2,3))
and the result I am looking for is this:
id v1 v2 v3 v4 new
1 1 1 2 1 2 1
2 2 2 2 2 2 2
3 3 2 2 2 2 2
4 4 2 2 2 2 2
5 5 3 2 2 2 1
6 6 NA 1 3 1 2
7 7 2 2 2 2 2
8 8 2 1 2 2 2
9 9 2 2 2 2 2
10 10 2 2 2 3 2
I have also tried using rowSums within the if else statement, but with the missing values this doesn't work for all observations unless I recode the NAs to another value which I want to avoid doing, and besides that I feel like there would be a much more efficient way of doing this.
I feel like it is likely that this question has been answered before, but I couldn't find anything on it. So help or direction to a previous answer would be appreciated.
It looks like you were very close to getting your desired output, but you were probably missing the na.rm = TRUE argument as part of your rowSums() call. This will remove any NAs before rowSums does its calculations.
Anyway, using your data frame from above, I created a new variable that counts the number of times 1 appears across the variables, while ignoring NA values. Note that I've subsetted the data to exclude the id column:
df$count <- rowSums(df[-1] == 1, na.rm = TRUE)
Then I created another variable using an ifelse statement that returns a 1 if the count is 2 or more or a 2 otherwise.
df$var <- ifelse(df$count >= 2, 1, 2)
The returned output:
id v1 v2 v3 v4 count var
1 1 1 2 1 2 2 1
2 2 2 2 2 2 0 2
3 3 2 2 2 2 0 2
4 4 2 2 2 2 0 2
5 5 3 2 2 2 0 2
6 6 NA 1 3 1 2 1
7 7 2 2 2 2 0 2
8 8 2 1 2 2 1 2
9 9 2 2 2 2 0 2
10 10 2 2 2 3 0 2
UPDATE / EDIT: As mentioned by Gregor in the comments, you can also just wrap the rowSums function in the ifelse statement for one line of code.
Say I have a list in R like so,
[1] 3 5 4 7
And I want to generate all "drawings" from this list, from 1 up to the value of each number. For example,
1 1 1 1
1 1 1 2
1 1 1 3
...
2 3 3 1
2 3 3 2
2 3 3 3
...
3 5 4 7
I know I have used rep() in the past to do something very similar, which works for lists of 2 or 3 numbers (i.e. something like 1 4 5), but I'm not sure how to generalize this here.
Thoughts?
As suggested in comments, use Map function to apply seq to elements of your vector, then use expand.grid to generate data.frame with Cartesian product of result's elements:
head(expand.grid(Map(seq,c(3,5,4,7))))
Var1 Var2 Var3 Var4
1 1 1 1 1
2 2 1 1 1
3 3 1 1 1
4 1 2 1 1
5 2 2 1 1
6 3 2 1 1
I would like to do the following:
A B
1 2
1 3
1 4
2 3
2 4
3 4
using data.table, but I am not sure how to exclude the already used numbers cumulatively.