This question already has answers here:
Unique combination of all elements from two (or more) vectors
(6 answers)
create all possible permutations of two vectors in R [duplicate]
(1 answer)
Closed 4 years ago.
I have a column col1 from df1
col1
A
B
C
D
E
I have col2 from df2
col2
1
2
3
I want a new df df3 combining both col2 and col1
col2 col1
1 A
1 B
1 C
1 D
1 E
2 A
2 B
2 C
2 D
2 E
3 A
...
3 E
I used
n = 5; test = do.call("rbind", replicate(n, col2, simplify = FALSE))
n = 3; test = do.call("rbind", replicate(n, col1, simplify = FALSE))
And then merge data together. it's really not efficient with big data. what's the best way to solve this problem?
Use merge with a cross join, for a base R option:
df1 <- data.frame(col1=c("A", "B", "C", "D", "E"))
df2 <- data.frame(col2=c(1:3))
merge(df1, df2, by=NULL)
col1 col2
1 A 1
2 B 1
3 C 1
4 D 1
5 E 1
6 A 2
7 B 2
...
15 E 3
Demo
Related
Perhaps the question and answer are already posted, but I can't find it. Besides, is there any optimal approach to this problem?
Because this is just an example of some rows, but I'll apply it to a data frame of about 1 million rows.
I'm kind of new to R.
I have two data frames
DF1:
a b
1 1 0
2 2 0
3 2 0
4 3 0
5 5 0
and
DF2
l
1 A
2 B
3 C
4 D
5 E
What I try to do, is to match the values in DF1$a with the indexes of DF2 and assign those values to DF1$b so my result would be the following way.
DF1:
a b
1 1 A
2 2 B
3 2 B
4 3 C
5 5 E
I've coded a for loop to do this, but it seems that I'm missing something
for(i in 1:length(df1$a)){
df1$b[i] <- df2$l[df1$a[i]]
}
Which throws the following result:
DF1:
a b
1 1 1
2 2 2
3 2 2
4 3 3
5 5 5
Thanks in advance :)
We can use merge to merge two data frame based on row id and a.
# Create example data frame
DF1 <- data.frame(a = c(1, 2, 2, 3, 5))
DF2 <- data.frame(l = c("A", "B", "C", "D", "E"),
stringsAsFactors = FALSE)
# Create a column called a in DF2 shows the row id
DF2$a <- row.names(DF2)
# Merge DF1 and DF2 by a
DF3 <- merge(DF1, DF2, by = "a", all.x = TRUE)
# Change the name of column l to be b
names(DF3) <- c("a", "b")
DF3
# a b
# 1 1 A
# 2 2 B
# 3 2 B
# 4 3 C
# 5 5 E
Suppose I have two data frame
df1 <- data.frame(A = 1:6, B = 7:12, C = rep(1:2, 3))
df2 <- data.frame(C = 1:2, D = c("A", "B"))
I want to create a new column E in df1 whose value is based on the values of Column C, which can then be connected to Column D in df2. For example, the C value in the first row of df1 is "1". And value 1 of column C in df2 corresponds to "A" of Column D, so the value E created in df2 should from column "A", i.e., 1.
As suggested by Select values from different columns based on a variable containing column names, I can achieve this by two steps:
setDT(df1)
setDT(df2)
df3 <- df1[df2, on = "C"] # step 1 combines the two data.tables
df3[, E := .SD[[.BY[[1]]]], by = D] # step 2
My question is: Could we do this in one step? Furthermore, as my data is relatively large, the first step in this original solution takes a lot time. Could we do this in a faster way?
Any suggestions?
Here's how I would do it:
df1[df2, on=.(C), D := i.D][, E := .SD[[.BY$D]], by=D]
A B C D E
1: 1 7 1 A 1
2: 2 8 2 B 8
3: 3 9 1 A 3
4: 4 10 2 B 10
5: 5 11 1 A 5
6: 6 12 2 B 12
This adds the columns to df1 by reference instead of making a new table and so I guess is more efficient than building df3. Also, since they're added to df1, the rows retain their original ordering.
you can try this, the C column can indicates column value from df1
setDT(df1)
df1[, e := eval(parse(text = names(df1)[C])), by = 1:nrow(df1)]
df1
A B C e
1: 1 7 1 1
2: 2 8 2 8
3: 3 9 1 3
4: 4 10 2 10
5: 5 11 1 5
6: 6 12 2 12
This question already has answers here:
Select rows from a data frame based on values in a vector
(3 answers)
Closed 7 years ago.
I have a data.frame and a vector. I want to output only the rows from the data frame that have values in a column in common with the vector v.
For example:
v = (1,2,3,4,5)
df =
A B
1 a 2
2 b 6
3 c 4
4 d 1
5 e 8
What I want to do is, if df$b has any values of v in it then output the row. Basically if df$b[i] isn't in v then remove the row for i= 1:nrows(df)
output should be
A B
1 a 2
2 c 4
3 d 1
since 2,4 and 1 are in v.
You should make use of the %in% operator.
v <- c(1, 2, 3, 4, 5)
df <- read.table(text =
" A B
1 a 2
2 b 6
3 c 4
4 d 1
5 e 8", header = TRUE)
out <- df[df$B %in% v, ]
This gives:
A B
1 a 2
3 c 4
4 d 1
This question already has answers here:
How do I get a contingency table?
(6 answers)
Faster ways to calculate frequencies and cast from long to wide
(4 answers)
Closed 4 years ago.
I have a dataset like this below
Col Value
A 1
A 0
A 1
A 1
A 1
B 0
B 1
B 0
B 1
B 1
How do I transform this so that it looks like this below
Col1 Col2 Col3
A 4 1
B 3 2
Col2 counts all the 1s and Col3 counts all the 0s for each factor value in Col1.
Or we can use dcast
library(reshape2)
dcast(df1, Col~Value, value.var='Value', length)
For this, you can just use table:
table(mydf)
## Value
## Col 0 1
## A 1 4
## B 2 3
Or:
library(data.table)
as.data.table(mydf)[, as.list(table(Value)), by = Col]
## Col 0 1
## 1: A 1 4
## 2: B 2 3
Another approach of aggregating the values is:
df <- data.frame(Col=c("A","A","A","A","A","B","B","B","B","B"), Value=c(1,0,1,1,1,0,1,0,1,1))
new_df <- as.data.frame(with(df, tapply(Value, list(Col, Value), FUN = function(x) length(x))))
new_df <- setNames(cbind(rownames(new_df), new_df), c("Col1","Col2","Col3"))
new_df
Col1 Col2 Col3
A A 1 4
B B 2 3
We can set rownames to NULL if do not wish to see them:
rownames(new_df) <- NULL
Result:
Col1 Col2 Col3
1 A 1 4
2 B 2 3
This question already has answers here:
Concatenate / paste a column by a group and add to original data
(2 answers)
Closed 8 years ago.
I have the problem to merge certain characters of a group into a new column, e.g.
df = read.table(text="ID Class
1 a
1 b
2 a
2 c
3 b
4 a
4 b
4 c", header=T)`
and the output should be something like
ID Class Class.aggr
1 a a, b
1 b
2 a a, c
2 c
3 b b
4 a a,b,c
4 b
4 c`
I thought about using cat(union), but the data sample size is very high and I don't know how to call the Class characters dependent on the ID (tapply doesn't seem to work).
Here's a solution using base functions
df$class.arg<-""
df$class.arg[!duplicated(df$ID)]<-
tapply(df$Class, factor(df$ID,unique(df$ID)), paste, collapse=",")
which also produces
ID Class class.arg
1 1 a a,b
2 1 b
3 2 a a,c
4 2 c
5 3 b b
6 4 a a,b,c
7 4 b
8 4 c
This is a possible approach with dplyr
Create data.frame:
ID <- c(1,1,2,2,3,4,4,4)
Class <- c("a","b","a", "c", "b", "a", "b", "c")
df <- data.frame(ID,Class)
And then:
require(dplyr)
df <- df %.%
group_by(ID) %.% #group by ID
mutate(count = 1:n()) %.% #count occurence per ID
mutate(Class.aggr = paste(Class,collapse=",")) #paste the "Class" objects in a new column
df$Class.aggr[df$count>1] <- "" #delete information from other rows
df$count <- NULL #delete column with counts
#>df
# ID Class Class.aggr
#1 1 a a, b
#2 1 b
#3 2 a a, c
#4 2 c
#5 3 b b
#6 4 a a, b, c
#7 4 b
#8 4 c