I have a data.frame with two columns
> data.frame(a=c(5,4,3), b =c(1,2,4))
a b
1 5 1
2 4 2
3 3 4
I want to produce a list of data.frames with different combinations of those column values; there should be a total of six possible scenarios for the above example (correct me if I am wrong):
a b
1 5 1
2 4 2
3 3 4
a b
1 5 1
2 4 4
3 3 2
a b
1 5 2
2 4 1
3 3 4
a b
1 5 2
2 4 4
3 3 1
a b
1 5 4
2 4 2
3 3 1
a b
1 5 4
2 4 1
3 3 2
Is there a simple function to do it? I don't think expand.grid worked out for me.
Actually expand.grid can work here, but it is not recommended since it's rather inefficient when you have many rows in df (you need to subset n! out of n**n if you have n rows).
Below is an example using expand.grid
u <- do.call(expand.grid, rep(list(seq(nrow(df))), nrow(df)))
lapply(
asplit(
subset(
u,
apply(u, 1, FUN = function(x) length(unique(x))) == nrow(df)
), 1
), function(v) within(df, b <- b[v])
)
One more efficient option is to use perms from package pracma
library(pracma)
> lapply(asplit(perms(df$b),1),function(v) within(df,b<-v))
[[1]]
a b
1 5 4
2 4 2
3 3 1
[[2]]
a b
1 5 4
2 4 1
3 3 2
[[3]]
a b
1 5 2
2 4 4
3 3 1
[[4]]
a b
1 5 2
2 4 1
3 3 4
[[5]]
a b
1 5 1
2 4 2
3 3 4
[[6]]
a b
1 5 1
2 4 4
3 3 2
Using combinat::permn create all possible permutations of b value and for each bind it with a column.
df <- data.frame(a= c(5,4,3), b = c(1,2,4))
result <- lapply(combinat::permn(df$b), function(x) data.frame(a = df$a, b = x))
result
#[[1]]
# a b
#1 5 1
#2 4 2
#3 3 4
#[[2]]
# a b
#1 5 1
#2 4 4
#3 3 2
#[[3]]
# a b
#1 5 4
#2 4 1
#3 3 2
#[[4]]
# a b
#1 5 4
#2 4 2
#3 3 1
#[[5]]
# a b
#1 5 2
#2 4 4
#3 3 1
#[[6]]
# a b
#1 5 2
#2 4 1
#3 3 4
Related
I like to create two columns with cumulative frequency of "A" and "B" in the assignment columns.
df = data.frame(id = 1:10, assignment= c("B","A","B","B","B","A","B","B","A","B"))
id assignment
1 1 B
2 2 A
3 3 B
4 4 B
5 5 B
6 6 A
7 7 B
8 8 B
9 9 A
10 10 B
The resulting table would have this format
id assignment A B
1 1 B 0 1
2 2 A 1 1
3 3 B 1 2
4 4 B 1 3
5 5 B 1 4
6 6 A 2 4
7 7 B 2 5
8 8 B 2 6
9 9 A 3 6
10 10 B 3 7
How to generalize the codes for more than 2 categories (say for "A","B",C")?
Thanks
Use lapply over unique values in assignment to create new columns.
vals <- sort(unique(df$assignment))
df[vals] <- lapply(vals, function(x) cumsum(df$assignment == x))
df
# id assignment A B
#1 1 B 0 1
#2 2 A 1 1
#3 3 B 1 2
#4 4 B 1 3
#5 5 B 1 4
#6 6 A 2 4
#7 7 B 2 5
#8 8 B 2 6
#9 9 A 3 6
#10 10 B 3 7
We can use model.matrix with colCumsums
library(matrixStats)
cbind(df, colCumsums(model.matrix(~ assignment - 1, df[-1])))
A base R option
transform(
df,
A = cumsum(assignment == "A"),
B = cumsum(assignment == "B")
)
gives
id assignment A B
1 1 B 0 1
2 2 A 1 1
3 3 B 1 2
4 4 B 1 3
5 5 B 1 4
6 6 A 2 4
7 7 B 2 5
8 8 B 2 6
9 9 A 3 6
10 10 B 3 7
I have two lists. Each of them with many vectors (around 500) of different lengths and I would like to get a tibble data frame with three columns.
My reproducible example is the following:
> a
[[1]]
[1] 1 3 6
[[2]]
[1] 5 4
> b
[[1]]
[1] 3 4
[[2]]
[1] 5 6 7
I would like to get the following tibble data frame:
name index value
a 1 1
a 1 3
a 1 6
a 2 5
a 2 4
b 1 3
b 1 4
b 2 5
b 2 6
b 2 7
I would be grateful if someone could help me with this issue
using Base R:
transform(stack(c(a=a,b=b)),name=substr(ind,1,1),ind=substr(ind,2,2))
values ind name
1 1 1 a
2 2 1 a
3 3 1 a
4 5 2 a
5 6 2 a
6 3 1 b
7 4 1 b
8 5 2 b
9 6 2 b
10 7 2 b
using tidyverse:
library(tidyverse)
list(a=a,b=b)%>%map(~stack(setNames(.x,1:length(.x))))%>%bind_rows(.id = "name")
name values ind
1 a 1 1
2 a 2 1
3 a 3 1
4 a 5 2
5 a 6 2
6 b 3 1
7 b 4 1
8 b 5 2
9 b 6 2
10 b 7 2
Here is one option with tidyverse
library(tidyverse)
list(a= a, b = b) %>%
map_df(enframe, name = "index", .id = 'name') %>%
unnest
# A tibble: 10 x 3
# name index value
# <chr> <int> <dbl>
# 1 a 1 1
# 2 a 1 3
# 3 a 1 6
# 4 a 2 5
# 5 a 2 4
# 6 b 1 3
# 7 b 1 4
# 8 b 2 5
# 9 b 2 6
#10 b 2 7
data
a <- list(c(1, 3, 6), c(5, 4))
b <- list(c(3, 4), c(5, 6, 7))
I have a factor F. I need to create a vector V of the same length of F in which there are values of the frequencies of factor levels.
For example:
F <- factor(c("a","b","c","b","a","a","a","b"))
table(F)
F
a b c
4 3 1
V should be:
V
[1] 4 3 1 3 4 4 4 3
We can use ave
ave(seq_along(X), X, FUN = length)
#[1] 4 3 1 3 4 4 4 3
Or use the table itself
as.vector(table(X)[X])
#[1] 4 3 1 3 4 4 4 3
x <- c("a","b","c","b","a","a","a","b")
Then, depending on whether you want the output to be named,
table(x)[x]
# x
# a b c b a a a b
# 4 3 1 3 4 4 4 3
c(table(x)[x])
# a b c b a a a b
# 4 3 1 3 4 4 4 3
as.numeric(table(x)[x])
# [1] 4 3 1 3 4 4 4 3
unname(table(x)[x])
# [1] 4 3 1 3 4 4 4 3
You can try this:
t=table(F)
as.numeric(sapply(1:length(F), function(i) t[F[i]]))
output
[1] 4 3 1 3 4 4 4 3
I try to count triplets; for this I use three vectors that are packed in a dataframe:
X=c(4,4,4,4,4,4,4,4,1,1,1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,3,3)
Y=c(1,1,1,1,1,1,1,1,1,1,1,1,2,2,3,4,2,2,2,2,3,4,1,1,2,2,3,3,4,4)
Z=c(4,4,5,4,4,4,4,4,6,1,1,1,1,1,1,1,2,2,2,2,7,2,3,3,3,3,3,3,3,3)
Count_Frame=data.frame(matrix(NA, nrow=(length(X)), ncol=3))
Count_Frame[1]=X
Count_Frame[2]=Y
Count_Frame[3]=Z
Counts=data.frame(table(Count_Frame))
There is the following problem: if I increase the value range in the vectors or use even more vectors the "Counts" dataframe quickly approaches its size limit due to the many 0-counts. Is there a way to exclude the 0-counts while generating "Counts"?
We can use data.table. Convert the 'data.frame' to 'data.table' (setDT(Count_Frame)), grouped by all the columns (.(X, Y, Z)), we get the number or rows (.N).
library(data.table)
setDT(Count_Frame)[,.N ,.(X, Y, Z)]
# X Y Z N
# 1: 4 1 4 7
# 2: 4 1 5 1
# 3: 1 1 6 1
# 4: 1 1 1 3
# 5: 1 2 1 2
# 6: 1 3 1 1
# 7: 1 4 1 1
# 8: 2 2 2 4
# 9: 2 3 7 1
#10: 2 4 2 1
#11: 3 1 3 2
#12: 3 2 3 2
#13: 3 3 3 2
#14: 3 4 3 2
Instead of naming all the columns, we can use names(Count_Frame) as well (if there are many columns)
setDT(Count_Frame)[,.N , names(Count_Frame)]
You can accomplish this with aggregate:
Count_Frame$one <- 1
aggregate(one ~ X1 + X2 + X3, data=Count_Frame, FUN=sum)
This will calculate the positive instances of table, but will not list the zero counts.
One solution is to create a combination of the column values and count those instead:
library(tidyr)
as.data.frame(table(unite(Count_Frame, tmp, X1, X2, X3))) %>%
separate(Var1, c('X1', 'X2', 'X3'))
Resulting output is:
X1 X2 X3 Freq
1 1 1 1 3
2 1 1 6 1
3 1 2 1 2
4 1 3 1 1
5 1 4 1 1
6 2 2 2 4
7 2 3 7 1
8 2 4 2 1
9 3 1 3 2
10 3 2 3 2
11 3 3 3 2
12 3 4 3 2
13 4 1 4 7
14 4 1 5 1
Or using plyr:
library(plyr)
count(Count_Frame, colnames(Count_Frame))
output
# > count(Count_Frame, colnames(Count_Frame))
# X1 X2 X3 freq
# 1 1 1 1 3
# 2 1 1 6 1
# 3 1 2 1 2
# 4 1 3 1 1
# 5 1 4 1 1
# 6 2 2 2 4
# 7 2 3 7 1
# 8 2 4 2 1
# 9 3 1 3 2
# 10 3 2 3 2
# 11 3 3 3 2
# 12 3 4 3 2
# 13 4 1 4 7
# 14 4 1 5 1
There is given a unordered factor ID, a reference vector for the rank of each level and a label for each level. Now I want to order the ID's by given rank and after that I want to overrider the labels in the factor.
Could you give a advise if there is a better way to do so:
ID<-factor(c(1,2,2,3,1,3,3,2,1,1)+10)
Rank<-c("11"=3,"12"=1,"13"=2)
Label<-c("11"="B","12"="A","13"="C")
ID.Rank<-factor(ID, levels=names(Rank),labels=Rank)
ID.Rank<-factor(ID.Rank, levels=sort(Rank),order=T)
ID.Label<-factor(ID, levels=names(Label),labels=Label)
data.frame(ID,ID.Rank,ID.Label)
### here is importent that ID.Rank has a certain order.
factor(ID.Rank, labels=Label[match(levels(ID.Rank), Rank)])
If I understood your question correctly, here is how you can solve the problem.
set.seed(2)
ID<-as.numeric(ID)
df1<-as.data.frame(ID)
> df1
ID
1 1
2 1
3 3
4 2
5 3
6 2
7 3
8 3
9 2
10 3
df2<-as.data.frame(Rank)
df2$ID<-rownames(df2)
> df2
Rank ID
1 3 1
2 1 2
3 2 3
df3<-merge(df1,df2,by="ID")
ID Rank
1 1 3
2 1 3
3 2 1
4 2 1
5 2 1
6 3 2
7 3 2
8 3 2
9 3 2
10 3 2
df3$Rank is what you are looking as the final result. You can convert that to factor.
Updated as per comments: If you want the original order of ID:
df1$IDo<-rownames(df1)
df3
ID IDo Rank
1 1 1 3
2 1 7 3
3 1 4 3
4 2 3 1
5 2 9 1
6 2 10 1
7 3 2 2
8 3 5 2
9 3 6 2
10 3 8 2
myFac <- factor(ID, levels=Rank, labels=names(Rank) )
myFac
[1] 3 3 2 2 3 1 1 2 2 3
Levels: 1 < 2 < 3
match(levels(myFac), names(Label) )
[1] 1 2 3
Label[match(levels(myFac), names(Label) )]
1 2 3
"B" "A" "C"
levels(myFac) <- Label[match(levels(myFac), names(Label) )]
myFac
#-----
[1] C C A A C B B A A C
Levels: B < A < C
Assuming Rank and Label are always in the same order, you just need to order the labels appropriately and then use them to create the ordered factor.
ID <- factor(c(1,2,2,3,1,3,3,2,1,1)+10)
Rank <- c("11"=3,"12"=1,"13"=2)
Label <- c("11"="B","12"="A","13"="C")
Label <- Label[order(Rank)]
factor(ID, levels=names(Label), labels=Label, order=TRUE)
## [1] B A A C B C C A B B
## Levels: A < C < B