If I take the following two-column dataframe, with one column being a factor and the other being a numeric vector:
data <- data.frame(x=c("a","b","c","d","e","f","g","h"), y = c(1,2,3,3,2,1,5,6))
data$x <- as.factor(data$x)
How can I turn it into a new dataframe data2 where the factor levels of data$x are columns and the rows contain the corresponding numeric values from data$y, like so?
structure(list(a = 1, b = 2, c = 3, d = 3, e = 2, f = 1, g = 5,
h = 6), class = "data.frame", row.names = c(NA, -1L))
With base R, use rbind.data.frame:
d <- rbind.data.frame(data$y)
colnames(d) <- data$x
a b c d e f g h
1 1 2 3 3 2 1 5 6
With pivot_wider:
tidyr::pivot_wider(data, names_from = x, values_from = y)
a b c d e f g h
1 1 2 3 3 2 1 5 6
or with xtabs:
xtabs(y ~ ., data = data) |>
as.data.frame.list()
a b c d e f g h
1 1 2 3 3 2 1 5 6
Another possible solution, using data.table::transpose:
data.table::transpose(data, make.names = 1)
#> a b c d e f g h
#> 1 1 2 3 3 2 1 5 6
Another option using the sjmisc package:
library(sjmisc)
data %>%
rotate_df(cn = T)
# a b c d e f g h
#y 1 2 3 3 2 1 5 6
how to make a combination of letters
label=c("A","B","C","D","E")
into a dataframe with 4 group (G1, G2, G3, G4) as follows
k2=data.frame(G1=c("AB","AC","AD","AE","BC","BD","BE","CD","CE","DE"),
G2=c("C","B","B","B","A","A","A","A","A","A"),
G3=c("D","D","C","C","D","C","C","B","B","B"),
G4=c("E","E","E","D","E","E","D","E","D","C"))
and if i want to make group into 3 (G1, G2, G3) and give condition so that "B" and "C" can't separate like below dataframe how to do?
k3=data.frame(G1=c("BCD","BCE","BCA","AE","AD","DE"),
G2=c("A","A","D","BC","BC","BC"),
G3=c("E","D","E","D","E","A"))
Thank you very much for the help
Here is one way to do what you want to do:
a <- t(combn(c("A", "B", "C", "D", "E"), 2))
a <- paste0(a[, 1], a[, 2])
b <- t(apply(a, 1, function(x) setdiff(c("A", "B", "C", "D", "E"), x)))
k2 <- data.frame(a, b)
colnames(k2) <- paste0("G", 1:4)
k2
# G1 G2 G3 G4
# 1 AB C D E
# 2 AC B D E
# 3 AD B C E
# 4 AE B C D
# 5 BC A D E
# 6 BD A C E
# 7 BE A C D
# 8 CD A B E
# 9 CE A B D
# 10 DE A B C
The simplest way to do the second version is to exclude "C" and add it at the end:
d <- t(combn(c("A", "B", "D", "E"), 2))
d <- paste0[d[, 1], d[, 2]]
e <- t(apply(d, 1, function(x) setdiff(c("A", "B", "D", "E"), x)))
k3 <- data.frame(d, e)
colnames(k3) <- paste0("G", 1:3)
k3 <- data.frame(sapply(g, function(x) gsub("B", "BC", x)))
k3
# G1 G2 G3
# 1 ABC D E
# 2 AD BC E
# 3 AE BC D
# 4 BCD A E
# 5 BCE A D
# 6 DE A BC
This does not match your k3 exactly, but it is more consistent with k2.
This question already has answers here:
Repeat each row of data.frame the number of times specified in a column
(10 answers)
Closed 1 year ago.
for example I have a dataset that looks like this
structure(list(ID = c(1, 2, 3, 4, 5), COL1 = c("A", "B", "C",
"D", "E"), COL2 = c("F", "G", "H", "I", "J"), Paired = c(2, 3,
1, 2, 1)), row.names = c(NA, -5L), class = c("tbl_df", "tbl",
"data.frame"))
ID COL1 COL2 Paired
1 A F 2
2 B G 3
3 C H 1
4 D I 2
5 E J 1
I would like to create a dataset that looks like this. Note the number in the paired column
Col Col2
A F
A F
F A
F A
B G
B G
B G
G B
G B
G B
C H
H C
D I
D I
I D
I D
E J
J E
Note that A F is paired up two times. I want it basically to show in long the two times A and F paired in both combination scenario so 2 pairs is AF, AF, FA, FA.
We can use
library(dplyr)
library(tidyr)
df1 %>%
uncount(Paired) -> tmp
tmp %>%
rename(COL1= COL2, COL2 = COL1) %>%
bind_rows(tmp) %>%
select(-ID)
-output
A tibble: 18 x 2
COL2 COL1
<chr> <chr>
1 A F
2 A F
3 B G
4 B G
5 B G
6 C H
7 D I
8 D I
9 E J
10 F A
11 F A
12 G B
13 G B
14 G B
15 H C
16 I D
17 I D
18 J E
I have a vector with around 600 unique elements: A, B, C, D, E, F, G, H, I, etc. Using R, I would like to get a dataframe with 4 columns, where each row has all possible combinations of 4 elements under the following conditions:
"A" goes always in column 1.
Column 2 has B or C.
Columns 3 and 4 have pairs of the remaining elements (pair X, Y is considered equal to pair Y, X). I expect to get something like:
1 2 3 4
A B D E
A B F G
A B H I
A C D E
A C F G
A C H I
A possible solution using combn(), expand.grid() and tidyr::separate based on #akrun's comment.
library(magrittr)
library(tidyr)
vec_a <- LETTERS[1]
vec_b <- LETTERS[2:3]
vec_c <- LETTERS[4:26]
vec_d <- combn(vec_c, 2, FUN = paste, collapse = " ")
res <- expand.grid(vec_a, vec_b, vec_d) %>%
tidyr::separate(Var3, c("Var3","Var4"), " ")
head(res, 25)
#> Var1 Var2 Var3 Var4
#> 1 A B D E
#> 2 A C D E
#> 3 A B D F
#> 4 A C D F
#> 5 A B D G
#> 6 A C D G
#> 7 A B D H
#> 8 A C D H
#> 9 A B D I
#> 10 A C D I
#> 11 A B D J
#> 12 A C D J
#> 13 A B D K
#> 14 A C D K
#> 15 A B D L
#> 16 A C D L
#> 17 A B D M
#> 18 A C D M
#> 19 A B D N
#> 20 A C D N
#> 21 A B D O
#> 22 A C D O
#> 23 A B D P
#> 24 A C D P
#> 25 A B D Q
I'm trying to do a rolling count for variable one or two if they have group x.
Essentially I want something get back new_var1 and new_var2 in this example, where every time Var1 or Var2 has the combination a and group f it counts, or b and group f and so on. So, the overall appearances of a in each group are counted, regardless if a appears in column Var1 or Var2. However, the counts must be assigned to the proper colum. So, if a appears in column Var1 then the actual count must be assigned to column new_var1. Accordingly, for a in Var2 the actual count is to be in new_var2.
x <- expand.grid(letters[1:5],letters[1:5],KEEP.OUT.ATTRS = FALSE)
x <- x[x[,1]!=x[,2],c(2,1)]
x <- data.frame(x,group=as.character(rep(letters[c(1,2,1,4,1)+5],each=4)))
x<- data.frame(x,new_var1 = c(1,2,3,4,1,2,3,4,2,3,4,5,1,2,3,4,3,4,5,6))
x<- data.frame(x,new_var2 = c(1,1,1,1,1,1,1,1,5,2,2,2,1,1,1,1,6,3,6,3))`
Var2 Var1 group new_var2 new_var1
a b f 1 1
a c f 2 1
a d f 3 1
a e f 4 1
b a g 1 1
b c g 2 1
b d g 3 1
b e g 4 1
c a f 2 5
c b f 3 2
c d f 4 2
c e f 5 2
d a i 1 1
d b i 2 1
d c i 3 1
d e i 4 1
e a f 3 6
e b f 4 3
e c f 5 6
e d f 6 3
Any help would greatly be appreciated.
I managed to get this working:
x <- data.table(x)
x[, new_var1a := seq(.N) , by = c('Var1','group')]
x[, new_var2a := seq(.N) , by = c('Var2','group')]
Var2 Var1 group new_var2 new_var1 new_var1a new_var2a
1: a b f 1 1 1 1
2: a c f 2 1 1 2
3: a d f 3 1 1 3
4: a e f 4 1 1 4
5: b a g 1 1 1 1
6: b c g 2 1 1 2
7: b d g 3 1 1 3
8: b e g 4 1 1 4
9: c a f 2 5 1 1
10: c b f 3 2 2 2
11: c d f 4 2 2 3
12: c e f 5 2 2 4
13: d a i 1 1 1 1
14: d b i 2 1 1 2
15: d c i 3 1 1 3
16: d e i 4 1 1 4
17: e a f 3 6 2 1
18: e b f 4 3 3 2
19: e c f 5 6 2 3
20: e d f 6 3 3 4
But it treats var1 and var2 independently. Which I do not want.
So, your problem is more an algorithm problem so we'll use a loop instead of dplyr or data.table. FOR ME, using loops in R opten means using Rcpp. So this is my answer:
// [[Rcpp::depends(BH)]]
#include <Rcpp.h>
#include <boost/foreach.hpp>
using namespace Rcpp;
// the C-style upper-case macro name is a bit ugly
#define foreach BOOST_FOREACH
// [[Rcpp::export]]
ListOf<IntegerVector> new_vars(const IntegerVector& Var1,
const IntegerVector& Var2,
int n_Var,
ListOf<IntegerVector> ind_groups) {
int nrow = Var1.size();
IntegerVector new_var1a(nrow, NA_INTEGER);
IntegerVector new_var2a(nrow, NA_INTEGER);
for (int i = 0; i < ind_groups.size(); i++) {
IntegerVector counts(n_Var);
foreach(const int& j, ind_groups[i]) {
new_var1a[j] = ++counts[Var1[j]];
new_var2a[j] = ++counts[Var2[j]];
}
}
return List::create(Named("new_var1a") = new_var1a,
Named("new_var2a") = new_var2a);
}
/*** R
x <- expand.grid(letters[1:5],letters[1:5],
KEEP.OUT.ATTRS = FALSE,
stringsAsFactors = FALSE)
x <- x[x[,1]!=x[,2],c(2,1)]
x <- data.frame(x,group=as.character(rep(letters[c(1,2,1,4,1)+5],each=4)))
x <- data.frame(x,new_var1 = c(1,2,3,4,1,2,3,4,2,3,4,5,1,2,3,4,3,4,5,6))
x <- data.frame(x,new_var2 = c(1,1,1,1,1,1,1,1,5,2,2,2,1,1,1,1,6,3,6,3))
getNewVars <- function(x) {
Vars.levels <- unique(c(x$Var2, x$Var1))
new_vars <- new_vars(
Var1 = match(x$Var1, Vars.levels) - 1,
Var2 = match(x$Var2, Vars.levels) - 1,
n_Var = length(Vars.levels),
ind_groups = split(seq_along(x$group) - 1, x$group)
)
cbind(x, new_vars)
}
getNewVars(x)
*/
Put this in a ".cpp" file and source it.
PS: Make sure to use stringsAsFactors = FALSE.
Solution with dplyr, by first casting the data from wide to long format, while keeping the row id to later merge again.
Sample data
df = read.table(text=" Var2 Var1 group new_var2 new_var1
a b f 1 1
a c f 2 1
a d f 3 1
a e f 4 1
b a g 1 1
b c g 2 1
b d g 3 1
b e g 4 1
c a f 2 5
c b f 3 2
c d f 4 2
c e f 5 2
d a i 1 1
d b i 2 1
d c i 3 1
d e i 4 1
e a f 3 6
e b f 4 3
e c f 5 6
e d f 6 3",header=T)
df = df[,c("Var2","Var1","group")]
Code
library(reshape2)
library(dplyr)
df$id = seq(1,nrow(df))
df2 = melt(df, id.vars=c("id", "group")) %>% arrange(id)
df2 = df2 %>% group_by(group,value) %>% mutate(n= row_number())
df = df %>% left_join(df2[df2$variable=="Var1",c("id","n")], by="id")
df = df %>% left_join(df2[df2$variable=="Var2",c("id","n")], by="id")
colnames(df)[colnames(df)=="n.x"]="new_var1"
colnames(df)[colnames(df)=="n.y"]="new_var2"
Optionally add df2 = df2 %>% group_by(group,value,id) %>% mutate(n=max(n)) if a line can contain the same variables (which is not the case in your example).
Output
Var2 Var1 group id new_var1 new_var2
1 a b f 1 1 1
2 a c f 2 1 2
3 a d f 3 1 3
4 a e f 4 1 4
5 b a g 5 1 1
6 b c g 6 1 2
7 b d g 7 1 3
8 b e g 8 1 4
9 c a f 9 5 2
10 c b f 10 2 3
11 c d f 11 2 4
12 c e f 12 2 5
13 d a i 13 1 1
14 d b i 14 1 2
15 d c i 15 1 3
16 d e i 16 1 4
17 e a f 17 6 3
18 e b f 18 3 4
19 e c f 19 6 5
20 e d f 20 3 6
Hope this helps!
The dcast() function from the data.table package allows us to reshape multiple value variables simultaneously. This can be used to avoid the double left join in Florian's answer:
library(data.table)
long <- melt(setDT(x)[, rn := .I], id.vars = c("rn", "group"),
measure.vars = c("Var1", "Var2"), value.name = "Var")[
, variable := rleid(variable)][
order(rn), new_var := rowid(group, Var)][]
dcast(long, rn + group ~ ..., value.var = c("Var", "new_var"))[, rn := NULL][]
group Var_1 Var_2 new_var_1 new_var_2
1: f b a 1 1
2: f c a 1 2
3: f d a 1 3
4: f e a 1 4
5: g a b 1 1
6: g c b 1 2
7: g d b 1 3
8: g e b 1 4
9: f a c 5 2
10: f b c 2 3
11: f d c 2 4
12: f e c 2 5
13: i a d 1 1
14: i b d 1 2
15: i c d 1 3
16: i e d 1 4
17: f a e 6 3
18: f b e 3 4
19: f c e 6 5
20: f d e 3 6
Explanation
setDT(x) coerces x to data.table, then a column with row numbers is added before reshaping from wide to long format. Just to get nicer looking column names from the subsequent dcast(), the variables are renamed (for this [, variable := sub("Var", "", variable)] can be used as alternative to [, variable := rleid(variable)]).
The important step is the numbering of appearances of each Var within each group using rowid() grouped by group and Var.
Now, the result has two value columns. Finally, it is reshaped back from long to wide format again, and the rn column is removed as no longer needed.
Data
x <- expand.grid(letters[1:5], letters[1:5], KEEP.OUT.ATTRS = FALSE)
x <- x[x[, 1] != x[, 2], c(2, 1)]
x <- data.frame(
x,
group = as.character(rep(letters[c(1, 2, 1, 4, 1) + 5], each = 4)),
new_var1 = c(1, 2, 3, 4, 1, 2, 3, 4, 2, 3, 4, 5, 1, 2, 3, 4, 3, 4, 5, 6),
new_var2 = c(1, 1, 1, 1, 1, 1, 1, 1, 5, 2, 2, 2, 1, 1, 1, 1, 6, 3, 6, 3))