This is my sample dataset:
vector1 <-
data.frame(
"name" = "a",
"age" = 10,
"fruit" = c("orange", "cherry", "apple"),
"count" = c(1, 1, 1),
"tag" = c(1, 1, 2)
)
vector2 <-
data.frame(
"name" = "b",
"age" = 33,
"fruit" = c("apple", "mango"),
"count" = c(1, 1),
"tag" = c(2, 2)
)
vector3 <-
data.frame(
"name" = "c",
"age" = 58,
"fruit" = c("cherry", "apple"),
"count" = c(1, 1),
"tag" = c(1, 1)
)
list <- list(vector1, vector2, vector3)
print(list)
This is my test:
default <- c("cherry",
"orange",
"apple",
"mango")
for (num in 1:length(list)) {
#print(list[[num]])
list[[num]] <- rbind(
list[[num]],
data.frame(
"name" = list[[num]]$name,
"age" = list[[num]]$age,
"fruit" = setdiff(default, list[[num]]$fruit),#add missed value
"count" = 0,
"tag" = 1 #not found solutions
)
)
print(paste0("--------------", num, "--------"))
print(list)
}
#print(list)
I'm trying to find which fruit miss in the data frame and the fruit is based on the value of the tag.For example, in the first data frame, there are tags 1 and 2.If the value of tag 1 does not have the default fruit such as apple and banana, the missed default fruit will be added to 0 to the data frame.The expectation format likes the following:
[[1]]
name age fruit count tag
1 a 10 orange 1 1
2 a 10 cherry 1 1
3 a 10 apple 1 2
4 a 10 mango 0 1
5 a 10 apple 0 1
6 a 10 mango 0 2
7 a 10 orange 0 2
8 a 10 cherry 0 2
When I check the process of the loop, I also find that the first loop adds mango 3 times and I don't find the reason why it cannot add the missed value at one time.The overall output likes the following:
[[1]]
name age fruit count tag
1 a 10 orange 1 1
2 a 10 cherry 1 1
3 a 10 apple 1 2
4 a 10 mango 0 1
5 a 10 mango 0 1
6 a 10 mango 0 1
[[2]]
name age fruit count tag
1 b 33 apple 1 2
2 b 33 mango 1 2
3 b 33 cherry 0 1
4 b 33 orange 0 1
[[3]]
name age fruit count tag
1 c 58 cherry 1 1
2 c 58 apple 1 1
3 c 58 orange 0 1
4 c 58 mango 0 1
Does anyone help me and provides simple methods or other ways? Should I use the sqldf function to add 0 value?Is this a simple way to solve my problems?
Consider base R methods --lapply, expand.grid, transform, rbind, aggregate-- that appends all possible fruit and tag options to each dataframe and keeps the max counts.
new_list <- lapply(list, function(df) {
fruit_tag_df <- transform(expand.grid(fruit=c("apple", "cherry", "mango", "orange"),
tag=c(1,2)),
name = df$name[1],
age = df$age[1],
count = 0)
aggregate(.~name + age + fruit + tag, rbind(df, fruit_tag_df), FUN=max)
})
Output
new_list
# [[1]]
# name age fruit tag count
# 1 a 10 apple 1 0
# 2 a 10 cherry 1 1
# 3 a 10 orange 1 1
# 4 a 10 mango 1 0
# 5 a 10 apple 2 1
# 6 a 10 cherry 2 0
# 7 a 10 orange 2 0
# 8 a 10 mango 2 0
# [[2]]
# name age fruit tag count
# 1 b 33 apple 1 0
# 2 b 33 mango 1 0
# 3 b 33 cherry 1 0
# 4 b 33 orange 1 0
# 5 b 33 apple 2 1
# 6 b 33 mango 2 1
# 7 b 33 cherry 2 0
# 8 b 33 orange 2 0
# [[3]]
# name age fruit tag count
# 1 c 58 apple 1 1
# 2 c 58 cherry 1 1
# 3 c 58 mango 1 0
# 4 c 58 orange 1 0
# 5 c 58 apple 2 0
# 6 c 58 cherry 2 0
# 7 c 58 mango 2 0
# 8 c 58 orange 2 0
The OP has requested to complete each data.frame in list so that all combinations of default fruit and tags 1:2 will appear in the result whereby count should be set to 0 for the additional rows. Finally, each data.frame should consist at least of 4 x 2 = 8 rows.
I want to propose two different approaches:
Using lapply() and the CJ() (cross join) function from data.table to return a list.
Combine the separate data.frames in list to one large data.table using rbindlist() and apply the required transformations on the whole data.table.
Using lapply() and CJ()
library(data.table)
lapply(lst, function(x) setDT(x)[
CJ(name = name, age = age, fruit = default, tag = 1:2, unique = TRUE),
on = .(name, age, fruit, tag)][
is.na(count), count := 0][order(-count, tag)]
)
[[1]]
name age fruit count tag
1: a 10 cherry 1 1
2: a 10 orange 1 1
3: a 10 apple 1 2
4: a 10 apple 0 1
5: a 10 mango 0 1
6: a 10 cherry 0 2
7: a 10 mango 0 2
8: a 10 orange 0 2
[[2]]
name age fruit count tag
1: b 33 apple 1 2
2: b 33 mango 1 2
3: b 33 apple 0 1
4: b 33 cherry 0 1
5: b 33 mango 0 1
6: b 33 orange 0 1
7: b 33 cherry 0 2
8: b 33 orange 0 2
[[3]]
name age fruit count tag
1: c 58 apple 1 1
2: c 58 cherry 1 1
3: c 58 mango 0 1
4: c 58 orange 0 1
5: c 58 apple 0 2
6: c 58 cherry 0 2
7: c 58 mango 0 2
8: c 58 orange 0 2
Ordering by count and tag is not required but helps to compare the result with OP's expected output.
Creating on large data.table
Instead of a list of data.frames with identical structure we can use one large data.table where the origin of each row can be identified by an id column.
Indeed, th OP has asked other questions ("using lapply function and list in r"
and "how to loop the dataframe using sqldf?" where he asked for help in handling a list of data.frames. G. Grothendieck already had suggested to rbind the rows together.
The rbindlist() function has the idcol parameter which identifies the origin of each row:
library(data.table)
rbindlist(list, idcol = "df")
df name age fruit count tag
1: 1 a 10 orange 1 1
2: 1 a 10 cherry 1 1
3: 1 a 10 apple 1 2
4: 2 b 33 apple 1 2
5: 2 b 33 mango 1 2
6: 3 c 58 cherry 1 1
7: 3 c 58 apple 1 1
Note that df contains the number of the source data.frame in list (or the names of the list elements if list is named).
Now, we can apply above solution by grouping over df:
rbindlist(list, idcol = "df")[, .SD[
CJ(name = name, age = age, fruit = default, tag = 1:2, unique = TRUE),
on = .(name, age, fruit, tag)], by = df][
is.na(count), count := 0][order(df, -count, tag)]
df name age fruit count tag
1: 1 a 10 cherry 1 1
2: 1 a 10 orange 1 1
3: 1 a 10 apple 1 2
4: 1 a 10 apple 0 1
5: 1 a 10 mango 0 1
6: 1 a 10 cherry 0 2
7: 1 a 10 mango 0 2
8: 1 a 10 orange 0 2
9: 2 b 33 apple 1 2
10: 2 b 33 mango 1 2
11: 2 b 33 apple 0 1
12: 2 b 33 cherry 0 1
13: 2 b 33 mango 0 1
14: 2 b 33 orange 0 1
15: 2 b 33 cherry 0 2
16: 2 b 33 orange 0 2
17: 3 c 58 apple 1 1
18: 3 c 58 cherry 1 1
19: 3 c 58 mango 0 1
20: 3 c 58 orange 0 1
21: 3 c 58 apple 0 2
22: 3 c 58 cherry 0 2
23: 3 c 58 mango 0 2
24: 3 c 58 orange 0 2
df name age fruit count tag
A solution using dplyr and tidyr. We can use complete to expand the data frame and specify the fill values as 0 to count.
Notice that I changed your list name from list to fruit_list because it is a bad practice to use reserved words in R to name an object. Also notice that when I created the example data frame I set stringsAsFactors = FALSE because I don't want to create factor columns. Finally, I used lapply instead of for-loop to loop through the list elements.
library(dplyr)
library(tidyr)
fruit_list2 <- lapply(fruit_list, function(x){
x2 <- x %>%
complete(name, age, fruit = default, tag = c(1, 2), fill = list(count = 0)) %>%
select(name, age, fruit, count, tag) %>%
arrange(tag, fruit) %>%
as.data.frame()
return(x2)
})
fruit_list2
# [[1]]
# name age fruit count tag
# 1 a 10 apple 0 1
# 2 a 10 cherry 1 1
# 3 a 10 mango 0 1
# 4 a 10 orange 1 1
# 5 a 10 apple 1 2
# 6 a 10 cherry 0 2
# 7 a 10 mango 0 2
# 8 a 10 orange 0 2
#
# [[2]]
# name age fruit count tag
# 1 b 33 apple 0 1
# 2 b 33 cherry 0 1
# 3 b 33 mango 0 1
# 4 b 33 orange 0 1
# 5 b 33 apple 1 2
# 6 b 33 cherry 0 2
# 7 b 33 mango 1 2
# 8 b 33 orange 0 2
#
# [[3]]
# name age fruit count tag
# 1 c 58 apple 1 1
# 2 c 58 cherry 1 1
# 3 c 58 mango 0 1
# 4 c 58 orange 0 1
# 5 c 58 apple 0 2
# 6 c 58 cherry 0 2
# 7 c 58 mango 0 2
# 8 c 58 orange 0 2
DATA
vector1 <-
data.frame(
"name" = "a",
"age" = 10,
"fruit" = c("orange", "cherry", "apple"),
"count" = c(1, 1, 1),
"tag" = c(1, 1, 2),
stringsAsFactors = FALSE
)
vector2 <-
data.frame(
"name" = "b",
"age" = 33,
"fruit" = c("apple", "mango"),
"count" = c(1, 1),
"tag" = c(2, 2),
stringsAsFactors = FALSE
)
vector3 <-
data.frame(
"name" = "c",
"age" = 58,
"fruit" = c("cherry", "apple"),
"count" = c(1, 1),
"tag" = c(1, 1),
stringsAsFactors = FALSE
)
fruit_list <- list(vector1, vector2, vector3)
default <- c("cherry", "orange", "apple", "mango")
Related
I have a data like below:
V1 V2
1 orange, apple
2 orange, lemon
3 lemon, apple
4 orange, lemon, apple
5 lemon
6 apple
7 orange
8 lemon, apple
I want to split the V2 variable like this:
I have three categories of the V2 column: "orange", "lemon", "apple"
for each of the categories I want to create a new column (variable) that will inform about whether such a name appeared in V2 (0,1)
I tried this
df %>% separate(V2, into = c("orange", "lemon", "apple"))
.. and I got this result, but it's not what I expect.
V1 orange lemon apple
1 1 orange apple <NA>
2 2 orange lemon <NA>
3 3 lemon apple <NA>
4 4 orange lemon apple
5 5 lemon <NA> <NA>
6 6 apple <NA> <NA>
7 7 orange <NA> <NA>
8 8 lemon apple <NA>
The result I mean is below.
V1 orange lemon apple
1 1 0 1
2 1 1 0
3 0 1 1
4 1 1 0
5 0 1 0
6 0 0 1
7 1 0 0
8 0 1 1
you could try pivoting:
library(dplyr)
library(tidyr)
df |>
separate_rows(V2, sep = ", ") |>
mutate(ind = 1) |>
pivot_wider(names_from = V2,
values_from = ind,
values_fill = 0)
Output is:
# A tibble: 8 × 4
V1 orange apple lemon
<int> <dbl> <dbl> <dbl>
1 1 1 1 0
2 2 1 0 1
3 3 0 1 1
4 4 1 1 1
5 5 0 0 1
6 6 0 1 0
7 7 1 0 0
8 8 0 1 1
data I used:
V1 <- 1:8
V2 <- c("orange, apple", "orange, lemon",
"lemon, apple", "orange, lemon, apple",
"lemon", "apple", "orange",
"lemon, apple")
df <- tibble(V1, V2)
We may use dummy_cols
library(stringr)
library(fastDummies)
library(dplyr)
dummy_cols(df, "V2", split = ",\\s+", remove_selected_columns = TRUE) %>%
rename_with(~ str_remove(.x, '.*_'))
-output
# A tibble: 8 × 4
V1 apple lemon orange
<int> <int> <int> <int>
1 1 1 0 1
2 2 0 1 1
3 3 1 1 0
4 4 1 1 1
5 5 0 1 0
6 6 1 0 0
7 7 0 0 1
8 8 1 1 0
I'm trying to rank the certain groups by their counts using dense_rank, it doesn't make a distinct rank for groups that are tied. And any ranking function I try that has some sort of ties.method doesn't give me the rankings in a consecutive 1,2,3 order. Example:
library(dplyr)
id <- c(rep(1, 8),
rep(2, 8))
fruit <- c(rep('apple', 4), rep('orange', 1), rep('banana', 2), 'orange',
rep('orange', 4), rep('banana', 1), rep('apple', 2), 'banana')
df <- data.frame(id, fruit, stringsAsFactors = FALSE)
df2 <- df %>%
mutate(counter = 1) %>%
group_by(id, fruit) %>%
mutate(fruitCnt = sum(counter)) %>%
ungroup() %>%
group_by(id) %>%
mutate(fruitCntRank = dense_rank(desc(fruitCnt))) %>%
select(id, fruit, fruitCntRank)
df2
id fruit fruitCntRank
1 1 apple 1
2 1 apple 1
3 1 apple 1
4 1 apple 1
5 1 orange 2
6 1 banana 2
7 1 banana 2
8 1 orange 2
9 2 orange 1
10 2 orange 1
11 2 orange 1
12 2 orange 1
13 2 banana 2
14 2 apple 2
15 2 apple 2
16 2 banana 2
It doesn't matter which of orange or banana are ranked 3, and it doesn't even need to be consistent. I just need the groups to be ranked 1, 2, 3.
Desired result:
id fruit fruitCntRank
1 1 apple 1
2 1 apple 1
3 1 apple 1
4 1 apple 1
5 1 orange 2
6 1 banana 3
7 1 banana 3
8 1 orange 2
9 2 orange 1
10 2 orange 1
11 2 orange 1
12 2 orange 1
13 2 banana 2
14 2 apple 3
15 2 apple 3
16 2 banana 2
We can add count for each id and fruit combination, arrange them in descending order of count and get the rank using match.
library(dplyr)
df %>%
add_count(id, fruit) %>%
arrange(id, desc(n)) %>%
group_by(id) %>%
mutate(n = match(fruit, unique(fruit)))
#Another option with cumsum and duplicated
#mutate(n = cumsum(!duplicated(fruit)))
# id fruit n
# <dbl> <chr> <int>
# 1 1 apple 1
# 2 1 apple 1
# 3 1 apple 1
# 4 1 apple 1
# 5 1 orange 2
# 6 1 banana 3
# 7 1 banana 3
# 8 1 orange 2
# 9 2 orange 1
#10 2 orange 1
#11 2 orange 1
#12 2 orange 1
#13 2 banana 2
#14 2 apple 3
#15 2 apple 3
#16 2 banana 2
I would like to create a new column that sequentially labels groups of rows. Original data:
> dt = data.table(index=(1:10), group = c("apple","apple","orange","orange","orange","orange","apple","apple","orange","apple"))
> dt
index group
1: 1 apple
2: 2 apple
3: 3 orange
4: 4 orange
5: 5 orange
6: 6 orange
7: 7 apple
8: 8 apple
9: 9 orange
10: 10 apple
Desired output:
index group id
1: 1 apple 1
2: 2 apple 1
3: 3 orange 1
4: 4 orange 1
5: 5 orange 1
6: 6 orange 1
7: 7 apple 2
8: 8 apple 2
9: 9 orange 2
10: 10 apple 3
dplyr attempt:
dt %>% group_by(group) %>% mutate( id= row_number())
# A tibble: 10 x 3
# Groups: group [2]
index group id
<int> <chr> <int>
1 1 apple 1
2 2 apple 2
3 3 orange 1
4 4 orange 2
5 5 orange 3
6 6 orange 4
7 7 apple 3
8 8 apple 4
9 9 orange 5
10 10 apple 5
How can I edit this to get the first group of apples as 1, then the first group of oranges as 1, then the second group of apples as 2 etc (see desired output above). Also open to data.table solution.
library(data.table)
dt[, id := cumsum(c(TRUE, diff(index) > 1)), by="group"]
dt
# index group id
# 1: 1 apple 1
# 2: 2 apple 1
# 3: 3 orange 1
# 4: 4 orange 1
# 5: 5 orange 1
# 6: 6 orange 1
# 7: 7 apple 2
# 8: 8 apple 2
# 9: 9 orange 2
# 10: 10 apple 3
Starting from original dt:
library(dplyr)
dt %>%
group_by(group) %>%
mutate(id = cumsum(c(TRUE, diff(index) > 1))) %>%
ungroup()
# # A tibble: 10 x 3
# index group id
# <int> <chr> <int>
# 1 1 apple 1
# 2 2 apple 1
# 3 3 orange 1
# 4 4 orange 1
# 5 5 orange 1
# 6 6 orange 1
# 7 7 apple 2
# 8 8 apple 2
# 9 9 orange 2
# 10 10 apple 3
Base R, perhaps a little clunky:
out <- do.call(rbind, by(dt, dt$group,
function(x) transform(x, id = cumsum(c(TRUE, diff(index) > 1)))))
out[order(out$index),]
# index group id
# apple.1 1 apple 1
# apple.2 2 apple 1
# orange.3 3 orange 1
# orange.4 4 orange 1
# orange.5 5 orange 1
# orange.6 6 orange 1
# apple.7 7 apple 2
# apple.8 8 apple 2
# orange.9 9 orange 2
# apple.10 10 apple 3
The names can be removed easily with rownames(out) <- NULL. The order part isn't necessary, but I wanted to present it in the same order as the other solutions, and do.call/by does not preserve the original order.
Another option using data.table::rleid twice:
dt[, gid := rleid(group)][, id := rleid(gid), .(group)]
We can also use rle from base R
with(dt, with(rle(group), rep(ave(seq_along(values),
values, FUN = seq_along), lengths)))
#[1] 1 1 1 1 1 1 2 2 2 3
I have data frame, my goal is finding the pattern of combination var1 by ID, if at least 3 categories the same for each group, we set "Yes", and then which ID have the same combination.
ID1: I have 4 unique categories (A,B,C,D)
ID2: I have 4 unique categories (B,C,D,F)
ID3: I have 3 unique categories (A,B,C)
ID4: I have 2 unique categories (A,B)
ID5: I have 4 unique categories (C,D,F)
We can see ID1, ID2 have at least 3 categories the same (B,C,D), ID1 and ID3 have (A,B,C),and ID2, ID5 have at least 3 the same (C,D,F). So there are 4 ID will have "Yes" only ID4=="No".
ID <- c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,4,4,5,5,5,5,5)
var1 <- c("A","B","C","A","D","D","C","D","B","F","A","B","C","C",
"A","B","D","D","C","C","F")
df <- data.frame(ID,var1)
ID var1
1 1 A
2 1 B
3 1 C
4 1 A
5 1 D
6 2 D
7 2 C
8 2 D
9 2 B
10 2 F
11 3 A
12 3 B
13 3 C
14 3 C
15 4 A
16 4 B
17 5 D
18 5 D
19 5 C
20 5 C
21 5 F
My output will be
ID var1 var2 var3
1 1 A Yes 1-2
2 1 B Yes 1-2
3 1 C Yes 1-2
4 1 A Yes 1-2
5 1 D Yes 1-2
6 2 D Yes 1-2
7 2 C Yes 1-2
8 2 D Yes 1-2
9 2 B Yes 1-2
10 2 F Yes 1-2
11 3 A Yes 1-3
12 3 B Yes 1-3
13 3 C Yes 1-3
14 3 C Yes 1-3
15 4 A No 4
16 4 B No 4
17 5 D Yes 2-5
18 5 D Yes 2-5
19 5 C Yes 2-5
20 5 C Yes 2-5
21 5 F Yes 2-5
Thanks for advance.
The problem is essentially one of constructing an adjacency table based on common memberships, e.g. Working with Bipartite/Affiliation Network Data in R. To do that, we make a table out of the data (after eliminating duplicates), and then take the cross-product.
dd <- unique(df)
tab <- table(dd)
dd <- crossprod(t(tab))
diag(dd) <- 0
# ID
# ID 1 2 3 4 5
# 1 0 3 3 2 2
# 2 3 0 2 1 3
# 3 3 2 0 2 1
# 4 2 1 2 0 0
# 5 2 3 1 0 0
The table above allows us to see the number of categories that IDs share. Now we just have to go through the rows; for each row, I select the first ID that has a value of at least 3 (matched).
matched <- apply(dd >= 3, MAR = 1, function(x) which(x == TRUE)[1])
# 1 2 3 4 5
# 2 1 1 NA 2
So "1" matched with "2", "2" matched with "1", "3" matched with "1", "4" has no matches, "5" matched with "2". Finish off by manipulating this output to get the desired final product:
out <- apply(cbind(as.numeric(names(matched)), matched), MAR = 1, function(x) {
if (any(is.na(x))) {
data.frame(var2 = "No", var3 = x[1])
} else {
data.frame(var2 = "Yes", var3 = paste(sort(x), collapse = "-"))
}
})
out <- plyr::ldply(out, .id = "ID")
merge(df, out, all.x = TRUE)
# ID var1 var2 var3
# 1 1 A Yes 1-2
# 2 1 B Yes 1-2
# 3 1 C Yes 1-2
# 4 1 A Yes 1-2
# 5 1 D Yes 1-2
# 6 2 D Yes 1-2
# 7 2 C Yes 1-2
# 8 2 D Yes 1-2
# 9 2 B Yes 1-2
# 10 2 F Yes 1-2
# 11 3 A Yes 1-3
# 12 3 B Yes 1-3
# 13 3 C Yes 1-3
# 14 3 C Yes 1-3
# 15 4 A No 4
# 16 4 B No 4
# 17 5 D Yes 2-5
# 18 5 D Yes 2-5
# 19 5 C Yes 2-5
# 20 5 C Yes 2-5
# 21 5 F Yes 2-5
This question already has answers here:
Combine Multiple Columns Into Tidy Data [duplicate]
(3 answers)
Closed 5 years ago.
I have a data like the following and I would like to convert it into long format.
id count a1 b1 c1 a2 b2 c2 a3 b3 c3 age
1 1 apple 2 3 orange 3 2 beer 2 1 50
1 2 orange 3 2 apple 2 2 beer 2 1 50
2 1 pear 3 2 apple 2 2 orange 2 2 45
[a1,b1,c1],[a2,b2,c2],[a3,b3,c3] are the set of three attributes that person with an assigned id is facing and this person may face multiple choice situations with count indicating the ith choice situation. I want to change it back to a long format while keep the other variables like the following:
id count a b c age
1 1 apple 2 3 50
1 1 orange 3 2 50
1 1 beer 2 1 50
1 2 orange 3 2 50
1 2 apple 2 2 50
1 2 beer 2 1 50
2 1 pear 3 2 45
2 1 apple 2 2 45
2 1 orange 2 2 45
I have tried reshape with the following commands, but I get confused in terms of where to deal with timevar and times:
l <- reshape(df,
varying = df[,3:11],
v.names = c("a","b","c"),
timevar = "choice",
times = c("a","b","c"),
direction = "long")
with the above commands, I cannot the result I want, would sincerely appreciate any help!
Use the melt function from data.table package:
library(data.table)
setDT(df)
melt(df, id.vars = c('id', 'count', 'age'),
measure = patterns('a\\d', 'b\\d', 'c\\d'),
# this needs to be regular expression to group `a1, a2, a3` etc together and
# the `\\d` is necessary because you have an age variable in the column.
value.name = c('a', 'b', 'c'))[, variable := NULL][order(id, count, -age)]
# id count age a b c
# 1: 1 1 50 apple 2 3
# 2: 1 1 50 orange 3 2
# 3: 1 1 50 beer 2 1
# 4: 1 2 50 orange 3 2
# 5: 1 2 50 apple 2 2
# 6: 1 2 50 beer 2 1
# 7: 2 1 45 pear 3 2
# 8: 2 1 45 apple 2 2
# 9: 2 1 45 orange 2 2
To use the reshape function, you just have to adjust the varying argument. It can be a list and you want to put the variables that will make up the same column together as vectors in a list:
reshape(df,
idvar=c("id", "count", "age"),
varying = list(c(3,6,9), c(4,7,10), c(5,8,11)),
timevar="time",
v.names=c("a", "b", "c"),
direction = "long")
This returns
id count age time a b c
1.1.50.1 1 1 50 1 apple 2 3
1.2.50.1 1 2 50 1 orange 3 2
2.1.45.1 2 1 45 1 pear 3 2
1.1.50.2 1 1 50 2 orange 3 2
1.2.50.2 1 2 50 2 apple 2 2
2.1.45.2 2 1 45 2 apple 2 2
1.1.50.3 1 1 50 3 beer 2 1
1.2.50.3 1 2 50 3 beer 2 1
2.1.45.3 2 1 45 3 orange 2 2
I also added in the idvars as I think this is usually good practice for others or for re-reading your old code.
data
df <- read.table(header=T, text="id count a1 b1 c1 a2 b2 c2 a3 b3 c3 age
1 1 apple 2 3 orange 3 2 beer 2 1 50
1 2 orange 3 2 apple 2 2 beer 2 1 50
2 1 pear 3 2 apple 2 2 orange 2 2 45")
We can use dplyr/tidyr
library(dplyr)
library(tidyr)
gather(df1, Var, Val, a1:c3) %>%
extract(Var, into = c("Var1", "Var2"), "(.)(.)") %>%
spread(Var1, Val) %>%
select(-Var2)
# id count age a b c
#1 1 1 50 apple 2 3
#2 1 1 50 orange 3 2
#3 1 1 50 beer 2 1
#4 1 2 50 orange 3 2
#5 1 2 50 apple 2 2
#6 1 2 50 beer 2 1
#7 2 1 45 pear 3 2
#8 2 1 45 apple 2 2
#9 2 1 45 orange 2 2