Repeat dataframe with new column in R - r

I have a dataframe:
my_df <- data.frame(var1 = c(1,2,3,4,5), var2 = c(6,7,8,9,10))
my_df
var1 var2
1 1 6
2 2 7
3 3 8
4 4 9
5 5 10
I also have a vector:
my_vec <- c("a", "b", "c")
I want to repeat the dataframe length(my_vec) times, filling in the values of a new variable with the vector values. Is there a simple way to do this? If possible, i'd like to do this in a dplyr chain. Desired output:
var1 var2 var3
1 1 6 a
2 2 7 a
3 3 8 a
4 4 9 a
5 5 10 a
6 1 6 b
7 2 7 b
8 3 8 b
9 4 9 b
10 5 10 b
11 1 6 c
12 2 7 c
13 3 8 c
14 4 9 c
15 5 10 c

We can use crossing or with expand_grid
library(tidyr)
crossing(my_df, var3 = my_vec)
#expand_grid(my_df, var3 = my_vec)
If the order is important, use arrange
library(dplyr)
crossing(my_df, var3 = my_vec) %>%
arrange(var3)
-output
# A tibble: 15 × 3
var1 var2 var3
<dbl> <dbl> <chr>
1 1 6 a
2 2 7 a
3 3 8 a
4 4 9 a
5 5 10 a
6 1 6 b
7 2 7 b
8 3 8 b
9 4 9 b
10 5 10 b
11 1 6 c
12 2 7 c
13 3 8 c
14 4 9 c
15 5 10 c

Though I don't think this is likely to be the simplest answer in practice, I specifically saw that you wanted a dplyr chain that would solve this, and so I tried to do this without using the pre-existing functions that do this for you.
For your example specifically, you could use this chain with the tibble package functions add_column and add_row
my_df %>%
tibble::add_column(var3 = my_vec[1]) %>%
tibble::add_row(tibble::add_column(my_df, var3 = my_vec[2])) %>%
tibble::add_row(tibble::add_column(my_df, var3 = my_vec[3]))
which directly yields
var1 var2 var3
1 1 6 a
2 2 7 a
3 3 8 a
4 4 9 a
5 5 10 a
6 1 6 b
7 2 7 b
8 3 8 b
9 4 9 b
10 5 10 b
11 1 6 c
12 2 7 c
13 3 8 c
14 4 9 c
15 5 10 c
Though the principle can be extended a bit, it can still be more adaptable for whatever it is you want to apply this to. So I decided to make a function to do it for you.
my_fxn <-
function(frame, yourVector, new.col.name = paste0("var", NCOL(frame) + 1)) {
require(tidyverse)
origcols <- colnames(frame)
for (i in 1:length(yourVector)) {
intermediateFrame <- tibble::add_column(
frame,
temp.name = rep_len(yourVector[[i]], nrow(frame))
)
colnames(intermediateFrame) <- append(origcols, new.col.name)
if (i == 1) {
Frame3 <- intermediateFrame
} else {
Frame3 <- tibble::add_row(Frame3, intermediateFrame)
}
}
return(Frame3)
}
Running my_fxn(my_df, my_vec) should get you the same data frame/table that we got above.
I also experimented with using a for loop outside a function on its own to do this, but decided that it was getting to be overkill. That approach is definitely also possible, though.

Related

get the value of a cell of a dataframe based on the value in one of the columns in R

I have an example of a data frame in which columns "a" and "b" have certain values, and in column "c" the values are 1 or 2. I would like to create column "d" in which the value found in the frame will be located at the index specified in column "c".
x = data.frame(a = c(1:10), b = c(3:12), c = seq(1:2))
x
a b c
1 1 3 1
2 2 4 2
3 3 5 1
4 4 6 2
5 5 7 1
6 6 8 2
7 7 9 1
8 8 10 2
9 9 11 1
10 10 12 2
thus column "d" for the first row will contain the value 1, since the index in column "c" is 1, for the second row d = 4, since the index in column "c" is 2, and so on. I was not helped by the standard indexing in R, it just returns the value of the column c. in what ways can I solve my problem?
You may can create a matrix of row and column numbers to subset values from the dataframe.
x$d <- x[cbind(1:nrow(x), x$c)]
x
# a b c d
#1 1 3 1 1
#2 2 4 2 4
#3 3 5 1 3
#4 4 6 2 6
#5 5 7 1 5
#6 6 8 2 8
#7 7 9 1 7
#8 8 10 2 10
#9 9 11 1 9
#10 10 12 2 12
If the input is tibble, you need to change the tibble to dataframe to use the above answer.
If you don't want to change to dataframe, here is another option using rowwise.
library(dplyr)
x <- tibble(x)
x %>% rowwise() %>% mutate(d = c_across()[c])
By using dplyr::mutate and ifelse,
x %>% mutate(d = ifelse(c == 1, a, b))
a b c d
1 1 3 1 1
2 2 4 2 4
3 3 5 1 3
4 4 6 2 6
5 5 7 1 5
6 6 8 2 8
7 7 9 1 7
8 8 10 2 10
9 9 11 1 9
10 10 12 2 12

Transpose and Merge columns in R [duplicate]

This question already has answers here:
Reshaping data.frame from wide to long format
(8 answers)
Closed 2 years ago.
Quite new to R and I have a dataset in this format:
A B C
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
But I want it in this format:
A 1
A 2
A 3
A 4
A 5
B 1
B 2
B 3
...etc.
Seems like such a simple issue but I need HELP! Thanks
df <- data.frame(
A = 1:5,
B = 1:5,
C = 1:5
)
stack(df)
values ind
1 1 A
2 2 A
3 3 A
4 4 A
5 5 A
6 1 B
7 2 B
8 3 B
9 4 B
10 5 B
11 1 C
12 2 C
13 3 C
14 4 C
15 5 C
Examples using dplyr's gather function:
library(tidyverse)
A <- c(1,2,3,4,5)
B <- c(1,2,3,4,5)
C <- c(1,2,3,4,5)
df <- data.frame(A,B,C)
df %>% gather(key = "key", value = "value")
key value
1 a 1
2 a 2
3 a 3
4 a 4
5 a 5
6 b 1
7 b 2
8 b 3
9 b 4
10 b 5
11 c 1
12 c 2
13 c 3
14 c 4
15 c 5
You can use the package tidyr. This let's you choose, which columns you want to gather in the column "variable".
# if not installed yet
install.packages("tidyr")
library(tidyr)
data <- data.frame(
A = 1:5,
B = 1:5,
C = 1:5
)
data %>% pivot_longer(c(A, B, C), names_to = "variable", values_to = "value")
# Result
variable value
<chr> <int>
1 A 1
2 B 1
3 C 1
4 A 2
5 B 2
6 C 2
7 A 3
8 B 3
9 C 3
10 A 4
11 B 4
12 C 4
13 A 5
14 B 5
15 C 5

Expand dataframe by ID to generate a special column

I have the following dataframe
df<-data.frame("ID"=c("A", "A", "A", "A", "A", "B", "B", "B", "B", "B"),
'A_Frequency'=c(1,2,3,4,5,1,2,3,4,5),
'B_Frequency'=c(1,2,NA,4,6,1,2,5,6,7))
The dataframe appears as follows
ID A_Frequency B_Frequency
1 A 1 1
2 A 2 2
3 A 3 NA
4 A 4 4
5 A 5 6
6 B 1 1
7 B 2 2
8 B 3 5
9 B 4 6
10 B 5 7
I Wish to create a new dataframe df2 from df that looks as follows
ID CFreq
1 A 1
2 A 2
3 A 3
4 A 4
5 A 5
6 A 6
7 B 1
8 B 2
9 B 3
10 B 4
11 B 5
12 B 6
13 B 7
The new dataframe has a column CFreq that takes unique values from A_Frequency, B_Frequency and groups them by ID. Then it ignores the NA values and generates the CFreq column
I have tried dplyr but am unable to get the required response
df2<-df%>%group_by(ID)%>%select(ID, A_Frequency,B_Frequency)%>%
mutate(Cfreq=unique(A_Frequency, B_Frequency))
This yields the following which is quite different
ID A_Frequency B_Frequency Cfreq
<fct> <dbl> <dbl> <dbl>
1 A 1 1 1
2 A 2 2 2
3 A 3 NA 3
4 A 4 4 4
5 A 5 6 5
6 B 1 1 1
7 B 2 2 2
8 B 3 5 3
9 B 4 6 4
10 B 5 7 5
Request someone to help me here
gather function from tidyr package will be helpful here:
library(tidyverse)
df %>%
gather(x, CFreq, -ID) %>%
select(-x) %>%
na.omit() %>%
unique() %>%
arrange(ID, CFreq)
A different tidyverse possibility could be:
df %>%
nest(A_Frequency, B_Frequency, .key = C_Frequency) %>%
mutate(C_Frequency = map(C_Frequency, function(x) unique(x[!is.na(x)]))) %>%
unnest()
ID C_Frequency
1 A 1
2 A 2
3 A 3
4 A 4
5 A 5
9 A 6
10 B 1
11 B 2
12 B 3
13 B 4
14 B 5
18 B 6
19 B 7
Base R approach would be to split the dataframe based on ID and for every list we count the number of unique enteries and create a sequence based on that.
do.call(rbind, lapply(split(df, df$ID), function(x) data.frame(ID = x$ID[1] ,
CFreq = seq_len(length(unique(na.omit(unlist(x[-1]))))))))
# ID CFreq
#A.1 A 1
#A.2 A 2
#A.3 A 3
#A.4 A 4
#A.5 A 5
#A.6 A 6
#B.1 B 1
#B.2 B 2
#B.3 B 3
#B.4 B 4
#B.5 B 5
#B.6 B 6
#B.7 B 7
This will also work when A_Frequency B_Frequency has characters in them or some other random numbers instead of sequential numbers.
In tidyverse we can do
library(tidyverse)
df %>%
group_split(ID) %>%
map_dfr(~ data.frame(ID = .$ID[1],
CFreq= seq_len(length(unique(na.omit(flatten_chr(.[-1])))))))
A data.table option
library(data.table)
cols <- c('A_Frequency', 'B_Frequency')
out <- setDT(df)[, .(CFreq = sort(unique(unlist(.SD)))),
.SDcols = cols,
by = ID]
out
# ID CFreq
# 1: A 1
# 2: A 2
# 3: A 3
# 4: A 4
# 5: A 5
# 6: A 6
# 7: B 1
# 8: B 2
# 9: B 3
#10: B 4
#11: B 5
#12: B 6
#13: B 7

Maintaining order in split-apply-combine problems [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to ddply() without sorting?
I have the following data frame
dd1 = data.frame(cond = c("D","A","C","B","A","B","D","C"), val = c(11,7,9,4,3,0,5,2))
dd1
cond val
1 D 11
2 A 7
3 C 9
4 B 4
5 A 3
6 B 0
7 D 5
8 C 2
and now need to compute cumulative sums respecting the factor level in cond. The results should look like that:
> dd2 = data.frame(cond = c("D","A","C","B","A","B","D","C"), val = c(11,7,9,4,3,0,5,2), cumsum=c(11,7,9,4,10,4,16,11))
> dd2
cond val cumsum
1 D 11 11
2 A 7 7
3 C 9 9
4 B 4 4
5 A 3 10
6 B 0 4
7 D 5 16
8 C 2 11
It is important to receive the result data frame in the same order as the input data frame because there are other variables bound to that.
I tried ddply(dd1, .(cond), summarize, cumsum = cumsum(val)) but it didn't produce the result I expected.
Thanks
Use ave instead.
dd1$cumsum <- ave(dd1$val, dd1$cond, FUN=cumsum)
If doing this by hand is an option then split() and unsplit() with a suitable lapply() inbetween will do this for you.
dds <- split(dd1, dd1$cond)
dds <- lapply(dds, function(x) transform(x, cumsum = cumsum(x$val)))
unsplit(dds, dd1$cond)
The last line gives
> unsplit(dds, dd1$cond)
cond val cumsum
1 D 11 11
2 A 7 7
3 C 9 9
4 B 4 4
5 A 3 10
6 B 0 4
7 D 5 16
8 C 2 11
I separated the three steps, but these could be strung together or placed in a function if you are doing a lot of this.
A data.table solution:
require(data.table)
dt <- data.frame(dd1)
dt[, c.val := cumsum(val),by=cond]
> dt
# cond val c.val
# 1: D 11 11
# 2: A 7 7
# 3: C 9 9
# 4: B 4 4
# 5: A 3 10
# 6: B 0 4
# 7: D 5 16
# 8: C 2 11

Generate combination of data frame and vector

I know expand.grid is to create all combinations of given vectors. But is there a way to generate all combinations of a data frame and a vector by taking each row in the data frame as unique. For instance,
df <- data.frame(a = 1:3, b = 5:7)
c <- 9:10
how to create a new data frame that is the combination of df and c without expanding df:
df.c:
a b c
1 5 9
2 6 9
3 7 9
1 5 10
2 6 10
3 7 10
Thanks!
As for me the simplest way is merge(df, as.data.frame(c))
a b c
1 1 5 9
2 2 6 9
3 3 7 9
4 1 5 10
5 2 6 10
6 3 7 10
This may not scale when your dataframe has more than two columns per row, but you can just use expand.grid on the first column and then merge the second column in.
df <- data.frame(a = 1:3, b = 5:7)
c <- 9:10
combined <- expand.grid(a=df$a, c=c)
combined <- merge(combined, df)
> combined[order(combined$c), ]
a c b
1 1 9 5
3 2 9 6
5 3 9 7
2 1 10 5
4 2 10 6
6 3 10 7
You could also do something like this
do.call(rbind,lapply(9:10, function(x,d) data.frame(d, c=x), d=df)))
# or using rbindlist as a fast alternative to do.call(rbind,list)
library(data.table)
rbindlist(lapply(9:10, function(x,d) data.frame(d, c=x), d=df)))
or
rbindlist(Map(data.frame, c = 9:10, MoreArgs = list(a= 1:3,b=5:7)))
This question is really old but I found one more answer.
Use tidyr's expand_grid().
expand_grid(df, c)
# A tibble: 6 × 3
a b c
<int> <int> <int>
1 1 5 9
2 1 5 10
3 2 6 9
4 2 6 10
5 3 7 9
6 3 7 10

Resources