I have a data set as I've shown below:
data <- tribble(
~book_name, ~clicks, ~type,
"A", 10, "X",
"B", 20, "Y",
"C", 30, "Y",
"A", 10, "Z",
"A", 10, "X",
)
Now, I want to copy and paste the rows if the type is "X". So, my desired data set is something like this:
desired_data <- tribble(
~book_name, ~clicks, ~type,
"A", 10, "X",
"B", 20, "Y",
"C", 30, "Y",
"A", 10, "Z",
"A", 10, "X",
"A", 10, "X",
"A", 10, "X",
)
How to do this?
Filter and bind rows
data_x <- data %>% filter(type == 'X')
desired_data <- bind_rows(data,data_x)
A base R solution. The idea is to prepare the row indices for the desired output. 1:nrow(data) is for all rows. which(data$type == "X") is for the rows you would like to duplicate. By combing these two parts together, we can get the desired output.
data[c(1:nrow(data), which(data$type == "X")), ]
# # A tibble: 7 x 3
# book_name clicks type
# <chr> <dbl> <chr>
# 1 A 10 X
# 2 B 20 Y
# 3 C 30 Y
# 4 A 10 Z
# 5 A 10 X
# 6 A 10 X
# 7 A 10 X
Related
I would like to duplicate each observation based on the count. For example:
If count == 3, duplicate the observation three times but replacing the count with 1 each time.
If count == 1, no changes are required.
# Sample data
df <- tibble(
x = c("A", "C", "C", "B", "C", "A", "A"),
y = c("Y", "N", "Y", "N", "N", "N", "Y"),
count = c(1, 1, 3, 2, 1, 1, 1)
)
# Target output
df <- tibble(
x = c("A", "C", "C", "C", "C", "B", "B", "C", "A", "A"),
y = c("Y", "N", "Y", "Y", "Y", "N", "N", "N", "N", "Y"),
count = (1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
)
Using dplyr and tidyr,
df %>% uncount(count, .remove = F) %>%
mutate(count = ifelse(count==3,1, count))
The output is
x y count
<chr> <chr> <dbl>
1 A Y 1
2 C N 1
3 C Y 1
4 C Y 1
5 C Y 1
6 B N 2
7 B N 2
8 C N 1
9 A N 1
10 A Y 1
I'm relatively new to R, so apologies if this is way off base. But I have a dataset which looks something like this:
#simplified input - actual data has ~20K observations,
#V1 is a categorical variable with 2 options, V3 is a categorical variable with 23 options
df <- tribble(
~V1, ~V2, ~V3,
"A", "a", "Z",
"A", "a", "Y",
"A", "b", "X",
"A", "b", "Z",
"B", "c", "Z",
"B", "a", "Z",
"B", "a", "Y",
"A", "d", "X",
"A", "e", "X",
"A", "f", "X",
"A", "g", "X",
"B", "g", "X",
"B", "h", "X",
"A", "i", "X",
)
And I'm trying to count the distinct values of V2 based on a combination of V1 and V3. In this sample data, "a" can be found in A and B, and can be classified as Z or Y. So the output I'm envisioning would look something like, where the numbers are the distinct count of V2:
The desired output:
df <- tribble(
~V1, ~Z, ~Y, ~X,
"A_only", 1, 0, 5,
"B_only", 1, 0, 1,
"Both_A_and_B", 1, 1, 1
)
I'm honestly at a complete lost on how to do this, so any thoughts would be appreciated.
Updated
The problem Solved!
library(dplyr)
library(tidyr)
df %>%
group_by(V1, V2, V3) %>%
add_count() %>%
pivot_wider(names_from = V3, values_from = n) %>%
group_by(V2) %>%
mutate(V1 = ifelse(length(V2) > 1, "Both_A_and_B",
ifelse(length(V2) == 1 & V1 == "A", "A_only",
"B_only"))) %>%
distinct() %>%
group_by(V1) %>%
summarise(across(Z:X, ~ sum(.x, na.rm = TRUE)))
# A tibble: 3 x 4
V1 Z Y X
<chr> <int> <int> <int>
1 A_only 1 0 5
2 B_only 1 0 1
3 Both_A_and_B 1 1 1
I am trying to extract the rows of a dataframe which present some common data with the rows of a different size dataframe:
df1:
A B C D
a t 4 9
s p 3 7
w d 1 10
df2:
A B C D
a t 3 7
m r 5 8
p m 1 3
g u 5 2
s p 2 6
I am trying to get the rows of df1 accomplishing this conditions:
1. A and B variables must be equal between both dataframes
2. df1$C must belong to the interval (df2$C -5, df2$C +5), so the absolute value of the different between both values must be less than 5.
new_df<-df1[df1$A == df2$A && df1$B == df2$B && (df1$C > (df2$C - 5) && df1$C < (df2$C + 5)), ]
But I am getting this error, because the number of rows of both dataframes are different:
longer object length is not a multiple of shorter object length
I have also tried to use which but I am getting the same error. How can I solve this?
My expected output would be:
new_df
A B C D
a t 4 9
s p 3 7
This is possibly one way (deliberately made more intermediate variables here, it can be shortened). My logic was that A and B matching can be used to join the df's (step1 - resulting in the data frame s1) and then further filter on the numeric conditions (step2 - resulting in the data frame s2):
df1 <- tibble::tribble(
~A, ~B, ~C, ~D,
"a", "t", 4, 9,
"s", "p" , 3, 7,
"w", "d", 1, 10
)
df2 <- tibble::tribble(
~A, ~B, ~C, ~D,
"a", "t", 3 , 7,
"m", "r", 5, 8,
"p", "m", 1 , 3,
"g", "u", 5, 2,
"s", "p", 2 , 6)
new_df<-df1[df1$A == df2$A && df1$B == df2$B && (df1$C > (df2$C - 5) && df1$C < (df2$C + 5)), ]
s1 <- inner_join(df1, df2, by = (c("A", "B")), suffix = c(".from1", ".from2"))
s2 <- s1 %>%
mutate(condition1 = C.from1 > C.from2 - 5,
condition2 = C.from1 < C.from2 + 5) %>%
filter(condition1, condition2) %>%
select(-starts_with("condition"))
Here is a base R solution:
Merging the 2 DF by A and B makes sure that these variables already match and assign it to a new DF.
In this new DF, apply the remaining 2 conditions and delete the last two columns which came from the merge.
df1 <- tibble::tribble(
~A, ~B, ~C, ~D,
"a", "t", 4, 9,
"s", "p" , 3, 7,
"w", "d", 1, 10
)
df2 <- tibble::tribble(
~A, ~B, ~C, ~D,
"a", "t", 3 , 7,
"m", "r", 5, 8,
"p", "m", 1 , 3,
"g", "u", 5, 2,
"s", "p", 2 , 6)
merge(df1, df2, by = c('A', 'B')) -> df3
df3[(df3$C.x > df3$C.y-5) && df3$C.x < (df3$C.y + 5),][,-c(5,6)]
#> A B C.x D.x
#> 1 a t 4 9
#> 2 s p 3 7
I have a data set something like this:
df_A <- tribble(
~product_name, ~position, ~cat_id, ~pr,
"A", 1, 1, "X",
"A", 4, 2, "X",
"A", 3, 3, "X",
"B", 4, 5, NA,
"B", 6, 6, NA,
"C", 3, 1, "Y",
"C", 5, 2, "Y",
"D", 6, 2, "Z",
"D", 4, 8, "Z",
"D", 3, 9, "Z",
)
Now, I want to look up 1 and 2 in the cat_id, and find their position in the position for each product_name. If there is no 1 or 2 in the cat_id, then only these three variable will be returned to NA. Please see my desired data set to get a better understanding:
desired <- tribble(
~product_name, ~position_1, ~position_2, ~pr,
"A", 1, 4, "X",
"B", NA, NA, NA,
"C", 3, 5, "Y",
"D", NA, 6, "Z",
)
How can I get it?
We can filter the rows based on the 'cat_id', then if some of the 'product_name' are missing, use complete to expand the dataset and use pivot_wider to reshape into 'wide' format
library(dplyr)
library(tidyr)
library(stringr)
df_A %>%
filter(cat_id %in% 1:2) %>%
mutate(cat_id = str_c('position_', cat_id)) %>%
complete(product_name = unique(df_A$product_name)) %>%
pivot_wider(names_from = cat_id, values_from = position) %>%
select(-`NA`)
# A tibble: 4 x 4
# product_name pr position_1 position_2
# <chr> <chr> <dbl> <dbl>
#1 A X 1 4
#2 B <NA> NA NA
#3 C Y 3 5
#4 D Z NA 6
Or using reshape/subset from base R
reshape(merge(data.frame(product_name = unique(df_A$product_name)),
subset(df_A, cat_id %in% 1:2), all.x = TRUE),
idvar = c('product_name', 'pr'), direction = 'wide', timevar = 'cat_id')[-5]
Beloe is my test data and code to summarise the tbl table by counting num of positive values. Then add 5 consecutive row counts using rollapply and FUN sum. I am getting NA at rows 1,2 - 5,6,7,8 - 10,11.
NA at 5,6 and 10, 11 is due to the missing next rows which is expected but I don't understand why I am getting NA at rows 1,2 and 7,8. Can some take a look at the code and point my mistake?
library(tidyverse)
library(zoo)
tbl<-tribble(
~z, ~x, ~y,
"x", "a", 2,
"x", "b", 1,
"x", "b", 3,
"y", "c", 3,
"x", "c", 1,
"x", "d", -1,
"x", "q", 2,
"x", "q", 2,
"x", "a", 2,
"x", "s", -1,
"y", "q", -1,
"y", "b", 3,
"x", "c", 3,
"y", "c", -1,
"y", "q", 1,
"y", "w", 2,
"y", "w", -2,
"y", "t", 2,
"y", "t", 1
)
tbl %>%
group_by(z, x) %>%
summarise(xy = sum(y>0, na.rm = T))%>%
mutate(zzz = rollapply(xy, width=5, sum, fill=NA))
Output:
# A tibble: 11 x 4
# Groups: z [2]
z x xy zzz
<chr> <chr> <int> <dbl>
1 x a 2 NA
2 x b 2 NA
3 x c 2 8
4 x d 0 6
5 x q 2 NA
6 x s 0 NA
7 y b 1 NA
8 y c 1 NA
9 y q 1 6
10 y t 2 NA
11 y w 1 NA