Replace Value in column based on another column With R [duplicate] - r

This question already has answers here:
Update a Value in One Column Based on Criteria in Other Columns
(4 answers)
Closed 2 years ago.
I'm trying to replace the value of a column based on the data in a different column, but it's not working. Here's some example data.
df <- data.frame(Col1 = 1:10,
Col2 = c("a", "a", "a", "b", "b", "c", "c", "d", "d", "d"),
Col3 = c("11%", "12%", "13%", "14%", "15%", "16%", "17%", "18%", "19%", "20%"))
If the value of Col2 is b, I need to change the value of Col3 to NA or 0 (NA is more accurate, but for what I'm doing, a 0 will also work). Column 3 is percents, I know I used strings here.
I tried doing this a few ways, most recently of which is the loop I have listed below. I'm open to any solution on this though. Is my loop not working because I'm not defining a pattern?
for(i in df){
if(df$Col2 == "b"){
str_replace(df$Col3, replacement = NA)
}
}
print(df)

Here's a base R solution:
df$Col3[df$Col2 == 'b'] <- NA
Here's a dplyr/tidyverse solution:
library(dplyr)
df %>% mutate(Col3 = ifelse(Col2 == 'b',NA_character_,Col3))
(Original, but less efficient case_when solution)
df %>%
mutate(Col3 = case_when(Col2 == 'b' ~ NA_character_,
TRUE ~ Col3))
This gives us:
Col1 Col2 Col3
1 1 a 11%
2 2 a 12%
3 3 a 13%
4 4 b <NA>
5 5 b <NA>
6 6 c 16%
7 7 c 17%
8 8 d 18%
9 9 d 19%
10 10 d 20%

A base dplyr solution, using ifelse() instead of case_when():
library(dplyr)
df <- data.frame(Col1 = 1:10,
Col2 = c("a", "a", "a", "b", "b", "c", "c", "d", "d", "d"),
Col3 = seq(.11, .2, by = .1))
df %>%
mutate(Col3 = ifelse(Col2 == 'b', NA, Col2))

pkpto39,
Try this:
library('tidyverse')
df <- data.frame(Col1 = 1:10,
Col2 = c("a", "a", "a", "b", "b", "c", "c", "d", "d", "d")
Col3 = c("11%", "12%", "13%", "14%", "15%", "16%", "17%", "18%", "19%", "20%"), stringsAsFactors = FALSE)
df <- df %>% mutate(Col3 = ifelse(Col2 == "b", NA, Col3))

Related

Conditionally counting distinct number of items in one column based on other columns and rows

I'm relatively new to R, so apologies if this is way off base. But I have a dataset which looks something like this:
#simplified input - actual data has ~20K observations,
#V1 is a categorical variable with 2 options, V3 is a categorical variable with 23 options
df <- tribble(
~V1, ~V2, ~V3,
"A", "a", "Z",
"A", "a", "Y",
"A", "b", "X",
"A", "b", "Z",
"B", "c", "Z",
"B", "a", "Z",
"B", "a", "Y",
"A", "d", "X",
"A", "e", "X",
"A", "f", "X",
"A", "g", "X",
"B", "g", "X",
"B", "h", "X",
"A", "i", "X",
)
And I'm trying to count the distinct values of V2 based on a combination of V1 and V3. In this sample data, "a" can be found in A and B, and can be classified as Z or Y. So the output I'm envisioning would look something like, where the numbers are the distinct count of V2:
The desired output:
df <- tribble(
~V1, ~Z, ~Y, ~X,
"A_only", 1, 0, 5,
"B_only", 1, 0, 1,
"Both_A_and_B", 1, 1, 1
)
I'm honestly at a complete lost on how to do this, so any thoughts would be appreciated.
Updated
The problem Solved!
library(dplyr)
library(tidyr)
df %>%
group_by(V1, V2, V3) %>%
add_count() %>%
pivot_wider(names_from = V3, values_from = n) %>%
group_by(V2) %>%
mutate(V1 = ifelse(length(V2) > 1, "Both_A_and_B",
ifelse(length(V2) == 1 & V1 == "A", "A_only",
"B_only"))) %>%
distinct() %>%
group_by(V1) %>%
summarise(across(Z:X, ~ sum(.x, na.rm = TRUE)))
# A tibble: 3 x 4
V1 Z Y X
<chr> <int> <int> <int>
1 A_only 1 0 5
2 B_only 1 0 1
3 Both_A_and_B 1 1 1

How to consolidate the less relevant rows of my df in R?

This is my df:
col1 col2
1 a 20%
2 b 20%
3 c 15%
4 d 10%
5 e 9%
6 f 8%
7 g 7%
8 h 6%
9 h 5%
And I would like to have something like this
col1 col2
1 a 20%
2 b 20%
3 c 15%
4 d 10%
5 other 35%
I tried using dplyr to solve this, but I got no success.
You can use fct_collapse() from the forcats package.
library(tidyverse)
df <- tribble(
~col1, ~col2,
"a", 20,
"b", 20,
"c", 15,
"d", 10,
"e", 9,
"f", 8,
"g", 7,
"h", 6,
"h", 5
)
df$col1 <- fct_collapse(
df$col1,
a = "a",
b = "b",
c = "c",
d = "d",
other_level = "other")
df %>%
group_by(col1) %>%
summarise(col2 = sum(col2))

How to delete all the duplicates row based on two columns?

I have a data frame where I want to delete duplicates rows, but I want to delete them only if a value from another column is the same for all the rows. (To be more clear I want to delete the duplicates rows which have the same "Number" value for all rows)
There is a example of my data frame :
df <- data.frame("Name" = c("a", "a", "b", "b", "b", "c", "c", "c"),
"Number" = c(1, 1, 1, 2, 3, 4, 5, 5), stringsAsFactors = FALSE)
And the result I expect is :
result <- data.frame("Name" = c("b", "b", "b", "c", "c", "c"),
"Number" = c(1, 2, 3, 4, 5, 5), stringsAsFactors = FALSE)
We can group_by Name and remove groups which have more than 1 row and have only one distinct value.
library(dplyr)
df %>%
group_by(Name) %>%
filter(!(n_distinct(Number) == 1 & n() > 1))
# Name Number
# <chr> <dbl>
#1 b 2
#2 b 2
#3 b 3
and using base R ave, the same logic can be written as
df[with(df, !as.logical(ave(Number, Name, FUN = function(x)
length(unique(x)) == 1 & length(x) > 1))), ]
Here is a solution with data.table
library("data.table")
df <- data.table("Name" = c("a", "a", "b", "b", "b"),
"Number" = c(1, 1, 2, 2, 3))
df[, if (uniqueN(Number)!=1 || .N==1) .SD, Name]
and here is a solution with base R:
df <- data.frame("Name" = c("a", "a", "b", "b", "b"),
"Number" = c(1, 1, 2, 2, 3), stringsAsFactors = FALSE)
df[as.logical(ave(df$Number, df$Name, FUN=function(x) length(unique(x))!=1 || length(x)==1)),]
We can use data.table methods
library(data.table)
setDT(df)[, .SD[uniqueN(Number) > 1] , Name]
# Name Number
#1: b 1
#2: b 2
#3: b 3
#4: c 4
#5: c 5
#6: c 5

Conditional search, match and replace values between data frames

I have two dataframes as shown below. I would like to replace text (cells) in dataframe 1 with corresponding values taken from dataframe 2 when there is a match. I have tried to give a simple example below.
I have some limited experience with R but cant think of an easy solution right away. Any help/suggestions will be much appreciated.
input_1 = data.frame(col1 = c("ex1", "ex2", "ex3", "ex4"),
col2 = c("A", "B", "C", "D"),
col3 = c("B", "E", "F", "D"))
input_2 = data.frame(colx = c("A", "B", "C", "D", "E", "F"),
coly = c(1, 2, 3, 4, 5, 6))
output = data.frame(col1 = c("ex1", "ex2", "ex3", "ex4"),
col2 = c(1, 2, 3, 4),
col3 = c(2, 5, 6, 4))
Here's a tidyverse solution :
library(tidyverse)
mutate_at(input_1, -1, ~deframe(input_2)[as.character(.)])
# col1 col2 col3
# 1 ex1 1 2
# 2 ex2 2 5
# 3 ex3 3 6
# 4 ex4 4 4
deframe builds a named vector from a data frame, more convenient in this case.
as.character is necessary as you have factor columns
Example using tidyverse. My solution involved merging twice to input_2, but matching different columns. The last pipe cleans the data frame and renames the columns.
library(tidyverse)
input_1 = data.frame(col1 = c("ex1", "ex2", "ex3", "ex4"),
col2 = c("A", "B", "C", "D"),
col3 = c("B", "E", "F", "D"))
input_2 = data.frame(colx = c("A", "B", "C", "D", "E", "F"),
coly = c(1, 2, 3, 4, 5, 6))
output = data.frame(col1 = c("ex1", "ex2", "ex3", "ex4"),
col2 = c(1, 2, 3, 4),
col3 = c(2, 5, 6, 4))
input_1 %>% inner_join(input_2, by = c("col2" = "colx")) %>%
inner_join(input_2, by = c("col3" = "colx")) %>%
select(col1, coly.x, coly.y) %>%
magrittr::set_colnames(c("col1", "col2", "col3"))
One approach using base R would be to loop over columns where we want to change values using lapply, match the values with input_2$colx and get the corresponding coly value.
input_1[-1] <- lapply(input_1[-1], function(x) input_2$coly[match(x, input_2$colx)])
input_1
# col1 col2 col3
#1 ex1 1 2
#2 ex2 2 5
#3 ex3 3 6
#4 ex4 4 4
Actually, you could go away without using lapply, you could directly unlist the values and match
input_1[-1] <- input_2$coly[match(unlist(input_1[-1]), input_2$colx)]

How to remove duplicate pair-wise columns [duplicate]

This question already has an answer here:
Select equivalent rows [A-B & B-A] [duplicate]
(1 answer)
Closed 7 years ago.
Consider the following dataframe:
df <- data.frame(V1 = c("A", "A", "B", "B", "C", "C"),
V2 = c("B", "C", "A", "C", "A", "B"),
n = c(1, 3, 1, 2, 3, 2))
How can I remove duplicate pair-wise columns so that the output looks like:
# V1 V2 n
#1 A B 1
#2 A C 3
#3 B C 2
I tried unique() and duplicated() to no avail.
Not sure if this is the simplest way of doing it (transposing can be computationally expensive) but this would work with your data frame:
df <- data.frame(V1 = c("A", "A", "B", "B", "C", "C"),
V2 = c("B", "C", "A", "C", "A", "B"),
n = c(1, 3, 1, 2, 3, 2))
First, sort the data frame row-wise, so your value-pairs become true duplicates.
df <- data.frame(t(apply(df, 1, sort)))
Then you can just apply the unique function.
df <- unique(df)
If your column names and order are important, you'll have to re-establish those.
names(df) <- c("n", "V1", "V2")
df <- df[, c("V1", "V2", "n")]
Another option would be to reshape (xtabs(n~..)) the dataset ('df') to wide format, set the lower triangular matrix to 0, and remove the rows with "Freq" equal to 0.
m1 <- xtabs(n~V1+V2, df)
m1[lower.tri(m1)] <- 0
subset(as.data.frame(m1), Freq!=0)
# V1 V2 Freq
#4 A B 1
#7 A C 3
#8 B C 2

Resources