R if row value equals colnames assign 1 else 0 [duplicate]

R if row value equals colnames assign 1 else 0 [duplicate] - r

This question already has answers here:
R - How to one hot encoding a single column while keep other columns still?
(5 answers)
Closed 2 years ago.
original table is like this:
id
food
1
fish
2
egg
2
apple
for each id, should have 1 or 0 value of its food, so the table should look like this:
id
food
fish
egg
apple
1
fish
1
0
0
2
egg
0
1
0
2
apple
0
0
1

A proposition using the dcast() function of the reshape2 package :
df1 <- read.table(header = TRUE, text = "
id food
1 fish
2 egg
2 apple
")
###
df2 <- reshape2::dcast(data = df1,
formula = id+food ~ food,
fun.aggregate = length,
value.var = "food")
df2
#> id food apple egg fish
#> 1 1 fish 0 0 1
#> 2 2 apple 1 0 0
#> 3 2 egg 0 1 0
###
df3 <- reshape2::dcast(data = df1,
formula = id+factor(food, levels=unique(food)) ~
factor(food, levels=unique(food)),
fun.aggregate = length,
value.var = "food")
names(df3) <- c("id", "food", "fish", "egg", "apple")
df3
#> id food fish egg apple
#> 1 1 fish 1 0 0
#> 2 2 egg 0 1 0
#> 3 2 apple 0 0 1
# Created on 2021-01-29 by the reprex package (v0.3.0.9001)
Regards,

Related

Filter for if rows where ID is same but Value column has different value to first occurrence

I am looking for advice on the principle of filtering a dataset in R. I currently have the below code which allows for easy filtering of records where a value in column 'Value' is within the required list that I have created:
ValuesNumber <-
read.table(textConnection("CustomerID Value
1 Ball
1 Cat
2 Ball
2 Ball
3 Dog
4 Ball
4 Blitz"), header=TRUE)
#Filter for required values only
Values_List <- "Ball|Twist|Tester"
ValuesNumberFiltered <- ValuesNumber[grep(Values_List, ValuesNumber$Value
),]
I am looking to amend this so that the below criteria are met:
'CustomerID' appears in the dataset at least twice
The entry in 'Value' column for the second entry does not appear within a list of my choosing.
So for example if working with this dataset:
CustomerID
Value
1
Ball
1
Cat
2
Ball
2
Ball
3
Dog
4
Ball
4
Blitz
I would then like to create a new column entitled 'Y/N' which has:
'1' if the value in all occurrences after the first occurrence does not match my list or
'0' if it does not.
So the output would look like this:
CustomerID
Value
Y/N
1
Ball
0
1
Cat
1
2
Ball
0
2
Ball
0
3
Dog
0
4
Ball
0
4
Blitz
1

tidyverse solution:
library(dplyr)
Values_List <- c("Ball", "Twist", "Tester")
ValuesNumber %>%
group_by(CustomerID) %>%
mutate(`Y/N` = +(n() >= 2 & !(Value %in% Values_List)))
CustomerID Value `Y/N`
1 1 Ball 0
2 1 Cat 1
3 2 Ball 0
4 2 Ball 0
5 3 Dog 0
6 4 Ball 0
7 4 Blitz 1

library(dplyr)
ValuesNumber %>%
group_by(CustomerID) %>%
mutate(`Y/N` = case_when(
row_number() == 1 ~ 0,
grepl(Values_List, Value) ~ 0,
TRUE ~ 1
)) %>%
ungroup()
# # A tibble: 7 × 3
# CustomerID Value `Y/N`
# <int> <chr> <dbl>
# 1 1 Ball 0
# 2 1 Cat 1
# 3 2 Ball 0
# 4 2 Ball 0
# 5 3 Dog 0
# 6 4 Ball 0
# 7 4 Blitz 1

rm(list = ls())
library(tidyverse)
values_number <- read.table(
textConnection("CustomerID Value
1 Ball
1 Cat
2 Ball
2 Ball
3 Dog
4 Ball
4 Blitz"), header = TRUE)
# Filter for required values only
value_list <- c("Ball", "Twist", "Tester")
count_id <- values_number |>
group_by(CustomerID) |>
summarise(count = length(CustomerID)) |> # count the occurance of each customer id
right_join(values_number, by = "CustomerID") |> # combined to the original data
mutate("Y/N" = case_when(
count > 1 & !(Value %in% value_list) ~ 1, # if the occurance of customer id > 1 and
TRUE ~ 0) # the entry did not involved in the list
) # mark as 1, the others mark as 0

Create contingency table that displays the frequency distribution of pairs of variables

I want to create a contingency table that displays the frequency distribution of pairs of variables. Here is an example dataset:
mm <- matrix(0, 5, 6)
df <- data.frame(apply(mm, c(1,2), function(x) sample(c(0,1),1)))
colnames(df) <- c("Horror", "Thriller", "Comedy", "Romantic", "Sci.fi", "gender")
All variables are binary with 1 indicating either the presence of specfic movie type or the male gender. In the end, I would like to have the table that counts the presence of different movie types under specific gender. Something like this:
male female
Horror 1 1
Thriller 1 3
Comedy 2 2
Romantic 0 0
Sci.fi 2 0
I know I can create two tables of different movie types for male and female individually (see TarJae's answer here Create count table under specific condition) and cbind them later but I would like to do it in one chunk of code. How to achieve this in an efficient way?

You could do
sapply(split(df, df$gender), function(x) colSums(x[names(x)!="gender"]))
#> 0 1
#> Horror 1 1
#> Thriller 1 3
#> Comedy 0 0
#> Romantic 0 0
#> Sci.fi 1 3

Here is a solution using dplyr and tidyr:
df %>% pivot_longer(cols = -gender, names_to = "type") %>%
mutate(gender = fct_recode(as.character(gender),Male = "0",Female = "1")) %>%
group_by(gender,type) %>%
summarise(sum = sum(value)) %>%
pivot_wider(names_from = gender,values_from = sum)
Which gives
# A tibble: 5 x 3
type Male Female
<chr> <dbl> <dbl>
1 Comedy 0 1
2 Horror 1 3
3 Romantic 1 1
4 Sci.fi 1 1
5 Thriller 1 1
The second line is optional but allows to get the levels for the variable gender.

Please find below a reprex with an alternative solution using data.table and magrittr (for the pipes), also in one chunk.
Reprex
Your data (I set a seed for reproducibility)
set.seed(452)
mm <- matrix(0, 5, 6)
df <- data.frame(apply(mm, c(1,2), function(x) sample(c(0,1),1)))
colnames(df) <- c("Horror", "Thriller", "Comedy", "Romantic", "Sci.fi", "gender")
df
#> Horror Thriller Comedy Romantic Sci.fi gender
#> 1 0 1 1 0 0 0
#> 2 0 0 0 0 1 0
#> 3 1 0 1 1 0 1
#> 4 0 1 0 0 0 1
#> 5 0 1 0 0 0 1
Code in one chunk
library(data.table)
library(magrittr) # for the pipes!
df %>%
transpose(., keep.names = "rn") %>%
setDT(.) %>%
{.[, .(rn = rn,
male = rowSums(.[,.SD, .SDcols = .[, .SD[.N]] == 1]),
female = rowSums(.[,.SD, .SDcols = .[, .SD[.N]] == 0]))][rn !="gender"]}
Output
#> rn male female
#> 1: Horror 1 0
#> 2: Thriller 2 1
#> 3: Comedy 1 1
#> 4: Romantic 1 0
#> 5: Sci.fi 0 1
Created on 2021-11-25 by the reprex package (v2.0.1)

In R, take sum of multiple variables if combination of values in two other columns are unique

I am trying to expand on the answer to this problem that was solved, Take Sum of a Variable if Combination of Values in Two Other Columns are Unique
but because I am new to stack overflow, I can't comment directly on that post so here is my problem:
I have a dataset like the following but with about 100 columns of binary data as shown in "ani1" and "bni2" columns.
Locations <- c("A","A","A","A","B","B","C","C","D", "D","D")
seasons <- c("2", "2", "3", "4","2","3","1","2","2","4","4")
ani1 <- c(1,1,1,1,0,1,1,1,0,1,0)
bni2 <- c(0,0,1,1,1,1,0,1,0,1,1)
df <- data.frame(Locations, seasons, ani1, bni2)
Locations seasons ani1 bni2
1 A 2 1 0
2 A 2 1 0
3 A 3 1 1
4 A 4 1 1
5 B 2 0 1
6 B 3 1 1
7 C 1 1 0
8 C 2 1 1
9 D 2 0 0
10 D 4 1 1
11 D 4 0 1
I am attempting to sum all the columns based on the location and season, but I want to simplify so I get a total column for column #3 and after for each unique combination of location and season.
The problem is not all the columns have a 1 value for every combination of location and season and they all have different names.
I would like something like this:
Locations seasons ani1 bni2
1 A 2 2 0
2 A 3 1 1
3 A 4 1 1
4 B 2 0 1
5 B 3 1 1
6 C 1 1 0
7 C 2 1 1
8 D 2 0 0
9 D 4 1 2
Here is my attempt using a for loop:
df2 <- 0
for(i in 3:length(df)){
testdf <- data.frame(t(apply(df[1:2], 1, sort)), df[i])
df2 <- aggregate(i~., testdf, FUN=sum)
}
I get the following error:
Error in model.frame.default(formula = i ~ ., data = testdf) :
variable lengths differ (found for 'X1')
Thank you!

You can use dplyr::summarise and across after group_by.
library(dplyr)
df %>%
group_by(Locations, seasons) %>%
summarise(across(starts_with("ani"), ~sum(.x, na.rm = TRUE))) %>%
ungroup()
Another option is to reshape the data to long format using functions from the tidyr package. This avoids the issue of having to select columns 3 onwards.
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = -c(Locations, seasons)) %>%
group_by(Locations, seasons, name) %>%
summarise(Sum = sum(value, na.rm = TRUE)) %>%
ungroup() %>%
pivot_wider(names_from = "name", values_from = "Sum")
Result:
# A tibble: 9 x 4
Locations seasons ani1 ani2
<chr> <int> <int> <int>
1 A 2 2 0
2 A 3 1 1
3 A 4 1 1
4 B 2 0 1
5 B 3 1 1
6 C 1 1 0
7 C 2 1 1
8 D 2 0 0
9 D 4 1 2

R how to create a dictionary of unique values [duplicate]

This question already has answers here:
Split character column into several binary (0/1) columns
(7 answers)
Closed 2 years ago.
I have a column in a dataframe that contains multiple values like this
fruits
1 apple,banana
2 banana,peaches
3 peaches
4 mango
Is there a way to create a dictionary of unique values for fruits which
is will create a new column fruits with values :
fruits = apple,banana,peaches,mango
UPDATE: I need the value as a column and not a list of just unique values . So that I can create a final dataframe that would have the following :
fruits fruit_apple fruit_banana fruit_mango fruit_peacheas
1 apple,banana 1 0 0 0
2 banana,peaches 0 1 0 1
3 peaches 0 0 0 1
4 mango 0 0 1 0

We can do this easily with cSplit_e from splitstackshape
library(splitstackshape)
cSplit_e(df1, "fruits", ",", type = "character", fill = 0)
# fruits fruits_apple fruits_banana fruits_mango fruits_peaches
#1 apple,banana 1 1 0 0
#2 banana,peaches 0 1 0 1
#3 peaches 0 0 0 1
#4 mango 0 0 1 0
data
df1 <- structure(list(fruits = c("apple,banana", "banana,peaches", "peaches",
"mango")), .Names = "fruits", class = "data.frame", row.names = c("1",
"2", "3", "4"))

Do you want the new column to be that concatenated list repeated? Sorry, it's not particularly clear. Assuming that's the case though, and that your data.frame consists of strings not factors;
df <- read.delim(
text="fruits
apple,banana
banana,peaches
peaches
mango",
sep="\n",
header=TRUE,
stringsAsFactors=FALSE)
df
#> fruits
#> 1 apple,banana
#> 2 banana,peaches
#> 3 peaches
#> 4 mango
df$uniquefruits <- paste0(unique(unlist(strsplit(df$fruits, split=","))), collapse=",")
df
#> fruits uniquefruits
#> 1 apple,banana apple,banana,peaches,mango
#> 2 banana,peaches apple,banana,peaches,mango
#> 3 peaches apple,banana,peaches,mango
#> 4 mango apple,banana,peaches,mango
Or do you mean taking only the values from your first fruits column that are not duplicated elsewhere?
Update: Based on comments, I think this is what you're after:
uniquefruits <- unique(unlist(strsplit(df$fruits, split=",")))
uniquefruits
#> [1] "apple" "banana" "peaches" "mango"
df2 <- cbind(df,
sapply(uniquefruits,
function(y) apply(df, 1,
function(x) as.integer(y %in% unlist(strsplit(x, split=","))))))
df2
#> fruits apple banana peaches mango
#> 1 apple,banana 1 1 0 0
#> 2 banana,peaches 0 1 1 0
#> 3 peaches 0 0 1 0
#> 4 mango 0 0 0 1
In theory, you could do this with dplyr but I can't figure out how to automate the column processing for the rowwise mutate (anyone know how?)
library(dplyr)
df %>% rowwise() %>% mutate(apple = as.integer("apple" %in% unlist(strsplit(fruits, ","))),
banana = as.integer("banana" %in% unlist(strsplit(fruits, ","))),
peaches = as.integer("peaches" %in% unlist(strsplit(fruits, ","))),
mango = as.integer("mango" %in% unlist(strsplit(fruits, ","))))
#> Source: local data frame [4 x 5]
#> Groups: <by row>
#>
#> # A tibble: 4 x 5
#> fruits apple banana peaches mango
#> <chr> <int> <int> <int> <int>
#> 1 apple,banana 1 1 0 0
#> 2 banana,peaches 0 1 1 0
#> 3 peaches 0 0 1 0
#> 4 mango 0 0 0 1

with base R:
fruits <- sort(unique(unlist(strsplit(as.character(df$fruits), split=','))))
cols <- as.data.frame(matrix(rep(0, nrow(df)*length(fruits)), ncol=length(fruits)))
names(cols) <- fruits
df <- cbind.data.frame(df, cols)
df <- as.data.frame(t(apply(df, 1, function(x){fruits <- strsplit(x['fruits'], split=','); x[unlist(fruits)] <- 1;x})))
df
fruits apple banana mango peaches
1 apple,banana 1 1 0 0
2 banana,peaches 0 1 0 1
3 peaches 0 0 0 1
4 mango 0 0 1 0

You can use below steps,
1) Just split dataframe by comma using strsplit function.
2) Unlist a split list of vectors into a single vector.
3) Then take unique of list.fruits character vector.
Here is the solution,
# DataFrame of fruits
f <- c("apple,banana","banana,peaches","peaches","mango")
fruits <- as.data.frame(f)
# fruits dataframe
f
#1 apple,banana
#2 banana,peaches
#3 peaches
#4 mango
list.fruits <- unlist(strsplit(f,split=","))
unique.fruits <- unique(list.fruits)
# Result
unique.fruits
[1] "apple" "banana" "peaches" "mango"

Reshape from long to wide and create columns with binary value

I am aware of the spread function in the tidyr package but this is something I am unable to achieve.
I have a data.frame with 2 columns as defined below. I need to transpose the column Subject into binary columns with 1 and 0.
Below is the data frame:
studentInfo <- data.frame(StudentID = c(1,1,1,2,3,3),
Subject = c("Maths", "Science", "English", "Maths", "History", "History"))
> studentInfo
StudentID Subject
1 1 Maths
2 1 Science
3 1 English
4 2 Maths
5 3 History
6 3 History
And the output I am expecting is:
StudentID Maths Science English History
1 1 1 1 1 0
2 2 1 0 0 0
3 3 0 0 0 1
How can I do this with the spread() function or any other function.

Using reshape2 we can dcast from long to wide.
As you only want a binary outcome we can unique the data first
library(reshape2)
si <- unique(studentInfo)
dcast(si, formula = StudentID ~ Subject, fun.aggregate = length)
# StudentID English History Maths Science
#1 1 1 0 1 1
#2 2 0 0 1 0
#3 3 0 1 0 0
Another approach using tidyr and dplyr is
library(tidyr)
library(dplyr)
studentInfo %>%
mutate(yesno = 1) %>%
distinct %>%
spread(Subject, yesno, fill = 0)
# StudentID English History Maths Science
#1 1 1 0 1 1
#2 2 0 0 1 0
#3 3 0 1 0 0
Although I'm not a fan (yet) of tidyr syntax...

We can use table from base R
+(table(studentInfo)!=0)
# Subject
#StudentID English History Maths Science
# 1 1 0 1 1
# 2 0 0 1 0
# 3 0 1 0 0

Using tidyr :
library(tidyr)
studentInfo <- data.frame(
StudentID = c(1,1,1,2,3,3),
Subject = c("Maths", "Science", "English", "Maths", "History", "History"))
pivot_wider(studentInfo,
names_from = "Subject",
values_from = 'Subject',
values_fill = 0,
values_fn = function(x) 1)
#> # A tibble: 3 x 5
#> StudentID Maths Science English History
#> <dbl> <int> <int> <int> <int>
#> 1 1 1 1 1 0
#> 2 2 1 0 0 0
#> 3 3 0 0 0 1
Created on 2019-09-19 by the reprex package (v0.3.0)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R if row value equals colnames assign 1 else 0 [duplicate] - r

Related

Filter for if rows where ID is same but Value column has different value to first occurrence

Create contingency table that displays the frequency distribution of pairs of variables

In R, take sum of multiple variables if combination of values in two other columns are unique

R how to create a dictionary of unique values [duplicate]

Reshape from long to wide and create columns with binary value

Categories

Resources