R Long to wide with count and sum [duplicate] - r

This question already has an answer here:
R: Reshaping Multiple Columns from Long to Wide
(1 answer)
Closed 2 years ago.
I have a data as below:
#dt
Method ID Source Amt
A 1 X 10
A 1 Y 20
C 1 Z 30
B 2 Y 15
D 2 Z 10
C 3 X 20
D 3 X 20
E 4 Z 10
E 4 Z 10
What I want is:
ID Total_Amt Method_A Method_B Method_C Method_D Method_E Source_X Source_Y Source_Z
1 60 2 0 1 0 0 1 1 1
2 25 0 1 0 1 0 0 1 1
3 40 0 0 1 1 0 2 0 0
4 20 0 0 0 0 2 0 0 2
For the Method and Source columns, I want to calculate the count by their ID and use dcast to transform to wide format and also add up Amt column by ID.
Any Help?

Here's one way using dplyr and tidyr libraries. We first calculate sum of Amt values for each ID, get the data in long format, count number of rows and get it back in wide format.
library(dplyr)
library(tidyr)
df %>%
group_by(ID) %>%
mutate(Amt = sum(Amt)) %>%
pivot_longer(cols = c(Method, Source)) %>%
count(ID, value, Amt, name) %>%
pivot_wider(names_from = c(name, value), values_from = n, values_fill = 0)
# ID Amt Method_A Method_C Source_X Source_Y Source_Z Method_B Method_D Method_E
# <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#1 1 60 2 1 1 1 1 0 0 0
#2 2 25 0 0 0 1 1 1 1 0
#3 3 40 0 1 2 0 0 0 1 0
#4 4 20 0 0 0 0 2 0 0 2

Related

extract duplicate row based on condition across column in R

I'm stuck trying to keep row based on condition in R. I want to keep row of data based on the same condition across a large number of columns. So in the below example I want to keep rows from duplicated rows where hv value '0' at each column.
here is the data frame:
ID A B C
1 001 1 1 1
2 002 0 1 0
3 002 1 0 0
4 003 0 1 1
5 003 1 0 1
6 003 0 0 1
I want get like this:
ID A B C
1 001 1 1 1
2 002 0 0 0
3 003 0 0 1
Any help would be much appreciated, thanks!
Please check this code
# A tibble: 6 × 4
ID A B C
<dbl> <dbl> <dbl> <dbl>
1 1 1 1 1
2 2 0 1 0
3 2 1 0 0
4 3 0 1 1
5 3 1 0 1
6 3 0 0 1
code
data2 <- data %>% group_by(ID) %>%
mutate(across(c('A','B','C'), ~ ifelse(.x==0, 0, NA), .names = 'x{col}')) %>%
fill(xA, xB, xC) %>%
mutate(across(c('xA','xB','xC'), ~ ifelse(is.na(.x), 1, .x))) %>%
ungroup() %>% group_by(ID) %>% slice_tail(n=1)
output
# A tibble: 3 × 7
# Groups: ID [3]
ID A B C xA xB xC
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 1 1 1 1 1
2 2 1 0 0 0 0 0
3 3 0 0 1 0 0 1

Make a new column for every variable and tally [duplicate]

This question already has answers here:
R Split delimited strings in a column and insert as new column (in binary) [duplicate]
(3 answers)
Closed 4 months ago.
I have the following dataframe:
sample name
1 a cobra, tiger, reptile
2 b tiger, spynx
3 c reptile, cobra
4 d sphynx, tiger
5 e cat, dog, tiger
6 f dog, spynx
and what I want to make from that is.
sample cobra tiger spynx reptile cat dog
1 a 1 1 0 1 0 0
2 b 0 1 1 0 0 0
3 c 1 0 0 1 0 0
4 d 0 1 1 0 0 0
5 e 0 1 0 0 1 1
6 f 0 0 1 0 1 1
so basically make a new column out of all the variables that are in the column: name. and put a 1 if a value is present in the df$name and 0 if it is not present.
all <- unique(unlist(strsplit(as.character(df$name), ", ")))
all <- all[!is.na(all)]
for(i in df){
df[i]<- 0 }
this gives me all the variables as 0's, and now I want to match it to the name column, and if it is present make a 1 out of the 0
How would you approach this?
With tidyr and dplyr...
library(tidyr)
library(dplyr, warn = FALSE)
df1 |>
separate_rows(name) |>
group_by(sample, name) |>
summarise(count = n(), .groups = "drop") |>
pivot_wider(names_from = "name", values_from = "count", values_fill = 0)
#> # A tibble: 6 × 8
#> sample cobra reptile tiger spynx sphynx cat dog
#> <chr> <int> <int> <int> <int> <int> <int> <int>
#> 1 a 1 1 1 0 0 0 0
#> 2 b 0 0 1 1 0 0 0
#> 3 c 1 1 0 0 0 0 0
#> 4 d 0 0 1 0 1 0 0
#> 5 e 0 0 1 0 0 1 1
#> 6 f 0 0 0 1 0 0 1
Created on 2022-10-19 with reprex v2.0.2
data
df1 <- data.frame(sample = letters[1:6],
name = c("cobra, tiger, reptile",
"tiger, spynx",
"reptile, cobra",
"sphynx, tiger",
"cat, dog, tiger",
"dog, spynx"))

Add a column that count number of rows until the first 1, by group in R

I have the following dataset:
test_df=data.frame(Group=c(1,1,1,1,2,2),var1=c(1,0,0,1,1,1),var2=c(0,0,1,1,0,0),var3=c(0,1,0,0,0,1))
Group
var1
var2
var3
1
1
0
0
1
0
0
1
1
0
1
0
1
1
1
0
2
1
0
0
2
1
0
1
I want to add 3 columns (out1-3) for var1-3, which count number of rows until the first 1, by Group,
as shown below:
Group
var1
var2
var3
out1
out2
out3
1
1
0
0
1
3
2
1
0
0
1
1
3
2
1
0
1
0
1
3
2
1
1
1
0
1
3
2
2
1
0
0
1
0
2
2
1
0
1
1
0
2
I used this R code, I repeated it for my 3 variables, and my actual dataset contains more than only 3 columns.
But it is not working:
test_var1<-select(test_df,Group,var1 )%>%
group_by(Group) %>%
mutate(out1 = row_number()) %>%
filter(var1 != 0) %>%
slice(1)
df <- data.frame(Group=c(1,1,1,1,2,2),
var1=c(1,0,0,1,1,1),
var2=c(0,0,1,1,0,0),
var3=c(0,1,0,0,0,1))
This works for any number of variables as long as the structure is the same as in the example (i.e. Group + many variables that are 0 or 1)
df %>%
mutate(rownr = row_number()) %>%
pivot_longer(-c(Group, rownr)) %>%
group_by(Group, name) %>%
mutate(out = cumsum(value != 1 & (cumsum(value) < 1)) + 1,
out = ifelse(max(out) > n(), 0, max(out))) %>%
pivot_wider(names_from = c(name, name), values_from = c(value, out)) %>%
select(-rownr)
Returns:
Group value_var1 value_var2 value_var3 out_var1 out_var2 out_var3
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 0 0 1 3 2
2 1 0 0 1 1 3 2
3 1 0 1 0 1 3 2
4 1 1 1 0 1 3 2
5 2 1 0 0 1 0 2
6 2 1 0 1 1 0 2
If you only have 3 "out" variables then you can create three rows as follows
#1- Your dataset
df=data.frame(Group=rep(1,4),var1=c(1,0,0,1),var2=c(0,0,1,1),var3=c(0,1,0,0))
#2- Count the first row number with "1" value
df$out1=min(rownames(df)[which(df$var1==1)])
df$out2=min(rownames(df)[which(df$var2==1)])
df$out3=min(rownames(df)[which(df$var3==1)])
If you have more than 3 columns, then it may be better to create a loop for example
for(i in 1:3){
df[paste("out",i,sep="")]=min(rownames(df)[which(df[,which(colnames(df)==paste("var",i,sep=""))]==1)])
}

Transpose datatable from columns to rows based on a condition in R

I have a dataset which includes data regarding the activities of different people in rows of 15 minutes. Something like this:
Id
Ins
Out
1
1
1
1
0
1
1
1
1
.
0
0
.
1
0
.
0
1
2
1
1
2
1
0
.
0
1
etc
1
1
Here each row corresponds to a 15-minute timeslot, but the time slots referring to different people are placed beneath each other. I want to have the data in the columns "ins" and "out" to be in the same row as their respective id. So for id = 1, the whole row contains the data that is now in the columns "ins" and "out". I tried to use transpose but this obviously transposes the whole table making it very short and extremely wide.
The desired output is
id
type
var1
var2
var3
var4
1
ins
1
0
1
etc
1
out
1
1
1
etc
2
ins
1
1
0
etc
2
out
1
0
1
etc
etc.
You could
group by Id to get the needed column indices with row_number
pivot longer to put Ins and Outs together
pivot wider to get the expected output
data <- read.table(text = '
Id Ins Out
1 1 1
1 0 1
1 1 1
1 0 0
1 1 0
1 0 1
2 1 1
2 1 0
2 0 1',header=T)
library(tidyr)
library(dplyr)
data %>% group_by(Id) %>%
mutate(Var = row_number()) %>%
pivot_longer(cols = c("Ins","Out"), names_to = 'type') %>%
pivot_wider(names_from = Var, names_prefix = 'Var', values_from = value)
#> # A tibble: 4 x 8
#> # Groups: Id [2]
#> Id type Var1 Var2 Var3 Var4 Var5 Var6
#> <int> <chr> <int> <int> <int> <int> <int> <int>
#> 1 1 Ins 1 0 1 0 1 0
#> 2 1 Out 1 1 1 0 0 1
#> 3 2 Ins 1 1 0 NA NA NA
#> 4 2 Out 1 0 1 NA NA NA

Only Use The First Match For Every N Rows

I have a data.frame that looks like this.
Date Number
1 1
2 0
3 1
4 0
5 0
6 1
7 0
8 0
9 1
I would like to create a new column that puts a 1 in the column if it is the first 1 of every 3 rows. Otherwise put a 0. For example, this is how I would like the new data.frame to look
Date Number New
1 1 1
2 0 0
3 1 0
4 0 0
5 0 0
6 1 1
7 0 0
8 0 0
9 1 1
Every three rows we find the first 1 and populate the column otherwise we place a 0. Thank you.
Hmm, at first glance I thought Akrun answer provided me the solution. However, it is not exactly what I am looking for. Here is what #akrun solution provides.
df1 = data.frame(Number = c(1,0,1,0,1,1,1,0,1,0,0,0))
head(df1,9)
Number
1 1
2 0
3 1
4 0
5 1
6 1
7 1
8 0
9 1
Attempt at solution:
df1 %>%
group_by(grp = as.integer(gl(n(), 3, n()))) %>%
mutate(New = +(Number == row_number()))
Number grp New
<dbl> <int> <int>
1 1 1 1
2 0 1 0
3 1 1 0
4 0 2 0
5 1 2 0 #should be a 1
6 1 2 0
7 1 3 1
8 0 3 0
9 1 3 0
As you can see the code misses the one on row 5. I am looking for the first 1 in every chunk. Then everything else should be 0.
Sorry if i was unclear akrn
Edit** Akrun new answer is exactly what I am looking for. Thank you very much
Here is an option to create a grouping column with gl and then do a == with the row_number on the index of matched 1. Here, match will return only the index of the first match.
library(dplyr)
df1 %>%
group_by(grp = as.integer(gl(n(), 3, n()))) %>%
mutate(New = +(row_number() == match(1, Number, nomatch = 0)))
# A tibble: 12 x 3
# Groups: grp [4]
# Number grp New
# <dbl> <int> <int>
# 1 1 1 1
# 2 0 1 0
# 3 1 1 0
# 4 0 2 0
# 5 1 2 1
# 6 1 2 0
# 7 1 3 1
# 8 0 3 0
# 9 1 3 0
#10 0 4 0
#11 0 4 0
#12 0 4 0
Looking at the logic, perhaps you want to check if Number == 1 and that the prior 2 values were both 0. If that is not correct please let me know.
library(dplyr)
df %>%
mutate(New = ifelse(Number == 1 & lag(Number, n = 1L, default = 0) == 0 & lag(Number, n = 2L, default = 0) == 0, 1, 0))
Output
Date Number New
1 1 1 1
2 2 0 0
3 3 1 0
4 4 0 0
5 5 0 0
6 6 1 1
7 7 0 0
8 8 0 0
9 9 1 1
You can replace Number value to 0 except for the 1st occurrence of 1 in each 3 rows.
library(dplyr)
df %>%
group_by(gr = ceiling(row_number()/3)) %>%
mutate(New = replace(Number, -which.max(Number), 0)) %>%
#Or to be safe and specific use
#mutate(New = replace(Number, -which(Number == 1)[1], 0)) %>%
ungroup() %>% select(-gr)
# A tibble: 9 x 3
# Date Number New
# <int> <int> <int>
#1 1 1 1
#2 2 0 0
#3 3 1 0
#4 4 0 0
#5 5 0 0
#6 6 1 1
#7 7 0 0
#8 8 0 0
#9 9 1 1

Resources