Using Reshape to Combine Columns [duplicate] - r

This question already has answers here:
How to reshape data from long to wide format
(14 answers)
Closed 4 years ago.
I have this dataset that I'm trying to melt and combine "Debit" and "Credit" into the same column.
random
Address ID Debit Credit
1 tower1 A1 33 NA
2 happy1 A2 NA 24
3 today2 A3 145 NA
4 yesterday3 A4 122 NA
5 random3 A5 NA 14143
random <- melt(random, id = c("Address", "ID"))
Address ID variable value
1 tower1 A1 Debit 33
2 happy1 A2 Debit NA
3 today2 A3 Debit 145
4 yesterday3 A4 Debit 122
5 random3 A5 Debit NA
6 tower1 A1 Credit NA
7 happy1 A2 Credit 24
8 today2 A3 Credit NA
9 yesterday3 A4 Credit NA
10 random3 A5 Credit 14143
random[!(is.na(random$value)| random$value == ""),] #to remove NA and join them together
I'm wondering if it is possible to achieve my final dataset directly via reshape package?
This is the final dataset I hope to obtain
Address ID variable value
1 tower1 A1 Debit 33
3 today2 A3 Debit 145
4 yesterday3 A4 Debit 122
7 happy1 A2 Credit 24
10 random3 A5 Credit 14143

We can use gather to convert the dataframe into long format and then use na.omit to remove NA rows.
library(tidyverse)
df %>%
gather(key, value, -c(Address, ID)) %>%
na.omit()
# Address ID key value
#1 tower1 A1 Debit 33
#3 today2 A3 Debit 145
#4 yesterday3 A4 Debit 122
#7 happy1 A2 Credit 24
#10 random3 A5 Credit 14143
gather also has na.rm parameter to remove NA rows
df %>% gather(key, value, -c(Address, ID), na.rm = TRUE)
With reshape2 you can add na.rm = TRUE to remove NA rows
library(reshape2)
melt(df, id = c("Address", "ID"), na.rm = TRUE)
# Address ID variable value
#1 tower1 A1 Debit 33
#3 today2 A3 Debit 145
#4 yesterday3 A4 Debit 122
#7 happy1 A2 Credit 24
#10 random3 A5 Credit 14143

Related

R create new column based on data range at a certain time point

I have large data frame (>50 columns). A sample of the relevant columns are here:
tb <- data.frame(RowID=c("A1", "A2", "A3", "A4", "A5", "A6", "A7", "A8", "A9", "A10", "A11", "A12", "A13", "A14", "A15"),
Patient=c("001", "001", "001", "002", "002", "035", "035", "035", "035", "035", "100", "100", "105", "105", "105"),
Time=c(1,2,3,1,2,1,2,3,4,5,1,2,1,2,3),
Value=c(NA,10,23,100,30,10,15,NA,60,56.7,30,51,3,13,77))
I am trying to create a new column (Value_status) that ranks the initial value for each patient as either low or high (Value <50, Value >=50). The Value_status should be carried through to the other rows for that patient.
Here's what I have:
tb %>%
group_by(Patient) %>%
mutate(Value_status = if_else(Time == 1 & Value < 50, "low", "high"))
I thought I had solved it by adding group_by, but it doesn't give the same value for each individual patient as I hoped. I think I need to nest the if_else with more conditions, something like this?
Note: If a patient is missing Value at a time point other than 1, then they can still be grouped according to high/low.
tb %>%
group_by(Patient) %>%
mutate(Value_status = if_else(Time == 1 & Value < 50, "low",
if_else(Time == 1 & >= 50, "high",
if_else(#Apply the value from time point 1#))))
The output I am trying to get should look like this:
It should group patients based on whether or not their baseline values are high
RowID Patient Time Value Value_status
1 A1 001 1 NA <NA>
2 A2 001 2 10.0 <NA>
3 A3 001 3 23.0 <NA>
4 A4 002 1 100.0 high
5 A5 002 2 30.0 high
6 A6 035 1 10.0 low
7 A7 035 2 15.0 low
8 A8 035 3 NA low
9 A9 035 4 60.0 low
10 A10 035 5 56.7 low
11 A11 100 1 30.0 low
12 A12 100 2 51.0 low
13 A13 105 1 3.0 low
14 A14 105 2 13.0 low
15 A15 105 3 77.0 low
Instead of if_else nested, we could use case_when where we can have multiple conditions created, then do a group_by with 'Patient' and fill the 'Value_status' NA elements with the previous non-NA values
library(dplyr)
library(tidyr)
tb %>%
mutate(Value_status = case_when(Time == 1 & Value < 50 ~ "low",
Time == 1 & Value >= 50 ~ "high"
)) %>%
group_by(Patient) %>%
fill(Value_status) %>%
ungroup
-outupt
# A tibble: 15 x 5
RowID Patient Time Value Value_status
<chr> <chr> <dbl> <dbl> <chr>
1 A1 001 1 NA <NA>
2 A2 001 2 10 <NA>
3 A3 001 3 23 <NA>
4 A4 002 1 100 high
5 A5 002 2 30 high
6 A6 035 1 10 low
7 A7 035 2 15 low
8 A8 035 3 NA low
9 A9 035 4 60 low
10 A10 035 5 56.7 low
11 A11 100 1 30 low
12 A12 100 2 51 low
13 A13 105 1 3 low
14 A14 105 2 13 low
15 A15 105 3 77 low
Here a solution with a nested ifelse
tb %>%
mutate(Value_status = ifelse(Time != 1 & Value ==10, "medium",
ifelse(Time == 1 & Value < 50, "low",
ifelse(Time == 1 & Value >= 50, "high", NA)
)
))
Output:
RowID Patient Time Value Value_status
1 A1 001 1 NA <NA>
2 A2 001 2 10 medium
3 A3 001 3 23 <NA>
4 A4 002 1 100 high
5 A5 002 2 30 <NA>
6 A6 035 1 10 low
7 A7 035 2 15 <NA>
8 A8 035 3 NA <NA>
9 A9 035 4 60 <NA>
10 A10 035 5 57 <NA>
11 A11 100 1 30 low
12 A12 100 2 51 <NA>
13 A13 105 1 3 low
14 A14 105 2 13 <NA>
15 A15 105 3 77 <NA>

R spread dataframe [duplicate]

This question already has answers here:
Reshape multiple value columns to wide format
(5 answers)
Closed 2 years ago.
IN R language how to convert
data1 into data2
data1 = fread("
id year cost pf loss
A 2019-02 155 10 41
B 2019-03 165 14 22
B 2019-01 185 34 56
C 2019-02 350 50 0
A 2019-01 310 40 99")
data2 = fread("
id item 2019-01 2019-02 2019-03
A cost 30 155 NA
A pf 40 10 NA
A loss 99 41 NA
B cost 185 NA 160
B pf 34 NA 14
B loss 56 NA 22
C cost NA 350 NA
C pf NA 50 NA
C loss NA 0 NA")
I try to use spread、gather、dplyr、apply..... but .....
First get the data in long format and then get it back in wide.
library(tidyr)
data1 %>%
pivot_longer(cols = cost:loss) %>%
pivot_wider(names_from = year, values_from = value)
Note that gather and spread have been retired and replace by pivot_longer and pivot_wider.
Using data.table :
library(data.table)
dcast(melt(data1, c('id', 'year')), id+variable~year, value.var = 'value')
# id variable 2019-01 2019-02 2019-03
#1: A cost 310 155 NA
#2: A pf 40 10 NA
#3: A loss 99 41 NA
#4: B cost 185 NA 165
#5: B pf 34 NA 14
#6: B loss 56 NA 22
#7: C cost NA 350 NA
#8: C pf NA 50 NA
#9: C loss NA 0 NA

Combine two dataframes same/different names [duplicate]

This question already has answers here:
Combine two data frames by rows (rbind) when they have different sets of columns
(14 answers)
Closed 3 years ago.
I have 2 dataframes, i am trying to combine both the dataframes not only the ones with common names but also with different variable names and tell as NA if respective value not found.
I tried normal rbind but it asks for same column names.
Dataframes:
d1 <- data.frame(a=c('a1','a2','a3'), b = c("a51","a52","a53"), d = c(12,13,14))
d2 <- data.frame(a=c('a4','a5','a6'), g = c("a151","a152","a153"), k = c(122,123,124))
Expected Output:
a b d g k
1 a1 a51 12 <NA> NA
2 a2 a52 13 <NA> NA
3 a3 a53 14 <NA> NA
4 a4 <NA> NA a151 122
5 a5 <NA> NA a152 123
6 a6 <NA> NA a153 124
Here is an option with bind_rows
library(dplyr)
bind_rows(d1, d2)
# a b d g k
#1 a1 a51 12 <NA> NA
#2 a2 a52 13 <NA> NA
#3 a3 a53 14 <NA> NA
#4 a4 <NA> NA a151 122
#5 a5 <NA> NA a152 123
#6 a6 <NA> NA a153 124
Or using rbindlist
library(data.table)
rbindlist(list(d1, d2))

Convert 3 columns dataframe to a matrix - Change the names of similar column

I have a 3 columns data frame, which I try to convert it to a matrix.
Name Act value
Alex C1 100
John C2 100
Matt C3 252
Alex C1 456
Alex C5 234
John C5 456
Tina C2 897
Matt C2 652
Jorge C1 344
Alex C1 34
Matt C3 231
I want to have something like:
C1 C1.1 C1.2 C2 C3 C3.1 C4 C5
Alex 100 456 34 234
John 100 456
Matt 652 252 231
Tina 897
Jorge 344
I mean, since I have similar names that made the same act having different/same score, I want to instead of summing the values and put them in a cell of a matrix, generate a separate column for that act (while having a number at the end of the name).
Thanks
With tidyverse, you can do:
df %>%
group_by(Name) %>%
mutate(Act = make.unique(Act)) %>%
spread(Act, value)
Name C1 C1.1 C1.2 C2 C3 C3.1 C5
<chr> <int> <int> <int> <int> <int> <int> <int>
1 Alex 100 456 34 NA NA NA 234
2 John NA NA NA 100 NA NA 456
3 Jorge 344 NA NA NA NA NA NA
4 Matt NA NA NA 652 252 231 NA
5 Tina NA NA NA 897 NA NA NA

Match in R while disregarding order [duplicate]

This question already has an answer here:
Match Dataframes Excluding Last Non-NA Value and disregarding order
(1 answer)
Closed 5 years ago.
I am trying to do a match in R regardless of the order of the columns.
Basically the problem I am trying to solve is that if all of the values in the columns of df2, from column 2-to the end, are found in df1 (after Partner), then match df1.
Here's the catch: disregard the last non-NA value in each row when doing this match but include it in the final output. So don't take the last non-NA value into account when matching but include it.
After the match, determine if that last non-na value exists in any of the columns with it's respective row.
df1
Partner Col1 Col2 Col3 Col4
A A1 A2 NA NA
B A2 B9 NA NA
C B7 V9 C1 N9
D Q1 Q3 Q4 NA
df2
lift rule1 rule2 rule3
11 A2 A1 A9
10 A1 A3 NA
11 B9 A2 D7
10 Q4 Q1 NA
11 A2 B9 B1
How do I match df1 with df2 so that the following happens:
1) Disregards the order of the columns found in both dataframes.
2) Then determine if the last non-na value exists in the row currently.
Final output:
df3
Partner Col1 Col2 Col3 Col4 lift rule1 rule2 rule3 EXIST?
A A1 A2 NA NA 11 A2 A1 A9 YES
A A1 A2 NA NA 10 A1 A3 NA NOPE
B A2 B9 NA NA 11 B9 A2 D7 YES
B A2 B9 NA NA 11 A2 B9 B1 YES
D Q1 Q3 Q4 NA 10 Q4 Q1 NA YES
I get one more B match than you, but this solution is very close to what you want. You first have to add an id column as we use it to reconstruct the data. Then to perform the match, you first need to melt it with gather from tidyr and use inner_join from dplyr. We then cbind using the ids and the original data.frames.
library(tidyr);library(dplyr)
df1 <- read.table(text="Partner Col1 Col2 Col3 Col4
A A1 A2 NA NA
B A2 B9 NA NA
C B7 V9 C1 N9
D Q1 Q3 Q4 NA",header=TRUE, stringsAsFactors=FALSE)
df2 <- read.table(text="lift rule1 rule2 rule3
11 A2 A1 A9
10 A1 A3 NA
11 B9 A2 D7
10 Q4 Q1 NA
11 A2 B9 B1",header=TRUE, stringsAsFactors=FALSE)
df1 <- cbind(df1_id=1:nrow(df1),df1)
df2 <- cbind(df2_id=1:nrow(df2),df2)
#melt with gather
d11 <- df1 %>% gather(Col, Value,starts_with("C")) #Long
d11 <- d11 %>% na.omit() %>%group_by(df1_id) %>% slice(-n()) #remove last non NA
d22 <- df2 %>% gather(Rule, Value,starts_with("r")) #Long
res <- inner_join(d11,d22)
cbind(df1[res$df1_id,],df2[res$df2_id,])
df1_id Partner Col1 Col2 Col3 Col4 df2_id lift rule1 rule2 rule3
1 1 A A1 A2 <NA> <NA> 2 10 A1 A3 <NA>
1.1 1 A A1 A2 <NA> <NA> 1 11 A2 A1 A9
2 2 B A2 B9 <NA> <NA> 1 11 A2 A1 A9
2.1 2 B A2 B9 <NA> <NA> 5 11 A2 B9 B1
2.2 2 B A2 B9 <NA> <NA> 3 11 B9 A2 D7
4 4 D Q1 Q3 Q4 <NA> 4 10 Q4 Q1 <NA>

Resources