Separate character string variable into several variables [duplicate] - r

This question already has answers here:
Split character column into several binary (0/1) columns
(7 answers)
Closed 2 years ago.
I have data (a column in a dataframe) of type character. I want to separate these characters and, depending on the content, fill separate variables with 0s and 1s.
The column can be recreated with:
df <- data.frame(var = c("1;2", NA, "1;2;3;4;5", "3;5", "1", "1;4", "3", NA, "4", "1;5"))
For example, the characters can range from 1 to 5. I want to create six variables:
var_1, var_2, var_3, var_4, var_5, and var_NA. I want var_1 to contain a 1 if that row has a 1 within the character string, and 0 if it does not.
Thank you!

Perhaps, using cSplit_e would be an option
library(splitstackshape)
library(dplyr)
cSplit_e(df, 'var', sep=";", type = 'character', fill = 0, drop = TRUE)%>%
mutate(var_NA = +(is.na(df$var)))
# var_1 var_2 var_3 var_4 var_5 var_NA
#1 1 1 0 0 0 0
#2 0 0 0 0 0 1
#3 1 1 1 1 1 0
#4 0 0 1 0 1 0
#5 1 0 0 0 0 0
#6 1 0 0 1 0 0
#7 0 0 1 0 0 0
#8 0 0 0 0 0 1
#9 0 0 0 1 0 0
#10 1 0 0 0 1 0
Or using base R
t(sapply(strsplit(df$var, "[:;]"), function(x) +(1:5 %in% x)))

In tidyverse , we can get the data in long format by splitting on ";", create a column with "var", change all values to 1 and get the data in wide format.
library(dplyr)
library(tidyr)
df %>%
mutate(row = row_number()) %>%
separate_rows(var, sep = ";") %>%
mutate(col = paste0('var_', var),
var = 1) %>%
pivot_wider(names_from = col, values_from = var, values_fill = 0) %>%
ungroup %>%
select(-row)
# A tibble: 10 x 6
# var_1 var_2 var_NA var_3 var_4 var_5
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 1 0 0 0 0
# 2 0 0 1 0 0 0
# 3 1 1 0 1 1 1
# 4 0 0 0 1 0 1
# 5 1 0 0 0 0 0
# 6 1 0 0 0 1 0
# 7 0 0 0 1 0 0
# 8 0 0 1 0 0 0
# 9 0 0 0 0 1 0
#10 1 0 0 0 0 1

Related

Change multiple values in a dataframe based on two other values

If anyone mind lending some knowledge... What I am trying to do is make a new dataframe based on the below data frame values.
id value
ant 10
cat 4
cat 6
dog 5
dog 3
dog 2
fly 9
What I want to do next is, in sequential order I want to make a dataframe that looks like the following.
Every time we see a new id, we create a column. The max value is 10 so there should be 10 rows.
Our first word is ant and so therefore for every row of ant, I would like a 0.
Our next column is cat. We have 2 values and what I would like to do is for the first value we see, the first 4 rows must be 0 which is followed by 6 rows of 1.
Same logic for dog, with first five rows as 0 and next three rows as 1 and last 2 as 0.
Fly has only 9 rows of 0 and the last row should contain NA.
It should look like this
ant cat dog fly
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 1 0 0
0 1 1 0
0 1 1 0
0 1 1 0
0 1 0 0
0 1 0 NA
I know how to do this the long way by
newdf <- data.frame(matrix(2, ncol = length(unique(df[,"id"])) , nrow = 10))
newdf$X1[1:10] <- 0
newdf$X2[1:4] <- 0
newdf$X2[5:10] <- 1
...
However, is there any way to do this more efficiently? Note that my actual data will have roughly 50 rows so that's why I am looking for a more efficient way to complete this!
Here's a tidyverse answer -
library(dplyr)
library(tidyr)
df %>%
group_by(id) %>%
mutate(val = rep(c(0, 1), length.out = n())) %>%
uncount(value) %>%
mutate(row = row_number()) %>%
complete(row = 1:10) %>%
pivot_wider(names_from = id, values_from = val) %>%
select(-row)
# ant cat dog fly
# <dbl> <dbl> <dbl> <dbl>
# 1 0 0 0 0
# 2 0 0 0 0
# 3 0 0 0 0
# 4 0 0 0 0
# 5 0 1 0 0
# 6 0 1 1 0
# 7 0 1 1 0
# 8 0 1 1 0
# 9 0 1 0 0
#10 0 1 0 NA
For each id we assign an alternate 0, 1 value and use uncount to repeat the rows based on the count. Get the data in wide format so that we have a separate column for each id.
data
df <- structure(list(id = c("ant", "cat", "cat", "dog", "dog", "dog",
"fly"), value = c(10, 4, 6, 5, 3, 2, 9)), row.names = c(NA, -7L
), class = "data.frame")
You can try the following base R code
maxlen <- with(df, max(tapply(value, id, sum)))
list2DF(
lapply(
with(df, split(value, id)),
function(x) {
`length<-`(
rep(rep(c(0, 1), length.out = length(x)), x),
maxlen
)
}
)
)
which gives
ant cat dog fly
1 0 0 0 0
2 0 0 0 0
3 0 0 0 0
4 0 0 0 0
5 0 1 0 0
6 0 1 1 0
7 0 1 1 0
8 0 1 1 0
9 0 1 0 0
10 0 1 0 NA

Create a new dataframe with the all possible combinations

Having a dataframe like this:
data.frame(previous = c(1,2,2,1,3,3), next = c(1,1,2,3,1,3), id = c(1,2,3,4,5,6))
How is it possible to exatract a data frame which will check the previous and next columns and create 9 new columns which will have 1 only if the combination of previous and next exist. Example if previous if 2 and next 1 the combination is 2 1 and receives one.
Example of expected output:
data.frame(previous = c(1,2,2,1,3,3), next = c(1,1,2,3,1,3),
col1_1 = c(1,0,0,0,0,0),
col1_2 = c(0,0,0,0,0,0),
col1_3 = c(0,0,0,1,0,0),
col2_1 = c(0,1,0,0,0,0),
col2_2 = c(0,0,1,0,0,0),
col2_3 = c(0,0,0,0,0,0),
col3_1 = c(0,0,0,0,1,0),
col3_2 = c(0,0,0,0,0,0),
col3_3 = c(0,0,0,0,0,1), id = c(1,2,3,4,5,6))
You could use expand.grid to get all the combinations.
Assuming your data frame is called df and the column next is actually called next. to avoid clashing with the keyword next:
as.data.frame(apply(expand.grid(1:3, 1:3), 1, function(x) {
as.numeric(x[1] == df$previous & x[2] == df$next.)}))
#> V1 V2 V3 V4 V5 V6 V7 V8 V9
#> 1 1 0 0 0 0 0 0 0 0
#> 2 0 1 0 0 0 0 0 0 0
#> 3 0 0 0 0 1 0 0 0 0
#> 4 0 0 0 0 0 0 1 0 0
#> 5 0 0 1 0 0 0 0 0 0
#> 6 0 0 0 0 0 0 0 0 1
An step by step approach might be the following one. I have changed the next column name for next1 to avoid problems:
AllComb<-expand.grid(unique(df$previous),unique(df$next1))# Creating all possible combinations
myframe <- matrix(rep(0,nrow(AllComb)*nrow(df)),ncol=nrow(AllComb),nrow =nrow(df))
colnames(myframe)<-paste("col_",AllComb$Var1,"_",AllComb$Var2, sep ="")
for(id_row in 1:ncol(df)){
myvec <- df[id_row,]
Word <- paste("col_",myvec[1],"_",myvec[2], sep ="")# Finding Word
Colindex <-which(colnames(myframe)==Word) #Finding Column index
myframe[id_row, Colindex] <-1 # Replacing in column index and vetor
}
dfRes<-cbind(previous =df$previous, "next"= df$next1, myframe, id=df$id)
# previous next col_1_1 col_2_1 col_3_1 col_1_2 col_2_2 col_3_2 col_1_3 col_2_3 col_3_3 id
# [1,] 1 1 1 0 0 0 0 0 0 0 0 1
# [2,] 2 1 0 1 0 0 0 0 0 0 0 2
# [3,] 2 2 0 0 0 0 1 0 0 0 0 3
# [4,] 1 3 0 0 0 0 0 0 0 0 0 4
# [5,] 3 1 0 0 0 0 0 0 0 0 0 5
# [6,] 3 3 0 0 0 0 0 0 0 0 0 6
Inside a by you could use a switch, because your values are nicely consecutive 1:3. Finally we merge to get the result.
tmp <- by(dat, dat$next., function(x) {
x1 <- x$previous
o <- `colnames<-`(t(sapply(x1, function(z)
switch(z, c(1, 0, 0), c(0, 1, 0), c(0, 0, 1)))),
paste(el(x1), 1:3, sep="_"))
cbind(x, col=o)
})
res <- Reduce(function(...) merge(..., all=TRUE), tmp)
res[is.na(res)] <- 0 ## set NA to zero if wanted
Result
res[order(res$id),] ## order by ID if needed
# previous next. id col.1_1 col.1_2 col.1_3 col.2_1 col.2_2 col.2_3
# 1 1 1 1 1 0 0 0 0 0
# 3 2 1 2 0 1 0 0 0 0
# 4 2 2 3 0 0 0 0 1 0
# 2 1 3 4 1 0 0 0 0 0
# 5 3 1 5 0 0 1 0 0 0
# 6 3 3 6 0 0 1 0 0 0
Data
dat <- structure(list(previous = c(1, 2, 2, 1, 3, 3), next. = c(1, 1,
2, 3, 1, 3), id = c(1, 2, 3, 4, 5, 6)), class = "data.frame", row.names = c(NA,
-6L))
Note: next as column name is not particularly a good idea, since it has a special meaning in R.
Here is a tidyverse approach:
library(tidyr)
library(dplyr)
df %>%
rowid_to_column() %>%
complete(previous, nxt) %>%
unite(col , previous, nxt, sep = "_", remove = FALSE) %>%
pivot_wider(names_from = col, values_from = rowid, values_fn = list(rowid = ~1), values_fill = list(rowid = 0)) %>%
na.omit() %>%
arrange(id)
# A tibble: 6 x 12
previous nxt id `1_1` `1_2` `1_3` `2_1` `2_2` `2_3` `3_1` `3_2` `3_3`
<dbl> <dbl> <dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 1 1 1 1 0 0 0 0 0 0 0 0
2 2 1 2 0 0 0 1 0 0 0 0 0
3 2 2 3 0 0 0 0 1 0 0 0 0
4 1 3 4 0 0 1 0 0 0 0 0 0
5 3 1 5 0 0 0 0 0 0 1 0 0
6 3 3 6 0 0 0 0 0 0 0 0 1
This is another tidyverse solution that differ a little (maybe more concise) from #H1's one.
library(dplyr)
library(tidyr)
df %>%
mutate(n = 1) %>%
complete(id, previous, next., fill = list(n = 0)) %>%
unite(col, previous, next.) %>%
pivot_wider(names_from = col, names_prefix = "col", values_from = n) %>%
right_join(df)
# # A tibble: 6 x 12
# id col1_1 col1_2 col1_3 col2_1 col2_2 col2_3 col3_1 col3_2 col3_3 previous next.
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 1 0 0 0 0 0 0 0 0 1 1
# 2 2 0 0 0 1 0 0 0 0 0 2 1
# 3 3 0 0 0 0 1 0 0 0 0 2 2
# 4 4 0 0 1 0 0 0 0 0 0 1 3
# 5 5 0 0 0 0 0 0 1 0 0 3 1
# 6 6 0 0 0 0 0 0 0 0 1 3 3
You can try the code below
dfout <- within(df,
col <- `colnames<-`(t(sapply((Previous-1)*3+Next,
function(v) replace(rep(0,9),v,1))),
do.call(paste,c(expand.grid(1:3,1:3),sep = "_"))))
such that
> dfout
Previous Next id col.1_1 col.2_1 col.3_1 col.1_2 col.2_2 col.3_2 col.1_3 col.2_3 col.3_3
1 1 1 1 1 0 0 0 0 0 0 0 0
2 2 1 2 0 0 0 1 0 0 0 0 0
3 2 2 3 0 0 0 0 1 0 0 0 0
4 1 3 4 0 0 1 0 0 0 0 0 0
5 3 1 5 0 0 0 0 0 0 1 0 0
6 3 3 6 0 0 0 0 0 0 0 0 1

How to make a logical statement which finds a year between/in two date columns?

I have a problem executing something in R which maybe isn't hard but I simply can't figure it out.
Let's say I have the following dataframe with only two date columns: date_started and date_ended.
df <- data.frame(date_started=as.Date(c("1990-02-01","1995-03-04","1997-04-01","1999-01-11","1993-04-04")),
date_ended=as.Date(c("1993-08-12","1999-07-06","2000-06-05","1999-12-01","1996-07-08")))
They represent the start and end dates of the treatment of patients.
Now I would like to add new columns which are either 1 (TRUE) or 0 (FALSE) when the person was treated in a certain year.
The result columns should be:
df$year_1990 <- c(1,0,0,0,0)
df$year_1991 <- c(1,0,0,0,0)
df$year_1992 <- c(1,0,0,0,0)
df$year_1993 <- c(1,0,0,0,1)
df$year_1994 <- c(0,0,0,0,1)
df$year_1995 <- c(0,1,0,0,1)
df$year_1996 <- c(0,1,0,0,1)
df$year_1997 <- c(0,1,1,0,0)
df$year_1998 <- c(0,1,1,0,0)
df$year_1999 <- c(0,1,1,1,0)
df$year_2000 <- c(0,0,1,0,0)
So I can count for each year how many people were treated.
I have tried and looked for a solution but simply can't find it.
I've tried ifelse statements and the between function but I did not manage to solve this.
Any help is much appreciated!
One dplyr and tidyr option could be:
df %>%
rowwise() %>%
mutate(var = list(seq(as.numeric(substr(date_started, 1, 4)),
as.numeric(substr(date_ended, 1, 4)),
1))) %>%
ungroup() %>%
unnest(var) %>%
mutate(var = paste0("year_", var),
val = 1) %>%
pivot_wider(names_from = "var", values_from = "val", values_fill = list(val = 0))
date_started date_ended year_1990 year_1991 year_1992 year_1993 year_1995 year_1996 year_1997 year_1998 year_1999
<date> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1990-02-01 1993-08-12 1 1 1 1 0 0 0 0 0
2 1995-03-04 1999-07-06 0 0 0 0 1 1 1 1 1
3 1997-04-01 2000-06-05 0 0 0 0 0 0 1 1 1
4 1999-01-11 1999-12-01 0 0 0 0 0 0 0 0 1
5 1993-04-04 1996-07-08 0 0 0 1 1 1 0 0 0
# … with 2 more variables: year_2000 <dbl>, year_1994 <dbl>
A base R option would be to convert to Date class, extract the 'year' with format, get a sequence, stack the list of vectors to a 2 column data.frame, and get frequency count with table
lst1 <- Map(function(x, y) as.numeric(x):as.numeric(y),
format(as.Date(df$date_started), "%Y"), format(as.Date(df$date_ended), "%Y"))
dfn <- cbind(df, as.data.frame.matrix( table(stack(lst1)[2:1])))
row.names(dfn) <- NULL
colnames(dfn)[-(1:2)] <- paste0("year_", colnames(dfn)[-(1:2)])
dfn
# date_started date_ended year_1990 year_1991 year_1992 year_1993 year_1994 year_1995 year_1996 year_1997 year_1998 year_1999 year_2000
#1 1990-02-01 1993-08-12 1 1 1 1 0 0 0 0 0 0 0
#2 1995-03-04 1999-07-06 0 0 0 0 0 1 1 1 1 1 0
#3 1997-04-01 2000-06-05 0 0 0 0 0 0 0 1 1 1 1
#4 1999-01-11 1999-12-01 0 0 0 0 0 0 0 0 0 1 0
#5 1993-04-04 1996-07-08 0 0 0 1 1 1 1 0 0 0 0
Or using tidyverse
library(purrr)
library(tidyr)
library(dplyr)
library(lubridate)
library(gtools)
df %>%
mutate_all(ymd) %>%
mutate(new = map2(year(date_started), year(date_ended),
~ seq(.x, .y) %>%
set_names(str_c('year_', .)) %>%
as.list )) %>%
unnest_wider(new) %>%
mutate_at(vars(starts_with('year')), ~ +(!is.na(.))) %>%
select(date_started, date_ended, mixedsort(names(.)[-(1:2)]))
# A tibble: 5 x 13
# date_started date_ended year_1990 year_1991 year_1992 year_1993 year_1994 year_1995 year_1996 year_1997 year_1998 year_1999 year_2000
# <date> <date> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#1 1990-02-01 1993-08-12 1 1 1 1 0 0 0 0 0 0 0
#2 1995-03-04 1999-07-06 0 0 0 0 0 1 1 1 1 1 0
#3 1997-04-01 2000-06-05 0 0 0 0 0 0 0 1 1 1 1
#4 1999-01-11 1999-12-01 0 0 0 0 0 0 0 0 0 1 0
#5 1993-04-04 1996-07-08 0 0 0 1 1 1 1 0 0 0 0
Combining base R with lubridate::year yields a succinct and simple
solution:
year_bool <- sapply(1990:2000, function(y) {
as.integer(y >= year(df$date_started) & y <= year(df$date_ended))
})
colnames(year_bool) <- paste('year', 1990:2000, sep = '_')
cbind(df, year_bool)
## date_started date_ended year_1990 year_1991 year_1992 year_1993
## 1 1990-02-01 1993-08-12 1 1 1 1
## 2 1995-03-04 1999-07-06 0 0 0 0
## 3 1997-04-01 2000-06-05 0 0 0 0
## 4 1999-01-11 1999-12-01 0 0 0 0
## 5 1993-04-04 1996-07-08 0 0 0 1
## year_1994 year_1995 year_1996 year_1997 year_1998 year_1999 year_2000
## 1 0 0 0 0 0 0 0
## 2 0 1 1 1 1 1 0
## 3 0 0 0 1 1 1 1
## 4 0 0 0 0 0 1 0
## 5 1 1 1 0 0 0 0
Base R solution using #Andy Rominger's logic:
# Create a vector with that's values are all the years between the two date vectors:
year_range <- eval(parse(text = paste(range(unlist(lapply(df,
function(x){x <- as.integer(gsub("[-].*", "", x))}))),
collapse = ":")))
# Using Andy Rominger's logic, but in base determine if date is between the two years:
new_df <- cbind(df, setNames(data.frame(sapply(year_range, function(x){
as.integer(x >= as.numeric(gsub("[-].*", "", df$date_started)) &
x <= as.numeric(gsub("[-].*", "", (df$date_ended))))
}
)
),
c(paste0("year_", year_range))
)
)
Data:
df <-
structure(
list(
date_started = structure(c(7336, 9193, 9952, 10602,
8494), class = "Date"),
date_ended = structure(c(8624, 10778,
11113, 10926, 9685), class = "Date")
),
class = "data.frame",
row.names = c(NA,-5L)
)

adding together multiple sets of columns in r

I'm trying to add several sets of columns together.
Example df:
df <- data.frame(
key = 1:5,
ab0 = c(1,0,0,0,1),
ab1 = c(0,2,1,0,0),
ab5 = c(1,0,0,0,1),
bc0 = c(0,1,0,2,0),
bc1 = c(2,0,0,0,0),
bc5 = c(0,2,1,0,1),
df0 = c(0,0,0,1,0),
df1 = c(1,0,3,0,0),
df5 = c(1,0,0,0,6)
)
Giving me:
key ab0 ab1 ab5 bc0 bc1 bc5 df0 df1 df5
1 1 1 0 1 0 2 0 0 1 1
2 2 0 2 0 1 0 2 0 0 0
3 3 0 1 0 0 0 1 0 3 0
4 4 0 0 0 2 0 0 1 0 0
5 5 1 0 1 0 0 1 0 0 6
I want to add all sets of columns with 0s and 5s in them together and place them in the 0 column.
So the end result would be:
key ab0 ab1 ab5 bc0 bc1 bc5 df0 df1 df5
1 1 2 0 1 0 2 0 0 1 1
2 2 0 2 0 3 0 2 0 0 0
3 3 0 1 0 1 0 1 0 3 0
4 4 0 0 0 2 0 0 2 0 0
5 5 2 0 1 1 0 1 0 0 6
I could add the columns together using 3 lines:
df$ab0 <- df$ab0 + df$ab5
df$bc0 <- df$bc0 + df$bc5
df$df0 <- df$df0 + df$df5
But my real example has over a hundred columns so I'd like to iterate over them and use apply.
The column names of the first set are contained in col0 and the names of the second set are in col5.
col0 <- c("ab0","bc0","df0")
col5 <- c("ab5","bc5","df5")
I created a function to add the columns to gether using mapply:
fun1 <- function(df,x,y) {
df[,x] <- df[,x] + df[,y]
}
mapply(fun1,df,col0,col5)
But I get an error: Error in df[, x] : incorrect number of dimensions
Thoughts?
Simply add two data frames together by their subsetted columns, assuming they will be the same length. No loops needed. All vectorized operation.
final_df <- df[grep("0", names(df))] + df[grep("5", names(df))]
final_df <- cbind(final_df, df[grep("0", names(df), invert=TRUE)])
final_df <- final_df[order(names(final_df))]
final_df
# ab0 ab1 ab5 bc0 bc1 bc5 df0 df1 df5 key
# 1 2 0 1 0 2 0 1 1 1 1
# 2 0 2 0 3 0 2 0 0 0 2
# 3 0 1 0 1 0 1 0 3 0 3
# 4 0 0 0 2 0 0 1 0 0 4
# 5 2 0 1 1 0 1 6 0 6 5
Rextester demo
You could use map2 from the purrr package to iterate over the two vectors at once:
df <- data.frame(
key = 1:5,
ab0 = c(1,0,0,0,1),
ab1 = c(0,2,1,0,0),
ab5 = c(1,0,0,0,1),
bc0 = c(0,1,0,2,0),
bc1 = c(2,0,0,0,0),
bc5 = c(0,2,1,0,1),
df0 = c(0,0,0,1,0),
df1 = c(1,0,3,0,0),
df5 = c(1,0,0,0,6)
)
col0 <- c("ab0","bc0","df0")
col5 <- c("ab5","bc5","df5")
purrr::map2(col0, col5, function(x, y) {
df[[x]] <<- df[[x]] + df[[y]]
})
> df
key ab0 ab1 ab5 bc0 bc1 bc5 df0 df1 df5
1 1 2 0 1 0 2 0 1 1 1
2 2 0 2 0 3 0 2 0 0 0
3 3 0 1 0 1 0 1 0 3 0
4 4 0 0 0 2 0 0 1 0 0
5 5 2 0 1 1 0 1 6 0 6
Here's an approach using tidyr and dplyr from the tidyverse meta-package.
First, I bring the table into long ("tidy") format, and split out the column into two components, and spread by the number part of those components.
Then I do the calculation you describe.
Finally, I bring it back into the original format using the inverse of step 1.
library(tidyverse)
df_tidy <- df %>%
# Step 1
gather(col, value, -key) %>%
separate(col, into = c("grp", "num"), 2) %>%
spread(num, value) %>%
# Step 2
mutate(`0` = `0` + `5`) %>%
# Step 3, which is just the inverse of Step 1.
gather(num, value, -key, - grp) %>%
unite(col, c("grp", "num")) %>%
spread(col, value)
df_tidy
key ab_0 ab_1 ab_5 bc_0 bc_1 bc_5 df_0 df_1 df_5
1 1 2 0 1 0 2 0 1 1 1
2 2 0 2 0 3 0 2 0 0 0
3 3 0 1 0 1 0 1 0 3 0
4 4 0 0 0 2 0 0 1 0 0
5 5 2 0 1 1 0 1 6 0 6

Converting to wide format from long with multiple id and value columns [duplicate]

This question already has answers here:
From long to wide data with multiple columns
(3 answers)
Closed 4 years ago.
I am stuck trying to convert from wide to long format with multiple ID and value columns. I'd prefer a tidyr solution as dcast as been defaulting to length.
Here's what I've tried so far:
df_wide <- df %>%
melt(id.vars = c(Route, Address, Week)) %>%
dcast(Route + Address ~ variable + Week)
Data:
df <- read.table(text = "
Route Week Address V1 V2 V3 V4 V5
A Week1 12345_SE_Court 0 1 0 0 0
A Week2 12345_SE_Court 0 0 1 1 1
B Week1 98765_NW_Drive 1 1 0 0 1
B Week2 98765_NW_Drive 0 1 0 1 0
C Week1 10293_SW_Road 0 0 0 0 1
C Week2 10293_SW_Road 1 0 0 0 1
A Week1 33333_NE_Street 0 1 1 0 0
A Week2 33333_NE_Street 1 0 1 0 0"
, header = TRUE)
Desired output:
Route Address V1.Week1 V2.Week1 V3.Week1 V4.Week1 V5.Week1 V1.Week1 V2.Week2 V3.Week2 V4.Week2 V5.Week2
A 12345_SE_Court 0 1 0 0 0 0 0 1 1 1
A 33333_NE_Street 0 1 1 0 1 0 1 0 0 0
B 98765_NW_Drive 1 1 0 0 1 0 1 0 1 0
C 10293_SW_Road 0 0 0 0 1 1 0 0 0 1
Here's the way to do this using tidyr. The trick is that you need to do a gather first:
library(tidyr)
df_wide <- df %>%
gather(key, value, V1:V5) %>%
unite("key", key, Week, sep = ".") %>%
spread(key, value)
df_wide
#> Route Address V1.Week1 V1.Week2 V2.Week1 V2.Week2 V3.Week1
#> 1 A 12345_SE_Court 0 0 1 0 0
#> 2 A 33333_NE_Street 0 1 1 0 1
#> 3 B 98765_NW_Drive 1 0 1 1 0
#> 4 C 10293_SW_Road 0 1 0 0 0
#> V3.Week2 V4.Week1 V4.Week2 V5.Week1 V5.Week2
#> 1 1 0 1 0 1
#> 2 1 0 0 0 0
#> 3 0 0 1 1 0
#> 4 0 0 0 1 1
Created on 2018-06-27 by the reprex package (v0.2.0).

Resources