I am looking to do something in R that seems similar to what I would use the reshape package for, but not quite. I am looking to move some rows of a data frame into columns but not all. For example, my data frame looks something like:
v1, v2, v3
info, time, 12:00
info, day, Monday
info, temperature, 70
data, 1, 2
data, 2, 2
data, 3, 1
data, 4, 1
data, 5, 3
I would like to transform it into something like:
v1, v2, v3, info_time, info_day, info_temperature
data, 1, 2, 12:00, Monday, 70
data, 2, 2, 12:00, Monday, 70
data, 3, 1, 12:00, Monday, 70
data, 4, 1, 12:00, Monday, 70
data, 5, 3, 12:00. Monday, 70
Is there an easy way to do this? Does the reshape package help here?
Thank you in advance for all your help!
Vincent
Try
library(reshape2)
indx <- df$v1=='data'
res <- cbind(df[indx,],dcast(df[!indx,],v1~v2, value.var='v3'))[,-4]
row.names(res) <- NULL
colnames(res)[4:6] <- paste('info', colnames(res)[4:6], sep="_")
res
# v1 v2 v3 info_day info_temperature info_time
#1 data 1 2 Monday 70 12:00
#2 data 2 2 Monday 70 12:00
#3 data 3 1 Monday 70 12:00
#4 data 4 1 Monday 70 12:00
#5 data 5 3 Monday 70 12:00
Or use dplyr/tidyr
library(dplyr)
library(tidyr)
cbind(df[indx,],
unite(df[!indx,], Var, v1, v2) %>%
mutate(id=1) %>%
spread(Var, v3)%>%
select(-id))
Or using base R
cbind(df[indx,],
reshape(transform(df[!indx,], v2= paste(v1, v2, sep="_")),
idvar='v1', timevar='v2', direction='wide')[,-1])
data
df <- structure(list(v1 = c("info", "info", "info", "data", "data",
"data", "data", "data"), v2 = c("time", "day", "temperature",
"1", "2", "3", "4", "5"), v3 = c("12:00", "Monday", "70", "2",
"2", "1", "1", "3")), .Names = c("v1", "v2", "v3"), class = "data.frame",
row.names = c(NA, -8L))
A solution without external packages ( using df structure from Akrun):
df1 <- cbind(df[4:8,1:3],apply(df[1:3,3,drop=FALSE],1,function(x) rep(x,nrow(df)-3)))
colnames(df1)[4:6] <- paste("info",df[1:3,2], sep = "_")
df1
> df1
v1 v2 v3 info_time info_day info_temperature
4 data 1 2 12:00 Monday 70
5 data 2 2 12:00 Monday 70
6 data 3 1 12:00 Monday 70
7 data 4 1 12:00 Monday 70
8 data 5 3 12:00 Monday 70
Related
Is there a quick way to replace variable names with the content of the first row of a tibble?
So turning something like this:
Subject Q1 Q2 Q3
Subject age gender cue
429753 24 1 man
b952x8 23 2 mushroom
264062 19 1 night
53082m 35 1 moon
Into this:
Subject age gender cue
429753 24 1 man
b952x8 23 2 mushroom
264062 19 1 night
53082m 35 1 moon
My dataset has over 100 variables so I'm looking for a way that doesn't involve typing out each old and new variable name.
A possible solution:
df <- structure(list(Subject = c("Subject", "429753", "b952x8", "264062",
"53082m"), Q1 = c("age", "24", "23", "19", "35"), Q2 = c("gender",
"1", "2", "1", "1"), Q3 = c("cue", "man", "mushroom", "night",
"moon")), row.names = c(NA, -5L), class = "data.frame")
names(df) <- df[1,]
df <- df[-1,]
df
#> Subject age gender cue
#> 2 429753 24 1 man
#> 3 b952x8 23 2 mushroom
#> 4 264062 19 1 night
#> 5 53082m 35 1 moon
I have this table:
ID Day Score
23928 Monday 75
394838 Tuesday 83
230902 Wednesday 90
329832 Thursday 40
…
and goes on, repeating day several times.
I want to transpose the day column to get this table
MONDAY Tuesday Wednesday …… Sunday
78 4343 343 433
Is there a way to do this in r ?
We can use data.table::transpose
library(data.table)
data.table::transpose(df1[-1], make.names = 'Day')
Or using base R
as.data.frame.list(with(df1, setNames(Score, Day)))
data
df1 <- structure(list(ID = c(23928L, 394838L, 230902L, 329832L),
Day = c("Monday",
"Tuesday", "Wednesday", "Thursday"), Score = c(75L, 83L, 90L,
40L)), class = "data.frame", row.names = c(NA, -4L))
Assuming your data is stored in a data.frame, you could use dplyr and tidyr:
df %>%
select(-ID) %>%
pivot_wider(names_from=Day, values_from=Score)
which returns
# A tibble: 1 x 4
Monday Tuesday Wednesday Thursday
<dbl> <dbl> <dbl> <dbl>
1 75 83 90 40
Use t and set names:
setNames(as.data.frame(t(df$Score)), df$Day)
Output
# Monday Tuesday Wednesday Thursday
# 75 83 90 40
You could use tidyr:
library(tidyr)
data <- data.frame(day = c("Monday", "Tuesday", "Wednesday", "Thursday"),
val = c(12,75,9,38) )
data %>% spread(day,val)
Result:
Monday Thursday Tuesday Wednesday
12 38 75 9
I have a data frame in R which looks like below
Model Month Demand Inventory
A Jan 10 20
B Feb 30 40
A Feb 40 60
I want the data frame to look
Jan Feb
A_Demand 10 40
A_Inventory 20 60
A_coverage
B_Demand 30
B_Inventory 40
B_coverage
A_coverage and B_Coverage will be calculated in excel using a formula. But the problem I need help with is to pivot the data frame from wide to long format (original format).
I tried to implement the solution from the linked duplicate but I am still having difficulty:
HD_dcast <- reshape(data,idvar = c("Model","Inventory","Demand"),
timevar = "Month", direction = "wide")
Here is a dput of my data:
data <- structure(list(Model = c("A", "B", "A"), Month = c("Jan", "Feb",
"Feb"), Demand = c(10L, 30L, 40L), Inventory = c(20L, 40L, 60L
)), class = "data.frame", row.names = c(NA, -3L))
Thanks
Here's an approach with dplyr and tidyr, two popular R packages for data manipulation:
library(dplyr)
library(tidyr)
data %>%
mutate(coverage = NA_real_) %>%
pivot_longer(-c(Model,Month), names_to = "Variable") %>%
pivot_wider(id_cols = c(Model, Variable), names_from = Month ) %>%
unite(Variable, c(Model,Variable), sep = "_")
## A tibble: 6 x 3
# Variable Jan Feb
# <chr> <dbl> <dbl>
#1 A_Demand 10 40
#2 A_Inventory 20 60
#3 A_coverage NA NA
#4 B_Demand NA 30
#5 B_Inventory NA 40
#6 B_coverage NA NA
Problem: need to add values from one dataframe to another depending on the time window in which each row occurs.
I have one dataframe with a list of singular events like this:
Ind Date Time Event
1 FAU 15/11/2016 06:40:43 A
2 POR 15/11/2016 12:26:51 V
3 POR 15/11/2016 14:52:53 B
4 MAM 20/11/2016 08:12:19 G
5 SUR 03/12/2016 13:51:18 A
6 SUR 14/12/2016 07:47:06 V
And a second data frame with ongoing, continuous events linked like this:
Date Time Event
1 15/11/2016 06:56:48 1
2 15/11/2016 06:59:40 2
3 15/11/2016 07:27:36 3
4 15/11/2016 07:29:10 4
5 15/11/2016 07:34:51 5
6 15/11/2016 07:35:10 6
7 15/11/2016 07:37:19 7
8 15/11/2016 07:39:55 8
9 15/11/2016 07:51:59 9
10 15/11/2016 08:00:13 10
11 15/11/2016 08:08:01 11
12 15/11/2016 08:13:21 12
13 15/11/2016 08:16:21 13
14 15/11/2016 12:14:48 14
15 15/11/2016 12:16:58 15
16 15/11/2016 12:51:22 16
17 15/11/2016 12:52:09 17
18 15/11/2016 13:26:29 18
19 15/11/2016 13:26:55 19
20 15/11/2016 13:34:14 20
21 15/11/2016 13:50:41 21
22 15/11/2016 13:53:25 22
23 15/11/2016 14:15:17 23
24 15/11/2016 14:54:49 24
Question: how can I combine these so that for the singular events we can see during which continuous events they occurred, for example, something like this:
Ind Date Time Eventx Eventy
1 FAU 15/11/2017 06:40:43 A 1
2 POR 15/11/2017 12:26:51 V 15
3 POR 15/11/2017 14:52:53 B 23
Many thanks
I can provide you with a data.table solution. The only issue is that I had to move the start of the first event in the second dataframe to an earlier date, since it was after the starting time of the first event of the first dataframe.
You'll need the additional packages data.table and lubridate.
library(data.table)
library(lubridate)
dt1 <- data.table(df1)
dt2 <- data.table(df2)
dt1[, Date.Time := as.POSIXct(strptime(paste(Date, Time, sep = " "), "%d/%m/%Y %H:%M:%S"))]
dt2[, Date.Time := as.POSIXct(strptime(paste(Date, Time, sep = " "), "%d/%m/%Y %H:%M:%S"))]
# Create the start and end time columns in the second data.table
dt2[, `:=`(Start.Time = Date.Time
, End.Time = shift(Date.Time, n = 1L, fill = NA, type = "lead"))]
# Change the start date to an earlier one
dt2[Event == 1,`:=`(Start.Time = Start.Time - days(1)) ]
# Merge on multiple conditions and the selection of the relevant columns
dt2[dt1, on=.(Start.Time < Date.Time
, End.Time > Date.Time)
, nomatch = 0L][,.(Ind
, Date
, Time
, Eventx = i.Event
, Eventy = Event)]
# Output of the last merge
Ind Date Time Eventx Eventy
1: FAU 15/11/2016 06:56:48 A 1
2: POR 15/11/2016 12:16:58 V 15
3: POR 15/11/2016 14:15:17 B 23
This should work (at least does on your example):
df1 <- structure(list(Ind = c("FAU", "POR", "POR", "MAM", "SUR", "SUR"
), Date = c("15/11/2016", "15/11/2016", "15/11/2016", "20/11/2016",
"03/12/2016", "14/12/2016"), Time = c("06:40:43", "12:26:51",
"14:52:53", "08:12:19", "13:51:18", "07:47:06"), Event = c("A",
"V", "B", "G", "A", "V")), .Names = c("Ind", "Date", "Time",
"Event"), class = "data.frame", row.names = c("1", "2", "3",
"4", "5", "6"))
df2 <- structure(list(Date = c("15/11/2016", "15/11/2016", "15/11/2016",
"15/11/2016", "15/11/2016", "15/11/2016", "15/11/2016", "15/11/2016",
"15/11/2016", "15/11/2016", "15/11/2016", "15/11/2016", "15/11/2016",
"15/11/2016", "15/11/2016", "15/11/2016", "15/11/2016", "15/11/2016",
"15/11/2016", "15/11/2016", "15/11/2016", "15/11/2016", "15/11/2016",
"15/11/2016"), Time = c("06:56:48", "06:59:40", "07:27:36", "07:29:10",
"07:34:51", "07:35:10", "07:37:19", "07:39:55", "07:51:59", "08:00:13",
"08:08:01", "08:13:21", "08:16:21", "12:14:48", "12:16:58", "12:51:22",
"12:52:09", "13:26:29", "13:26:55", "13:34:14", "13:50:41", "13:53:25",
"14:15:17", "14:54:49"), Event = 1:24), .Names = c("Date", "Time",
"Event"), class = "data.frame", row.names = c("1", "2", "3",
"4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15",
"16", "17", "18", "19", "20", "21", "22", "23", "24"))
Create as.POSIXct variables:
df1$datetime <- as.POSIXct(strptime(paste(df1$Date, df1$Time, sep = " "), "%d/%m/%Y %H:%M:%S"))
df2$datetime <- as.POSIXct(strptime(paste(df2$Date, df2$Time, sep = " "), "%d/%m/%Y %H:%M:%S"))
Initiate new count variable for df1:
df1$count <- NA
Now we loop over the rows of df1 and count the occurences in df2 with the same Date and within the Time intervals:
for(i in 1:nrow(df1)){
df1$count[i] <- sum(df2$datetime[df2$Date == df1$Date[i]] < df1$datetime[i])
}
Result:
> df1
Ind Date Time Event datetime count
1 FAU 15/11/2016 06:40:43 A 2016-11-15 06:40:43 0
2 POR 15/11/2016 12:26:51 V 2016-11-15 12:26:51 15
3 POR 15/11/2016 14:52:53 B 2016-11-15 14:52:53 23
4 MAM 20/11/2016 08:12:19 G 2016-11-20 08:12:19 0
5 SUR 03/12/2016 13:51:18 A 2016-12-03 13:51:18 0
6 SUR 14/12/2016 07:47:06 V 2016-12-14 07:47:06 0
I've got a data frame that looks like something along these lines:
Day Salesperson Value
==== ============ =====
Monday John 40
Monday Sarah 50
Tuesday John 60
Tuesday Sarah 30
Wednesday John 50
Wednesday Sarah 40
I want to divide the value for each salesperson by the number of times that each of the days of the week has occurred. So: There have been 3 Monday, 3 Tuesdays, and 2 Wednesdays — I don't have this information digitally, but can create a vector along the lines of
c(3, 3, 2)
How can I conditionally divide the Value column based on the number of times each day occurs?
I've found an inelegant solution, which entails copying the Day column to a temp column, replacing each of the names of the week in the new column with the number of times each day occurs using
df$temp <- sub("Monday, 3, df$temp)
but doing this seems kinda clunky. Is there a neat way to do this?
Suppose your auxiliary data is in another data.frame:
Day N_Day
1 Monday 3
2 Tuesday 3
3 Wednesday 2
The simplest way would be to merge:
DF_new <- merge(DF, DF2, by="Day")
DF_new$newcol <- DF_new$Value / DF_new$N_Day
which gives
Day Salesperson Value N_Day newcol
1 Monday John 40 3 13.33333
2 Monday Sarah 50 3 16.66667
3 Tuesday John 60 3 20.00000
4 Tuesday Sarah 30 3 10.00000
5 Wednesday John 50 2 25.00000
6 Wednesday Sarah 40 2 20.00000
The mergeless shortcut is
DF$newcol <- DF$Value / DF2$N_Day[match(DF$Day, DF2$Day)]
Data:
DF <- structure(list(Day = structure(c(1L, 1L, 2L, 2L, 3L, 3L), .Label =
c("Monday",
"Tuesday", "Wednesday"), class = "factor"), Salesperson = structure(c(1L,
2L, 1L, 2L, 1L, 2L), .Label = c("John", "Sarah"), class = "factor"),
Value = c(40L, 50L, 60L, 30L, 50L, 40L)), .Names = c("Day",
"Salesperson", "Value"), class = "data.frame", row.names = c(NA,
-6L))
DF2 <- structure(list(Day = structure(1:3, .Label = c("Monday", "Tuesday",
"Wednesday"), class = "factor"), N_Day = c(3, 3, 2)), .Names = c("Day",
"N_Day"), row.names = c(NA, -3L), class = "data.frame")
You can use the library dplyr to merge your data frame with the frequency of each day.
df <- data.frame(
Day=c("Monday","Monday","Tuesday","Tuesday","Wednesday","Wednesday"),
Salesperson=c("John","Sarah","John","Sarah","John","Sarah"),
Value=c(40,50,60,30,50,40), stringsAsFactors=F)
aux <- data.frame(
Day=c("Monday","Tuesday","Wednesday"),
freq=c(3,3,2)
)
output <- df %>% left_join(aux, by="Day") %>% mutate(Value2=Value/n)
To create this auxiliary table with the count of days that appear in your original data instead of doing it manually. You could use:
aux <- df %>% group_by(Day) %>% summarise(n=n())
> output
Day Salesperson Value n Value2
1 Monday John 40 2 20
2 Monday Sarah 50 2 25
3 Tuesday John 60 2 30
4 Tuesday Sarah 30 2 15
5 Wednesday John 50 2 25
6 Wednesday Sarah 40 2 20
If you want to substitute the actual valuecolumn, then use mutate(Value=Value/n) and to remove the additional columns, you can add a select(-n)
output <- df %>% left_join(aux, by="Day") %>% mutate(Value=Value/n) %>% select(-n)