Dataframe to tidy format in R

Dataframe to tidy format in R - r

I've this dataframe
x <- data.frame("date" = c("03-01-2005","04-01-2005","05-01-2005","06-01-2005"),
"pricemax.0" = c(50,20,25,56),
"pricemax.200" = c(25,67,89,30),
"pricemax.1000" = c(45,60,40,30),
"pricemax.1400" = c(60,57,32,44),
"pricemin.0" = c(22,15,23,43),
"pricemin.200" = c(21,40,59,21),
"pricemin.1000" = c(32,12,20,24),
"pricemin.1400" = c(30,20,14,20))
The numbers after the dot represents hours, e.g pricemax.200 would be 02:00. I need to gather the date and time information in one column of class POSIXct with the other two columns being pricemax and pricemin.
So, what I want is something like this:
And what I've done so far:
tidy_x <- x %>%
pivot_longer(
cols = contains("pricemax"),
names_to = c(NA,"hour"),
names_sep = "\\.",
values_to = "pricemax"
) %>%
pivot_longer(
cols = contains("pricemin"),
names_to = c(NA,"hour_2"),
names_sep = "\\.",
values_to = "pricemin"
)
I'm not sure how I can combine the date and time columns and keep the variables pricemin and pricemax organized.

Using dplyr and tidyr, you can do :
library(dplyr)
library(tidyr)
x %>%
pivot_longer(cols = -date,
names_to = c('.value', 'time'),
names_sep = '\\.') %>%
mutate(time = sprintf('%04s', time)) %>%
unite(datetime, date, time, sep = " ") %>%
mutate(datetime = lubridate::dmy_hm(datetime))
# A tibble: 16 x 3
# datetime pricemax pricemin
# <dttm> <dbl> <dbl>
# 1 2005-01-03 00:00:00 50 22
# 2 2005-01-03 02:00:00 25 21
# 3 2005-01-03 10:00:00 45 32
# 4 2005-01-03 14:00:00 60 30
# 5 2005-01-04 00:00:00 20 15
# 6 2005-01-04 02:00:00 67 40
# 7 2005-01-04 10:00:00 60 12
# 8 2005-01-04 14:00:00 57 20
# 9 2005-01-05 00:00:00 25 23
#10 2005-01-05 02:00:00 89 59
#11 2005-01-05 10:00:00 40 20
#12 2005-01-05 14:00:00 32 14
#13 2005-01-06 00:00:00 56 43
#14 2005-01-06 02:00:00 30 21
#15 2005-01-06 10:00:00 30 24
#16 2005-01-06 14:00:00 44 20
Get the data in long format with max and min in different column and hour information in different column. We make hour information consistent (of 4 digits) using sprintf and combine them into one column and convert it into datetime value.

Maybe you can try reshape like below to make a long data frame
y <- transform(
reshape(x, direction = "long", varying = -1),
date = strptime(paste(date, time / 100), "%d-%m-%Y %H")
)[c("date", "pricemax", "pricemin")]
y <- `row.names<-`(y[order(y$date),],NULL)
which gives
> y
date pricemax pricemin
1 2005-01-03 00:00:00 50 22
2 2005-01-03 02:00:00 25 21
3 2005-01-03 10:00:00 45 32
4 2005-01-03 14:00:00 60 30
5 2005-01-04 00:00:00 20 15
6 2005-01-04 02:00:00 67 40
7 2005-01-04 10:00:00 60 12
8 2005-01-04 14:00:00 57 20
9 2005-01-05 00:00:00 25 23
10 2005-01-05 02:00:00 89 59
11 2005-01-05 10:00:00 40 20
12 2005-01-05 14:00:00 32 14
13 2005-01-06 00:00:00 56 43
14 2005-01-06 02:00:00 30 21
15 2005-01-06 10:00:00 30 24
16 2005-01-06 14:00:00 44 20

Here is a data.table approach:
setDT(x)
DT <- melt.data.table(x, id.vars = "date")
DT[, c("var", "time") := tstrsplit(variable , ".", fixed=TRUE)
][, datetime := as.POSIXct(paste(date, as.integer(time) / 100), format = "%d-%m-%Y %H")
][, setdiff(names(DT), c("datetime", "var", "value")) := NULL]
DT <- dcast.data.table(DT, datetime ~ var, value.var = "value")
> DT
datetime pricemax pricemin
1: 2005-01-03 00:00:00 50 22
2: 2005-01-03 02:00:00 25 21
3: 2005-01-03 10:00:00 45 32
4: 2005-01-03 14:00:00 60 30
5: 2005-01-04 00:00:00 20 15
6: 2005-01-04 02:00:00 67 40
7: 2005-01-04 10:00:00 60 12
8: 2005-01-04 14:00:00 57 20
9: 2005-01-05 00:00:00 25 23
10: 2005-01-05 02:00:00 89 59
11: 2005-01-05 10:00:00 40 20
12: 2005-01-05 14:00:00 32 14
13: 2005-01-06 00:00:00 56 43
14: 2005-01-06 02:00:00 30 21
15: 2005-01-06 10:00:00 30 24
16: 2005-01-06 14:00:00 44 20

Related

Splitting a dateTime vector if time is greater than x between vector components

I have the following data:
df <- data.frame(index = 1:85,
times = c(seq(as.POSIXct("2020-10-03 21:31:00 UTC"),
as.POSIXct("2020-10-03 22:25:00 UTC")
"min"),
seq(as.POSIXct("2020-11-03 10:10:00 UTC"),
as.POSIXct("2020-11-03 10:39:00 UTC"),
"min")
))
if we look at row 55 and 56 there is a clear divide in times:
> df[55:56, ]
index times
55 55 2020-10-03 22:25:00
56 56 2020-11-03 10:10:00
I would like to add a third categorical column split based on the splits,
e.g. row df$split[55, ] = A and row df$split[56, ] = B
logic like
If time gap between rows is greater than 5 mins start new category for subsequent rows until the next instance where time gap > 5 mins.
thanks

You could use
library(dplyr)
df %>%
mutate(cat = 1 + cumsum(c(0, diff(times)) > 5))
which returns
index times cat
1 1 2020-10-03 21:31:00 1
2 2 2020-10-03 21:32:00 1
3 3 2020-10-03 21:33:00 1
4 4 2020-10-03 21:34:00 1
5 5 2020-10-03 21:35:00 1
6 6 2020-10-03 21:36:00 1
7 7 2020-10-03 21:37:00 1
8 8 2020-10-03 21:38:00 1
...
53 53 2020-10-03 22:23:00 1
54 54 2020-10-03 22:24:00 1
55 55 2020-10-03 22:25:00 1
56 56 2020-11-03 10:10:00 2
57 57 2020-11-03 10:11:00 2
58 58 2020-11-03 10:12:00 2
59 59 2020-11-03 10:13:00 2
If you need letters or something else, you could for example use
df %>%
mutate(cat = LETTERS[1 + cumsum(c(0, diff(times)) > 5)])
to convert the categories 1 and 2 into A and B.

Analyzing data in order of column and then row in R

I have a dataset of logged data at 5 minutes intervals that also includes data at 1 minute intervals denoted by _1 - _5 in the header.
Each row represents a 5 minute interval.
datetime temp speed_1 speed_2 speed_3 speed_4 speed_5
20190710 09:00:00 21 13 14 26 29 32
20190710 09:05:00 21 28 28 29 38 12
20190710 09:10:00 20 8 15 29 30 19
20190711 11:12:00 18 6 9 18 51 49
20190711 11:17:00 17 49 48 48 30 10
The actual dataset has an additional 25 columns of data logged at 5 minute intervals and consists of approximately 25000 rows.
I'm looking for an efficient way of analyzing the speed for each day.
For example, if I wanted to plot the speed for each day it would take speed_1 to speed_5 from the earliest entry on a particular day, say 09:00:00, then speed_1 to speed_5 from the next time, 09:05:00, and so on for the whole day.
Currently I have created an additional dataframe for the speed that fills in the times to give:
datetime speed
20190710 09:00:00 13
20190710 09:01:00 14
20190710 09:02:00 26
20190710 09:03:00 29
20190710 09:04:00 32
This results in having a second df of 125000 entries. I was wondering if there was a more memory efficient way of analyzing the original dataset as the datasets may grow considerably in the future.
Edit: Reproducible code added
structure(list(time = structure(1:3, .Label = c("20190710 09-00-00", "20190710 09-05-00", "20190710 09-10-00"), class = "factor"), temp = c(21, 21, 20), speed_1 = c(13, 28, 8), speed_2 = c(14, 28, 15), speed_3 = c(26, 29, 29), speed_4 = c(29, 38, 30), speed_5 = c(32, 12, 19)), .Names = c("time", "temp", "speed_1", "speed_2", "speed_3", "speed_4", "speed_5"), row.names = c(NA, -3L), class = "data.frame")

Here is a dplyr version:
library(tidyverse)
library(lubridate)
df <- read.table(text='datetime temp speed_1 speed_2 speed_3 speed_4 speed_5
"20190710 09:00:00" 21 13 14 26 29 32
"20190710 09:05:00" 21 28 28 29 38 12
"20190710 09:10:00" 20 8 15 29 30 19
"20190711 11:12:00" 18 6 9 18 51 49
"20190711 11:17:00" 17 49 48 48 30 10',header=T)
# we take our dataframe
df %>%
# ...then we put all the speed columns in one column
pivot_longer(starts_with("speed_")
, names_to = "minute"
, values_to = "speed") %>%
# ...then we...
mutate(datetime = ymd_hms(datetime) #...turn the "datetime" column actually into a datetime format
, minute = gsub("speed_", "", minute) %>% as.numeric() # ...remove "speed_" from the former column names (which are now in column "speed")
, datetime = datetime + minutes(minute - 1)) # ...and add the minute to our datetime...
...to get this:
# A tibble: 25 x 4
datetime temp minute speed
<dttm> <int> <dbl> <int>
1 2019-07-10 09:00:00 21 1 13
2 2019-07-10 09:01:00 21 2 14
3 2019-07-10 09:02:00 21 3 26
4 2019-07-10 09:03:00 21 4 29
5 2019-07-10 09:04:00 21 5 32
6 2019-07-10 09:05:00 21 1 28
7 2019-07-10 09:06:00 21 2 28
8 2019-07-10 09:07:00 21 3 29
9 2019-07-10 09:08:00 21 4 38
10 2019-07-10 09:09:00 21 5 12
# ... with 15 more rows

Some example data and expected output would help a lot. I gave it a shot anyways. You can do this if you simply want a list of all the speeds for every date.
dataset <- read.table(text='datetime temp speed_1 speed_2 speed_3 speed_4 speed_5
"20190710 09:00:00" 21 13 14 26 29 32
"20190710 09:05:00" 21 28 28 29 38 12
"20190710 09:10:00" 20 8 15 29 30 19
"20190711 11:12:00" 18 6 9 18 51 49
"20190711 11:17:00" 17 49 48 48 30 10',header=T)
dataset$datetime <- as.POSIXlt(dataset$datetime,format="%Y%m%d %H:%M:%OS")
lapply(split(dataset,as.Date(dataset$datetime)), function(x) c(t(x[,3:ncol(x)])) )
output:
$`2019-07-10`
[1] 13 14 26 29 32 28 28 29 38 12 8 15 29 30 19
$`2019-07-11`
[1] 6 9 18 51 49 49 48 48 30 10
Edit: Updated answer so that the speeds are in the correct order.

Here is something raw using data.table:
library(data.table)
setDT(df)
df[, time := as.POSIXct(time, format="%Y%m%d %H-%M-%OS")]
out <-
df[, !"temp"
][, melt(.SD, id.vars = "time")
][, time := time + (rleid(variable)-1)*60, time
][order(time), !"variable"]
out
# time value
# 1: 2019-07-10 09:00:00 13
# 2: 2019-07-10 09:01:00 14
# 3: 2019-07-10 09:02:00 26
# 4: 2019-07-10 09:03:00 29
# 5: 2019-07-10 09:04:00 32
# 6: 2019-07-10 09:05:00 28
# 7: 2019-07-10 09:06:00 28
# 8: 2019-07-10 09:07:00 29
# 9: 2019-07-10 09:08:00 38
# 10: 2019-07-10 09:09:00 12
# 11: 2019-07-10 09:10:00 8
# 12: 2019-07-10 09:11:00 15
# 13: 2019-07-10 09:12:00 29
# 14: 2019-07-10 09:13:00 30
# 15: 2019-07-10 09:14:00 19
Data:
df <- data.frame(
time = factor(c("20190710 09-00-00", "20190710 09-05-00", "20190710 09-10-00")),
temp = c(21, 21, 20),
speed_1 = c(13, 28, 8),
speed_2 = c(14, 28, 15),
speed_3 = c(26, 29, 29),
speed_4 = c(29, 38, 30),
speed_5 = c(32, 12, 19)
)

Calculate daily parameters from a dataframe with hourly-values in rows and with several columns of interest

The dataframe df1 summarizes water temperature at different depths (T5m,T15m,T25m,T35m) for every hour (Datetime). As an example of dataframe:
df1<- data.frame(Datetime=c("2016-08-12 12:00:00","2016-08-12 13:00:00","2016-08-12 14:00:00","2016-08-12 15:00:00","2016-08-13 12:00:00","2016-08-13 13:00:00","2016-08-13 14:00:00","2016-08-13 15:00:00"),
T5m= c(10,20,20,10,10,20,20,10),
T15m=c(10,20,10,20,10,20,10,20),
T25m=c(20,20,20,30,20,20,20,30),
T35m=c(20,20,10,10,20,20,10,10))
df1$Datetime<- as.POSIXct(df1$Datetime, format="%Y-%m-%d %H")
df1
Datetime T5m T15m T25m T35m
1 2016-08-12 12:00:00 10 10 20 20
2 2016-08-12 13:00:00 20 20 20 20
3 2016-08-12 14:00:00 20 10 20 10
4 2016-08-12 15:00:00 10 20 30 10
5 2016-08-13 12:00:00 10 10 20 20
6 2016-08-13 13:00:00 20 20 20 20
7 2016-08-13 14:00:00 20 10 20 10
8 2016-08-13 15:00:00 10 20 30 10
I would like to create a new dataframe df2 in which I have the average water temperature per day for either each depth interval and for the whole water column and the standard error estimation. I would expect something like this (I did the calculations by hand so there might be some mistakes):
> df2
Date meanT5m meanT15m meanT25m meanT35m meanTotal seT5m seT15m seT25m seT35m seTotal
1 2016-08-12 15 15 22.5 15 16.875 2.88 2.88 2.5 2.88 1.29
2 2016-08-13 15 15 22.5 15 16.875 2.88 2.88 2.5 2.88 1.29
I am especially interested in knowing how to do it with data.table since I will work with huge data.frames and I think data.table is quite efficient.
For calculating the standard error I know the function std.error() from the package plotrix.

Update based on #chinsoon's comment
First transform your data frame into a data table:
library(data.table)
setDT(df1)
Create a total column:
df1[, total := rowSums(.SD), .SDcols = grep("T[0-9]+m", names(df1))][]
# Datetime T5m T15m T25m T35m total
# 1: 2016-08-12 12:00:00 10 10 20 20 60
# 2: 2016-08-12 13:00:00 20 20 20 20 80
# 3: 2016-08-12 14:00:00 20 10 20 10 60
# 4: 2016-08-12 15:00:00 10 20 30 10 70
# 5: 2016-08-13 12:00:00 10 10 20 20 60
# 6: 2016-08-13 13:00:00 20 20 20 20 80
# 7: 2016-08-13 14:00:00 20 10 20 10 60
# 8: 2016-08-13 15:00:00 10 20 30 10 70
Apply the functions per day:
library(lubridate)
(df3 <- df1[, as.list(unlist(lapply(.SD, function (x)
c(mean = mean(x), sem = sd(x) / sqrt(length(x)))))),
day(Datetime)])
# day T5m.mean T5m.sem T15m.mean T15m.sem T25m.mean T25m.sem T35m.mean
# 1: 12 15 2.886751 15 2.886751 22.5 2.5 15
# 2: 13 15 2.886751 15 2.886751 22.5 2.5 15
# T35m.sem total.mean total.sem
# 1: 2.886751 67.5 4.787136
# 2: 2.886751 67.5 4.787136

Here is one way using dplyr and tidyr calculated in two parts
library(dplyr)
library(tidyr)
df2 <- df1 %>%
mutate(Datetime = as.Date(Datetime)) %>%
gather(key, value, -Datetime) %>%
group_by(Datetime, key) %>%
summarise(se = plotrix::std.error(value),
mean = mean(value)) %>%
gather(total, value, -key, -Datetime)
bind_rows(df2, df2 %>%
group_by(Datetime, total) %>%
summarise(value = sum(value)) %>%
mutate(key = paste("total", c("mean", "se"), sep = "_"))) %>%
unite(key, key, total) %>%
spread(key, value)
# A tibble: 2 x 11
# Groups: Datetime [2]
# Datetime T15m_mean T15m_se T25m_mean T25m_se T35m_mean
# <date> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 2016-08-12 15 2.89 22.5 2.5 15
#2 2016-08-13 15 2.89 22.5 2.5 15
# … with 5 more variables: T35m_se <dbl>, T5m_mean <dbl>,
# T5m_se <dbl>, total_mean_mean <dbl>, total_se_se <dbl>

counting the number of people in the system in R

I have the arrival time and departure time and date of different customers to a system. I want to count the number of people in the system in every 30 min. How can I do this R?
Here are my data

If I understand your question, here's an example with fake data:
library(tidyverse)
library(lubridate)
# Fake data
set.seed(2)
dat = data.frame(id=1:1000, type=rep(c("A","B"), 500),
arrival=as.POSIXct("2013-08-21 05:00:00") + sample(-10000:10000, 1000, replace=TRUE))
dat$departure = dat$arrival + sample(100:5000, 1000, replace=TRUE)
# Times when we want to check how many people are still present
times = seq(round_date(min(dat$arrival), "hour"), ceiling_date(max(dat$departure), "hour"), "30 min")
# Count number of people present at each time
map_df(times, function(x) {
dat %>%
group_by(type) %>%
summarise(Time = x,
Count=sum(arrival < x & departure > x)) %>%
spread(type, Count) %>%
mutate(Total = A + B)
})
Time A B Total
<dttm> <int> <int> <int>
1 2013-08-21 02:00:00 0 0 0
2 2013-08-21 02:30:00 26 31 57
3 2013-08-21 03:00:00 54 53 107
4 2013-08-21 03:30:00 75 81 156
5 2013-08-21 04:00:00 58 63 121
6 2013-08-21 04:30:00 66 58 124
7 2013-08-21 05:00:00 55 60 115
8 2013-08-21 05:30:00 52 63 115
9 2013-08-21 06:00:00 57 62 119
10 2013-08-21 06:30:00 62 51 113
11 2013-08-21 07:00:00 60 67 127
12 2013-08-21 07:30:00 72 54 126
13 2013-08-21 08:00:00 66 46 112
14 2013-08-21 08:30:00 19 12 31
15 2013-08-21 09:00:00 1 2 3
16 2013-08-21 09:30:00 0 0 0
17 2013-08-21 10:00:00 0 0 0

I'm not sure what you mean by counting the number of people "in the system", but I'm assuming you mean "the number of people who have arrived but not yet departed". To do this, you can apply a simple logical condition on the relevant columns of your dataframe, e.g.
logicVec <- df$arrival_time <= dateTimeObj & dateTimeObj < df$departure_time
LogicVec will evidently be a logical vector of TRUEs and FALSEs. Because TRUE == 1 and FALSE == 0, you can then simply use the sum(logicVec) function to get the the total number of people/customers/rows who fulfill the condition written above.
You can then simply repeat this line of code for every dateTimeObj (of class e.g. POSIXct) you want. In your case, it would be every dateTimeObj where each are 30 minutes apart.
I hope this helps.

R: Fill in all elements of sequence of datetime with patchy periodic datetime information

I guess I don't even know really what to 'title' this question as.
But I think this is quite a common data manipulation requirement.
I have data that has a periodic exchange between two parties of a quantity of a good. The exchanges are made hourly. Here is an example data frame:
df <- cbind.data.frame(Seller = as.character(c("A","A","A","A","A","A")),
Buyer = c("B","B","B","C","C","C"),
DateTimeFrom = c("1/07/2013 0:00","1/07/2013 9:00","1/07/2013 0:00","1/07/2013 6:00","1/07/2013 8:00","2/07/2013 9:00"),
DateTimeTo = c("1/07/2013 8:00","1/07/2013 15:00","2/07/2013 8:00","1/07/2013 9:00","1/07/2013 12:00","2/07/2013 16:00"),
Qty = c(50,10,20,25,5,5)
)
df$DateTimeFrom <- as.POSIXct(df$DateTimeFrom, format = '%d/%m/%Y %H:%M', tz = 'GMT')
df$DateTimeTo <- as.POSIXct(df$DateTimeTo, format = '%d/%m/%Y %H:%M', tz = 'GMT')
> df
Seller Buyer DateTimeFrom DateTimeTo Qty
1 A B 2013-07-01 00:00:00 2013-07-01 08:00:00 50
2 A B 2013-07-01 09:00:00 2013-07-01 15:00:00 10
3 A B 2013-07-01 00:00:00 2013-07-02 08:00:00 20
4 A C 2013-07-01 06:00:00 2013-07-01 09:00:00 25
5 A C 2013-07-01 08:00:00 2013-07-01 12:00:00 5
6 A C 2013-07-02 09:00:00 2013-07-02 16:00:00 5
So, for example, the first row of this data frame says that the Seller "A" sells 50 units of the good to the buyer "B" every hour from midnight on 1/7/13 until 8am on 1/7/13. You can also notice that some of these exchanges between the same two parties can overlap, but just with a different negotiated quantity.
What I need to do (and need your help with) is to generate a sequence covering all hours over this two day period that sums the total quantity exchanged in that hour between two sellers over all neogociations.
Here would be the resulting dataframe.
DateTimeSeq <- data.frame(seq(ISOdate(2013,7,1,0),by = "hour", length.out = 48))
colnames(DateTimeSeq) <- c("DateTime")
#What the Answer should be
DateTimeSeq$QtyAB <- c(70,70,70,70,70,70,70,70,70,30,30,30,30,30,30,30,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
DateTimeSeq$QtyAC <- c(0,0,0,0,0,0,25,25,30,30,5,5,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,5,5,5,5,5,5,5,0,0,0,0,0,0,0)
> DateTimeSeq
DateTime QtyAB QtyAC
1 2013-07-01 00:00:00 70 0
2 2013-07-01 01:00:00 70 0
3 2013-07-01 02:00:00 70 0
4 2013-07-01 03:00:00 70 0
5 2013-07-01 04:00:00 70 0
6 2013-07-01 05:00:00 70 0
7 2013-07-01 06:00:00 70 25
8 2013-07-01 07:00:00 70 25
9 2013-07-01 08:00:00 70 30
10 2013-07-01 09:00:00 30 30
11 2013-07-01 10:00:00 30 5
12 2013-07-01 11:00:00 30 5
13 2013-07-01 12:00:00 30 5
14 2013-07-01 13:00:00 30 0
15 2013-07-01 14:00:00 30 0
.... etc
Anybody able to lend a hand?
Thanks,
A

Here is my solution which uses the dplyr and reshape package.
library(dplyr)
library(reshape)
Firstly, we should expand the dataframe so that everything is in an hourly format. This can be done using the do part of dplyr.
df %>% rowwise() %>%
do(data.frame(Seller=.$Seller,
Buyer=.$Buyer,
Qty=.$Qty,
DateTimeCurr=seq(from=.$DateTimeFrom, to=.$DateTimeTo, by="hour")))
Output:
Source: local data frame [66 x 4]
Groups: <by row>
Seller Buyer Qty DateTimeCurr
1 A B 50 2013-07-01 00:00:00
2 A B 50 2013-07-01 01:00:00
3 A B 50 2013-07-01 02:00:00
...
From there it is trivial to get the correct id's and summarise the total using the group_by function.
df1 <- df %>% rowwise() %>%
do(data.frame(Seller=.$Seller,
Buyer=.$Buyer,
Qty=.$Qty,
DateTimeCurr=seq(from=.$DateTimeFrom, to=.$DateTimeTo, by="hour"))) %>%
group_by(Seller, Buyer, DateTimeCurr) %>%
summarise(TotalQty=sum(Qty)) %>%
mutate(id=paste0("Qty", Seller, Buyer))
Output:
Source: local data frame [48 x 5]
Groups: Seller, Buyer
Seller Buyer DateTimeCurr TotalQty id
1 A B 2013-07-01 00:00:00 70 QtyAB
2 A B 2013-07-01 01:00:00 70 QtyAB
3 A B 2013-07-01 02:00:00 70 QtyAB
From this dataframe, all we have to do is cast it into the format you have above.
> cast(df1, DateTimeCurr~ id, value="TotalQty")
DateTimeCurr QtyAB QtyAC
1 2013-07-01 00:00:00 70 NA
2 2013-07-01 01:00:00 70 NA
3 2013-07-01 02:00:00 70 NA
4 2013-07-01 03:00:00 70 NA
5 2013-07-01 04:00:00 70 NA
6 2013-07-01 05:00:00 70 NA
So the whole piece of code
df1 <- df %>% rowwise() %>%
do(data.frame(Seller=.$Seller,
Buyer=.$Buyer,
Qty=.$Qty,
DateTimeCurr=seq(from=.$DateTimeFrom, to=.$DateTimeTo, by="hour"))) %>%
group_by(Seller, Buyer, DateTimeCurr) %>%
summarise(TotalQty=sum(Qty)) %>%
mutate(id=paste0("Qty", Seller, Buyer))
cast(df1, DateTimeCurr~ id, value="TotalQty")

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Dataframe to tidy format in R - r

Related

Splitting a dateTime vector if time is greater than x between vector components

Analyzing data in order of column and then row in R

Calculate daily parameters from a dataframe with hourly-values in rows and with several columns of interest

counting the number of people in the system in R

R: Fill in all elements of sequence of datetime with patchy periodic datetime information

Categories

Resources