I have one text file that look like:
wd <- read.table("C:\\Users\\value.txt", sep ='' , header =TRUE)
head(wd) # hourly values
# Year day hour mint valu1
# 1 2002 1 7 30 0.5
# 2 2002 1 8 0 0.3
# 3 2002 1 8 30 0.4
I want to add another column with format od date like this:
"2002-01-01 07:30:00 UTC"
Thanks for your help
Try this. No packages are used:
transform(wd,
Date = as.POSIXct(paste(Year, day, hour, mint), format = "%Y %j %H %M", tz = "UTC")
)
## Year day hour mint valu1 Date
## 1 2002 1 7 30 0.5 2002-01-01 07:30:00
## 2 2002 1 8 0 0.3 2002-01-01 08:00:00
## 3 2002 1 8 30 0.4 2002-01-01 08:30:00
Note: Input is:
wd <- structure(list(Year = c(2002L, 2002L, 2002L), day = c(1L, 1L,
1L), hour = c(7L, 8L, 8L), mint = c(30L, 0L, 30L), valu1 = c(0.5,
0.3, 0.4)), .Names = c("Year", "day", "hour", "mint", "valu1"
), class = "data.frame", row.names = c(NA, -3L))
You might be able to simplify things with a package like lubridate but I think to illustrate the solution this will work for you. Next time it would save time for people answering if you provide code to create the sample data like I've done here.
d <- read.table(header=T, stringsAsFactors=F, text="
Year day hour mint valu1
2002 1 7 30 0.5
2002 1 8 0 0.3
2002 1 8 30 0.4
")
require(stringr)
d$datetime <- strptime(
paste0(
d$Year, "-",
str_pad(d$day,3,pad="0"),
str_pad(d$hour,2,pad="0"),
":",
str_pad(d$mint, 2, pad="0")
),
format="%Y-%j %H:%M"
)
Related
Objective:
I have a dataset, df, that I wish to first tally up the number of occurrences for each date and then multiply the output by a certain number.
Sent Duration Length
1/7/2020 8:11:00 PM 34 216
1/22/2020 7:51:05 AM 432 111
1/7/2020 1:35:08 AM 57 90
1/22/2020 3:43:26 AM 22 212
1/22/2020 4:00:00 AM 55 500
Desired Outcome:
Date Count Aggregation(80)
1/7/2020 2 160
1/22/2020 3 240
I wish to count the number of times a particular 'datetime' occurs and then multiply this outcome by 80. The date, 1/7/2020 occurs twice, and the date of 1/22/2020, occurs three times. I am then multiplying this number count by the number 80.
The dput is:
structure(list(Sent = structure(c(5L, 3L, 4L, 1L, 2L), .Label = c("1/22/2020 3:43:26 AM",
"1/22/2020 4:00:00 AM", "1/22/2020 7:51:05 PM", "1/7/2020 1:35:08 AM",
"1/7/2020 8:11:00 PM"), class = "factor"), Duration = c(34L,
432L, 57L, 22L, 55L), length = c(216L, 111L, 90L, 212L, 500L)), class = "data.frame", row.names = c(NA,
-5L))
This is what I have tried:
df1<- aggregate(df$Sent, by=list(Category= df$dSent),
FUN=length)
However, I need to output the frequency that the dates occurs along with the aggregation (multiply by 80)
Any suggestions are welcome.
We can convert Sent to POSIXct format and extract the date, count the number of rows in each date and multiply it by 80. Using dplyr, we can do it as :
library(dplyr)
df %>%
group_by(Date = as.Date(lubridate::mdy_hms(Sent))) %>%
summarise(Count = n(), `Aggregation(80)` = Count * 80)
# Date Count `Aggregation(80)`
# <date> <int> <dbl>
#1 2020-01-07 2 160
#2 2020-01-22 3 240
Using table.
as.data.frame(cbind(Count=(r <- table(as.Date(df$Sent, format="%m/%d/%Y %H:%M:%S"))),
Agg=r*80))
# Count Agg
# 2020-01-07 2 160
# 2020-01-22 3 240
or
`rownames<-`(as.data.frame(cbind(Count=(r <- table(as.Date(df$Sent, format="%m/%d/%Y %H:%M:%S"))),
Agg=r*80, Date=names(r)))[c(3, 1:2)], NULL)
# Date Count Agg
# 1 2020-01-07 2 160
# 2 2020-01-22 3 240
Here is the data.table way of things..
code
library( data.table )
#set data as data.table
setDT(mydata)
#set timestamps as posix
mydata[, Sent := as.POSIXct( Sent, format = "%m/%d/%Y %H:%M:%S %p" ) ]
#summarise
mydata[, .(Count = .N, Aggregation = .N * 80), by = .(Date = as.Date(Sent) )]
output
# Date Count Aggregation
# 1: 2020-01-07 2 160
# 2: 2020-01-22 3 240
I am struggling hard with date time formatting in R. I am sure this is an easy fix... can someone write me a line of code that will convert all values from Year, M, D, Time into a new column "datetime"?
What data looks like:
x year m d time
A 2019 2 23 11:12 PM
B 2019 1 31 2:04 PM
C 2018 12 31 12:01 AM
D 2017 2 1 10:14 AM
What I want:
x datetime
A 2/23/19 11:12 PM
B 1/31/19 11:12 PM
C 12/31/18 12:01 AM
D 2/23/17 10:14 PM
Since it's a datetime value we can convert it into a standard format by pasting the values together.
df$datetime <- with(df, as.POSIXct(paste(year, m, d, time),
format = "%Y %m %d %I:%M %p", tz = "UTC"))
df
# x year m d time datetime
#1 A 2019 2 23 11:12PM 2019-02-23 23:12:00
#2 B 2019 1 31 2:04PM 2019-01-31 14:04:00
#3 C 2018 12 31 12:01AM 2018-12-31 00:01:00
#4 D 2017 2 1 10:14AM 2017-02-01 10:14:00
Or using lubridate
library(dplyr)
library(lubridate)
df %>% mutate(datetime = ymd_hm(paste(year, m, d, time)))
data
df <- structure(list(x = structure(1:4, .Label = c("A", "B", "C", "D"
), class = "factor"), year = c(2019L, 2019L, 2018L, 2017L), m = c(2L,
1L, 12L, 2L), d = c(23L, 31L, 31L, 1L), time = c("11:12 PM",
"2:04 PM", "12:01 AM", "10:14 AM")), row.names = c(NA, -4L), class = "data.frame")
I think the below should work for your goal:
df <- data.frame(datetime = apply(df,1, function(v) sprintf("%s/%s/%s %s",v["d"], v["m"], v["year"], v["time"])))
If you want to append the new column to the existing data.frame df, then use:
df$datetime <- apply(df,1, function(v) sprintf("%s/%s/%s %s",v["d"], v["m"], v["year"], v["time"]))
I have a dataset containing variables and a quantity of goods sold: for some days, however, there are no values.
I created a dataset with all 0 values in sales and all NA in the rest. How can I add those lines to the initial dataset?
At the moment, I have this:
sales
day month year employees holiday sales
1 1 2018 14 0 1058
2 1 2018 25 1 2174
4 1 2018 11 0 987
sales.NA
day month year employees holiday sales
1 1 2018 NA NA 0
2 1 2018 NA NA 0
3 1 2018 NA NA 0
4 1 2018 NA NA 0
I would like to create a new dataset, inserting the days where I have no observations, value 0 to sales, and NA on all other variables. Like this
new.data
day month year employees holiday sales
1 1 2018 14 0 1058
2 1 2018 25 1 2174
3 1 2018 NA NA 0
4 1 2018 11 0 987
I tried used something like this
merge(sales.NA,sales, all.y=T, by = c("day","month","year"))
But it does not work
Using dplyr, you could use a "right_join". For example:
sales <- data.frame(day = c(1,2,4),
month = c(1,1,1),
year = c(2018, 2018, 2018),
employees = c(14, 25, 11),
holiday = c(0,1,0),
sales = c(1058, 2174, 987)
)
sales.NA <- data.frame(day = c(1,2,3,4),
month = c(1,1,1,1),
year = c(2018,2018,2018, 2018)
)
right_join(sales, sales.NA)
This leaves you with
day month year employees holiday sales
1 1 1 2018 14 0 1058
2 2 1 2018 25 1 2174
3 3 1 2018 NA NA NA
4 4 1 2018 11 0 987
This leaves NA in sales where you want 0, but that could be fixed by including the sales data in sales.NA, or you could use "tidyr"
right_join(sales, sales.NA) %>% mutate(sales = replace_na(sales, 0))
Here is another data.table solution:
jvars = c("day","month","year")
merge(sales.NA[, ..jvars], sales, by = jvars, all.x = TRUE)[is.na(sales), sales := 0L][]
day month year employees holiday sales
1: 1 1 2018 14 0 1058
2: 2 1 2018 25 1 2174
3: 3 1 2018 NA NA 0
4: 4 1 2018 11 0 987
Or with some neater syntax:
sales[sales.NA[, ..jvars], on = jvars][is.na(sales), sales := 0][]
Reproducible data:
sales <- structure(list(day = c(1L, 2L, 4L), month = c(1L, 1L, 1L), year = c(2018L,
2018L, 2018L), employees = c(14L, 25L, 11L), holiday = c(0L,
1L, 0L), sales = c(1058L, 2174L, 987L)), row.names = c(NA, -3L
), class = c("data.table", "data.frame"))
sales.NA <- structure(list(day = 1:4, month = c(1L, 1L, 1L, 1L), year = c(2018L,
2018L, 2018L, 2018L), employees = c(NA, NA, NA, NA), holiday = c(NA,
NA, NA, NA), sales = c(0L, 0L, 0L, 0L)), row.names = c(NA, -4L
), class = c("data.table", "data.frame"))
That's an answer using the data.table package, since I am more familiar with the syntax, but regular data.frames should work pretty much the same. I also would switch to a proper date format, which will make life easier for you down the line.
Actually, in this way you would not need the Sales.NA table, since it would automatically be solved by all days which have NAs after the first join.
library(data.table)
dt.dates <- data.table(Date = seq.Date(from = as.Date("2018-01-01"), to = as.Date("2018-12-31"),by = "day" ))
dt.sales <- data.table(day = c(1,2,4)
, month = c(1,1,1)
, year = c(2018,2018,2018)
, employees = c(14, 25, 11)
, holiday = c(0,1,0)
, sales = c(1058, 2174, 987)
)
dt.sales[, Date := as.Date(paste(year,month,day, sep = "-")) ]
merge( x = dt.dates
, y = dt.sales
, by.x = "Date"
, by.y = "Date"
, all.x = TRUE
)
> Date day month year employees holiday sales
1: 2018-01-01 1 1 2018 14 0 1058
2: 2018-01-02 2 1 2018 25 1 2174
3: 2018-01-03 NA NA NA NA NA NA
4: 2018-01-04 4 1 2018 11 0 987
...
Here is my sample dataset
id hour
1 15:10
2 12:10
3 22:10
4 06:30
I need to find out the earliest time and latest time. The class of the hour is factor. So I need to convert factor to an appropriate class, and compare the earlier and later time. I tried to format the hour using the code below, but it did not work out as expected
format(as.Date(date),"%H:%M")
Use times of chron package
#Data
xx
# id hour
#1 1 15:10
#2 2 12:10
#3 3 22:10
#4 4 06:30
library(chron)
xx$hour = times(paste0(as.character(xx$hour), ":00"))
xx
# id hour
#1 1 15:10:00
#2 2 12:10:00
#3 3 22:10:00
#4 4 06:30:00
#Min and Max
range(xx$hour)
#[1] 06:30:00 22:10:00
xx = structure(list(id = 1:4, hour = structure(c(3L, 2L, 4L, 1L), .Label = c("06:30",
"12:10", "15:10", "22:10"), class = "factor")), .Names = c("id",
"hour"), row.names = c(NA, -4L), class = "data.frame")
If all you need is to find earliest (min) and latest (max) times, you can just convert the times to a character and use min, max: e.g.,
hour <- c("15:10", "12:10", "22:10", "06:30")
hour[which(hour == max(hour))]
> "22:10"
I have a dataframe that looks like this:
id time value
01 2014-02-26 13:00:00 6
02 2014-02-26 15:00:00 6
01 2014-02-26 18:00:00 6
04 2014-02-26 21:00:00 7
02 2014-02-27 09:00:00 6
03 2014-02-27 12:00:00 6
The dataframe consists of a mood score at different time stamps throughout the day of multiple patients.
I want the dataframe to become like this:
id 2014-02-26 2014-02-27
01 6.25 4.32
02 5.39 8.12
03 9.23 3.18
04 5.76 3.95
With on each row a patient and in each the column the daily mean of all the days in the dataframe. If there is no mood score on a specific date from a patient, I want the value to be NA.
What is the easiest way to do so using functions like ddply, or from other packages?
df <- structure(list(id = c(1L, 2L, 1L, 4L, 2L, 3L), time = structure(c(1393437600,
1393444800, 1393455600, 1393466400, 1393509600, 1393520400), class = c("POSIXct",
"POSIXt"), tzone = ""), value = c(6L, 6L, 6L, 7L, 6L, 6L)), .Names = c("id",
"time", "value"), row.names = c(NA, -6L), class = "data.frame")
Based on your description, this seems to be what you need,
library(tidyverse)
df1 %>%
group_by(id, time1 = format(time, '%Y-%m-%d')) %>%
summarise(new = mean(value)) %>%
spread(time1, new)
#Source: local data frame [4 x 3]
#Groups: id [4]
# id `2014-02-26` `2014-02-27`
#* <int> <dbl> <dbl>
#1 1 6 NA
#2 2 6 6
#3 3 NA 6
#4 4 7 NA
In base R, you could combine aggregate with reshape like this:
# get means by id-date
temp <- setNames(aggregate(value ~ id + format(time, "%y-%m-%d"), data=df, FUN=mean),
c("id", "time", "value"))
# reshape to get dates as columns
reshape(temp, direction="wide", idvar="id", timevar="time")
id value.14-02-26 value.14-02-27
1 1 6 NA
2 2 6 6
3 4 7 NA
5 3 NA 6
I'd reccomend using the data.table package, the approach then is very similar to Sotos' tidiverse solution.
library(data.table)
df <- data.table(df)
df[, time1 := format(time, '%Y-%m-%d')]
aggregated <- df[, list(meanvalue = mean(value)), by=c("id", "time1")]
aggregated <- dcast.data.table(aggregated, id~time1, value.var="meanvalue")
aggregated
# id 2014-02-26 2014-02-27
# 1: 1 6 NA
# 2: 2 6 6
# 3: 3 NA 6
# 4: 4 NA 7
(I think my result differs, because my System runs on another timezone, I imported the datetime objects as UTC.)