I can't understand why my code is providing an undesired ouput since I've tried this in the past with similar datasets and good results.
Below are the two dataframes I would like to left_join():
> head(datagps)
Date & Time [Local] Latitude Longitude DateTime meters
1: 06/11/2018 08:44 -2.434986 34.85387 2018-11-06 08:44:00 1.920190
2: 06/11/2018 08:48 -2.434993 34.85386 2018-11-06 08:48:00 3.543173
3: 06/11/2018 08:52 -2.435014 34.85388 2018-11-06 08:52:00 1.002979
4: 06/11/2018 08:56 -2.435011 34.85389 2018-11-06 08:56:00 3.788024
5: 06/11/2018 09:00 -2.434986 34.85387 2018-11-06 09:00:00 1.262584
6: 06/11/2018 09:04 -2.434994 34.85386 2018-11-06 09:04:00 3.012679
> head(datasensorraw)
# A tibble: 6 x 4
TimeGroup x y z
<dttm> <int> <int> <dbl>
1 2000-01-01 00:04:00 0 0 0
2 2000-01-01 00:08:00 1 0 1
3 2000-01-01 00:12:00 0 0 0
4 2000-01-01 00:20:00 0 0 0
5 2000-01-01 00:24:00 0 0 0
6 2018-06-09 05:04:00 4 14 14.6
And below is my code. There are no Errors, but for some reason I get NA's under x, y and z. This should not happen since there are registered values in the datasensorraw dataframe for those time stamps:
> library(dplyr)
> dataresults<-datagps %>%
+ mutate(`Date & Time [Local]` = as.POSIXct(`Date & Time [Local]`,
+ format = "%d/%m/%Y %H:%M")) %>%
+ left_join(datasensorraw, by = c("Date & Time [Local]" = "TimeGroup"))
> #Left join the data frames
> head(dataresults)
Date & Time [Local] Latitude Longitude DateTime meters x y z
1 2018-11-06 07:44:00 -2.434986 34.85387 2018-11-06 08:44:00 1.920190 NA NA NA
2 2018-11-06 07:48:00 -2.434993 34.85386 2018-11-06 08:48:00 3.543173 NA NA NA
3 2018-11-06 07:52:00 -2.435014 34.85388 2018-11-06 08:52:00 1.002979 NA NA NA
4 2018-11-06 07:56:00 -2.435011 34.85389 2018-11-06 08:56:00 3.788024 NA NA NA
5 2018-11-06 08:00:00 -2.434986 34.85387 2018-11-06 09:00:00 1.262584 NA NA NA
6 2018-11-06 08:04:00 -2.434994 34.85386 2018-11-06 09:04:00 3.012679 NA NA NA
I can also upload a small dput() sample of datagps and datasensorraw.
I am learning R so I'm wondering if I'm doing something wrong. I shouldn't get NAs under those columns as you can see on the dput() samples provided. Any input is appreciated!
Looks like a mixup on your date format. Try to switch format = "%d/%m/%Y %H:%M" to format = "%m/%d/%Y %H:%M" or switch it to d/m/y in your other dataset.
dataresults<- datagps_sample %>%
mutate(`Date & Time [Local]` = as.POSIXct(`Date & Time [Local]`, format = "%m/%d/%Y %H:%M")) %>%
left_join(datasensorraw_sample, by = c("Date & Time [Local]" = "TimeGroup"))
> head(dataresults)
Date & Time [Local] Latitude Longitude DateTime meters x y z
1 2018-06-11 12:44:00 -2.434986 34.85387 2018-11-06 08:44:00 1.920190 17 12 21.59363
2 2018-06-11 12:48:00 -2.434993 34.85386 2018-11-06 08:48:00 3.543173 6 0 6.00000
3 2018-06-11 12:52:00 -2.435014 34.85388 2018-11-06 08:52:00 1.002979 47 25 53.24351
4 2018-06-11 12:56:00 -2.435011 34.85389 2018-11-06 08:56:00 3.788024 0 0 0.00000
5 2018-06-11 13:00:00 -2.434986 34.85387 2018-11-06 09:00:00 1.262584 48 53 72.23108
6 2018-06-11 13:04:00 -2.434994 34.85386 2018-11-06 09:04:00 3.012679 139 113 179.24589
EDIT: basically, left_join was not finding any matches and it was returning the rows from your original dataframe but with the new columns as NA. If you format your column before left joining you could check if there are common id's with something simple like datagps$Date & Time [Local] %in% datasensorraw$TimeGroup.
I'm borrowing the reproducible example given here:
Aggregate daily level data to weekly level in R
since it's pretty much close to what I want to do.
Interval value
1 2012-06-10 552
2 2012-06-11 4850
3 2012-06-12 4642
4 2012-06-13 4132
5 2012-06-14 4190
6 2012-06-15 4186
7 2012-06-16 1139
8 2012-06-17 490
9 2012-06-18 5156
10 2012-06-19 4430
11 2012-06-20 4447
12 2012-06-21 4256
13 2012-06-22 3856
14 2012-06-23 1163
15 2012-06-24 564
16 2012-06-25 4866
17 2012-06-26 4421
18 2012-06-27 4206
19 2012-06-28 4272
20 2012-06-29 3993
21 2012-06-30 1211
22 2012-07-01 698
23 2012-07-02 5770
24 2012-07-03 5103
25 2012-07-04 775
26 2012-07-05 5140
27 2012-07-06 4868
28 2012-07-07 1225
29 2012-07-08 671
30 2012-07-09 5726
31 2012-07-10 5176
In his question, he asks to aggregate on weekly intervals, what I'd like to do is aggregate on a "day of the week basis".
So I'd like to have a table similar to that one, adding the values of all the same day of the week:
Day of the week value
1 "Sunday" 60000
2 "Monday" 50000
3 "Tuesday" 60000
4 "Wednesday" 50000
5 "Thursday" 60000
6 "Friday" 50000
7 "Saturday" 60000
You can try:
aggregate(d$value, list(weekdays(as.Date(d$Interval))), sum)
We can group them by weekly intervals using weekdays :
library(dplyr)
df %>%
group_by(Day_Of_The_Week = weekdays(as.Date(Interval))) %>%
summarise(value = sum(value))
# Day_Of_The_Week value
# <chr> <int>
#1 Friday 16903
#2 Monday 26368
#3 Saturday 4738
#4 Sunday 2975
#5 Thursday 17858
#6 Tuesday 23772
#7 Wednesday 13560
We can do this with data.table
library(data.table)
setDT(df1)[, .(value = sum(value)), .(Dayofweek = weekdays(as.Date(Interval)))]
# Dayofweek value
#1: Sunday 2975
#2: Monday 26368
#3: Tuesday 23772
#4: Wednesday 13560
#5: Thursday 17858
#6: Friday 16903
#7: Saturday 4738
using lubridate https://cran.r-project.org/web/packages/lubridate/vignettes/lubridate.html
df1$Weekday=wday(arrive,label=TRUE)
library(data.table)
df1=data.table(df1)
df1[,sum(value),Weekday]
I have a data frame like below, where the error 1 is present if there is a error in DOB and after that the corrected DOB for the same record with no error. I want to extract only the data which are not corrected and having the error 1. Could anyone out there help me on this.
ID Date1 Date2 DOB Code Error
381 2002-10-01 2015-10-01 1967-01-22 4 1
381 2002-10-01 2015-10-01 1967-01-20 4 NA
381 2011-10-01 2015-10-01 1969-05-13 11 1
381 2011-10-01 2015-10-01 1968-05-13 11 NA
837 2005-12-07 2015-12-07 1987-11-19 8 1
837 2005-12-08 2015-12-08 1989-12-07 8 1
837 2001-04-15 2015-04-15 1984-08-11 18 1
840 2001-04-23 2015-04-23 1999-03-14 18 NA
The output table will have the details below.
ID Date1 Date2 DOB Code Error
837 2005-12-07 2015-12-07 1987-11-19 8 1
837 2005-12-08 2015-12-08 1989-12-07 8 1
837 2001-04-15 2015-04-15 1984-08-11 18 1
I have an R data.frame containing one value for every quarter of hour
Date A B
1 2015-11-02 00:00:00 0 0 //day start
2 2015-11-02 00:15:00 0 0
3 2015-11-02 00:30:00 0 0
4 2015-11-02 00:45:00 0 0
...
96 2015-11-02 23:45:00 0 0 //day end
97 2015-11-03 00:00:00 0 0 //new day
...
6 2016-03-23 01:15:00 0 0 //last record
I use xts to construct a time series
xtsA <- xts(data$A,data$Date)
by using apply.daily I get the result I expect
apply.daily(xtsA, sum)
Date A
1 2015-11-02 23:45:00 400
2 2015-11-03 23:45:00 400
3 2015-11-04 23:45:00 500
but apply.weekly seems to use Monday as last day of the week
Date A
19 2016-03-07 00:45:00 6500 //Monday
20 2016-03-14 00:45:00 5500 //Monday
21 2016-03-21 00:45:00 5000 //Monday
and I do not understand why it uses 00:45:00. Does anyone know?
Data is imported from CSV file the Date column looks like this:
data <- read.csv("...", header=TRUE)
Date A
1 151102 0000 0
...
The error is in the date time interpretation and using
data$Date <- as.POSIXct(strptime(data$Date, "%y%m%d %H%M"), tz = "GMT")
solves it, and now apply.weekly returns
Date A
1 2015-11-08 23:45:00 3500 //Sunday
2 2015-11-15 23:45:00 4000 //Sunday
...
I have a data set that has dates and times for in and out. Each line is an in and out set, but some are blank. I can remove the blanks with na.omit and a nice read in (it was a csv, and na.strings=c("") works on the read.csv).
Of course, because the real world is never like the tutorial, some of the times are only dates, so my as.POSIXlt(Dataset$In,format="%m/%d/%Y %H:%M") returns NA on the "only date no time"s.
na.omit does not remove these lines. so the questions are 2
Why doesn't na.omit work, or how can I get it to work?
Better, How can I convert one column into both Dates and Times (in the posix format) without 2 calls or with some sort of optional parameter in the format string? (or is this even possible?).
This is a sample of the dates and times. I can't share the real file, 1 it's huge, 2 it's PII.
Id,In,Out
1,8/15/2015 8:00,8/15/2015 17:00
1,8/16/2015 8:04,8/16/2015
1,8/17/2015 8:50,8/17/2015 18:00
1,8/18/2015,8/18/2015 17:00
2,8/15/2015,8/15/2015 13:00
2,8/16/2015 8:00,8/16/2015 17:00
3,8/15/2015 4:00,8/15/2015 11:00
3,8/16/2015 9:00,8/16/2015 19:00
3,8/17/2015,8/17/2015 17:00
3,,
4,,
4,8/16/2015 6:00,8/16/2015 20:00
DF <- read.table(text = "Id,In,Out
1,8/15/2015 8:00,8/15/2015 17:00
1,8/16/2015 8:04,8/16/2015
1,8/17/2015 8:50,8/17/2015 18:00
1,8/18/2015,8/18/2015 17:00
2,8/15/2015,8/15/2015 13:00
2,8/16/2015 8:00,8/16/2015 17:00
3,8/15/2015 4:00,8/15/2015 11:00
3,8/16/2015 9:00,8/16/2015 19:00
3,8/17/2015,8/17/2015 17:00", header = TRUE, sep = ",",
stringsAsFactors = FALSE) #set this option during import
DF$In[nchar(DF$In) < 13] <- paste(DF$In[nchar(DF$In) < 13], "0:00")
DF$Out[nchar(DF$Out) < 13] <- paste(DF$Out[nchar(DF$Out) < 13], "0:00")
DF$In <- as.POSIXct(DF$In, format = "%m/%d/%Y %H:%M", tz = "GMT")
DF$Out <- as.POSIXct(DF$Out, format = "%m/%d/%Y %H:%M", tz = "GMT")
# Id In Out
#1 1 2015-08-15 08:00:00 2015-08-15 17:00:00
#2 1 2015-08-16 08:04:00 2015-08-16 00:00:00
#3 1 2015-08-17 08:50:00 2015-08-17 18:00:00
#4 1 2015-08-18 00:00:00 2015-08-18 17:00:00
#5 2 2015-08-15 00:00:00 2015-08-15 13:00:00
#6 2 2015-08-16 08:00:00 2015-08-16 17:00:00
#7 3 2015-08-15 04:00:00 2015-08-15 11:00:00
#8 3 2015-08-16 09:00:00 2015-08-16 19:00:00
#9 3 2015-08-17 00:00:00 2015-08-17 17:00:00
na.omit doesn't work with POSIXlt objects because it is documented to "handle vectors, matrices and data frames comprising vectors and matrices (only)." (see help("na.omit")). And in the strict sense, POSIXlt objects are not vectors:
unclass(as.POSIXlt(DF$In))
#$sec
#[1] 0 0 0 0 0 0 0 0 0
#
#$min
#[1] 0 4 50 0 0 0 0 0 0
#
#$hour
#[1] 8 8 8 0 0 8 4 9 0
#
#$mday
#[1] 15 16 17 18 15 16 15 16 17
#
#$mon
#[1] 7 7 7 7 7 7 7 7 7
#
#$year
#[1] 115 115 115 115 115 115 115 115 115
#
#$wday
#[1] 6 0 1 2 6 0 6 0 1
#
#$yday
#[1] 226 227 228 229 226 227 226 227 228
#
#$isdst
#[1] 0 0 0 0 0 0 0 0 0
#
#attr(,"tzone")
#[1] "GMT"
There is hardly any reason to prefer POSIXlt over POSIXct (which is an integer giving the number of seconds since the origin internally and thus needs less memory).
You've been given a couple of strategies that bring these character values in and process "in-place". I almost never use as.POSIXlt since there are so many pitfalls in dealing with the list-in-list structures that it returns, especially considering its effective incompatibility with dataframes. Here's a method that does the testing and coercion at the read.-level by defining an as-method:
setOldClass("inTime", prototype="POSIXct")
setAs("character", "inTime",
function(from) structure( ifelse( is.na(as.POSIXct(from, format="%m/%d/%Y %H:%M") ),
as.POSIXct(from, format="%m/%d/%Y") ,
as.POSIXct(from, format="%m/%d/%Y %H:%M") ),
class="POSIXct" ) )
read.csv(text=txt, colClasses=c("numeric", 'inTime','inTime') )
Id In Out
1 1 2015-08-15 08:00:00 2015-08-15 17:00:00
2 1 2015-08-16 08:04:00 2015-08-16 00:00:00
3 1 2015-08-17 08:50:00 2015-08-17 18:00:00
4 1 2015-08-18 00:00:00 2015-08-18 17:00:00
5 2 2015-08-15 00:00:00 2015-08-15 13:00:00
6 2 2015-08-16 08:00:00 2015-08-16 17:00:00
7 3 2015-08-15 04:00:00 2015-08-15 11:00:00
8 3 2015-08-16 09:00:00 2015-08-16 19:00:00
9 3 2015-08-17 00:00:00 2015-08-17 17:00:00
The structure "envelope" is needed because of the rather strange behavior of ifelse, which otherwise would return a numeric object rather than an object of class-'POSIXct'.