New R user here - I have many .csv files containing time stamps (date_time) in one column and temperature readings in the other. I am trying to write a function that detects the date_time format, and then changes it to a different format. Because of the way the data was collected, the date/time format is different in some of the .csv files. I want the function to change the date_time for all files to the same format.
Date_time format I want: %m/%d/%y %H:%M:%S
Date_time format I want changed to above: "%y-%m-%d %H:%M:%S"
> head(file1data)
x date_time temperature coupler_d coupler_a host_con stopped EoF
1 1 18-07-10 09:00:00 41.137 Logged
2 2 18-07-10 09:15:00 41.322
3 3 18-07-10 09:30:00 41.554
4 4 18-07-10 09:45:00 41.832
5 5 18-07-10 10:00:00 42.156
6 6 18-07-10 10:15:00 42.755
> head(file2data)
x date_time temperature coupler_d coupler_a host_con stopped EoF
1 1 07/10/18 01:00:00 PM 8.070 Logged
2 2 07/10/18 01:15:00 PM 8.095
3 3 07/10/18 01:30:00 PM 8.120
4 4 07/10/18 01:45:00 PM 8.120
5 5 07/10/18 02:00:00 PM 8.020
6 6 07/10/18 02:15:00 PM 7.795
file2data is in the correct format. file1data is incorrect.
I have tried using logicals to detect and replace the date format e.g.,
file1data %>%
if(str_match_all(date_time,"([0-9][0-9]{2})[-.])")){
format(as.POSIXct(date_time,format="%y-%m-%d %H:%M:%S"),"%m/%d/%y %H:%M:%S")
}else{format(date_time,"%m/%d/%y %H:%M:%S")}
but this has not worked, I get the following errors:
Error in if (.) str_match_all(date_time, "([0-9][0-9]{2})[-.])") else { :
argument is not interpretable as logical
In addition: Warning message:
In if (.) str_match_all(date_time, "([0-9][0-9]{2})[-.])") else { :
the condition has length > 1 and only the first element will be used
Any ideas?
Related
I have below dataframe (df) from ENTSO-E showing German power prices. I created the "Hour" column with lubridate function hour(df$date). Output was a range (1,2,..,23,0)
# to replace 0 with 24
df["Hour"][df["Hour"]=="0"]<- "24"
I will need to work on an hourly basis. So I filtered each hour from 1 till 24, but I cannot filter the replaced hour: H24.
H1 <- df %>%
filter(Hour==1)
H24 <- df %>%
filter(Hour==24)
Error in match.fun(FUN) : object 'Hour' not found
24 values are still in Hour col, and class is numeric but I cannot do any calculation with the Hour column.
class(df$Hour)
[1] "numeric"
mean(german_last_4$Hour)
[1] NA
I am thinking problem is with replace function. is there any other way to produce a result that works with H24?
date
price
Hour
2019-01-01 01:00:00
28.32
1
2019-01-01 02:00:00
10.07
2
2019-01-01 03:00:00
-4.08
3
2019-01-01 04:00:00
-9.91
4
2019-01-01 05:00:00
-7.41
5
2019-01-01 06:00:00
-12.55
6
I am trying to use the RchivalTag package in R and I am having issues with the date and time. Whenever I try to use read_histos() it gives me the following error:
hist_dat_1<- read_histos(Georgia)
Warning: 598 parsing failures.
row col expected actual
1 -- date like %H:%M:%S %d-%b-%Y 2009-02-23 00:00:00
2 -- date like %H:%M:%S %d-%b-%Y 2009-02-23 06:00:00
3 -- date like %H:%M:%S %d-%b-%Y 2009-02-23 12:00:00
4 -- date like %H:%M:%S %d-%b-%Y 2009-02-23 18:00:00
5 -- date like %H:%M:%S %d-%b-%Y 2009-02-24 00:00:00
... ... ........................... ...................
See problems(...) for more details.
Error in .fact2datetime(as.character(add0$Date), date_format = date_format, :
date concersion failed! Please revise current'date_format': %H:%M:%S %d-%b-%Y to 2009-02-23 00:00:00
Current date/time class:
class(Georgia$Date)
#[1] "POSIXct" "POSIXt"
I have changed the date and time format to:
format(Georgia$Date, format = "%H:%M:%S %d-%b-%Y")
But I am still getting the error message. If anyone is familiar with this package, help would be greatly appreciated
This is a difficult one without reproducible data, but we can actually replicate your error using one of the package's built-in data sets:
library(RchivalTag)
hist_file <- system.file("example_files/67851-12h-Histos.csv",package="RchivalTag")
Georgia <- read.csv(hist_file)
This reads the histogram data in as a data frame, which seems to be the stage you are at. We will, however, change the format to POSIXct standard date-time representation.
Georgia$Date <- strptime(Georgia$Date, "%H:%M:%S %d-%b-%Y")
Georgia$Date <- as.character(Georgia$Date)
Now when we try to read the data we get:
read_histos(Georgia)
#> Warning: 34 parsing failures.
#> row col expected actual
#> 1 -- date like %H:%M:%S %d-%b-%Y 2012-02-01 08:10:00
#> 2 -- date like %H:%M:%S %d-%b-%Y 2012-02-01 12:00:00
#> 3 -- date like %H:%M:%S %d-%b-%Y 2012-02-02 00:00:00
#> 4 -- date like %H:%M:%S %d-%b-%Y 2012-02-02 12:00:00
#> 5 -- date like %H:%M:%S %d-%b-%Y 2012-02-03 00:00:00
#> ... ... ........................... ...................
#> See problems(...) for more details.
#>
#> Error in .fact2datetime(as.character(add0$Date), date_format = date_format, :
#> date concersion failed! Please revise current'date_format': %H:%M:%S %d-%b-%Y to 2012-02-01 08:10:00
To fix this, we can rewrite the Date column using strftime:
Georgia$Date <- strftime(Georgia$Date, "%H:%M:%S %d-%b-%Y")
And now we can read the data without the parse errors.
read_histos(Georgia)
#> DeployID Ptt Type p0_25 p0_50 p0_75 p0_90
#> 1 67851 67851 TAD 2.9 100 100 100
#> 2 67851 67851 TAT 3.0 100 100 100
I have a data frame (called homeAnew) from which the head is as follows.
date total
1 2014-01-01 00:00:00 0.756
2 2014-01-01 01:00:00 0.717
3 2014-01-01 02:00:00 0.643
4 2014-01-01 03:00:00 0.598
5 2014-01-01 04:00:00 0.604
6 2014-01-01 05:00:00 0.638
I wanted to extract explicit dates and I originally used:
Hourly <- subset(homeAnew,date >= "2014-04-10 00:00:00" & date <= "2015-04-10 00:00:00")
However the result was a dataframe that started at 2014-04-09 12:00:00 and ended 2015-04-09 12:00:00. Basically it was shifted back 12 hours from where I wanted it.
I was able to use
Date1<-as.Date("2014-04-10 00:00:00")
Date2<-as.Date("2015-04-10 00:00:00")
Hourly<-homeAnew[homeAnew$date>=Date1 & homeAnew$date<=Date2,]
To get what was after but I was wondering if someone could explain to me why subset would work like that?
I have a dataframe df with a certain number of columns. One of them, ts, is timestamps:
1462147403122 1462147412990 1462147388224 1462147415651 1462147397069 1462147392497
...
1463529545634 1463529558639 1463529556798 1463529558788 1463529564627 1463529557370.
I have also at my disposal the corresponding datetime in the datetime column:
"2016-05-02 02:03:23 CEST" "2016-05-02 02:03:32 CEST" "2016-05-02 02:03:08 CEST" "2016-05-02 02:03:35 CEST" "2016-05-02 02:03:17 CEST" "2016-05-02 02:03:12 CEST"
...
"2016-05-18 01:59:05 CEST" "2016-05-18 01:59:18 CEST" "2016-05-18 01:59:16 CEST" "2016-05-18 01:59:18 CEST" "2016-05-18 01:59:24 CEST" "2016-05-18 01:59:17 CEST"
As you can see my dataframe contains data accross several day. Let's say there are 3. I would like to add a column containing number 1, 2 or 3. 1 if the line belongs to the first day, 2 for the second day, etc...
Thank you very much in advance,
Clement
One way to do this is to keep track of total days elapsed each time the date changes, as demonstrated below.
# Fake data
dat = data.frame(datetime = c(seq(as.POSIXct("2016-05-02 01:03:11"),
as.POSIXct("2016-05-05 01:03:11"), length.out=6),
seq(as.POSIXct("2016-05-09 01:09:11"),
as.POSIXct("2016-05-16 02:03:11"), length.out=4)))
tz(dat$datetime) = "UTC"
Note, if your datetime column is not already in a datetime format, convert it to one using as.POSIXct.
Now, create a new column with the day number, counting the first day in the sequence as day 1.
dat$day = c(1, cumsum(as.numeric(diff(as.Date(dat$datetime, tz="UTC")))) + 1)
dat
datetime day
1 2016-05-02 01:03:11 1
2 2016-05-02 15:27:11 1
3 2016-05-03 05:51:11 2
4 2016-05-03 20:15:11 2
5 2016-05-04 10:39:11 3
6 2016-05-05 01:03:11 4
7 2016-05-09 01:09:11 8
8 2016-05-11 09:27:11 10
9 2016-05-13 17:45:11 12
10 2016-05-16 02:03:11 15
I specified the timezone in the code above to avoid getting tripped up by potential silent shifts between my local timezone and UTC. For example, note the silent shift from my default local time zone ("America/Los_Angeles") to UTC when converting a POSIXct datetime to a date:
# Fake data
datetime = seq(as.POSIXct("2016-05-02 01:03:11"), as.POSIXct("2016-05-05 01:03:11"), length.out=6)
tz(datetime)
[1] ""
date = as.Date(datetime)
tz(date)
[1] "UTC"
data.frame(datetime, date)
datetime date
1 2016-05-02 01:03:11 2016-05-02
2 2016-05-02 15:27:11 2016-05-02
3 2016-05-03 05:51:11 2016-05-03
4 2016-05-03 20:15:11 2016-05-04 # Note day is different due to timezone shift
5 2016-05-04 10:39:11 2016-05-04
6 2016-05-05 01:03:11 2016-05-05
I am trying to subtract one hour to date/times within a POSIXct column that are earlier than or equal to a time stated in a different comparison dataframe for that particular ID.
For example:
#create sample data
Time<-as.POSIXct(c("2015-10-02 08:00:00","2015-11-02 11:00:00","2015-10-11 10:00:00","2015-11-11 09:00:00","2015-10-24 08:00:00","2015-10-27 08:00:00"), format = "%Y-%m-%d %H:%M:%S")
ID<-c(01,01,02,02,03,03)
data<-data.frame(Time,ID)
Which produces this:
Time ID
1 2015-10-02 08:00:00 1
2 2015-11-02 11:00:00 1
3 2015-10-11 10:00:00 2
4 2015-11-11 09:00:00 2
5 2015-10-24 08:00:00 3
6 2015-10-27 08:00:00 3
I then have another dataframe with a key date and time for each ID to compare against. The Time in data should be compared against Comparison in ComparisonData for the particular ID it is associated with. If the Time value in data is earlier than or equal to the comparison value one hour should be subtracted from the value in data:
#create sample comparison data
Comparison<-as.POSIXct(c("2015-10-29 08:00:00","2015-11-02 08:00:00","2015-10-26 08:30:00"), format = "%Y-%m-%d %H:%M:%S")
ID<-c(01,02,03)
ComparisonData<-data.frame(Comparison,ID)
This should look like this:
Comparison ID
1 2015-10-29 08:00:00 1
2 2015-11-02 08:00:00 2
3 2015-10-26 08:30:00 3
In summary, the code should check all times of a certain ID to see if any are earlier than or equal to the value specified in ComparisonData and if they are, subtract one hour. This should give this data frame as an output:
Time ID
1 2015-10-02 07:00:00 1
2 2015-11-02 11:00:00 1
3 2015-10-11 09:00:00 2
4 2015-11-11 09:00:00 2
5 2015-10-24 07:00:00 3
6 2015-10-27 08:00:00 3
I have looked at similar solutions such as this but I cannot understand how to also check the times using the right timing with that particular ID.
I think ddply seems quite a promising option but I'm not sure how to use it for this particular problem.
Here's a quick and efficient solution using data.table. First we join the two data sets by ID and then just modify the Times which are lower or equal to Comparison
library(data.table) # v1.9.6+
setDT(data)[ComparisonData, end := i.Comparison, on = "ID"]
data[Time <= end, Time := Time - 3600L][, end := NULL]
data
# Time ID
# 1: 2015-10-02 07:00:00 1
# 2: 2015-11-02 11:00:00 1
# 3: 2015-10-11 09:00:00 2
# 4: 2015-11-11 09:00:00 2
# 5: 2015-10-24 07:00:00 3
# 6: 2015-10-27 08:00:00 3
Alternatively, we could do this in one step while joining using ifelse (not sure how efficient this though)
setDT(data)[ComparisonData,
Time := ifelse(Time <= i.Comparison,
Time - 3600L, Time),
on = "ID"]
data
# Time ID
# 1: 2015-10-02 07:00:00 1
# 2: 2015-11-02 11:00:00 1
# 3: 2015-10-11 09:00:00 2
# 4: 2015-11-11 09:00:00 2
# 5: 2015-10-24 07:00:00 3
# 6: 2015-10-27 08:00:00 3
I am sure there is going to be a better solution than this, however, I think this works.
for(i in 1:nrow(data)) {
if(data$Time[i] < ComparisonData[data$ID[i], 1]){
data$Time[i] <- data$Time[i] - 3600
}
}
# Time ID
#1 2015-10-02 07:00:00 1
#2 2015-11-02 11:00:00 1
#3 2015-10-11 09:00:00 2
#4 2015-11-11 09:00:00 2
#5 2015-10-24 07:00:00 3
#6 2015-10-27 08:00:00 3
This is going to iterate through every row in data.
ComparisonData[data$ID[i], 1] gets the time column in ComparisonData for the corresponding ID. If this is greater than the Time column in data then reduce the time by 1 hour.