Converting 10 minute data to hourly average using if condition in R - r

Similar questions may be asked before.I'm new to R and unable to use the other methods.I have one month 10 minute interval data.Example is below. First column is date second is hour.
> 01 00 10 2,8
01 00 20 2,4
01 00 30 2,4
01 00 40 2,1
01 00 50 2,3
01 01 00 1,9
01 01 10 2
I tried to write a code that calculates hourly average if first column(day) and second column(hour) is equal. Because of some values are missing. I tried this code but it does not help.
for(i in 1:4314) {
if(mydata1[i,1] == mydata1[i+1,1] && (mydata1[i,2]= mydata1[i+1,2])){
while(mydata1[i,2] != mydata1[i+1,2]){sum(mydata1[i,4])}}
else {
print(mean(sum(mydata1[i,4])))
}
}
Thanks.

This is very easy with the dplyr package.
Let's give your data some names:
names(mydata) = c("day", "hour", "minute", "value")
library(dplyr)
group_by(mydata, day, hour) %>%
summarize(hourly.mean = mean(hour))

Related

R_how can I use str_sub to split date and time

The following file names were used in a camera trap study. The S number represents the site, P is the plot within a site, C is the camera number within the plot, the first string of numbers is the YearMonthDay and the second string of numbers is the HourMinuteSecond.
file.names <- c( 'S123.P2.C10_20120621_213422.jpg',
'S10.P1.C1_20120622_050148.jpg',
'S187.P2.C2_20120702_023501.jpg')
file.names
Use a combination of str_sub() and str_split() to produce a data frame with columns corresponding to the site, plot, camera, year, month, days, hour, minute, and second for these three file names. So we want to produce code that will create the data frame:
Site
Plot
Camera
Year
Month
Day
Hour
Minute
Second
S123
P2
C10
2012
06
21
21
34
22
S10
P1
C1
2012
06
22
05
01
48
S187
P2
C2
2012
07
02
02
35
01
My codes are below:
file.names %>%
str_sub(start = 1, end = -5) %>%
str_replace_all("_", ".") %>%
str_split(pattern = fixed("."), n = 5)
I have no idea how to split date and time
nms <- c("Site", "Plot", "Camera", "Year", "Month", "Day", "Hour", "Minute", "Second")
library(tidyverse)
data.frame(file.names) %>%
extract(file.names, nms,
'(\\w+)\\.(\\w+)\\.(\\w+)_(\\d{4})(\\d{2})(\\d{2})_(\\d{2})(\\d{2})(\\d{2})')
Site Plot Camera Year Month Day Hour Minute Second
1 S123 P2 C10 2012 06 21 21 34 22
2 S10 P1 C1 2012 06 22 05 01 48
3 S187 P2 C2 2012 07 02 02 35 01
in Base R:
type.convert(strcapture('(\\w+)\\.(\\w+)\\.(\\w+)_(\\d{4})(\\d{2})(\\d{2})_(\\d{2})(\\d{2})(\\d{2})',
file.names, as.list(setNames(character(length(nms)), nms))), as.is = TRUE)
Site Plot Camera Year Month Day Hour Minute Second
1 S123 P2 C10 2012 6 21 21 34 22
2 S10 P1 C1 2012 6 22 5 1 48
3 S187 P2 C2 2012 7 2 2 35 1
This is a specific case where your data is pretty neatly formatted with fields separated by either _ or ., and where the date and time fields have uniform character length. That means you can skip doing regex and instead just split by those delimeters, drop the substrings into a data frame, then separate the date components and the time components by their positions. As is often the case, as a tidyverse solution you're trading writing extra code for it being pretty easy to follow and scale.
library(magrittr)
strsplit(file.names, split = "[._]") %>%
purrr::map_dfr(setNames, c("site", "plot", "camera", "date", "time", "ext")) %>%
tidyr::separate(date, into = c("year", "month", "day"), sep = c(4, 6)) %>%
tidyr::separate(time, into = c("hour", "minute", "second"), sep = c(2, 4)) %>%
dplyr::select(-ext)
#> # A tibble: 3 × 9
#> site plot camera year month day hour minute second
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 S123 P2 C10 2012 06 21 21 34 22
#> 2 S10 P1 C1 2012 06 22 05 01 48
#> 3 S187 P2 C2 2012 07 02 02 35 01
The ext column was leftover from the initial string splitting, so you can drop it.
I don't know anything about str_sub or str_split other than the fact that they may be efforts to adapt the sub and strsplit functions to an alternate universe. I just learned base R and have not really seen the need to learn a new syntax. Here's a base solution:
as.POSIXct( sub( "([^_]+[_])(\\d{8})[_](\\d{6})", "\\2 \\4", file.names) , format="%Y%m%d %H%M%S")
[1] "2012-06-21" "2012-06-22" "2012-07-02"
You can real the sub pattern as
1) beginning with the start of the string collect all the non-underscore characters into the first capture group
2) Then get the next 8 digits (if they exist) in a second capture group
3) and everything that follows will be in a third capture group
The substitution is to just return the contents of the second capture group. The conversion to Date values is straightforward. I'm assuming that should be clear from the code, but if not then see ?as.Date.
Here's the rest;
as.POSIXct( sub( "([^_]+[_])(\\d{8})[_](\\d{6})(.+$)", "\\2 \\3", file.names) ,
format="%Y%m%d %H%M%S")
[1] "2012-06-21 21:34:22 PDT" "2012-06-22 05:01:48 PDT" "2012-07-02 02:35:01 PDT"
If you want the break out then convert to POSIXlt and extract the resulting list.

R: date format with just year and month

I have a dataframe with monthly data, one column containing the year and one column containing the month. I'd like to combine them into one column with Date format, going from this:
Year Month Data
2020 1 54
2020 2 58
2020 3 78
2020 4 59
To this:
Date Data
2020-01 54
2020-02 58
2020-03 78
2020-04 59
I think you can't represent a Date format in R without showing the day. If you want a character column, like in your example, you can do:
> x <- data.frame(Year = c(2020,2020,2020,2020), Month = c(1,2,3,4), Data = c(54,58,78,59))
> x$Month <- ifelse(nchar(x$Month == 1), paste0(0, x$Month), x$Month) # add 0 behind.
> x$Date <- paste(x$Year, x$Month, sep = '-')
> x
Year Month Data Date
1 2020 01 54 2020-01
2 2020 02 58 2020-02
3 2020 03 78 2020-03
4 2020 04 59 2020-04
> class(x$Date)
[1] "character"
If you want a Date type column you will have to add:
x$Date <- paste0(x$Date, '-01')
x$Date <- as.Date(x$Date, format = '%Y-%m-%d')
x
class(x$Date)
Maybe the simplest way would be to arbitrarily set a day (e.g. 01) to all your dates ? Therefore date intervals would be preserved.
data<-data.frame(Year=c(2020,2020,2020,2020), Month=c(1,2,3,4), Data=c(54,58,78,59))
data$Date<-gsub(" ","",paste(data$Year,"-",data$Month,"-","01"))
data$Date<-as.Date(data$Date,format="%Y-%m-%d")
You can use sprintf -
sprintf('%d-%02d', data$Year, data$Month)
#[1] "2020-01" "2020-02" "2020-03" "2020-04"

using multiple sep arguments in a character string

I have a data frame (dates) that looks like this:
year month start end
2000 06 01 10
2000 06 11 20
2000 06 21 30
I want to create a vector of character strings (one for each row in the data frame) so that each date follows this format:
year month start-end (first row would be 2000 06 01-10).
I've tried using a for loop with the paste function:
titles <- character()
for (i in 1:nrow(dates)){
titles[i] <- paste(dates[i, 1], dates[i,2], dates[i,3], dates[i,4])
}
> titles
[1] "2000 06 01 10" "2000 06 11 20" "2000 06 21 30"
but I can't figure out how to replace the last space with a dash. Is there a way to coerce the paste function into doing this or is there another function I can use?
Thanks for the help
Following your solution, if you just replace
paste(dates[i, 1], dates[i,2], dates[i,3], dates[i,4])
with
paste(dates[i, 1], dates[i,2], paste(dates[i,3], dates[i,4], sep = "-"))
that should work already. This just nests the "-" separating paste within the " " separating paste (default of paste is " ").
A more elegant one-liner would be to use apply:
apply(dates, 1, function(row)paste(row[1], row[2], paste(row[3], row[4], sep = "-")))
[1] "2000 06 01-10" "2000 06 11-20" "2000 06 21-30"
Instead of a loop, you may want to consider:
df$titles <- with(df, paste(year, month, start, end, sep = "-"))
df
# year month start end titles
# 1 2000 06 01 10 2000-06-01-10
# 2 2000 06 11 20 2000-06-11-20
# 3 2000 06 21 30 2000-06-21-30
We can use unite from tidyr:
library(tidyverse)
df %>%
unite("new_date", year:end, sep = " ") %>%
mutate(new_date = sub("\\s(\\d+)$", "-\\1", new_date))
or with two unite's:
df %>%
unite("temp_date", year:start, sep = " ") %>%
unite("new_date", temp_date, end, sep = "-")
Output:
new_date
1 2000 6 1-10
2 2000 6 11-20
3 2000 6 21-30

Time series SparkR missing value

I'm working with SparkR on Time Series and I have a question.
After some operation I got something like this, where DayHour represent the Day and the Hour of the ID's Value.
DayHour ID Value
01 00 4704 10
01 01 4705 11
.
.
.
04 23 4705 12
The problem is that I have some gap like 01 01, 01 02 missing
DayHour ID Value
01 00 4704 13
01 03 4704 12
I have to fill the gap in the whole dataset with :
DayHour ID Value
01 00 4704 13
01 01 4704 0
01 02 4704 0
01 03 4704 12
Foreach ID I have to fill the gap with the DayHour missing, ID and Value = 0
Solution both in R SparkR would be usefull.
I represented your data in data frame df_r
>df_r <- data.frame(DayHour=c("01 00","01 01","01 02","01 03","01 06","01 07"),
ID = c(4704,4705,4705,4706,4706,4706),Value=c(10,11,12,13,14,15))
> df_r
DayHour ID Value
1 01 00 4704 10
2 01 01 4705 11
3 01 02 4705 12
4 01 03 4706 13
5 01 06 4706 14
6 01 07 4706 15
where the missing hours are 01 04 and 01 05
#Removing white spaces
>df_r$DayHour <- sub(" ", "", df_r$DayHour)
# create dummy all the 'dayhour' in sequence
x=c(00:23)
y=01:04
all_day_hour <- data.frame(Hour = rep(x,4), Day = rep(y,each=24))
all_day_hour$Hour <- sprintf("%02d", all_day_hour$Hour)
all_day_hour$Day <- sprintf("%02d", all_day_hour$Day)
all_day_hour_1 <- transform(all_day_hour,DayHour=paste0(Day,Hour))
all_day_hour_1 <- all_day_hour_1[c(3)]
# using for loop to filter out by each id
>library(dplyr)
>library(forecast)
>df.new <- data.frame()
>factors=unique(df_r$ID)
>for(i in 1:length(factors))
{
df_r1 <- filter(df_r, ID == factors[i])
#Merge
df_data1<- merge(df_r1, all_day_hour_1, by="DayHour", all=TRUE)
df_data1$Value[which(is.na(df_data1$Value))] <- 0
df.new <- rbind(df.new, df_data1)
}

Seasonality by day of month

I want to check for seasonality in a time series by the day of the month.
The problem is that the months are not of equal length (or frequency) - there are months with 31, 28 & 30 days.
When declaring the ts object I can only specify a fixed frequency so it wont be correct.
> x <- data.frame(d = as.Date("2013-01-01") + 1:365 , v = runif(365))
> tapply(as.numeric(format(x$d,"%d")) , format(x$d,"%m") , max)
01 02 03 04 05 06 07 08 09 10 11 12
31 28 31 30 31 30 31 31 30 31 30 31
How can I create a time series object in r that i can later decompose and check for seasonality ?
Is it possible to create a pivot table and convert it into a ts ?

Resources