I have replicate this code with 4 different places and 4 different years.
df1 <- df %>% filter(Place == "Al" & year==2016)
rollingMean(df1, pollutant = "O", hours=8, new.name = "mean", data.thresh=75)
Sample of data:
Place O date_time year
Al 23 2016-01-01 01:00:00 2016
Al 15 2016-01-01 02:00:00 2016
Al 18 2016-01-01 03:00:00 2016
Al 18 2016-01-01 04:00:00 2016
Al 20 2016-01-01 05:00:00 2016
Al 21 2016-01-01 06:00:00 2016
Ar 23 2016-01-01 01:00:00 2016
Ar 15 2016-01-01 02:00:00 2016
Ar 18 2016-01-01 03:00:00 2016
Ar 18 2016-01-01 04:00:00 2016
Ar 20 2016-01-01 05:00:00 2016
Ar 21 2016-01-01 06:00:00 2016
Ma 23 2016-01-01 01:00:00 2016
Ma 15 2016-01-01 02:00:00 2016
Ma 18 2016-01-01 03:00:00 2016
Ma 18 2016-01-01 04:00:00 2016
Ma 20 2016-01-01 05:00:00 2016
Ma 21 2016-01-01 06:00:00 2016
Ss 23 2016-01-01 01:00:00 2016
Ss 15 2016-01-01 02:00:00 2016
Ss 18 2016-01-01 03:00:00 2016
Ss 18 2016-01-01 04:00:00 2016
Ss 20 2016-01-01 05:00:00 2016
Ss 21 2016-01-01 06:00:00 2016
How can I optimize my code? I think that I need to loop or map but it is my first time doing this.
You can split the dataset for every unique value of Place and Year and use map to run rollingMean function for each group and combine them into one dataframe.
library(dplyr)
library(purrr)
result <- df %>%
group_split(Place, Year) %>%
map_df(~rollingMean(.x, pollutant = "O", hours=8,
new.name = "mean", data.thresh=75))
I have a following dataframe:
df<-data.frame(timecol=as.POSIXct(c("2016-05-31 22:12:27 PDT","2016-05-31 22:25:03 PDT","2016-05-31 23:08:43 PDT","2016-05-31 23:24:10 PDT","2016-06-01 02:00:56 PDT","2016-06-01 03:00:56 PDT","2016-06-01 05:00:56 PDT","2016-06-01 22:12:27 PDT","2016-06-01 22:25:03 PDT","2016-06-01 23:08:43 PDT","2016-06-01 23:24:10 PDT","2016-06-02 02:00:56 PDT","2016-06-02 03:00:56 PDT","2016-06-02 05:00:56 PDT")),value=sample(1:100,14))
> df
timecol value
1 2016-05-31 22:12:27 100
2 2016-05-31 22:25:03 86
3 2016-05-31 23:08:43 39
4 2016-05-31 23:24:10 91
5 2016-06-01 02:00:56 32
6 2016-06-01 03:00:56 93
7 2016-06-01 05:00:56 53
8 2016-06-01 22:12:27 54
9 2016-06-01 22:25:03 76
10 2016-06-01 23:08:43 19
11 2016-06-01 23:24:10 56
12 2016-06-02 02:00:56 20
13 2016-06-02 03:00:56 3
14 2016-06-02 05:00:56 66
I need to aggregate the value column based of a predefined time interval - from 19pm this day to 7am the next day. I was thinking smth like this:
tm <- seq(as.POSIXct("2016-05-31 19:00:00 PDT"),as.POSIXct("2016-06-02 07:00:00 PDT"), by = "12 hours")
aggregate(df$value, list(day = cut(tm, "days")), sum)
but I can't figure out what's wrong.
This question already has answers here:
How to fill with different colors between two lines? (originally: fill geom_polygon with different colors above and below y = 0 (or any other value)?)
(4 answers)
Closed 5 years ago.
I have this df
x acc
1 1902-01-01 0.782887804
2 1903-01-01 -0.003144199
3 1904-01-01 0.100006276
4 1905-01-01 0.326173392
5 1906-01-01 1.285114692
6 1907-01-01 2.844399973
7 1920-01-01 -0.300232190
8 1921-01-01 1.464389342
9 1922-01-01 0.142638653
10 1923-01-01 -0.020162385
11 1924-01-01 0.361928571
12 1925-01-01 0.616325588
13 1926-01-01 -0.108206003
14 1927-01-01 -0.318441954
15 1928-01-01 -0.267884586
16 1929-01-01 -0.022473777
17 1930-01-01 -0.294452983
18 1931-01-01 -0.654927109
19 1932-01-01 -0.263508341
20 1933-01-01 0.622530992
21 1934-01-01 1.009666043
22 1935-01-01 0.675484421
23 1936-01-01 1.209162008
24 1937-01-01 1.655280986
25 1948-01-01 2.080021785
26 1949-01-01 0.854572563
27 1950-01-01 0.997540963
28 1951-01-01 1.000244163
29 1952-01-01 0.958322941
30 1953-01-01 0.816259474
31 1954-01-01 0.814488644
32 1955-01-01 1.233694537
33 1958-01-01 0.460120970
34 1959-01-01 0.344201474
35 1960-01-01 1.601430139
36 1961-01-01 0.387850967
37 1962-01-01 -0.385954401
38 1963-01-01 0.699355708
39 1964-01-01 0.084519926
40 1965-01-01 0.708964572
41 1966-01-01 1.456280443
42 1967-01-01 1.479412638
43 1968-01-01 1.199000726
44 1969-01-01 0.282942042
45 1970-01-01 -0.181724504
46 1971-01-01 0.012170186
47 1972-01-01 -0.095891043
48 1973-01-01 -0.075384446
49 1974-01-01 -0.156668145
50 1975-01-01 -0.303023258
51 1976-01-01 -0.516027310
52 1977-01-01 -0.826791524
53 1980-01-01 -0.947112221
54 1981-01-01 -1.634878300
55 1982-01-01 -1.955298323
56 1987-01-01 -1.854447550
57 1988-01-01 -1.458955443
58 1989-01-01 -1.256102245
59 1990-01-01 -0.864108585
60 1991-01-01 -1.293373024
61 1992-01-01 -1.049530431
62 1993-01-01 -1.002526230
63 1994-01-01 -0.868783614
64 1995-01-01 -1.081858981
65 1996-01-01 -1.302103374
66 1997-01-01 -1.288048194
67 1998-01-01 -1.455750340
68 1999-01-01 -1.015467069
69 2000-01-01 -0.682789640
70 2001-01-01 -0.811058004
71 2002-01-01 -0.972374057
72 2003-01-01 -0.536505225
73 2004-01-01 -0.518686263
74 2005-01-01 -0.976298621
75 2006-01-01 -0.946429713
I would like plot the data in this kind:
where on x axes there is column x of df, and on y axes column acc.
Is possible plot it with ggplot?
I tried with this code:
ggplot(df,aes(x=x,y=acc))+
geom_linerange(data =df , aes(colour = ifelse(acc <0, "blue", "red")),ymin=min(df),ymax=max(cdf))
but the result is this:
Please, how I can do it?
Is this what you want? I'm not sure.
ggplot(data = df,mapping = aes(x,acc))+geom_segment(data = df , mapping = aes(x=x,y=ystart,xend=x,yend=acc,color=col))
df$x=year(as.Date(df$x))
df$ystart=0
df$col=ifelse(df$acc>=0,"blue","red")
This question already has answers here:
Plot separate years on a common day-month scale
(3 answers)
Closed 6 years ago.
I'm trying to drop year from a multiyear data frame and plot day-month on x axis with geom_smooth() calculated for different years.
My data structure, initially looks like this:
> str(pmWaw)
'data.frame': 52488 obs. of 5 variables:
$ date : POSIXct, format: "2014-01-01 00:00:00" "2014-01-01 00:00:00" "2014-01-01 00:00:00" "2014-01-01 01:00:00" ...
$ stacja: Factor w/ 273 levels "DsWrocKorzA",..: 26 27 129 26 27 129 26 27 129 26 ...
$ pm25 : num 100 63 NA 69 36 NA 41 31 NA 37 ...
$ pm10 : num 122 68 79 77 38 90 43 32 39 38 ...
$ season: Ord.factor w/ 4 levels "spring (MAM)"<..: 4 4 4 4 4 4 4 4 4 4 ...
Using lubridate I added year and month as separate variables:
library(lubridate)
pmWaw$year<- year(pmWaw$date)
pmWaw$month<- month(pmWaw$date)
Next, using a code found here on stackoverflow I calculated a month and day variable in %m-%d format:
pmWaw$month.day<-format(pmWaw$date, format="%m-%d")
#check new variable type:
> typeof(pmWaw$month.day)
[1] "character"
Eventually data frame I work with is this:
> head(pmWaw)
date stacja pm25 pm10 season year month month.day
1 2014-01-01 00:00:00 MzWarNiepodKom 100 122 winter (DJF) 2014 1 01-01
2 2014-01-01 00:00:00 MzWarszUrsynow 63 68 winter (DJF) 2014 1 01-01
3 2014-01-01 00:00:00 MzWarTarKondra NA 79 winter (DJF) 2014 1 01-01
4 2014-01-01 01:00:00 MzWarNiepodKom 69 77 winter (DJF) 2014 1 01-01
5 2014-01-01 01:00:00 MzWarszUrsynow 36 38 winter (DJF) 2014 1 01-01
6 2014-01-01 01:00:00 MzWarTarKondra NA 90 winter (DJF) 2014 1 01-01
> tail(pmWaw)
date stacja pm25 pm10 season year month month.day
52483 2015-12-30 22:00:00 MzWarAlNiepo 36 47 winter (DJF) 2015 12 12-30
52484 2015-12-30 22:00:00 MzWarKondrat 26 29 winter (DJF) 2015 12 12-30
52485 2015-12-30 22:00:00 MzWarWokalna 36 44 winter (DJF) 2015 12 12-30
52486 2015-12-30 23:00:00 MzWarAlNiepo 39 59 winter (DJF) 2015 12 12-30
52487 2015-12-30 23:00:00 MzWarKondrat 36 39 winter (DJF) 2015 12 12-30
52488 2015-12-30 23:00:00 MzWarWokalna 40 49 winter (DJF) 2015 12 12-30
Passing new values to ggplot gives me three issues:
ggplot(pmWaw, aes(x=month.day, y=pm25)) +
geom_jitter(alpha=0.5) +
geom_smooth()
First (minor) problem: month.day is a char type variable and ggplot won't recognize it's initial time series nature. This I can probably overcome by manually setting scale labels to months.
Second (major) problem geom_smooth() is not calculated at all and I can't figure out why?
Third (major) problem is I can't work out a solution to add year as a grouping variable for two separate smoothed lines (mostly because geom_smooth is not there at all).
My guess is, that the source of all problems lies somewhere in the way how I extracted month and day format and ended up with a character class variable.
Could anyone help me fix it? Any hints appreciated.
Looks like I found a solution to work with:
ggplot(pmWaw, aes(x=month.day, y=pm25, group = year)) +
geom_point(alpha=0.5) +
geom_smooth(aes(color=factor(year)))
solves issues 2 and 3 - geom smooth is there and I can distinguish years. Probably not the best solution but might be a good place to start
Following up my previous question about aggregating hourly data into daily data, I want to continue with (a) monthly aggregate and (b) merging the monthly aggregate into the original dataframe.
My original dataframe looks like this:
Lines <- "Date,Outdoor,Indoor
01/01/2000 01:00,30,25
01/01/2000 02:00,31,26
01/01/2000 03:00,33,24
02/01/2000 01:00,29,25
02/01/2000 02:00,27,26
02/01/2000 03:00,39,24
12/01/2000 02:00,27,26
12/01/2000 03:00,39,24
12/31/2000 23:00,28,25"
The daily aggregates have been answered in my previous question, and then I can find my way to produce the monthly aggregates from there, to something like this:
Lines <- "Date,Month,OutdoorAVE
01/01/2000,Jan,31.33
02/01/2000,Feb,31.67
12/01/2000,Dec,31.33"
Where the OutdoorAVE is the monthly average of the daily minimum and maximum outdoor temperature. What I want to have in the end is something like this:
Lines <- "Date,Outdoor,Indoor,Month,OutdoorAVE
01/01/2000 01:00,30,25,Jan,31.33
01/01/2000 02:00,31,26,Jan,31.33
01/01/2000 03:00,33,24,Jan,31.33
02/01/2000 01:00,29,25,Feb,31.67
02/01/2000 02:00,27,26,Feb,31.67
02/01/2000 03:00,39,24,Feb,31.67
12/01/2000 02:00,27,26,Dec,31.33
12/01/2000 03:00,39,24,Dec,31.33
12/31/2000 23:00,28,25,Dec,31.33"
I do not know enough R on how to do that. Any help is greatly appreciated.
Try ave and eg POSIXlt to extract the month:
zz <- textConnection(Lines)
Data <- read.table(zz,header=T,sep=",",stringsAsFactors=F)
close(zz)
Data$Month <- strftime(
as.POSIXlt(Data$Date,format="%m/%d/%Y %H:%M"),
format='%b')
Data$outdoor_ave <- ave(Data$Outdoor,Data$Month,FUN=mean)
Gives :
> Data
Date Outdoor Indoor Month outdoor_ave
1 01/01/2000 01:00 30 25 Jan 31.33333
2 01/01/2000 02:00 31 26 Jan 31.33333
3 01/01/2000 03:00 33 24 Jan 31.33333
4 02/01/2000 01:00 29 25 Feb 31.66667
5 02/01/2000 02:00 27 26 Feb 31.66667
6 02/01/2000 03:00 39 24 Feb 31.66667
7 12/01/2000 02:00 27 26 Dec 31.33333
8 12/01/2000 03:00 39 24 Dec 31.33333
9 12/31/2000 23:00 28 25 Dec 31.33333
Edit : Then just calcualte Month in Data as shown above and use merge :
zz <- textConnection(Lines2) # Lines2 is the aggregated data
Data2 <- read.table(zz,header=T,sep=",",stringsAsFactors=F)
close(zz)
> merge(Data,Data2[-1],all=T)
Month Date Outdoor Indoor OutdoorAVE
1 Dec 12/01/2000 02:00 27 26 31.33
2 Dec 12/01/2000 03:00 39 24 31.33
3 Dec 12/31/2000 23:00 28 25 31.33
4 Feb 02/01/2000 01:00 29 25 31.67
5 Feb 02/01/2000 02:00 27 26 31.67
6 Feb 02/01/2000 03:00 39 24 31.67
7 Jan 01/01/2000 01:00 30 25 31.33
8 Jan 01/01/2000 02:00 31 26 31.33
9 Jan 01/01/2000 03:00 33 24 31.33
This is tangential to your question, but you may want to use RSQLite and a separate tables for various aggregate values instead, and join the tables with simple SQL commands. If you use many kinds of aggregations your data frame can easily get large and ugly.
Here's a zoo/xts solution. Note that Month is numeric here because you can't mix types in zoo/xts objects.
require(xts) # loads zoo too
Lines1 <- "Date,Outdoor,Indoor
01/01/2000 01:00,30,25
01/01/2000 02:00,31,26
01/01/2000 03:00,33,24
02/01/2000 01:00,29,25
02/01/2000 02:00,27,26
02/01/2000 03:00,39,24
12/01/2000 02:00,27,26
12/01/2000 03:00,39,24
12/31/2000 23:00,28,25"
con <- textConnection(Lines1)
z <- read.zoo(con, header=TRUE, sep=",",
format="%m/%d/%Y %H:%M", FUN=as.POSIXct)
close(con)
zz <- merge(z, Month=.indexmon(z),
OutdoorAVE=ave(z[,1], .indexmon(z), FUN=mean))
zz
# Outdoor Indoor Month OutdoorAVE
# 2000-01-01 01:00:00 30 25 0 31.33333
# 2000-01-01 02:00:00 31 26 0 31.33333
# 2000-01-01 03:00:00 33 24 0 31.33333
# 2000-02-01 01:00:00 29 25 1 31.66667
# 2000-02-01 02:00:00 27 26 1 31.66667
# 2000-02-01 03:00:00 39 24 1 31.66667
# 2000-12-01 02:00:00 27 26 11 31.33333
# 2000-12-01 03:00:00 39 24 11 31.33333
# 2000-12-31 23:00:00 28 25 11 31.33333
Update: How do get the above result using two different data sets.
Lines2 <- "Date,Month,OutdoorAVE
01/01/2000,Jan,31.33
02/01/2000,Feb,31.67
12/01/2000,Dec,31.33"
con <- textConnection(Lines2)
z2 <- read.zoo(con, header=TRUE, sep=",", format="%m/%d/%Y",
FUN=as.POSIXct, colClasses=c("character","NULL","numeric"))
close(con)
zz2 <- na.locf(merge(z1, Month=.indexmon(z1), OutdoorAVE=z2))[index(z1)]
# same output as zz (above)