Eight-hour averages in R studio - r

I want to get eight-hour (Time column) averages of the maximum values of O column. It is important that the time column is each hour of one day. Also, the dataset contains the next columns: (place) several place, (date) 365 days, (Time) per each day it has 24 hours, (O) values of ozone, month and year.
The sample dataset is:
data%>%select("Place","date","Time", "O",
"month","year")
Place Date Time O month year
Al 2016-01-01 1 23 enero 2016
Al 2016-01-01 2 15 enero 2016
Al 2016-01-01 3 18 enero 2016
Al 2016-01-01 4 18 enero 2016
Al 2016-01-01 5 20 enero 2016
Al 2016-01-01 6 21 enero 2016
Al 2016-01-01 7 24 enero 2016
Al 2016-01-01 8 24 enero 2016
Al 2016-01-01 9 22 enero 2016
Al 2016-01-01 10 24 enero 2016
Al 2016-01-01 11 33 enero 2016
Al 2016-01-01 12 53 enero 2016
Al 2016-01-01 13 54 enero 2016
Al 2016-01-01 14 54 enero 2016
Al 2016-01-01 15 58 enero 2016
Al 2016-01-01 16 60 enero 2016
Al 2016-01-01 17 57 enero 2016
Al 2016-01-01 18 55 enero 2016
Al 2016-01-01 19 50 enero 2016
Al 2016-01-01 20 51 enero 2016
Al 2016-01-01 21 51 enero 2016
Al 2016-01-01 22 55 enero 2016
Al 2016-01-01 23 46 enero 2016
Al 2016-01-01 24 57 enero 2016
I hope to get the maximum values of O column by the period of one day, ie, the 2016-01-01 I want to get eight maximum value of O and make the average.
But I don't know how to do.

You can use zoo::rollapply for this.
Default usage:
zoo::rollapply(dat$O, 8, max, fill = NA)
# [1] NA NA NA 24 24 24 33 53 54 54 58 60 60 60 60 60 60 60 60 57 NA NA NA NA
This is "centering" the window, where the first 24 is the max of positions 1-8. I already added fill=NA, since we need the output to be the same size as the input vector.
You can change the alignment, so that the max is of the value and 7 to its left or its right. For instance,
zoo::rollapply(dat$O, 8, max, fill = NA, align = "left")
# [1] 24 24 24 33 53 54 54 58 60 60 60 60 60 60 60 60 57 NA NA NA NA NA NA NA
zoo::rollapply(dat$O, 8, max, fill = NA, align = "right")
# [1] NA NA NA NA NA NA NA 24 24 24 33 53 54 54 58 60 60 60 60 60 60 60 60 57
I'll assume that we need the latter (align="right").
Finally, we can do a partial max, where the second value is the max of indices 1-2; third value is the max of indices 1-3; etc. In that case,
zoo::rollapply(dat$O, 8, max, align = "right", partial = TRUE)
# [1] 23 23 23 23 23 23 24 24 24 24 33 53 54 54 58 60 60 60 60 60 60 60 60 57
(Notice we don't technically need fill=NA anymore.)
If you want "partial" max,

Related

R: How plot negative and positive anomaly (for this data) with ggplot? [duplicate]

This question already has answers here:
How to fill with different colors between two lines? (originally: fill geom_polygon with different colors above and below y = 0 (or any other value)?)
(4 answers)
Closed 5 years ago.
I have this df
x acc
1 1902-01-01 0.782887804
2 1903-01-01 -0.003144199
3 1904-01-01 0.100006276
4 1905-01-01 0.326173392
5 1906-01-01 1.285114692
6 1907-01-01 2.844399973
7 1920-01-01 -0.300232190
8 1921-01-01 1.464389342
9 1922-01-01 0.142638653
10 1923-01-01 -0.020162385
11 1924-01-01 0.361928571
12 1925-01-01 0.616325588
13 1926-01-01 -0.108206003
14 1927-01-01 -0.318441954
15 1928-01-01 -0.267884586
16 1929-01-01 -0.022473777
17 1930-01-01 -0.294452983
18 1931-01-01 -0.654927109
19 1932-01-01 -0.263508341
20 1933-01-01 0.622530992
21 1934-01-01 1.009666043
22 1935-01-01 0.675484421
23 1936-01-01 1.209162008
24 1937-01-01 1.655280986
25 1948-01-01 2.080021785
26 1949-01-01 0.854572563
27 1950-01-01 0.997540963
28 1951-01-01 1.000244163
29 1952-01-01 0.958322941
30 1953-01-01 0.816259474
31 1954-01-01 0.814488644
32 1955-01-01 1.233694537
33 1958-01-01 0.460120970
34 1959-01-01 0.344201474
35 1960-01-01 1.601430139
36 1961-01-01 0.387850967
37 1962-01-01 -0.385954401
38 1963-01-01 0.699355708
39 1964-01-01 0.084519926
40 1965-01-01 0.708964572
41 1966-01-01 1.456280443
42 1967-01-01 1.479412638
43 1968-01-01 1.199000726
44 1969-01-01 0.282942042
45 1970-01-01 -0.181724504
46 1971-01-01 0.012170186
47 1972-01-01 -0.095891043
48 1973-01-01 -0.075384446
49 1974-01-01 -0.156668145
50 1975-01-01 -0.303023258
51 1976-01-01 -0.516027310
52 1977-01-01 -0.826791524
53 1980-01-01 -0.947112221
54 1981-01-01 -1.634878300
55 1982-01-01 -1.955298323
56 1987-01-01 -1.854447550
57 1988-01-01 -1.458955443
58 1989-01-01 -1.256102245
59 1990-01-01 -0.864108585
60 1991-01-01 -1.293373024
61 1992-01-01 -1.049530431
62 1993-01-01 -1.002526230
63 1994-01-01 -0.868783614
64 1995-01-01 -1.081858981
65 1996-01-01 -1.302103374
66 1997-01-01 -1.288048194
67 1998-01-01 -1.455750340
68 1999-01-01 -1.015467069
69 2000-01-01 -0.682789640
70 2001-01-01 -0.811058004
71 2002-01-01 -0.972374057
72 2003-01-01 -0.536505225
73 2004-01-01 -0.518686263
74 2005-01-01 -0.976298621
75 2006-01-01 -0.946429713
I would like plot the data in this kind:
where on x axes there is column x of df, and on y axes column acc.
Is possible plot it with ggplot?
I tried with this code:
ggplot(df,aes(x=x,y=acc))+
geom_linerange(data =df , aes(colour = ifelse(acc <0, "blue", "red")),ymin=min(df),ymax=max(cdf))
but the result is this:
Please, how I can do it?
Is this what you want? I'm not sure.
ggplot(data = df,mapping = aes(x,acc))+geom_segment(data = df , mapping = aes(x=x,y=ystart,xend=x,yend=acc,color=col))
df$x=year(as.Date(df$x))
df$ystart=0
df$col=ifelse(df$acc>=0,"blue","red")

creating unique sequence for October 15 to April 30th following year- R

Basically, I'm looking at snowpack data. I want to assign a unique value to each date (column "snowday") over the period October 15 to May 15th the following year (the winter season of course) ~215 days. then add a column "snowmonth" that corresponds to the sequential months of the seasonal data, as well as a "snow year" column that represents the year where each seasonal record starts.
There are some missing dates- however- but instead of finding those dates and inserting NA's into the rows, I've opted to skip that step and instead go the sequential root which can then be plotted with respect to the "snowmonth"
Basically, I just need to get the "snowday" sequence of about 1:215 (+1 for leap years down in a column, and the rest I can do myself. It looks like this
month day year depth date yearday snowday snowmonth
12 26 1955 27 1955-12-26 360 NA NA
12 27 1955 24 1955-12-27 361 NA NA
12 28 1955 24 1955-12-28 362 NA NA
12 29 1955 24 1955-12-29 363 NA NA
12 30 1955 26 1955-12-30 364 NA NA
12 31 1955 26 1955-12-31 365 NA NA
1 1 1956 25 1956-01-01 1 NA NA
1 2 1956 25 1956-01-02 2 NA NA
1 3 1956 26 1956-01-03 3 NA NA
man<-data.table()
man <-  read.delim('mansfieldstake.txt',header=TRUE, check.names=FALSE)
man[is.na(man)]<-0
man$date<-paste(man$yy, man$mm, man$dd,sep="-", collapse=NULL)
man$yearday<-NA #day of the year 1-365
colnames(man)<- c("month","day","year","depth", "date","yearday")
man$date<-as.Date(man$date)
man$yearday<-yday(man$date)
man$snowday<-NA
man$snowmonth<-NA
man[420:500,]
head(man)
output would look something like this:
month day year depth date yearday snowday snowmonth
12 26 1955 27 1955-12-26 360 73 3
12 27 1955 24 1955-12-27 361 74 3
12 28 1955 24 1955-12-28 362 75 3
12 29 1955 24 1955-12-29 363 76 3
12 30 1955 26 1955-12-30 364 77 3
12 31 1955 26 1955-12-31 365 78 3
1 1 1956 25 1956-01-01 1 79 4
1 2 1956 25 1956-01-02 2 80 4
1 3 1956 26 1956-01-03 3 81 4
I've thought about loops and all that- but it's inefficient... leap years kinda mess things up as well- this has become more challenging than i thought. good first project though!
just looking for a simple sequence here, dropping all non-snow months. thanks for anybody who's got input!
If I understand correctly that snowday should be the number of days since the beginning of the season, all you need to make this column using data.table is:
day_one <- as.Date("1955-10-01")
man[, snowday := -(date - day_one)]
If all you want is a sequence of unique values, then seq() is your best bet.
Then you can create the snowmonth using:
library(lubridate)
man[, snowmonth := floor(-time_length(interval(date, day_one), unit = "month"))

insert new rows to the time series data, with date added automatically

I have a time-series data frame looks like:
TS.1
2015-09-01 361656.7
2015-09-02 370086.4
2015-09-03 346571.2
2015-09-04 316616.9
2015-09-05 342271.8
2015-09-06 361548.2
2015-09-07 342609.2
2015-09-08 281868.8
2015-09-09 297011.1
2015-09-10 295160.5
2015-09-11 287926.9
2015-09-12 323365.8
Now, what I want to do is add some new data points (rows) to the existing data frame, say,
320123.5
323521.7
How can I added corresponding date to each row? The data is just sequentially inhered from the last row.
Is there any package can do this automatically, so that the only thing I do is to insert new data point?
Here's some play data:
df <- data.frame(date = seq(as.Date("2015-01-01"), as.Date("2015-01-31"), "days"), x = seq(31))
new.x <- c(32, 33)
This adds the extra observations along with the proper sequence of dates:
new.df <- data.frame(date=seq(max(df$date) + 1, max(df$date) + length(new.x), "days"), x=new.x)
Then just rbind them to get your expanded data frame:
rbind(df, new.df)
date x
1 2015-01-01 1
2 2015-01-02 2
3 2015-01-03 3
4 2015-01-04 4
5 2015-01-05 5
6 2015-01-06 6
7 2015-01-07 7
8 2015-01-08 8
9 2015-01-09 9
10 2015-01-10 10
11 2015-01-11 11
12 2015-01-12 12
13 2015-01-13 13
14 2015-01-14 14
15 2015-01-15 15
16 2015-01-16 16
17 2015-01-17 17
18 2015-01-18 18
19 2015-01-19 19
20 2015-01-20 20
21 2015-01-21 21
22 2015-01-22 22
23 2015-01-23 23
24 2015-01-24 24
25 2015-01-25 25
26 2015-01-26 26
27 2015-01-27 27
28 2015-01-28 28
29 2015-01-29 29
30 2015-01-30 30
31 2015-01-31 31
32 2015-02-01 32
33 2015-02-02 33

Creating a vector with multiple sequences based on number of IDs' repetitions

I've got a data frame with panel-data, subjects' characteristic through the time. I need create a column with a sequence from 1 to the maximum number of year per every subject. For example, if subject 1 is in the data frame from 2000 to 2005, I need the following sequence: 1,2,3,4,5,6.
Below is a small fraction of my data. The last column (exp) is what I trying to get. Additionally, if you have a look at the first subject (13) you'll see that in 2008 the value of qtty is zero. In this case I need just a NA or a code (0,1, -9999), it doesn't matter which one.
Below the data is what I did to get that vector, but it didn't work.
Any help will be much appreciated.
subject season qtty exp
13 2000 29 1
13 2001 29 2
13 2002 29 3
13 2003 29 4
13 2004 29 5
13 2005 27 6
13 2006 27 7
13 2007 27 8
13 2008 0 NA
28 2000 18 1
28 2001 18 2
28 2002 18 3
28 2003 18 4
28 2004 18 5
28 2005 18 6
28 2006 18 7
28 2007 18 8
28 2008 18 9
28 2009 20 10
28 2010 20 11
28 2011 20 12
28 2012 20 13
35 2000 21 1
35 2001 21 2
35 2002 21 3
35 2003 21 4
35 2004 21 5
35 2005 21 6
35 2006 21 7
35 2007 21 8
35 2008 21 9
35 2009 14 10
35 2010 11 11
35 2011 11 12
35 2012 10 13
My code:
numbY<-aggregate(season ~ subject, data = toCountY,length)
colnames(numbY)<-c("subject","inFish")
toCountY$inFish<-numbY$inFish[match(toCountY$subject,numbY$subject)]
numbYbyFisher<-unique(numbY)
seqY<-aggregate(numbYbyFisher$inFish, by=list(numbYbyFisher$subject), function(x)seq(1,x,1))
I am using ddply and I distinguish 2 cases:
Either you generate a sequence along subjet and you replace by NA where you have qtty is zero
ddply(dat,.(subjet),transform,new.exp=ifelse(qtty==0,NA,seq_along(subjet)))
Or you generate a sequence along qtty different of zero with a jump where you have qtty is zero
ddply(dat,.(subjet),transform,new.exp={
hh <- seq_along(which(qtty !=0))
if(length(which(qtty ==0))>0)
hh <- append(hh,NA,which(qtty==0)-1)
hh
})
EDITED
ind=qtty!=0
exp=numeric(length(subject))
temp=0
for(i in 1:length(unique(subject[ind]))){
temp[i]=list(seq(from=1,to=table(subject[ind])[i]))
}
exp[ind]=unlist(temp)
this will provide what you need

Transforming long format data to short format by segmenting dates that include redundant observations

I have a data set that is long format and includes exact date/time measurements of 3 scores on a single test administered between 3 and 5 times per year.
ID Date Fl Er Cmp
1 9/24/2010 11:38 15 2 17
1 1/11/2011 11:53 39 11 25
1 1/15/2011 11:36 39 11 39
1 3/7/2011 11:28 95 58 2
2 10/4/2010 14:35 35 9 6
2 1/7/2011 13:11 32 7 8
2 3/7/2011 13:11 79 42 30
3 10/12/2011 13:22 17 3 18
3 1/19/2012 14:14 45 15 36
3 5/8/2012 11:55 29 6 11
3 6/8/2012 11:55 74 37 7
4 9/14/2012 9:15 62 28 18
4 1/24/2013 9:51 82 45 9
4 5/21/2013 14:04 135 87 17
5 9/12/2011 11:30 98 61 18
5 9/15/2011 13:23 55 22 9
5 11/15/2011 11:34 98 61 17
5 1/9/2012 11:32 55 22 17
5 4/20/2012 11:30 23 4 17
I need to transform this data to short format with time bands based on month (i.e. Fall=August-October; Winter=January-February; Spring=March-May). Some bands will include more than one observation per participant, and as such, will need a "spill over" band. An example transformation for the Fl scores below.
ID Fall1Fl Fall2Fl Winter1Fl Winter2Fl Spring1Fl Spring2Fl
1 15 NA 39 39 95 NA
2 35 NA 32 NA 79 NA
3 17 NA 45 NA 28 74
4 62 NA 82 NA 135 NA
5 98 55 55 NA 23 NA
Notice that dates which are "redundant" (i.e. more than 1 Aug-Oct observation) spill over into Fall2fl column. Dates that occur outside of the desired bands (i.e. November, December, June, July) should be deleted. The final data set should have additional columns that include Fl Er and Cmp.
Any help would be appreciated!
(Link to .csv file with long data http://mentor.coe.uh.edu/Data_Example_Long.csv )
This seems to do what you are looking for, but doesn't exactly match your desired output. I haven't looked at your sample data to see whether the problem lies with your sample desired output or the transformations I've done, but you should be able to follow along with the code to see how the transformations were made.
## Convert dates to actual date formats
mydf$Date <- strptime(gsub("/", "-", mydf$Date), format="%m-%d-%Y %H:%M")
## Factor the months so we can get the "seasons" that you want
Months <- factor(month(mydf$Date), levels=1:12)
levels(Months) <- list(Fall = c(8:10),
Winter = c(1:2),
Spring = c(3:5),
Other = c(6, 7, 11, 12))
mydf$Seasons <- Months
## Drop the "Other" seasons
mydf <- mydf[!mydf$Seasons == "Other", ]
## Add a "Year" column
mydf$Year <- year(mydf$Date)
## Add a "Times" column
mydf$Times <- as.numeric(ave(as.character(mydf$Seasons),
mydf$ID, mydf$Year, FUN = seq_along))
## Load "reshape2" and use `dcast` on just one variable.
## Repeat for other variables by changing the "value.var"
dcast(mydf, ID ~ Seasons + Times, value.var="Fluency")
# ID Fall_1 Fall_2 Winter_1 Winter_2 Spring_2 Spring_3
# 1 1 15 NA 39 39 NA 95
# 2 2 35 NA 32 NA 79 NA
# 3 3 17 NA 45 NA 29 NA
# 4 4 62 NA 82 NA 135 NA
# 5 5 98 55 55 NA 23 NA

Resources