I have a data frame that looks similar to this:
I know the starting year of the first obs (1963). The obs are in the exact chronological order. So the next instance of "Jan" (obs 13) indicates that the year is 1964. Is there a way to create a column "Year" that has increases the current year every time that the next occurrence of "Jan " happens?
In the pic, it would be "1964" and then when "Jan" happens again, 1965 and so on....
There is an answer to a similar problem that was suggested but it doesn't quite do it and here it is:
## Make data easily reproducible
df <- data.frame(day=c(24, 21, 20, 10, 20, 20, 10, 15),
month = c("Jun", "Mar", "Jan", "Dec", "Jun", "Jan", "Dec", "Dec"))
## Convert each month-day combo to its corresponding "julian date"
datestring <- paste("1963", match(df[[2]], month.abb), df[[1]], sep = "-")
date <- strptime(datestring, format = "%Y-%m-%d")
julian <- as.integer(strftime(date, format = "%j"))
## Transitions between years occur wherever julian date increases between
## two observations
df$year <- 1963 - cumsum(diff(c(julian[2], julian))>0)
But this won't do it: Because the last two observations have the same month ("Dec" and then another "Dec") the count for year increases:
The last observation should still read "1960" NOT "1959".
The OP has requested to complete the years in ascending order starting in 1963.
The approach below works without date conversion and dummy dates and can be amended to work with fiscal years (see here).
df$year <- 1963 + cumsum(c(0L, diff(100L*as.integer(
factor(df$month, levels = month.abb)) + df$day) < 0))
df
day month year
1 24 Jun 1963
2 21 Mar 1964
3 20 Jan 1965
4 10 Dec 1965
5 20 Jun 1966
6 20 Jan 1967
7 10 Dec 1967
8 15 Dec 1967
Note that there is a question which seems to be similar but was asking to complete years in descending order. The solution there needs to be changed in two places to work here.
Related
Apologies if this is a repeat question, I searched and could not find the specific answer I am looking for.
I have a data frame where one column is a 16-digit code, and there are a number of other columns. Here is a simplified example:
code = c("1109619910224003", "1157919910102001", "1539820070315001", "1563120190907002")
year = c(1991, 1991, 2007, 2019)
month = c(02, 01, 03, 09)
dat = as.data.frame(cbind(code,year,month))
dat
> dat
code year month
1 1109619910224003 1991 2
2 1157919910102001 1991 1
3 1539820070315001 2007 3
4 1563120190907002 2019 9
As you can see, the code contains year, month, and day information. I already have columns for year and month in my dataframe, but I need to also create a day column, which would be 24, 02, 15, and 07 in this example. The date is always in the format yyyymmdd and begins as the 6th digit in the code. So I essentially need to extract the 12th and 13th digits from each code to create my day column.
I then need to create another column for day of year from the date information, so I end up with the following:
day = c(24, 02, 15, 07)
dayofyear = c(55, 2, 74, 250)
dat2 = as.data.frame(cbind(code,year,month,day,dayofyear))
dat2
> dat2
code year month day dayofyear
1 1109619910224003 1991 2 24 55
2 1157919910102001 1991 1 2 2
3 1539820070315001 2007 3 15 74
4 1563120190907002 2019 9 7 250
Any suggestions? Thanks!
You can leverage the Date data type in R to accomplish all of these tasks. First we will parse out the date portion of the code (characters 6 to 13), and convert them to Date format using readr::parse_date(). Once the date is converted, we can simply access all of the values you want rather than calculating them ourselves.
library(tidyverse)
out <- dat %>%
mutate(
date=readr::parse_date(substr(code, 6, 13), format="%Y%m%d"),
day=format(date, "%d"),
month=format(date, "%m"),
year=format(date, "%Y"),
day.of.year=format(date, "%j")
)
(I'm using tidyverse syntax here because I find it quicker for these types of problems)
Once we create these columns, we can look at the updated data.frame out:
code year month date day day.of.year
1 1109619910224003 1991 02 1991-02-24 24 055
2 1157919910102001 1991 01 1991-01-02 02 002
3 1539820070315001 2007 03 2007-03-15 15 074
4 1563120190907002 2019 09 2019-09-07 07 250
Edit: note that the output for all the new columns is character. We can tell this without using str() because of the leading zeros in the new columns. To get rid of this, we can do something like out <- out %>% mutate_all(as.integer), or just append the mutate_all call to the end of our existing pipeline.
This question already has answers here:
How to sum a variable by group
(18 answers)
Closed 3 years ago.
Hi How do i apply lappy() to the values in rows?
Lets say my dataset is
Name value month
SP 50 January
Mary 67 January
Justin 33 January
SP 45 February
Mary 37 February
Justin 53 February
SP 55 March
Mary 67 March
Justin 23 March
So if i need to find the sum of values for each month,
to get my out put as:
Month Value
January 150
February 135
March 145
Currently i only know how to calculate this on the whole without the monthly separation as in:
library(tidyverse)
x <- tibble(Name = c("SP","Mary","Justin","SP","Mary","Justin","SP","Mary","Justin"),
value = c(50,67,33,45,37,53,55,67,23),
Month = c("January", "January", "February", "February","March","March"))
y <- data.frame(x,sum(x$value))
I do not know how to get my desired outcome.
any idea how?
We can use aggregate from base R
aggregate(value ~ Month, x, sum)
Or with dplyr
library(dplyr)
x %>%
group_by(Month) %>%
summarise(value = sum(value))
I have a data frame with year and day
df <- data.frame(year = rep(1980:2015,each = 365), day = 1:365)
Please note that I only need 365 days a year i.e. I am asusming each day has
365 years.
I want to generate two data:
1) which month does each day fall in
2) which 15-days period each day fall in. A year will have 24 15-days period. i.e. each month will be split into two halves something like this;
Jan: 1st - 15th: 1st Quarter
Jan: 16th- 31st: 2nd Quarter
Feb: 1st - 15th: 3rd Quarter
Feb: 16th - 28th: 4th Quarter
March: 1st - 15th: 5th Quarter
.
.
Decmber: 16th - 31st: 24th quarter
My final data should look like this
Year Day Quarter Month
1980 1 1 1
1980 2 1 1
.
.
1980 365 24 12
.
.
2015 1 1 1
2015 2 1 1
.
.
2015 365 12 24
I can generate the month using this:
library(dplyr)
months <- list(1:31, 32:59, 60:90, 91:120, 121:151, 152:181, 182:212, 213:243, 244:273, 274:304, 305:334, 335:365)
df1 <- df %>% group_by(year) %>%
mutate(month = sapply(day, function(x) which(sapply(months, function(y) x %in% y)))
But I do not know how to generate the 15-days period?
To handle that Feb 29th in leap years should not be included, we may generate a complete sequence of dates and then remove instances of Feb 29th. Grab month from the date. Calculate the two-week periods by checking if day of the month %d is <= 15 and subtract from 2* the month number.
# complete sequence of dates
# use two years in this example, with 2012 being a leap year
dates <- data.frame(date = seq(as.Date("2011-01-01"), as.Date("2012-12-31"), by = "1 day"))
# remove Feb 29th in leap years
d <- dates[format(dates$date, "%m-%d") != "02-29", , drop = FALSE]
# create month
d$month <- month(d$date)
# create two-week number
d$twoweek <- d$month * 2 - (as.numeric(format(d$date, "%d")) <= 15)
I have a dataframe df with 3 columns (months, year, value).
>head(df)
months year value
January 01 23875.00
February 01 15343.25
March 01 9584.25
April 01 19026.33
May 01 26324.00
June 01 31228.00
Every 12 rows (starting from the first January), the year goes 02, 03, 04, etc.. until 16.
I need to calculate seasonal means i.e.
For Summer mean of (December,January,February); for Autumn mean of (March,April,May), for Winter mean of (June,July,August) and for Spring mean of (September,October,November).
Then make a new dataframe with seasons, year, and the mean value of them to get something like this.
>head(seasdf)
season year value
DJF 01
MAM 01
JJA 01
SON 01
DJF 02
MAM 02
With all the years until 16. I searched for similar questions with this kind of dataframe, but i couldn't find a way to do it.
Sorry for this noob question.
We assume that adjacent months in the same quarter should all have the same quarter name and year and that quarters are named after the year in which the quarter ends. For example, Dec 2001, Jan 2002 and Feb 2002 would all be part of the DJF 2002 quarter.
First convert the year and month to a "yearmon" class variable, ym, and then add 1/12 to push the months forward one. This is based on the fact that yearmon variables are stored as the year + 0 for Jan, 1/12 for Feb, 2/12 for Mar, etc. Then convert that to a "yearqtr" class variable, yq. Now aggregate value by yq noting that yearqtr variables sort correctly so that 2001 Q1 will come before 2001 Q2, etc. Finally reconstitute the aggregated data frame with the columns shown in the question.
library(zoo) # yearmon and yearqtr classes
ym <- as.yearmon(paste(DF$months, DF$year), "%B %y")
yq <- as.yearqtr(ym + 1/12)
Ag <- aggregate(value ~ yq, DF, mean)
season.name <- c("DJF", "MAM", "JJA", "SON")
with(Ag, data.frame(year = as.integer(yq), season = season.name[cycle(yq)], value))
giving:
year season value
1 2001 DJF 19609.12
2 2001 MAM 18311.53
3 2001 JJA 31228.00
If the exact layout shown in the question is not important then we could omit the last two lines of code above and just use Ag
> Ag
yq value
1 2001 Q1 19609.12
2 2001 Q2 18311.53
3 2001 Q3 31228.00
Note: The input DF in reproducible form was assumed to be:
DF <- structure(list(months = c("January", "February", "March", "April",
"May", "June"), year = c("01", "01", "01", "01", "01", "01"),
value = c(23875, 15343.25, 9584.25, 19026.33, 26324, 31228
)), .Names = c("months", "year", "value"), class = "data.frame", row.names = c(NA, -6L))
Seems like your months variable is standard month name, you can match it against the month.name variable in R to get the month as a number, i.e(January will be 1, February will 2, etc), and take modulo division of 3 to get the season as another group variable aside from year, and then it should be trivial to group by year, season and take the average:
library(dplyr)
df %>% group_by(season = match(months, month.name) %% 12 %/% 3, year) %>%
summarise(value = mean(value)) %>% ungroup() %>%
# optional: convert the season from number to meaningful labels which could also be
# summer, autumn, winter and spring
mutate(season = factor(season, levels = c(0,1,2,3),
labels = c("DJF", "MAM", "JJA", "SON")))
# A tibble: 3 × 3
# season year value
# <fctr> <int> <dbl>
#1 DJF 1 19609.12
#2 MAM 1 18311.53
#3 JJA 1 31228.00
If December needs to be rolled to the next year Summer, you can add one to the year variable when months == "December":
df %>% group_by(season = match(months, month.name) %% 12 %/% 3, year = ifelse(months == "December", year + 1, year)) %>%
summarise(value = mean(value)) %>% ungroup() %>%
# optional: convert the season from number to meaningful labels which could also be
# summer, autumn, winter and spring
mutate(season = factor(season, levels = c(0,1,2,3),
labels = c("DJF", "MAM", "JJA", "SON")))
I am now learning R and using the SEAS package to help me with some calculation in R and data is the same format as SEAS package likes. It is a time series
require(seas)
data(mscdata)
dat.int <- (mksub(mscdata, id=1108447))
the heading of the data and it is 20 years of data
year yday date t_max t_min t_mean rain snow precip
However, I now need to calculate the number of days in each month rainfall is >= 1.0mm . So at the end of it. I would have two columns ( each month in each year and total # of days in each month rainfall>= 1.0mm )
I'm not certain how to write this code and any help would be appreciated
Thank you
Lam
I now need to calculate the number of days in each month rainfall is >= 1.0mm. So at the end of it. I would have two columns ( each month in each year and total # of days in each month rainfall>= 1.0mm )
1) So dat.int$date is a Date object. First step is you need to create a new column dat.int$yearmon extracting the year-month, e.g. using zoo::yearmon
Extract month and year from a zoo::yearmon object
require(zoo)
dat.int$yearmon <- as.yearmon(dat.int$date, "%b %y")
2) Second, you need to do a summarize operation (recommend you use plyr or the newer dplyr) on rain>=1.0 aggregated by yearmon. Let's name our resulting column rainy_days.
If you want to store rainy_days column back into the dat.int dataframe, you use a transform instead of a summarize:
ddply(dat.int, .(yearmon), transform, rainy_days=sum(rain >= 1.0) )
or else if you really just want a new summary dataframe:
require(plyr)
rainydays_by_yearmon <- ddply(dat.int, .(yearmon), summarize, rainy_days=sum(rain >= 1.0) )
print.data.frame(rainydays_by_yearmon)
yearmon rainy_days
1 Jan 1975 14
2 Feb 1975 12
3 Mar 1975 13
4 Apr 1975 6
5 May 1975 6
6 Jun 1975 5
...
355 Jul 2004 3
356 Aug 2004 7
357 Oct 2004 14
358 Nov 2004 16
359 Dec 2004 19
Note: you can do the above with plain old R, without using zoo or plyr/dplyr packages. But might as well teach you nicer, more scalable, maintainable code idioms.