I am attempting to multiply a column of numbers representing daily precipitation amounts by the corresponding monthly precipitation amount of the same year. From the example below, this means multiplying every PPT value in January 1890 by the monthly PPT value for January 1890, i.e. multiplying 31 numbers from D.SIM by the same number from M.SIM, and then doing the same for all the remaining months and years in the record. Is there an easy way?
Many thanks.
Dataset: D.SIM
Day Month Year PPT
1 1 1890 2.4
2 1 1890 0.0
3 1 1890 3.6
Dataset: M.SIM
Year Jan Feb Mar ...
1890 78.5 69.6 62.1 ...
Create loop to repeat daily values to align with monthly values
for (i in df){
JAN <- data.frame(rep(df$Jan, each=31))
}
and then repeated for the other 11 months.
Related
I'm working with daily discharge data over 30 years. Discharge is measured in cfs, and my dataset looks like this:
date ddmm year cfs
1/04/1986 1-Apr 1986 2560
2/04/1986 2-Apr 1986 3100
3/04/1986 3-Apr 1986 2780
4/04/1986 4-Apr 1986 2640
...
17/01/1987 17-Jan 1987 1130
18/01/1987 18-Jan 1987 1190
19/01/1987 19-Jan 1987 1100
20/01/1987 20-Jan 1987 864
21/01/1987 21-Jan 1987 895
22/01/1987 22-Jan 1987 962
23/01/1987 23-Jan 1987 998
24/01/1987 24-Jan 1987 1140
I'm trying to calculate the number of days preceding each date that the discharge exceeds 1000 cfs and put it in a new column ("DaysGreater1000") that will be used in a subsequent analysis.
In this example, DaysGreater1000 would be 0 for all of the dates in April 1986. DaysGreater1000 would be 1 on 20 Jan, 2 on 21 Jan, 3 on 22 Jan, etc.
Do I first need to create a column (event) of binary data for when the threshold is exceeded? I have been reading several old questions and it looks like I need to use ifelse but I can't figure out how to make a new column of data and then how to make the next step to calculate the number of preceding days.
Here are the questions that I have been examining:
Calculate days since last event in R
Calculate elapsed time since last event
... And this is the code that looks promising, but I can't quite put it all together!
df %>%
mutate(event = as.logical(event),
last_event = if_else(event, true = date, false = NA_integer_)) %>%
fill(last_event) %>%
mutate(event_age = date - last_event)
summary(df)
I'm sorry if I'm not being very eloquent! I'm feeling a bit rusty as I haven't used R in a while.
My dataset has 3000 observations (rows) and 24 variables (columns). First 12 (column 1 to 12) variables are price value from January to December in a year. The remaining 12 variables (column 13 to 24) are rent value from January to December in the same year. I would like to caculate the price/rent for each month and add the 12 new ratios columns to the end of my origianl dataset. How can I do this in R?
Thank you in advance!
You could do :
df[paste0('ratio', 1:12)] <- df[1:12]/df[13:24]
This will create 12 new columns called ratio1, ratio2 ... ratio12. where ratio1 is column1/column13 ratio2 is column2/column14 and so on.
I have an ohlc daily data for US stocks. I would like to derive a weekly timeseries from it and compute SMA and EMA. To be able to do that though, requirement is to create the weekly timeseries from the maximum high per week, and another weekly timeseries from the minimum low per week. After that I, would then compute their sma and ema then assign to every days of the week (one period forward). So, first problem first, how do I get the weekly from the daily using R (any package), or better if you can show me an algo for it, any language but preferred is Golang? Anyway, I can rewrite it in golang if needed.
Date High Low Week(High) Week(Low) WkSMAHigh 2DP WkSMALow 2DP
(one period forward)
Dec 24 Fri 6 3 8 3 5.5 1.5
Dec 23 Thu 7 5 5.5 1.5
Dec 22 Wed 8 5 5.5 1.5
Dec 21 Tue 4 4 5.5 1.5
Assume Holiday (Dec 20)
Dec 17 Fri 4 3 6 2 None
Dec 16 Thu 4 3
Dec 15 Wed 5 2
Dec 14 Tue 6 4
Dec 13 Mon 6 4
Dec 10 Fri 5 1 5 1 None
Dec 9 Thu 4 3
Dec 8 Wed 3 2
Assume Holiday (Dec 6 & 7)
I'd start by generating a column which specifies which week it is.
You could use the lubridate package to do this, that would require converting your dates into Date types. It has a function called week which returns the number of full 7 day periods that have passed since Jan 1st + 1. However I don't know if this data goes over several years or not. Plus I think there's a simpler way to do this.
The example I'll give below will simply do it by creating a column which just repeats an integer 7 times up to the length of your data frame.
Pretend your data frame is called ohlcData.
# Create a sequence 7 at a time all the way up to the end of the data frame
# I limit the sequence to the length nrow(ohlcData) so the rounding error
# doesn't make the vectors uneven lengths
ohlcData$Week <- rep(seq(1, ceiling(nrow(ohlcData)/7), each = 7)[1:nrow(ohlcData)]
With that created we can then go ahead and use the plyr package, which has a really useful function called ddply. This function applies a function to columns of data grouped by another column of data. In this case we will apply the max and min functions to your data based on its grouping by our new column Week.
library(plyr)
weekMax <- ddply(ohlcData[,c("Week", "High")], "Week", numcolwise(max))
weekMin <- ddply(ohlcData[,c("Week", "Low")], "Week", numcolwise(min))
That will then give you the min and max of each week. The dataframe returned for both weekMax and weekMin will have 2 columns, Week and the value. Combine these however you see fit. Perhaps weekExtreme <- cbind(weekMax, weekMin[,2]). If you want to be able to marry up date ranges to the week numbers it will just be every 7th date starting with whatever your first date was.
I am trying to sum daily rainfall values into monthly totals for a record over 100 years in length. My data takes the form:
Year Month Day Rain
1890 1 1 0
1890 1 2 3.1
1890 1 3 2.5
1890 1 4 15.2
In the example above I want R to sum all the days of rainfall in January 1890, then February 1890, March 1890.... through to December 2010. I guess what I'm trying to do is create a loop to sum values. My output file should look like:
Year Month Rain
1890 1 80.5
1890 2 72.4
1890 3 66.8
1890 4 77.2
Any easy way to do this?
Many thanks.
You can use dplyr for some pleasing syntax
library(dplyr)
df %>%
group_by(Year, Month) %>%
summarise(Rain = sum(Rain))
In some cases it can be beneficial to convert it to a time-series class like xts, then you can use functions like apply.monthly().
Data:
df <- data.frame(
Year = rep(1890,5),
Month = c(1,1,1,2,2),
Day = 1:5,
rain = rexp(5)
)
> head(df)
Year Month Day rain
1 1890 1 1 0.1528641
2 1890 1 2 0.1603080
3 1890 1 3 0.5363315
4 1890 2 4 0.6368029
5 1890 2 5 0.5632891
Convert it to xts and use apply.monthly():
library(xts)
dates <- with(df, as.Date(paste(Year, Month, Day), format("%Y %m %d")))
myXts <- xts(df$rain, dates)
> head(apply.monthly(myXts, sum))
[,1]
1890-01-03 0.8495036
1890-02-05 1.2000919
I am now learning R and using the SEAS package to help me with some calculation in R and data is the same format as SEAS package likes. It is a time series
require(seas)
data(mscdata)
dat.int <- (mksub(mscdata, id=1108447))
the heading of the data and it is 20 years of data
year yday date t_max t_min t_mean rain snow precip
However, I now need to calculate the number of days in each month rainfall is >= 1.0mm . So at the end of it. I would have two columns ( each month in each year and total # of days in each month rainfall>= 1.0mm )
I'm not certain how to write this code and any help would be appreciated
Thank you
Lam
I now need to calculate the number of days in each month rainfall is >= 1.0mm. So at the end of it. I would have two columns ( each month in each year and total # of days in each month rainfall>= 1.0mm )
1) So dat.int$date is a Date object. First step is you need to create a new column dat.int$yearmon extracting the year-month, e.g. using zoo::yearmon
Extract month and year from a zoo::yearmon object
require(zoo)
dat.int$yearmon <- as.yearmon(dat.int$date, "%b %y")
2) Second, you need to do a summarize operation (recommend you use plyr or the newer dplyr) on rain>=1.0 aggregated by yearmon. Let's name our resulting column rainy_days.
If you want to store rainy_days column back into the dat.int dataframe, you use a transform instead of a summarize:
ddply(dat.int, .(yearmon), transform, rainy_days=sum(rain >= 1.0) )
or else if you really just want a new summary dataframe:
require(plyr)
rainydays_by_yearmon <- ddply(dat.int, .(yearmon), summarize, rainy_days=sum(rain >= 1.0) )
print.data.frame(rainydays_by_yearmon)
yearmon rainy_days
1 Jan 1975 14
2 Feb 1975 12
3 Mar 1975 13
4 Apr 1975 6
5 May 1975 6
6 Jun 1975 5
...
355 Jul 2004 3
356 Aug 2004 7
357 Oct 2004 14
358 Nov 2004 16
359 Dec 2004 19
Note: you can do the above with plain old R, without using zoo or plyr/dplyr packages. But might as well teach you nicer, more scalable, maintainable code idioms.