Subtracting samples in R - r

In this code, I have selected a random sample of 30 and from this random sample of 30 taken a sample of 15.
I am stuck on how to subtract the sample of 15 that I took from the sample of 30. I.E subtract s1985 from b1985
Can someone help me please?
My code is below
function(df, n) df[sample(nrow(df), n), , drop = FALSE]
sample.df(subset(df, YEAR == "1985"), 30)
b1985 <-sample.df(subset(df, YEAR == "1985"), 30)
s1985 <-sample.df(subset(b1985), 15)
sample.df(subset(df, YEAR == "1986"), 30)
b1986 <-sample.df(subset(df, YEAR == "1986"), 30)
s1986 <-sample.df(subset(b1986), 15)

Hi all I have figured out how to do the subtraction.
Rename the rows so they are not the random rows drawn
row.names(b1985) <- 1:nrow(b1985)
row.names(s1985) <- 1:nrow(b1985)
row.names(b1986) <- 1:nrow(b1986)
row.names(s1986) <- 1:nrow(s1986)
Subtract by column number (in this case I want to add column 5)
b1985[1:30,5]-s1985[1:15,5]
b1986[1:30,5]-s1986[1:15,5]

Related

Identify a row that follow two conditions

I need to identify which row follow the two following conditions:
- The row before present monthly rainfall lower than 20
- The row after present monthly rainfall higher than 20
I'm trying to identify when planting season starts, for that I wanted to say (for example): The planting season will start in the month when the month before the precipitation was lower than 20 but the month after is higher. In this example I sent you that month will be October because in September the rainfall is equal to 2 but in November equal to 100. I need to write a function that gives me the index corresponding to that month.
df <- data.frame(month=c(1:12),monthly_rainfall=c(60,67,164,65,5,3,0,1,2,24,100,102))
Thank you
You can use the lead() and lag() functions along with filter()
library(dplyr)
df %>%
filter(lag(monthly_rainfall) < 20,
lead(monthly_rainfall) > 20)
month monthly_rainfall
1 9 2
2 10 24
You can get the index with
df %>% mutate(planting_season = lag(monthly_rainfall) < 20 & lead(monthly_rainfall) > 20) %$%
planting_season %>%
which()
[1] 9 10
Or you could get the month with:
df %>% filter(lag(monthly_rainfall) < 20,
lead(monthly_rainfall) > 20) %$%
month
[1] 9 10
Base R solution:
get_months <- function(x, high = 20, low = 20) {
n <- length(x)
which(c(NA, x[1:(n-1)]) < high & c(x[2:n], NA) > low)
}
Then you can call it like this:
get_months(df$monthly_rainfall)
# 9 10

calculation of week number in R

I have found many answer regarding the week number of a particular date. What I want is to get a week number for 2 years i.e for first year it will give 1 to 53 weeks and then keep the count from 53 only and should not start with 1 again. Is it possible in R?. Example data is shown below:
We can use rep to add 53 to the vector ('vN2') after finding the number of observations for each year.
vN2 + rep(c(0, 53), tapply(vN2, cumsum(c(TRUE, diff(vN2) < 0)), FUN = length))
data
set.seed(24)
vN <- rep(1:53, sample(1:5, 53, replace=TRUE))
vN1 <- rep(1:53, sample(1:6, 53, replace=TRUE))
vN2 <- c(vN, vN1)

R Programming 30 day Months

I'm currently writing a script in the R Programming Language and I've hit a snag.
I have time series data organized in a way where there are 30 days in each month for 12 months in 1 year. However, I need the data organized in a proper 365 days in a year calendar, as in 30 days in a month, 31 days in a month, etc.
Is there a simple way for R to recognize there are 30 days in a month and to operate within that parameter? At the moment I have my script converting the number of days from the source in UNIX time and it counts up.
For example:
startingdate <- "20060101"
endingdate <- "20121230"
date <- seq(from = as.Date(startingdate, "%Y%m%d"), to = as.Date(endingdate, "%Y%m%d"), by = "days")
This would generate an array of dates with each month having 29 days/30 days/31 days etc. However, my data is currently organized as 30 days per month, regardless of 29 days or 31 days present.
Thanks.
The first 4 solutions are basically variations of the same theme using expand.grid. (3) uses magrittr and the others use no packages. The last two work by creating long sequence of numbers and then picking out the ones that have month and day in range.
1) apply This gives a series of yyyymmdd numbers such that there are 30 days in each month. Note that the line defining yrs in this case is the same as yrs <- 2006:2012 so if the years are handy we could shorten that line. Omit as.numeric in the line defining s if you want character string output instead. Also, s and d are the same because we have whole years so we could omit the line defining d and use s as the answer in this case and also in general if we are always dealing with whole years.
startingdate <- "20060101"
endingdate <- "20121230"
yrs <- seq(as.numeric(substr(startingdate, 1, 4)), as.numeric(substr(endingdate, 1, 4)))
g <- expand.grid(yrs, sprintf("%02d", 1:12), sprintf("%02d", 1:30))
s <- sort(as.numeric(apply(g, 1, paste, collapse = "")))
d <- s[ s >= startingdate & s <= endingdate ] # optional if whole years
Run some checks.
head(d)
## [1] 20060101 20060102 20060103 20060104 20060105 20060106
tail(d)
## 20121225 20121226 20121227 20121228 20121229 20121230
length(d) == length(2006:2012) * 12 * 30
## [1] TRUE
2) no apply An alternative variation would be this. In this and the following solutions we are using yrs as calculated in (1) so we omit it to avoid redundancy. Also, in this and the following solutions, the corresponding line to the one setting d is omitted, again, to avoid redundancy -- if you don't have whole years then add the line defining d in (1) replacing s in that line with s2.
g2 <- expand.grid(yr = yrs, mon = sprintf("%02d", 1:12), day = sprintf("%02d", 1:30))
s2 <- with(g2, sort(as.numeric(paste0(yr, mon, day))))
3) magrittr This could also be written using magrittr like this:
library(magrittr)
expand.grid(yr = yrs, mon = sprintf("%02d", 1:12), day = sprintf("%02d", 1:30)) %>%
with(paste0(yr, mon, day)) %>%
as.numeric %>%
sort -> s3
4) do.call Another variation.
g4 <- expand.grid(yrs, 1:12, 1:30)
s4 <- sort(as.numeric(do.call("sprintf", c("%d%02d%02d", g4))))
5) subset sequence Create a sequence of numbers from the starting date to the ending date and if each number is of the form yyyymmdd pick out those for which mm and dd are in range.
seq5 <- seq(as.numeric(startingdate), as.numeric(endingdate))
d5 <- seq5[ seq5 %/% 100 %% 100 %in% 1:12 & seq5 %% 100 %in% 1:30]
6) grep Using seq5 from (5)
d6 <- as.numeric(grep("(0[1-9]|1[0-2])(0[1-9]|[12][0-9]|30)$", seq5, value = TRUE))
Here's an alternative:
date <- unclass(startingdate):unclass(endingdate) %% 30L
month <- rep(1:12, each = 30, length.out = NN <- length(date))
year <- rep(1:(NN %/% 360 + 1), each = 360, length.out = NN)
(of course, we can easily adjust by adding constants to taste if you want a specific day to be 0, or a specific month, etc.)

jumping average column at every n-th rows

Please help me on this..
so I have daily observations (data frame) for 32-year period. (thus total around 11659 rows: there's some missing rows)
I want to calculate average of each column at every 365th interval (i.e. every jan-01 for 32 year period, every Jan-02 for 32 year period, etc.
so the output would have total 365 rows and each row is average of 32 rows at 365 interval.
any suggestions? I found similar case and tried their solution and modified a bit but the output is not correct. especially I don't understand sapply part below..
df <-data.frame(x=c(1:10000),y=c(1:10000))
byapply <- function(x, by, fun, ...)
{
# Create index list
if (length(by) == 1)
{
nr <- nrow(x)
split.index <- rep(1:ceiling(nr / by), each = by, length.out = nr)
} else
{
nr <- length(by)
split.index <- by
}
index.list <- split(seq(from = 1, to = nr), split.index)
# Pass index list to fun using sapply() and return object #this is where I am lost
sapply(index.list, function(i)
{
do.call(fun, list(x[, i], ...))
})
}
thank you for your time..
How about using the plyr package:
require(plyr) # for aggregating data
require(plyr) # for aggregating data
series<-data.frame(date=as.Date("1964-01-01")+(1:100000),
obs=runif(10000),
obs2=runif(10000),
obs3=runif(10000))
ddply(series, # run on series df
.(DOY=format(date,"%j")), # group by string of day and month (call col DOY)
summarise, # tell the function to summarise by group (day of year)
daymean=mean(obs), # calculate the mean
daymean2=mean(obs2), # calculate the mean
daymean3=mean(obs3) # calculate the mean
)
# DOY daymean daymean2 daymean3
#1 001 0.4957763 0.4882559 0.4944281
#2 002 0.5184197 0.4970996 0.4720893
#3 003 0.5192313 0.5185357 0.4878891
#4 004 0.4787227 0.5150596 0.5317068
#5 005 0.4972933 0.5065012 0.4956527
#6 006 0.5112484 0.5276013 0.4785681
#...
Although there's possibly a special function, which does exactly what you need, here is a solution using ave:
set.seed(1)
dates = seq(from=as.Date("1970-01-01"), as.Date("2000-01-01"), by="day")
df <- data.frame(val1=runif(length(dates)),
val2=rchisq(length(dates), 10))
day <- format(dates, "%j") # day of year (1:366)
df <- cbind(df, setNames(as.data.frame(sapply(df, function(x) {
ave(x, day) # calculate mean by day for df$val1 and df$val2
})), paste0(names(df), "_mean")))
head(df[1:365, 3:4], 3)
# val1_mean val2_mean
# 1 0.5317151 10.485001
# 2 0.5555664 10.490968
# 3 0.6428217 10.763027
That is, if I understood your task correctly.

Count the number of Fridays or Mondays in Month in R

I would like a function that counts the number of specific days per month..
i.e.. Nov '13 -> 5 fridays.. while Dec'13 would return 4 Fridays..
Is there an elegant function that would return this?
library(lubridate)
num_days <- function(date){
x <- as.Date(date)
start = floor_date(x, "month")
count = days_in_month(x)
d = wday(start)
sol = ifelse(d > 4, 5, 4) #estimate that is the first day of the month is after Thu or Fri then the week will have 5 Fridays
sol
}
num_days("2013-08-01")
num_days(today())
What would be a better way to do this?
1) Here d is the input, a Date class object, e.g. d <- Sys.Date(). The result gives the number of Fridays in the year/month that contains d. Replace 5 with 1 to get the number of Mondays:
first <- as.Date(cut(d, "month"))
last <- as.Date(cut(first + 31, "month")) - 1
sum(format(seq(first, last, "day"), "%w") == 5)
2) Alternately replace the last line with the following line. Here, the first term is the number of Fridays from the Epoch to the next Friday on or after the first of the next month and the second term is the number of Fridays from the Epoch to the next Friday on or after the first of d's month. Again, we replace all 5's with 1's to get the count of Mondays.
ceiling(as.numeric(last + 1 - 5 + 4) / 7) - ceiling(as.numeric(first - 5 + 4) / 7)
The second solution is slightly longer (although it has the same number of lines) but it has the advantage of being vectorized, i.e. d could be a vector of dates.
UPDATE: Added second solution.
There are a number of ways to do it. Here is one:
countFridays <- function(y, m) {
fr <- as.Date(paste(y, m, "01", sep="-"))
to <- fr + 31
dt <- seq(fr, to, by="1 day")
df <- data.frame(date=dt, mon=as.POSIXlt(dt)$mon, wday=as.POSIXlt(dt)$wday)
df <- subset(df, df$wday==5 & df$mon==df[1,"mon"])
return(nrow(df))
}
It creates the first of the months, and a day in the next months.
It then creates a data frame of month index (on a 0 to 11 range, but we only use this for comparison) and weekday.
We then subset to a) be in the same month and b) on a Friday. That is your result set, and
we return the number of rows as your anwser.
Note that this only uses base R code.
Without using lubridate -
#arguments to pass to function:
whichweekday <- 5
whichmonth <- 11
whichyear <- 2013
#function code:
firstday <- as.Date(paste('01',whichmonth,whichyear,sep="-"),'%d-%m-%Y')
lastday <- if(whichmonth == 12) { '31-12-2013' } else {seq(as.Date(firstday,'%d-%m-%Y'), length=2, by="1 month")[2]-1}
sum(
strftime(
seq.Date(
from = firstday,
to = lastday,
by = "day"),
'%w'
) == whichweekday)

Resources