How to represent two columns of year and day as a date? - r

I have a text file consisting of 3 columns as shown below. the measurements are taken each day for several years (2001-2013). I want to plot a time series for valu1 but as the year and day are separated I have a problem:
to read the file:
LR=read.table("C:\\Users\\dat.txt", sep ='', header =TRUE)
header:
head(LR)
Year day valu1
1 2001 1 0
2 2001 2 1
3 2001 3 2
4 2001 4 0
5 2001 5 0.30
6 2001 6 0
I tried this:
LR$Year=as.Date(as.character(LR$Year))
Error in `$<-.data.frame`(`*tmp*`, "Year", value = numeric(0)) :
replacement has 0 rows, data has .
I do not know if all days are available so I wonder if we can tell R that if a date is missing, just replace it with NA but still consider the missing date in the plot with no value in the plot.

You can try:
LR$date <- as.Date(paste(LR$Year, LR$day, sep = "-"), format = "%Y-%j")
I assumed here that day is the day of the year, so something that goes from 1 - 366. This is the %j in the format.

Related

How to sample from smaller data frame with multiple conditions to a larger data frame?

I have a main dataset df.main with 3 sites and each site has 3 subsites, that were samlpled over three months. I have a separate dataset with some abiotic variables ONLY for a single month df.sample. But for each site, I have three values from the sub-sites. In my original dataset, I need to add the abitoic column. However, for every month, for each sub-site I only want to SAMPLE with replacement from one of the three samples from the sub.site.
set.seed(111)
##Main Data Set
month <- rep(c("Jan","Feb","Mar"), each =9 )
site <- rep(c("1","2","3","1","2","3","1","2","3"), each = 3)
sub.site <- rep(c(1,2,3,1,2,3,1,2,3), time = 3 )
df.main <- data.frame(month, site, sub.site)
month site sub.site
Jan 1 1
Jan 1 2
Jan 1 3
Jan 2 1
Jan 2 2
... .. ..
Mar 3 3
##Sampler Data Set
site <- rep(c(1,2,3), time = 9)
sub.site <- rep(c(1,1,1,2,2,2,3,3,3), each = 3)
abiotic <- rnorm(27,7,1)
df.sample <- data.frame(site, sub.site, abiotic)
site sub.site abiotic
1 1 7.175096
1 1 8.805868
1 1 6.783571
1 2 7.910917
1 2 7.202307
1 2 8.404883
...
2 1 7.122915
2 1 6.152732
...
3 1 7.978232
3 1 6.870228
##Desired Output in df.main
month site sub.site abiotic
Jan 1 1 8.805868
Jan 1 2 7.910917
You can do a full join of the two tables using site and sub.site, and then just sample one row from each month, site and sub.site combination.
If you are unsure about table joining (full join, left join, etc.), you may want to look that up online. It is very simple and standard in, say querying database.
In your case, after the full joining, because you have 9 unique combination of site and sub.site, you will have 81 rows:
joining.output <- df.main %>%
full_join(df.sample, by = c("site", "sub.site"))
> joining.output
month site sub.site abiotic
1 Jan 1 1 7.235221
2 Jan 1 1 4.697654
3 Jan 1 1 5.502573
...
28 Feb 1 1 7.235221
29 Feb 1 1 4.697654
30 Feb 1 1 5.502573
...
55 Mar 1 1 7.235221
56 Mar 1 1 4.697654
57 Mar 1 1 5.502573
Then to sample 1 row for each site and sub.site combination for each month, just group by the 3 variables and sample.
Here is the code that puts everything together:
output <- df.main %>%
full_join(df.sample, by = c("site", "sub.site")) %>%
group_by(month, site, sub.site) %>%
slice_sample(n=1)
p.s. in your example, df.main$sub.site is a character array but df.sample$sub.site is a numeric array. You may want to convert the character array to numeric using as.double() function before joining.

Count number of days since a specific date [duplicate]

This question already has answers here:
Get the difference between dates in terms of weeks, months, quarters, and years
(9 answers)
Closed 6 years ago.
I have got a dataframe with a column Date in which the observations range from 1974-10-01 to 2014-30-09. I would like to create a new column ("Day") in the dataframe which specify the number of day since the first time period day (i.e. 1974-10-01).
I already have the code and it worked perfectly for a really similar dataframe but I do not know why with this 2nd dataframe it does not work.
1) The code is the following:
library(lubridate)
ref_date <- dmy("01-10-1974")
df$Day <- as.numeric(difftime(df$Date, ref_date))
2) The first rows of my dataframe are:
Code Area Date Height
1 2001 551.4 1975-04-01 120.209
2 2001 551.4 1976-01-06 158.699
3 2001 551.4 1977-01-21 128.289
4 2001 551.4 1978-02-23 198.254
5 2001 551.4 1979-07-31 131.811
[....]
3) What I obtain with my code (1) is the following:
Code Area Date Day Height
1 2001 551.4 1975-04-01 15724800 120.209
2 2001 551.4 1976-01-06 39916800 158.699
3 2001 551.4 1977-01-21 72835200 128.289
4 2001 551.4 1978-02-23 107222400 198.254
5 2001 551.4 1979-07-31 152409600 131.811
[....]
I spent more than 2 hours wondering why without any clue.
Any suggestion?
Another option
difftime(ref_date,df$Date,units = "days")
Are you looking for something like the example below :
df <- data.frame(Date = c("1975-04-01"))
> df
Date
1 1975-04-01
df$new_col <- as.Date(as.character(df$Date), format="%Y-%m-%d") - as.Date(as.character("1974-10-01"), format="%Y-%m-%d")
> df
Date new_col
1 1975-04-01 182 days
>
Your code seems to work as long as the Date is a character column.
library(lubridate)
ref_date <- dmy("01-10-1974")
df<- data.frame(Code=2001, Area=551.4, Date=c("1975-04-01","1976-01-06","1977-01-21","1978-02-23","1979-07-31"), Height=c(120.209, 158.699, 128.289, 198.254, 131.811))
df$Day <- as.numeric(difftime(df$Date, ref_date))

New column from non-standard date factor in R

I have a dataframe with an oddly formatted dates column. I'd like to create a column just showing the year from the original date column and I am having trouble coming up with a way to do this because the current date column is being treated as a factor. Any advice on how to do this efficiently would be appreciated.
Example
starting with:
org <- c("a","b","c","d")
country <- c("1","2","3","4")
date <- c("01-09-14","01-10-07","11-31-99","10-31-12")
toy <- data.frame(cbind(org,country,date))
toy
org country date
1 a 1 01-09-14
2 b 2 01-10-07
3 c 3 11-31-99
4 d 4 10-31-12
str(toy$date)
Factor w/ 4 levels "01-09-14","01-10-07",..: 1 2 4 3
Desired result:
org country Year
1 a 1 2014
2 b 2 2007
3 c 3 1999
4 d 4 2012
This should work:
transform(toy,Year=format(strptime(date,"%m-%d-%y"),"%Y"))
This produces
## org country date Year
## 1 a 1 01-09-14 2014
## 2 b 2 01-10-07 2007
## 3 c 3 11-31-99 <NA>
## 4 d 4 10-31-12 2012
I initially thought that the NA value was because the %y format indicator wasn't smart enough to handle previous-century dates, but ?strptime says:
‘%y’ Year without century (00-99). On input, values 00 to 68 are
prefixed by 20 and 69 to 99 by 19 - that is the behaviour
specified by the 2004 and 2008 POSIX standards, but they do
also say ‘it is expected that in a future version the default
century inferred from a 2-digit year will change’.
implying that it should be able to handle it.
The problem is actually that 31 November doesn't exist ...
(You can drop the date column at your leisure ...)

as.Date is throwing a row number mismatch, but all vectors are same length

The following (CSV) dataset has 3133 rows of expenses by day between 7/1/2000 and 12/31/2014:
head(d_exp_0014)
2000 7 6 792078.595 9
2000 7 7 140065.5 9
2000 7 11 190553.2 9
2000 7 12 119208.65 9
2000 7 16 1068156.293 9
2000 7 17 0 9
2000 7 21 457828.8033 9
2000 7 26 661445.0775 9
2000 7 28 211122.82 9
2000 8 2 273575.1733 8
The columns here are Year, Month, Day, Expense, and Count (for how many days of the each month had an expense).
I am trying to do a forecast out to the end of 2015, and need to deal with these messy date columns so I can slice and dice xts (?) objects with dplyr. ISOdate and as.Date functions are throwing this error:
> exp <- data.frame(data = d_exp_0014, Date = as.Date(paste(Year, Month, Day), format = "m%/d%/Y%"), Amount = Amount, Count = Count, t = c(1:3133))
Error in data.frame(data = d_exp_0014, Date = as.Date(paste(Year, Month, :
arguments imply differing number of rows: 3133, 3134
> length(d_exp_0014$Year)
[1] 3133
> length(d_exp_0014$Month)
[1] 3133
> length(d_exp_0014$Day)
[1] 3133
What am I doing wrong? And should I instead build a vector of 5296 continuous dates between 7/1/2000 and 12/31/2014 and merge my 3133 rows of observations to this table (thus effectively inserting '0' in the Amount column for days on which there were no payments)?
Several errors (but not from paste): I'm guessing you were taught to use attach. That is probably the source of this particular error. Start by
detach(d_exp_0014)
d_exp_0014 <- cbind(d_exp_0014,
myDate = with(d_exp_0014,
as.Date(paste(Year, Month, Day, sep="/"),
format = "%Y/%m/%d") # note % first then letter
)
)
Then you can add further columns as needed.

How to compute the daily average from hourly values?

I have a text file consisting of 6 columns as shown below. the measurements are taken each 30 mint for several years (2001-2013). I want to compute the daily average so for example: for 2001 take all values correspond to the first day (1) and compute the average and do this for all days in that year and also for all years available in the text file.
to read the file:
LR=read.table("C:\\Users\\dat.txt", sep ='', header =TRUE)
header:
head(LR)
Year day hour mint valu1 valu2
1 2001 1 5 30 0 0
2 2001 1 6 0 1 0
3 2001 1 6 30 2 0
4 2001 1 7 0 0 7
5 2001 1 7 30 5 8
6 2001 1 8 0 0 0
Try:
library(plyr)
ddply(LR, .(Year, day), summarize, val = mean(valu1))
And another less elegant option:
LR$n <- paste(LR$Year, LR$day, sep="-")
tapply(LR$valu1, LR$n, FUN=mean)
If you want to select a certain range of years use subset:
dat < ddply(LR, .(Year, day), summarize, val = mean(valu1))
subset(dat, Year > 2003 & Year < 2005)
You can try aggregate:
res <- aggregate(LR, by = list(paste0(dat$Year, dat$day)), FUN = mean)
## You can remove the extra columns if you want
res[, -c(1,4,5)]
Or as Michael Lawrence suggests, using the formula interface:
aggregate(cbind(valu1, valu2) ~ Year + day, LR, mean)

Resources