I have several dates of measurement for the same specimen. I'm trying to figure out the first day I have and the 2nd, 3rd, 4th...up to 6th day.
here is the data
First I took data$start and split it
#split timestamp into separate date and time vars
temp<-strsplit(as.character(data$start), " ")
mat<-matrix(unlist(temp), ncol=2, byrow=TRUE)
df<-as.data.frame(mat)
colnames(df)<-c("date", "time")
data<-cbind(df, data)
then
data$date<-as.Date(data$date, "%Y-%m-%d")
data$dob <- ave(as.numeric(data$date), data$mcode, FUN = min)
data$dob <- data$dob - 1
data$pnday <- as.numeric(data$date) - data$dob
Both pnd and dob columns have an NA -- sorry if this is silly, any ideas?
I'm new to working with dates/times in R
If you format your data as an xts, it will automatically be ordered by date from first to last. To do this, you'll need to make your timestamp readings into POSIXct instances and pass them to the order.by parameter of its constructor. I hope this helps you. If you need further help, please leave a comment, and I'll dig into your data.
Related
I know this question has probably been answered in different ways, but still struggling with this. I am working with a dataset where the dates format for date1 is '2/1/2000', '5/12/2000', '6/30/2015' where the class() is character. And the second column of dates date2 in the format '2015-07-06', '2015-08-01', '2017-10-09' where the class() is "POSIXct" "POSIXt" .
I am attempting to standardize both columns so I can compute the difference in days between them using something like this
abs(difftime(date1 ,date2 , units = c("days")))
I have tried numerous ways in converting the first date1 into the same class using strtime, lubridate etc. What's the best way to move forward for me to be able to standardize both and compute the difference in days?
sample data
x <- c('2/1/2000', '5/12/2000', '6/30/2015')
y <- as.POSIXct(c('2015-07-06', '2015-08-01', '2017-10-09'))
code
#make both posixct
x2 <- as.POSIXct(x, format = "%m/%d/%Y")
abs(x2 - y)
# Time differences in days
# [1] 5633.958 5559.000 832.000
I have a data.frame in R in which includes two variables with a Start-Date and an End-Date. I would like to add a new column with the number of days between the two dates and reduce the result by the number of sundays in each interval. I tried it like below but it doesn't work:
Data$Start <- as.Date(Data$Start, "%d.%m.%y")
Data$End <- as.Date(Data$End,"%d.%m.%y")
interval <- difftime(Data$Start, Data$End, units = "days")
sundays <- seq(from = Data$Start, to = Data$End, by = "days")
number.sundays <- length(which(wday(sundays)==1))
Data$DaysAhead <- interval - number.sundays
I get the error message in the seq() function, that it has to have the lenght 1 but I don't understand how I can handle this. Can somenone help me out with that?
Here's an example that works:
Data <- data.frame(
Start = c("01.01.2020", "01.06.2020"),
End = c("01.03.2020", "01.09.2020")
)
Data$Start <- as.Date(Data$Start, "%d.%m.%Y")
Data$End <- as.Date(Data$End,"%d.%m.%Y")
interval <- difftime(Data$End, Data$Start, units = "days")
sundays <- lapply(1:nrow(Data), function(i)seq(from = Data$Start[i], to = Data$End[i], by = "days"))
number.sundays <- sapply(sundays, function(x)length(which(lubridate::wday(x)==1)))
Data$DaysAhead <- interval - number.sundays
The problem is that seq() isn't vectorized, it assumes a single start and single end point. If you putt it inside of a loop (like lapply()) it will work and generate the relevant sequence for each start and end time. Then you can use sapply() to figure out how many sundays and since the returned value is a scalar, the return from sapply() will be a vector of the same length as interval.
I realized with an updated data set that there's a problem with the solution above, when Start-Date and End-Date aren't in the same year. I still want to count the days except Sundays starting on the 20.12.2020 until 10.01.2021 for example. The error message showing up in that case is that the sign with the argument "by" is wrong. I just can't manage to get it running . If I turn the dates around, the output makes no sense and the number of days is too high. What do I have to do to get this running over the year-end?
I want to automatise a code that calculates transport times. I would like that the code gives me 4 months that you can choose out of a big readout from a year and splits up the last month in its four weeks and just describes the data subsets (describing is not the problem).
Generating subsets from a dataset for chosen months is not the problem because I can define the months.
But where I struggle is the 3/4 weeks of the last month. I need to identify them automatically and after that generate the subsets. (I hope that generating subsets should be easier after identifying.)
I can give you a little mock-up of my data.
dates <- as.Date(c("2019-01-07", "2019-01-08", "2019-01-09",
"2019-01-15", "2019-01-21"))
number <- c(12,13,14,15,20)
df <- data.frame(number, dates)
The original df contains of 60 variables but I believe this simple mockup can provide enough info for the task.
I am pretty new to r, I have no idea how to solve the problem, I will show you how I solved it with the months, but as said, in this case they are defined.
function(data = df, m1 = "01" , m2 = "02") {
Monat1 <- subset(data, format.Date(dates , "%m") == m1)
Thank you for helping me out a bit.
you can use the function strftime
strftime(df$dates, format = "%W")
in rstudio use
?strftime()
to see all the different values you can extract from a date or POSIXCT object
You can do it using base R and lubridate
Data
dates <- as.Date(c("2019-01-07", "2019-01-08", "2019-01-09",
"2019-01-15", "2019-01-21"))
number <- c(12,13,14,15,20)
df <- data.frame(number, dates)
str(df)
Answer
library(lubridate)
df$condition <- ifelse(month(df$dates) == month(Sys.Date())-1,week(df$dates),"-")
condition will check if the date is less than a month ago or not and if yes it will give you week number for that particular value
For annual data (such as Annual Income Statements), I would like to keep xts format but I need to convert the index of the table to "only year". There are yearmon and yearqtr classes but I did not find "year" only class to work with xts.
# IS is annual reports of incomes. time(IS) is POSIXct.
library(quantmod)
IS <- viewFin(get(getFin("IBM")), "IS", "A") # Download data
IS <- as.xts(t(IS)) # Convert to xts
time(IS) <- as.yearqtr(time(IS)) ## works to have quarterly index
time(IS) <- as.yearmon(time(IS)) ## works to have monthly index
time(IS) <- ????(time(IS)) ## To have yearly index with xts class
What is the best solution? Thank you.
It would be helpful if you explained why you need to have the index as "year" only. Xts has an indexFormat command that allows you to control how dates are displayed, and while I've never used it I assume it will allow you to display only the year of any given index entry.
A more extreme solution would be to convert every date to the first of the year in that year. Here's some code to help do this:
first.of.year <- function(x) # Given a date, returns the first day of that year
return(as.Date(paste(year(as.Date(x)),"-01-01", sep="")))
index(x) <- first.of.year(index(x))
I am trying to understand why R behaves differently with the "aggregate" function. I wanted to average 15m-data to hourly data. For this, I passed the 15m-data together with a pre-designed "hour" array (4 times the same date per hour, taking the original POSIXct array) to the aggregate function.
After some time, I realized that the function was behaving odd (well, probably the data was odd, but why?) when giving over the date-array with
strftime(data.15min$posix, format="%Y-%m-%d %H")
However, if I handed over the data with
cut(data.15min$posix, "1 hour")
the data was averaged correctly.
Below, a minimal example is embedded, including a sample of the data.
I would be happy to understand what I did wrong.
Thanks in advance!
d <- 3
bla <- read.table("test_daten.dat",header=TRUE,sep=",")
data.15min <- NULL
data.15min$posix <- as.POSIXct(bla$dates,tz="UTC")
data.15min$o3 <- bla$o3
hourtimes <- unique(as.POSIXct(paste(strftime(data.15min$posix, format="%Y-%m-%d %H"),":00:00",sep=""),tz="Universal"))
agg.mean <- function (xx, yy, rm.na = T)
# xx: parameter that determines the aggregation: list(xx), e.g. hour etc.
# yy: parameter that will be aggregated
{
aa <- yy
out.mean <- aggregate(aa, list(xx), FUN = mean, na.rm=rm.na)
out.mean <- out.mean[,2]
}
#############
data.o3.hour.mean <- round(agg.mean(strftime(data.15min$posix, format="%m/%d/%y %H"), data.15min$o3), d); data.o3.hour.mean[1:100]
win.graph(10,5)
par(mar=c(5,15,4,2), new =T)
plot(data.15min$posix,data.15min$o3,col=3,type="l",ylim=c(10,60)) # original data
par(mar=c(5,15,4,2), new =T)
plot(data.date.hour_mean,data.o3.hour.mean,col=5,type="l",ylim=c(10,60)) # Wrong
##############
data.o3.hour.mean <- round(agg.mean(cut(data.15min$posix, "1 hour"), data.15min$o3), d); data.o3.hour.mean[1:100]
win.graph(10,5)
par(mar=c(5,15,4,2), new =T)
plot(data.15min$posix,data.15min$o3,col=3,type="l",ylim=c(10,60)) # original data
par(mar=c(5,15,4,2), new =T)
plot(data.date.hour_mean,data.o3.hour.mean,col=5,type="l",ylim=c(10,60)) # Correct
Data:
Download data
Too long for a comment.
The reason your results look different is that aggregate(...) sorts the results by your grouping variable(s). In the first case,
strftime(data.15min$posix, format="%m/%d/%y %H")
is a character vector with poorly formatted dates (they do not sort properly). So the first row corresponds to the "date" "01/01/96 00".
In your second case,
cut(data.15min$posix, "1 hour")
generates actual POSIXct dates, which sort properly. So the first row corresponds to the date: 1995-11-04 13:00:00.
If you had used
strftime(data.15min$posix, format="%Y-%m-%d %H")
in your first case you would have gotten the same result as using cut(...)