I have a data frame with 3262 rows and 10 columns. One of the columns has date with format YY-MM-DD. I want to store all the rows with 10 different dates in a different data frame so I tried :
newdata= df[df$Date %in% as.Date(c('2011-08-05','2012-1-13','2012-03-2','2014-04-01')),]
but nothing. I thought it might need to specify again the format so I tried:
df$Date <- as.Date( as.character(df$Date), "%d-%m-%y")
newdata= df[df$Date %in% as.Date(c('2011-08-05','2012-1-13','2012-03-2','2014-04-01')),]
All I get is an empty data frame saying no data available in table. At which point I made the mistake (something stupid I guess)?
I created an example:
set.seed(1)
df=data.frame(col1=seq(1,10),
col2=seq(1,0),
Date=as.Date(floor(runif(min=15550,max=17000,n=10)),origin="1970-01-01"))
> df
col1 col2 Date
1 1 1 2013-08-17
2 2 0 2014-01-19
3 3 1 2014-11-06
4 4 0 2016-03-06
5 5 1 2013-05-17
6 6 0 2016-02-21
7 7 1 2016-04-28
8 8 0 2015-03-14
9 9 1 2015-01-27
10 10 0 2012-10-26
Using the same code you provided:
newdata= df[df$Date %in% as.Date(c('2013-08-17','2015-01-27')),]
Gives me
> newdata
col1 col2 Date
1 1 1 2013-08-17
9 9 1 2015-01-27
Are you sure that str(df$Date) shows you really have a Date format?
> str(df$Date)
Date[1:10], format: "2013-08-17" "2014-01-19" "2014-11-06" "2016-03-06" "2013-05-17" "2016-02-21" "2016-04-28" "2015-03-14" "2015-01-27" "2012-10-26"
Related
I am trying to use the prepData function in the R package moveHMM. I am getting "Error in prepData(x, coordNames = c("lon", "lat")) : Each animal's obervations must be contiguous."
x is a data.frame with column names "ID", "long", "lat". ID column is the name of each animal as a character, and lon/lat are numeric. There are no NA values, no missing rows.
I do not know what this error means nor can I fix it. Help please.
x <- data.frame(dat$ID, dat$lon, dat$lat)
hmmgps <- prepData(x, coordNames=c("lon", "lat"))
The function prepData assumes that the rows for each track (or each animal) are grouped together in the data frame. The error message indicates that it is not the case, and that at least one track is split. For example, the following (artificial) data set would cause this error:
> data
ID lon lat
1 1 54.08658 12.190313
2 1 54.20608 12.101203
3 1 54.18977 12.270896
4 2 55.79217 9.943341
5 2 55.88145 9.986028
6 2 55.91742 9.887342
7 1 54.25305 12.374541
8 1 54.28061 12.190078
This is because the track with ID "1" is split into two parts, separated by the track with ID "2".
The tracks need to be contiguous, i.e. all observations with ID "1" should come first, followed by all observations with ID "2". One possible solution would be to order the data by ID and by date.
Consider the same data set, with a "date" column:
> data
ID lon lat date
1 1 54.08658 12.190313 2019-09-06 14:20:00
2 1 54.20608 12.101203 2019-09-06 15:20:00
3 1 54.18977 12.270896 2019-09-06 16:20:00
4 2 55.79217 9.943341 2019-09-04 07:55:00
5 2 55.88145 9.986028 2019-09-04 08:55:00
6 2 55.91742 9.887342 2019-09-04 09:55:00
7 1 54.25305 12.374541 2019-09-06 17:20:00
8 1 54.28061 12.190078 2019-09-06 18:20:00
Following the answer to that question, you can define the ordered data set with:
> data_ordered <- data[with(data, order(ID, date)),]
> data_ordered
ID lon lat date
1 1 54.08658 12.190313 2019-09-06 14:20:00
2 1 54.20608 12.101203 2019-09-06 15:20:00
3 1 54.18977 12.270896 2019-09-06 16:20:00
7 1 54.25305 12.374541 2019-09-06 17:20:00
8 1 54.28061 12.190078 2019-09-06 18:20:00
4 2 55.79217 9.943341 2019-09-04 07:55:00
5 2 55.88145 9.986028 2019-09-04 08:55:00
6 2 55.91742 9.887342 2019-09-04 09:55:00
Then, the ordered data (excluding the date column) can be passed to prepData:
> hmmgps <- prepData(data_ordered[,1:3], coordNames = c("lon", "lat"))
> hmmgps
ID step angle x y
1 1 16.32042 NA 54.08658 12.190313
2 1 18.85560 2.3133191 54.20608 12.101203
3 1 13.37296 -0.6347523 54.18977 12.270896
4 1 20.62507 -2.4551318 54.25305 12.374541
5 1 NA NA 54.28061 12.190078
6 2 10.86906 NA 55.79217 9.943341
7 2 11.60618 -1.6734604 55.88145 9.986028
8 2 NA NA 55.91742 9.887342
I hope that this helps.
I'm working on time-series analyses and I'm hoping to develop multiple datasets with different units of analysis. Namely: the units in data set 1 will be districts in country X for 2-week periods within a span of 4 years (districtYearPeriodCode), the units in data set 2 will be districts in country X for 4-week periods within a span of 4 years, and so forth.
I have created a number of data frames containing start and end dates for each interval, as well as an interval ID. The one below is for the 2-week intervals.
begin <- seq(ymd('2004-01-01'),ymd('2004-06-30'), by = as.difftime(weeks(2)))
end <- seq(ymd('2004-01-14'),ymd('2004-06-30'), by = as.difftime(weeks(2)))
interval <- seq(1,13,1)
df2 <- data.frame(begin, end, interval)
begin end interval
1 2004-01-01 2004-01-14 1
2 2004-01-15 2004-01-28 2
3 2004-01-29 2004-02-11 3
4 2004-02-12 2004-02-25 4
5 2004-02-26 2004-03-10 5
6 2004-03-11 2004-03-24 6
7 2004-03-25 2004-04-07 7
8 2004-04-08 2004-04-21 8
9 2004-04-22 2004-05-05 9
10 2004-05-06 2004-05-19 10
11 2004-05-20 2004-06-02 11
12 2004-06-03 2004-06-16 12
13 2004-06-17 2004-06-30 13
In addition to this I have a data frame that contains observations for events, dates included. It looks something like this:
new.df3 <- data.frame(dates5, districts5)
new.df3
dates5 districts5
1 2004-01-01 d1
2 2004-01-02 d2
3 2004-01-03 d3
4 2004-01-04 d4
5 2004-01-05 d5
Is there a function I can write or a command I can use to end up with something like this?
dates5 districts5 interval5
1 2004-01-01 d1 1
2 2004-01-02 d2 1
3 2004-01-03 d3 1
4 2004-01-04 d4 1
5 2004-01-05 d5 1
I have been trying to find an answer in the lubridate package, or in other threads but all answers seem to be tailored at finding out whether a date falls within a specific time interval instead of identifying the interval a date falls into from a group of intervals.
Much appreiciated!
I used the purrr approached outlined by #alistair in here. I reproduce it below:
elements %>%
map(~intervals$phase[.x >= intervals$start & .x <= intervals$end]) %>%
# Clean up a bit. Shorter, but less readable: map_chr(~.x[1] %||% NA)
map_chr(~ifelse(length(.x) == 0, NA, .x))
## [1] "a" "a" "a" NA "b" "b" "c"
I have a dataframe in R, which has two variables that are dates and I need to calculate the difference in days between them. However, they are formatted as YYYYMMDD. How do I change it to a date format readable in R?
This should work
lubridate::ymd(given_date_format)
I like anydate() from the anytime package. Quick demo, with actual data:
R> set.seed(123) # be reproducible
R> data <- data.frame(inp=Sys.Date() + cumsum(runif(10)*10))
R> data$ymd <- format(data$inp, "%Y%m%d") ## as yyyymmdd
R> data$int <- as.integer(data$ymd) ## same as integer
R> library(anytime)
R> data$diff1 <- c(NA, diff(anydate(data$ymd))) # reads YMD
R> data$diff2 <- c(NA, diff(anydate(data$int))) # also reads int
R> data
inp ymd int diff1 diff2
1 2017-06-23 20170623 20170623 NA NA
2 2017-07-01 20170701 20170701 8 8
3 2017-07-05 20170705 20170705 4 4
4 2017-07-14 20170714 20170714 9 9
5 2017-07-24 20170724 20170724 10 10
6 2017-07-24 20170724 20170724 0 0
7 2017-07-29 20170729 20170729 5 5
8 2017-08-07 20170807 20170807 9 9
9 2017-08-13 20170813 20170813 6 6
10 2017-08-17 20170817 20170817 4 4
R>
Here the first column is actual dates we work from. Columns two and three are then generates to match OP's requirement: YMD, either in character or integer.
We then compute differences on them, account for the first 'lost' data point differences when we have no predecessor and show that either date format works.
I have a table including a time series of daily values and a variable "x":
# x<-100
# date user_id index re
# 1 2013-11-07 ff268cef0c29 1
# 2 2013-11-02 12bb7af7a842 1
# 3 2013-11-30 e45abb10ae0b 1
# 4 2013-11-06 e45abb10ae0b 2
# 5 2013-11-25 f266f8c9580e 1
Date is formatted as Date using the "as.Date" function.
now I want to add "x" to the value in column "re" on a specific day, eg. on the 01.04. of every year in the time series.
how to do that best?
Thanks for help!
df <- data.frame(date=as.Date(c("2013-04-01",
"2013-04-02",
"2014-04-01")),
re=1:3)
x <- -100
i <- which(format(df$date, "%d.%m.") == "01.04.")
df$re[i] <- df$re[i] + x
# date re
# 1 2013-04-01 -99
# 2 2013-04-02 2
# 3 2014-04-01 -97
I have a questions that might be too basic, but here it is...
I want to extract monthly data from a dataset like this:
Date Obs
1 2001-01-01 120
2 2001-01-02 100
3 2001-01-03 150
4 2001-01-04 175
5 2001-01-05 121
6 2001-01-06 100
I just want to get the rows from the data where I have a certain month(e.g. January), this works perfectly:
output=which(strftime(dataset[,1],"%m")=="01",dataset[,1])
However when I try to create a loop to go through all the months using a variable that is declared has character it doesn't work and I only get "FALSE".
value=as.character(k)
output=which(strftime(dataset[,1],"%m")==value,dataset[,1])
Do not parse dates as strings. That is too error prone. Parse dates as dates, and do logical comparisons on them.
Here is one approach, creating January to March data and sub-setting February based on a comparison:
R> output <- data.frame(date=seq(as.Date("2011-01-01"), by=7, length=10),
+ value=cumsum(runif(10)*100))
R> output
date value
1 2011-01-01 8.29916
2 2011-01-08 44.82950
3 2011-01-15 72.08662
4 2011-01-22 134.19277
5 2011-01-29 221.67744
6 2011-02-05 245.77195
7 2011-02-12 314.82081
8 2011-02-19 396.34661
9 2011-02-26 437.14286
10 2011-03-05 442.41321
R> output[ output[,"date"] >= as.Date("2011-02-01") &
+ output[,"date"] <= as.Date("2011-02-28"), ]
date value
6 2011-02-05 245.772
7 2011-02-12 314.821
8 2011-02-19 396.347
9 2011-02-26 437.143
R>
Another approach uses the xts package:
R> oo <- xts(output[,"value"], order.by=output[,"date"])
R> oo
[,1]
2011-01-01 8.29916
2011-01-08 44.82950
2011-01-15 72.08662
2011-01-22 134.19277
2011-01-29 221.67744
2011-02-05 245.77195
2011-02-12 314.82081
2011-02-19 396.34661
2011-02-26 437.14286
2011-03-05 442.41321
R> oo["2011-02-01::2011-02-28"]
[,1]
2011-02-05 245.772
2011-02-12 314.821
2011-02-19 396.347
2011-02-26 437.143
R>
as xts has convenient date parsing for the index; see the package documentation for details.
I'm assuming k is an integer in 1:12. I suspect you may be better off using abbreviated month names:
value <- month.abb[k]
output <- which(strftime(dataset[,1],"%b")==value,dataset[,1])
The reason you way isn't working is because the month number is zero-padded and "1" != "01".
You can also use dates as dates with POSIXlt()$mon
as.POSIXlt(output$date)$mon # Note that Jan = 0 and Feb=1
[1] 0 0 0 0 0 1 1 1 1 2
There are several other packages such as chron, lubridate and gdata that provide date handling functions. I found the functions in lubridate particularly intuitive and less prone to errors in my clumsy hands.