How to assign a value depending on two conditions including column names. (add environmental variable to tracking data) - r

I have a data frame (track) with the position (longitude - Latitude) and date (number of the day in the year) of tracking point for different animals and an other data frame (var) which gives a the mean temperature for every day of the year in different locations.
I would like to add a new column TEMP to my data frame (Track) where the value would be from (var) and correspond to the date and GPS location of each tracking points in (track).
Here are a really simple subset of my data and what I would like to obtain.
track = data.frame(
animals=c(1,1,1,2,2),
Longitude=c(117,116,117,117,116),
Latitude=c(18,20,20,18,20),
Day=c(1,3,4,1,5))
Var = data.frame(
Longitude=c(117,117,116,116),
Latitude=c(18,20,18,20),
Day1=c(22,23,24,21),
Day2=c(21,28,27,29),
Day3=c(12,13,14,11),
Day4=c(17,19,20,23),
Day5=c(32,33,34,31)
)
TrackPlusVar = data.frame(
animals=c(1,1,1,2,2),
Longitude=c(117,116,117,117,116),
Latitude=c(18,20,20,18,20),
Day=c(1,3,4,1,5),
Temp= c(22,11,19,22,31)
)
I've no idea how to assign the value from the same date and GPS location as it is a column name. Any idea would be very useful !

This is a dplyr and tidyr approach.
library(dplyr)
library(tidyr)
# reshape table Var
Var %>%
gather(Day,Temp,-Longitude, -Latitude) %>%
mutate(Day = as.numeric(gsub("Day","",Day))) -> Var2
# join tables
track %>% left_join(Var2, by=c("Longitude", "Latitude", "Day"))
# animals Longitude Latitude Day Temp
# 1 1 117 18 1 22
# 2 1 116 20 3 11
# 3 1 117 20 4 19
# 4 2 117 18 1 22
# 5 2 116 20 5 31
If the process that creates your tables makes sure that all your cases belong to both tables, then you can use inner_join instead of left_join to make the process faster.
If you're still not happy with the speed you can use a data.table join process to check if it is faster, like:
library(data.table)
Var2 = setDT(Var2, key = c("Longitude", "Latitude", "Day"))
track = setDT(track, key = c("Longitude", "Latitude", "Day"))
Var2[track][order(animals,Day)]
# Longitude Latitude Day Temp animals
# 1: 117 18 1 22 1
# 2: 116 20 3 11 1
# 3: 117 20 4 19 1
# 4: 117 18 1 22 2
# 5: 116 20 5 31 2

Related

Aggregate week and date in R by some specific rules

I'm not used to using R. I already asked a question on stack overflow and got a great answer.
I'm sorry to post a similar question, but I tried many times and got the output that I didn't expect.
This time, I want to do slightly different from my previous question.
Merge two data with respect to date and week using R
I have two data. One has a year_month_week column and the other has a date column.
df1<-data.frame(id=c(1,1,1,2,2,2,2),
year_month_week=c(2022051,2022052,2022053,2022041,2022042,2022043,2022044),
points=c(65,58,47,21,25,27,43))
df2<-data.frame(id=c(1,1,1,2,2,2),
date=c(20220503,20220506,20220512,20220401,20220408,20220409),
temperature=c(36.1,36.3,36.6,34.3,34.9,35.3))
For df1, 2022051 means 1st week of May,2022. Likewise, 2022052 means 2nd week of May,2022. For df2,20220503 means May 3rd, 2022. What I want to do now is merge df1 and df2 with respect to year_month_week. In this case, 20220503 and 20220506 are 1st week of May,2022.If more than one date are in year_month_week, I will just include the first of them. Now, here's the different part. Even if there is no date inside year_month_week,just leave it NA. So my expected output has a same number of rows as df1 which includes the column year_month_week.So my expected output is as follows:
df<-data.frame(id=c(1,1,1,2,2,2,2),
year_month_week=c(2022051,2022052,2022053,2022041,2022042,2022043,2022044),
points=c(65,58,47,21,25,27,43),
temperature=c(36.1,36.6,NA,34.3,34.9,NA,NA))
First we can convert the dates in df2 into year-month-date format, then join the two tables:
library(dplyr);library(lubridate)
df2$dt = ymd(df2$date)
df2$wk = day(df2$dt) %/% 7 + 1
df2$year_month_week = as.numeric(paste0(format(df2$dt, "%Y%m"), df2$wk))
df1 %>%
left_join(df2 %>% group_by(year_month_week) %>% slice(1) %>%
select(year_month_week, temperature))
Result
Joining, by = "year_month_week"
id year_month_week points temperature
1 1 2022051 65 36.1
2 1 2022052 58 36.6
3 1 2022053 47 NA
4 2 2022041 21 34.3
5 2 2022042 25 34.9
6 2 2022043 27 NA
7 2 2022044 43 NA
You can build off of a previous answer here by taking the function to count the week of the month, then generate a join key in df2. See here
df1 <- data.frame(
id=c(1,1,1,2,2,2,2),
year_month_week=c(2022051,2022052,2022053,2022041,2022042,2022043,2022044),
points=c(65,58,47,21,25,27,43))
df2 <- data.frame(
id=c(1,1,1,2,2,2),
date=c(20220503,20220506,20220512,20220401,20220408,20220409),
temperature=c(36.1,36.3,36.6,34.3,34.9,35.3))
# Take the function from the previous StackOverflow question
monthweeks.Date <- function(x) {
ceiling(as.numeric(format(x, "%d")) / 7)
}
# Create a year_month_week variable to join on
df2 <-
df2 %>%
mutate(
date = lubridate::parse_date_time(
x = date,
orders = "%Y%m%d"),
year_month_week = paste0(
lubridate::year(date),
0,
lubridate::month(date),
monthweeks.Date(date)),
year_month_week = as.double(year_month_week))
# Remove duplicate year_month_weeks
df2 <-
df2 %>%
arrange(year_month_week) %>%
distinct(year_month_week, .keep_all = T)
# Join dataframes
df1 <-
left_join(
df1,
df2,
by = "year_month_week")
Produces this result
id.x year_month_week points id.y date temperature
1 1 2022051 65 1 2022-05-03 36.1
2 1 2022052 58 1 2022-05-12 36.6
3 1 2022053 47 NA <NA> NA
4 2 2022041 21 2 2022-04-01 34.3
5 2 2022042 25 2 2022-04-08 34.9
6 2 2022043 27 NA <NA> NA
7 2 2022044 43 NA <NA> NA
>
Edit: forgot to mention that you need tidyverse loaded
library(tidyverse)

How to store datasets in a list in a loop - R

Suppose I have dataset A including information on the month ("Date"):
station ID
precipitation (mm)
LONG
LAT
Date
1
70
5
50
2010-01
1
60
5
50
2010-02
1
61
5
50
2010-03
2
75
10
47
2010-01
2
65
10
47
2010-02
2
70
10
47
2010-03
I have a while loop that creates separate datasets from dataset A based on the month. My aim is to create a list list_months that stores all information of each dataset (i.e. month).
#used as criterium for selecting data per loop
months = c("2010-01", "2010-02","2010-03")
#used in different variable names, based on month
abr = c("jan", "feb", "mar")
#set item from list to 1 (=January)
i = 1
j = 1
#create empty list to store individual datasets that are generated for each loop
list_months = list()
#while loop. Aim: create 3 separate datasets, based on month, and store each one per loop
while(i <= length(months) && j <= length(abr))
{
#store each dataset to a different variable name, corresponding to the month
assign(paste("dataset", abr[j], sep="_"),subset(A, Date == months[i]))
#assign variable name to variable that is appended to list
ap <- paste("dataset", abr[j], sep="_")
#append variable name to list for further data processing
list_months <- append(list_months, ap)
#next loop (i.e. following month)
i = i+1
j = j+1
}
However, when I try to view the first item via view(list_months[1]) the output is as follows:
X.dataset_jan.
1
dataset_jan
How can I store the variable name in a list in such a way that it creates the output:
station ID
precipitation (mm)
LONG
LAT
Date
1
70
5
50
2010-01
2
75
10
47
2010-01
Creating a new list after the loop list_months = list(dataset_jan, dataset_feb, dataset_mar) does the trick. However, I would like to store the datasets during each loop.
Thanks
this is quite strfaightformward using data.table's split
library(data.table)
# make sure your data in in a data.table format
setDT(mydata)
# use split.data.table to split on a certain column
split(mydata, by = "Date")
# $`2010-01`
# station_ID precipitation LONG LAT Date
# 1: 1 70 5 50 2010-01
# 2: 2 75 10 47 2010-01
#
# $`2010-02`
# station_ID precipitation LONG LAT Date
# 1: 1 60 5 50 2010-02
# 2: 2 65 10 47 2010-02
#
# $`2010-03`
# station_ID precipitation LONG LAT Date
# 1: 1 61 5 50 2010-03
# 2: 2 70 10 47 2010-03

Creating a new Data.Frame from variable values

I am currently working on a task that requires me to query a list of stocks from an sql db.
The problem is that it is a list where there are 1:n stocks traded per date. I want to calculate the the share of each stock int he portfolio on a given day (see example) and pass it to a new data frame. In other words date x occurs 2 times (once for stock A and once for stock B) and then pull it together that date x occurs only one time with the new values.
'data.frame': 1010 obs. of 5 variables:
$ ID : int 1 2 3 4 5 6 7 8 9 10 ...
$ Date : Date, format: "2019-11-22" "2019-11-21" "2019-11-20" "2019-11-19" ...
$ Close: num 52 51 50.1 50.2 50.2 ...
$ Volume : num 5415 6196 3800 4784 6189 ...
$ Stock_ID : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
RawInput<-data.frame(Date=c("2017-22-11","2017-22-12","2017-22-13","2017-22-11","2017-22-12","2017-22-13","2017-22-11"), Close=c(50,55,56,10,11,12,200),Volume=c(100,110,150,60,70,80,30),Stock_ID=c(1,1,1,2,2,2,3))
RawInput$Stock_ID<-as.factor(RawInput$Stock_ID)
*cannot transfer the date to a date variable in this example
I would like to have a new dataframe that generates the Value traded per day, the weight of each stock, and the daily returns per day, while keeping the number of stocks variable.
I hope I translated the issue properly so that I can receive help.
Thank you!
I think the easiest way to do this would be to use the dplyr package. You may need to read some documentation but the mutate and group_by function may be able do what you want. This function will allow you to modify the current dataframe by either adding a new column or changing the existing data.
Lets start with a reproducible dataset
RawInput<-data.frame(Date=c("2017-22-11","2017-22-12","2017-22-13","2017-22-11","2017-22-12","2017-22-13","2017-22-11"),
Close=c(50,55,56,10,11,12,200),
Volume=c(100,110,150,60,70,80,30),
Stock_ID=c(1,1,1,2,2,2,3))
RawInput$Stock_ID<-as.factor(RawInput$Stock_ID)
library(magrittr)
library(dplyr)
dat2 <- RawInput %>%
group_by(Date, Stock_ID) %>% #this example only has one stock type but i imagine you want to group by stock
mutate(CloseMean=mean(Close),
CloseSum=sum(Close),
VolumeMean=mean(Volume),
VolumeSum=sum(Volume)) #what ever computation you need to do with
#multiple stock values for a given date goes here
dat2 %>% select(Stock_ID, Date, CloseMean, CloseSum, VolumeMean,VolumeSum) %>% distinct() #dat2 will still be the same size as dat, thus use the distinct() function to reduce it to unique values
# A tibble: 7 x 6
# Groups: Date, Stock_ID [7]
Stock_ID Date CloseMean CloseSum VolumeMean VolumeSum
<fct> <fct> <dbl> <dbl> <dbl> <dbl>
1 1 2017-22-11 50 50 100 100
2 1 2017-22-12 55 55 110 110
3 1 2017-22-13 56 56 150 150
4 2 2017-22-11 10 10 60 60
5 2 2017-22-12 11 11 70 70
6 2 2017-22-13 12 12 80 80
7 3 2017-22-11 200 200 30 30
This data set that you provided actually only has one unique Stock_ID and Date combinations so there was nothing actually done with the data. However if you remove Stock_ID where necessary you can see how this function would work
dat2 <- RawInput %>%
group_by(Date) %>%
mutate(CloseMean=mean(Close),
CloseSum=sum(Close),
VolumeMean=mean(Volume),
VolumeSum=sum(Volume))
dat2 %>% select(Date, CloseMean, CloseSum, VolumeMean,VolumeSum) %>% distinct()
# A tibble: 3 x 5
# Groups: Date [3]
Date CloseMean CloseSum VolumeMean VolumeSum
<fct> <dbl> <dbl> <dbl> <dbl>
1 2017-22-11 86.7 260 63.3 190
2 2017-22-12 33 66 90 180
3 2017-22-13 34 68 115 230
After reading your first reply, You will have to be specific on how you are trying to calculate the weight. Also define your end result.
Im going to assume weight is just percentage by total cost. And the end result is for each date show the weight per stock. In other words a matrix of dates and stock Ids
library(tidyr)
RawInput %>%
group_by(Date) %>%
mutate(weight=Close/sum(Close)) %>%
select(Date, weight, Stock_ID) %>%
spread(key = "Stock_ID", value = "weight", fill = 0)
# A tibble: 3 x 4
# Groups: Date [3]
Date `1` `2` `3`
<fct> <dbl> <dbl> <dbl>
1 2017-22-11 0.192 0.0385 0.769
2 2017-22-12 0.833 0.167 0
3 2017-22-13 0.824 0.176 0

How to diagonally subtract different columns in R

I have a dataset of a hypothetical exam.
id <- c(1,1,3,4,5,6,7,7,8,9,9)
test_date <- c("2012-06-27","2012-07-10","2013-07-04","2012-03-24","2012-07-22", "2013-09-16","2012-06-21","2013-10-18", "2013-04-21", "2012-02-16", "2012-03-15")
result_date <- c("2012-07-29","2012-09-02","2013-08-01","2012-04-25","2012-09-01","2013-10-20","2012-07-01","2013-10-31", "2013-05-17", "2012-03-17", "2012-04-20")
data1 <- as_data_frame(id)
data1$test_date <- test_date
data1$result_date <- result_date
colnames(data1)[1] <- "id"
"id" indicates the ID of the students who have taken a particular exam. "test_date" is the date the students took the test and "result_date" is the date when the students' results are posted. I'm interested in finding out which students retook the exam BEFORE the result of that exam session was released, e.g. students who knew that they have underperformed and retook the exam without bothering to find out their scores. For example, student with "id" 1 took the exam for the second time on "2012-07-10" which was before the result date for his first exam - "2012-07-29".
I tried to:
data1%>%
group_by(id) %>%
arrange(id, test_date) %>%
filter(n() >= 2) %>% #To only get info on students who have taken the exam more than once and then merge it back in with the original data set using a join function
So essentially, I want to create a new column called "re_test" where it would equal 1 if a student retook the exam BEFORE receiving the result of a previous exam and 0 otherwise (those who retook after seeing their marks or those who did not retake).
I have tried to mutate in order to find cases where dates are either positive or negative by subtracting the 2nd test_date from the 1st result_date:
mutate(data1, re_test = result_date - lead(test_date, default = first(test_date)))
However, this leads to mixing up students with different id's. I tried to split but mutate won't work on a list of dataframes so now I'm stuck:
split(data1, data1$id)
Just to add on, this is a part of the desired result:
data2 <- as_data_frame(id <- c(1,1,3,4))
data2$test_date_result <- c("2012-06-27","2012-07-10", "2013-07-04","2012-03-24")
data2$result_date_result <- c("2012-07-29","2012-09-02","2013-08-01","2012-04-25")
data2$re_test <- c(1, 0, 0, 0)
Apologies for the verbosity and hope I was clear enough.
Thanks a lot in advance!
library(reshape2)
library(dplyr)
# first melt so that we can sequence by date
data1m <- data1 %>%
melt(id.vars = "id", measure.vars = c("test_date", "result_date"), value.name = "event_date")
# any two tests in a row is a flag - use dplyr::lag to comapre the previous
data1mc <- data1m %>%
arrange(id, event_date) %>%
group_by(id) %>%
mutate (multi_test = (variable == "test_date" & lag(variable == "test_date"))) %>%
filter(multi_test)
# id variable event_date multi_test
# 1 1 test_date 2012-07-10 TRUE
# 2 9 test_date 2012-03-15 TRUE
## join back to the original
data1 %>%
left_join (data1mc %>% select(id, event_date, multi_test),
by=c("id" = "id", "test_date" = "event_date"))
I have a piecewise answer that may work for you. I first create a data.frame called student that contains the re-test information, and then join it with the data1 object. If students re-took the test multiple times, it will compare the last test to the first, which is a flaw, but I'm unsure if students have the ability to re-test multiple times?
student <- data1 %>%
group_by(id) %>%
summarise(retest=(test_date[length(test_date)] < result_date[1]) == TRUE)
Some re-test values were NA. These were individuals that only took the test once. I set these to FALSE here, but you can retain the NA, as they do contain information.
student$retest[is.na(student$retest)] <- FALSE
Join the two data.frames to a single object called data2.
data2 <- left_join(data1, student, by='id')
I am sure there are more elegant ways to approach this. I did this by taking advantage of the structure of your data (sorted by id) and the lag function that can refer to the previous records while dealing with a current record.
### Ensure Data are sorted by ID ###
data1 <- arrange(data1,id)
### Create Flag for those that repeated ###
data1$repeater <- ifelse(lag(data1$id) == data1$id,1,0)
### I chose to do this on all data, you could filter on repeater flag first ###
data1$timegap <- as.Date(data1$result_date) - as.Date(data1$test_date)
data1$lagdate <- as.Date(data1$test_date) - lag(as.Date(data1$result_date))
### Display results where your repeater flag is 1 and there is negative time lag ###
data1[data1$repeater==1 & !is.na(data1$repeater) & as.numeric(data1$lagdate) < 0,]
# A tibble: 2 × 6
id test_date result_date repeater timegap lagdate
<dbl> <chr> <chr> <dbl> <time> <time>
1 1 2012-07-10 2012-09-02 1 54 days -19 days
2 9 2012-03-15 2012-04-20 1 36 days -2 days
I went with a simple shift comparison. 1 line of code.
data1 <- data.frame(id = c(1,1,3,4,5,6,7,7,8,9,9), test_date = c("2012-06-27","2012-07-10","2013-07-04","2012-03-24","2012-07-22", "2013-09-16","2012-06-21","2013-10-18", "2013-04-21", "2012-02-16", "2012-03-15"), result_date = c("2012-07-29","2012-09-02","2013-08-01","2012-04-25","2012-09-01","2013-10-20","2012-07-01","2013-10-31", "2013-05-17", "2012-03-17", "2012-04-20"))
data1$re_test <- unlist(lapply(split(data1,data1$id), function(x)
ifelse(as.Date(x$test_date) > c(NA, as.Date(x$result_date[-nrow(x)])), 0, 1)))
data1
id test_date result_date re_test
1 1 2012-06-27 2012-07-29 NA
2 1 2012-07-10 2012-09-02 1
3 3 2013-07-04 2013-08-01 NA
4 4 2012-03-24 2012-04-25 NA
5 5 2012-07-22 2012-09-01 NA
6 6 2013-09-16 2013-10-20 NA
7 7 2012-06-21 2012-07-01 NA
8 7 2013-10-18 2013-10-31 0
9 8 2013-04-21 2013-05-17 NA
10 9 2012-02-16 2012-03-17 NA
11 9 2012-03-15 2012-04-20 1
I think there is benefit in leaving NAs but if you really want all others as zero, simply:
data1$re_test <- ifelse(is.na(data1$re_test), 0, data1$re_test)
data1
id test_date result_date re_test
1 1 2012-06-27 2012-07-29 0
2 1 2012-07-10 2012-09-02 1
3 3 2013-07-04 2013-08-01 0
4 4 2012-03-24 2012-04-25 0
5 5 2012-07-22 2012-09-01 0
6 6 2013-09-16 2013-10-20 0
7 7 2012-06-21 2012-07-01 0
8 7 2013-10-18 2013-10-31 0
9 8 2013-04-21 2013-05-17 0
10 9 2012-02-16 2012-03-17 0
11 9 2012-03-15 2012-04-20 1
Let me know if you have any questions, cheers.

Assign rows to a group based on spatial neighborhood and temporal criteria in R

I have an issue that I just cannot seem to sort out. I have a dataset that was derived from a raster in arcgis. The dataset represents every fire occurrence during a 10-year period. Some raster cells had multiple fires within that time period (and, thus, will have multiple rows in my dataset) and some raster cells will not have had any fire (and, thus, will not be represented in my dataset). So, each row in the dataset has a column number (sequential integer) and a row number assigned to it that corresponds with the row and column ID from the raster. It also has the date of the fire.
I would like to assign a unique ID (fire_ID) to all of the fires that are within 4 days of each other and in adjacent pixels from one another (within the 8-cell neighborhood) and put this into a new column.
To clarify, if there were an observation from row 3, col 3, Jan 1, 2000 and another from row 2, col 4, Jan 4, 2000, those observations would be assigned the same fire_ID.
Below is a sample dataset with "rows", which are the row IDs of the raster, "cols", which are the column IDs of the raster, and "dates" which are the dates the fire was detected.
rows<-sample(seq(1,50,1),600, replace=TRUE)
cols<-sample(seq(1,50,1),600, replace=TRUE)
dates<-sample(seq(from=as.Date("2000/01/01"), to=as.Date("2000/02/01"), by="day"),600, replace=TRUE)
fire_df<-data.frame(rows, cols, dates)
I've tried sorting the data by "row", then "column", then "date" and looping through, to create a new fire_ID if the row and column ID were within one value and the date was within 4 days, but this obviously doesn't work, as fires which should be assigned the same fire_ID are assigned different fire_IDs if there are observations in between them in the list that belong to a different fire_ID.
fire_df2<-fire_df[order(fire_df$rows, fire_df$cols, fire_df$date),]
fire_ID=numeric(length=nrow(fire_df2))
fire_ID[1]=1
for (i in 2:nrow(fire_df2)){
fire_ID[i]=ifelse(
fire_df2$rows[i]-fire_df2$rows[i-1]<=abs(1) & fire_df2$cols[i]-fire_df2$cols[i-1]<=abs(1) & fire_df2$date[i]-fire_df2$date[i-1]<=abs(4),
fire_ID[i-1],
i)
}
length(unique(fire_ID))
fire_df2$fire_ID<-fire_ID
Please let me know if you have any suggestions.
I think this task requires something along the lines of hierarchical clustering.
Note, however, that there will be necessarily some degree of arbitrariness in the ids. This is because it is entirely possible that the cluster of fires itself is longer than 4 days yet every fire is less than 4 days away from some other fire in that cluster (and thus should have the same id).
library(dplyr)
# Create the distances
fire_dist <- fire_df %>%
# Normalize dates
mutate( norm_dates = as.numeric(dates)/4) %>%
# Only keep the three variables of interest
select( rows, cols, norm_dates ) %>%
# Compute distance using L-infinite-norm (maximum)
dist( method="maximum" )
# Do hierarchical clustering with "single" aggl method
fire_clust <- hclust(fire_dist, method="single")
# Cut the tree at height 1 and obtain groups
group_id <- cutree(fire_clust, h=1)
# First attach the group ids back to the data frame
fire_df2 <- cbind( fire_df, group_id ) %>%
# Then sort the data
arrange( group_id, dates, rows, cols )
# Print the first 20 records
fire_df2[1:10,]
(Make sure you have dplyr library installed. You can run install.packages("dplyr",dep=TRUE) if not installed. It is a really good and very popular library for data manipulations)
A couple of simple tests:
Test #1. The same forest fire moving.
rows<-1:6
cols<-1:6
dates<-seq(from=as.Date("2000/01/01"), to=as.Date("2000/01/06"), by="day")
fire_df<-data.frame(rows, cols, dates)
gives me this:
rows cols dates group_id
1 1 1 2000-01-01 1
2 2 2 2000-01-02 1
3 3 3 2000-01-03 1
4 4 4 2000-01-04 1
5 5 5 2000-01-05 1
6 6 6 2000-01-06 1
Test #2. 6 different random forest fires.
set.seed(1234)
rows<-sample(seq(1,50,1),6, replace=TRUE)
cols<-sample(seq(1,50,1),6, replace=TRUE)
dates<-sample(seq(from=as.Date("2000/01/01"), to=as.Date("2000/02/01"), by="day"),6, replace=TRUE)
fire_df<-data.frame(rows, cols, dates)
output:
rows cols dates group_id
1 6 1 2000-01-10 1
2 32 12 2000-01-30 2
3 31 34 2000-01-10 3
4 32 26 2000-01-27 4
5 44 35 2000-01-10 5
6 33 28 2000-01-09 6
Test #3: one expanding forest fire
dates <- seq(from=as.Date("2000/01/01"), to=as.Date("2000/01/06"), by="day")
rows_start <- 50
cols_start <- 50
fire_df <- data.frame(dates = dates) %>%
rowwise() %>%
do({
diff = as.numeric(.$dates - as.Date("2000/01/01"))
expand.grid(rows=seq(rows_start-diff,rows_start+diff),
cols=seq(cols_start-diff,cols_start+diff),
dates=.$dates)
})
gives me:
rows cols dates group_id
1 50 50 2000-01-01 1
2 49 49 2000-01-02 1
3 49 50 2000-01-02 1
4 49 51 2000-01-02 1
5 50 49 2000-01-02 1
6 50 50 2000-01-02 1
7 50 51 2000-01-02 1
8 51 49 2000-01-02 1
9 51 50 2000-01-02 1
10 51 51 2000-01-02 1
and so on. (All records identified correctly to belong to the same forest fire.)

Resources