How to find frequency of dates in R - r

I have a fishing dataset containing dates for when a catch was made, when the ship put the catch into port and the ID of the boats. As an added problem i have several datapoints for the same day as each boat has delivered several classes of fish.
I am trying to find the length of trip by taking the day of catch minus the day they last put into port. And i have to separate it by boat and make sure that it only counts one landing per day.
This is my data, LandingD is date of landing, FangstD is date of catch and SkipID is ship ID
LandingD FangstD SkipID
1 2000-02-19 2000-02-19 0004
2 2000-02-16 2000-02-16 0004
3 2000-04-29 2000-04-29 0004
4 2000-04-29 2000-04-29 0004
5 2000-11-30 2000-11-30 0020B
6 2000-02-16 2000-02-16 0075H
7 2000-02-16 2000-02-16 0075H
8 2000-01-22 2000-01-22 0075H
9 2000-01-15 2000-01-15 0075H
10 2000-01-29 2000-01-29 0075H
11 2000-02-11 2000-02-11 0075H
12 2000-02-04 2000-02-04 0075H
13 2000-06-02 2000-06-02 0076
14 2000-06-02 2000-06-02 0076
15 2000-05-20 2000-05-20 0076
16 2000-03-21 2000-03-21 0087
17 2000-03-21 2000-03-21 0087
18 2000-02-24 2000-02-24 0087
19 2000-02-24 2000-02-24 0087
20 2000-11-27 2000-11-27 0087
Any idea how this could be solved?
Thanks in advance!

Related

Filter POSIXct date times for certain months

I have a data set including POSIXct date time stamps ($acquisition_time). I need to filter all rows of this data set that have a date time stamp in June, July, August or September.
This is a sample of my data:
> data
animals_id acquisition_time longitude latitude projection collar_type
1 1 2010-01-05 19:59:00 7.611712 47.94893 EPSG:4326-WGS48 gps
2 1 2010-02-06 02:59:00 7.611367 47.95333 EPSG:4326-WGS48 gps
3 1 2010-03-06 23:59:00 7.612298 47.95245 EPSG:4326-WGS48 gps
4 1 2010-03-07 20:59:00 7.621620 47.95849 EPSG:4326-WGS48 gps
5 1 2010-04-08 17:59:00 7.611142 47.95456 EPSG:4326-WGS48 gps
6 1 2010-04-09 00:59:00 7.619372 47.95881 EPSG:4326-WGS48 gps
7 1 2010-05-09 07:59:00 7.612473 47.95379 EPSG:4326-WGS48 gps
8 1 2010-06-10 04:59:00 7.613174 47.95429 EPSG:4326-WGS48 gps
9 1 2010-06-11 22:59:00 7.612589 47.95584 EPSG:4326-WGS48 gps
10 1 2010-07-12 19:59:00 7.613384 47.95734 EPSG:4326-WGS48 gps
11 1 2010-08-13 16:59:00 7.612884 47.95448 EPSG:4326-WGS48 gps
12 1 2010-08-13 23:59:00 7.614389 47.95932 EPSG:4326-WGS48 gps
13 1 2010-08-14 20:59:00 7.617362 47.96213 EPSG:4326-WGS48 gps
14 1 2010-09-15 03:59:00 7.612436 47.95579 EPSG:4326-WGS48 gps
15 1 2010-09-15 17:59:00 7.616448 47.95875 EPSG:4326-WGS48 gps
16 1 2010-09-16 01:00:00 7.611193 47.95464 EPSG:4326-WGS48 gps
17 1 2010-10-16 21:59:00 7.619343 47.96087 EPSG:4326-WGS48 gps
18 1 2010-10-18 01:59:00 7.619420 47.95877 EPSG:4326-WGS48 gps
19 1 2010-11-18 22:59:00 7.624575 47.95586 EPSG:4326-WGS48 gps
20 1 2010-12-19 12:59:00 7.615908 47.95812 EPSG:4326-WGS48 gps
21 1 2010-01-20 23:59:00 7.605586 47.93908 EPSG:4326-WGS48 gps
22 1 2010-02-21 20:59:00 7.627373 47.96214 EPSG:4326-WGS48 gps
23 1 2010-02-22 03:59:00 7.625065 47.95793 EPSG:4326-WGS48 gps
24 1 2010-02-22 17:59:00 7.614603 47.95174 EPSG:4326-WGS48 gps
25 1 2010-02-23 07:59:00 7.613502 47.95427 EPSG:4326-WGS48 gps
study_area_id animals_age_class animals_sex
1 13 s f
2 13 s f
3 13 s f
4 13 s f
5 13 s f
6 13 s f
7 13 s f
8 13 s f
9 13 s f
10 13 s f
11 13 s f
12 13 s f
13 13 s f
14 13 s f
15 13 s f
16 13 s f
17 13 s f
18 13 s f
19 13 s f
20 13 s f
21 13 s f
22 13 s f
23 13 s f
24 13 s f
25 13 s f
I tried the following code but I get an error:
data <- data$acquisition_time %>% filter(month(Year) %in% c(6,7,8,9))
Error in UseMethod("filter") :
no applicable method for 'filter' applied to an object of class "c('POSIXct', 'POSIXt')"
How can I do this?
Your code is mostly correct, however, you are attempting to apply filter to a vector rather than to a data frame. The correct code would be (assuming lubridate has been loaded:
data %>%
filter(month(acquisition_time) %in% c(6, 7, 8, 9))

Aggregate a data frame on variance

Say I have this data frame, df,
Day value
1 2012-06-10 552
2 2012-06-10 4850
3 2012-06-11 4642
4 2012-06-11 4132
5 2012-06-11 4190
6 2012-06-12 4186
7 2012-06-13 1139
8 2012-06-13 490
9 2012-06-13 5156
10 2012-06-13 4430
11 2012-06-13 4447
12 2012-06-14 4256
13 2012-06-14 3856
14 2012-06-14 1163
15 2012-06-17 564
16 2012-06-17 4866
17 2012-06-17 4421
18 2012-06-19 4206
19 2012-06-20 4272
20 2012-06-20 3993
21 2012-06-20 1211
22 2012-07-21 698
23 2012-07-21 5770
24 2012-07-21 5103
25 2012-07-21 775
26 2012-07-21 5140
27 2012-07-22 4868
I would like a to create a data.frame, dfvar, that would contain the daily variance: something like:
Day Variance
1 2012-06-10 9236402
2 2012-06-11 X
3 2012-06-12 4186
4 2012-06-13 1139
5 2012-06-14 4256
6 2012-06-17 564
7 2012-06-19 4206
8 2012-06-20 4272
9 2012-07-21 698
10 2012-07-22 4868
So for example, I computed it, the entry
dfvar$Variance[1] = var(c(552, 4850))
I tried to do
dfvar <- aggregate(df, by = list(Day), FUN = var)
but this isn't the input I expected. I really want to have the variance of the values of the same day, without the other days...
Any ideas about that?
Is this what you want ?
library(dplyr)
df%>%group_by(Day)%>%dplyr::summarise(Variance=var(value))#return NA if only one value within the group
Day Variance
<fctr> <dbl>
1 2012-06-10 9236402.00
2 2012-06-11 77961.33
3 2012-06-12 NA
4 2012-06-13 4615704.30
5 2012-06-14 2829816.33
6 2012-06-17 5596946.33
7 2012-06-19 NA
8 2012-06-20 2864514.33
9 2012-07-21 6422224.70
10 2012-07-22 NA

Aggregate on a daily basis in R

I'm borrowing the reproducible example given here:
Aggregate daily level data to weekly level in R
since it's pretty much close to what I want to do.
Interval value
1 2012-06-10 552
2 2012-06-11 4850
3 2012-06-12 4642
4 2012-06-13 4132
5 2012-06-14 4190
6 2012-06-15 4186
7 2012-06-16 1139
8 2012-06-17 490
9 2012-06-18 5156
10 2012-06-19 4430
11 2012-06-20 4447
12 2012-06-21 4256
13 2012-06-22 3856
14 2012-06-23 1163
15 2012-06-24 564
16 2012-06-25 4866
17 2012-06-26 4421
18 2012-06-27 4206
19 2012-06-28 4272
20 2012-06-29 3993
21 2012-06-30 1211
22 2012-07-01 698
23 2012-07-02 5770
24 2012-07-03 5103
25 2012-07-04 775
26 2012-07-05 5140
27 2012-07-06 4868
28 2012-07-07 1225
29 2012-07-08 671
30 2012-07-09 5726
31 2012-07-10 5176
In his question, he asks to aggregate on weekly intervals, what I'd like to do is aggregate on a "day of the week basis".
So I'd like to have a table similar to that one, adding the values of all the same day of the week:
Day of the week value
1 "Sunday" 60000
2 "Monday" 50000
3 "Tuesday" 60000
4 "Wednesday" 50000
5 "Thursday" 60000
6 "Friday" 50000
7 "Saturday" 60000
You can try:
aggregate(d$value, list(weekdays(as.Date(d$Interval))), sum)
We can group them by weekly intervals using weekdays :
library(dplyr)
df %>%
group_by(Day_Of_The_Week = weekdays(as.Date(Interval))) %>%
summarise(value = sum(value))
# Day_Of_The_Week value
# <chr> <int>
#1 Friday 16903
#2 Monday 26368
#3 Saturday 4738
#4 Sunday 2975
#5 Thursday 17858
#6 Tuesday 23772
#7 Wednesday 13560
We can do this with data.table
library(data.table)
setDT(df1)[, .(value = sum(value)), .(Dayofweek = weekdays(as.Date(Interval)))]
# Dayofweek value
#1: Sunday 2975
#2: Monday 26368
#3: Tuesday 23772
#4: Wednesday 13560
#5: Thursday 17858
#6: Friday 16903
#7: Saturday 4738
using lubridate https://cran.r-project.org/web/packages/lubridate/vignettes/lubridate.html
df1$Weekday=wday(arrive,label=TRUE)
library(data.table)
df1=data.table(df1)
df1[,sum(value),Weekday]

Matching Bloomberg Data in R

Working with the Rblpapi package, I receive a list of multiple data frames when requesting securities. (Equaling the number of securities requested)
My problem is the following one: Let's say:
I request daily data for A and B from 01.10.2016 - 31.10.2016
Some data for A is missing during that time, while B has,
also some data for B is missing, when A has.
So basically:
list$A
date PX_LAST
1 2016-10-03 216.704
2 2016-10-04 217.245
3 2016-10-05 216.887
4 2016-10-06 217.164
5 2016-10-10 217.504
6 2016-10-11 217.022
7 2016-10-12 217.326
8 2016-10-13 216.219
9 2016-10-14 217.275
10 2016-10-17 216.751
11 2016-10-18 218.812
12 2016-10-19 219.682
13 2016-10-20 220.189
14 2016-10-21 220.930
15 2016-10-25 221.179
16 2016-10-26 219.840
17 2016-10-27 219.158
18 2016-10-31 217.820
list$B
date PX_LAST
1 2016-10-03 1722.82
2 2016-10-04 1717.82
3 2016-10-05 1721.14
4 2016-10-06 1718.40
5 2016-10-07 1712.40
6 2016-10-11 1700.33
7 2016-10-12 1695.54
8 2016-10-13 1689.62
9 2016-10-14 1693.71
10 2016-10-17 1687.84
11 2016-10-18 1701.10
12 2016-10-19 1706.74
13 2016-10-21 1701.16
14 2016-10-24 1706.24
15 2016-10-25 1701.20
16 2016-10-26 1699.92
17 2016-10-27 1694.66
18 2016-10-28 1690.96
19 2016-10-31 1690.92
As you see they have a different number of obervations and dates are also not equal. For example: 5. observation for A is on 2016-10-10 and for B is on 2016-10-07.
So what I need is a means to combine both data frames. My idea was a full range date range (every day) where I add the PX_values for corresponding dates of A and B. After that I could delete empty rows.
Sorry for bad formatting, this is my first post here.
Thanks in advance.

insert new rows to the time series data, with date added automatically

I have a time-series data frame looks like:
TS.1
2015-09-01 361656.7
2015-09-02 370086.4
2015-09-03 346571.2
2015-09-04 316616.9
2015-09-05 342271.8
2015-09-06 361548.2
2015-09-07 342609.2
2015-09-08 281868.8
2015-09-09 297011.1
2015-09-10 295160.5
2015-09-11 287926.9
2015-09-12 323365.8
Now, what I want to do is add some new data points (rows) to the existing data frame, say,
320123.5
323521.7
How can I added corresponding date to each row? The data is just sequentially inhered from the last row.
Is there any package can do this automatically, so that the only thing I do is to insert new data point?
Here's some play data:
df <- data.frame(date = seq(as.Date("2015-01-01"), as.Date("2015-01-31"), "days"), x = seq(31))
new.x <- c(32, 33)
This adds the extra observations along with the proper sequence of dates:
new.df <- data.frame(date=seq(max(df$date) + 1, max(df$date) + length(new.x), "days"), x=new.x)
Then just rbind them to get your expanded data frame:
rbind(df, new.df)
date x
1 2015-01-01 1
2 2015-01-02 2
3 2015-01-03 3
4 2015-01-04 4
5 2015-01-05 5
6 2015-01-06 6
7 2015-01-07 7
8 2015-01-08 8
9 2015-01-09 9
10 2015-01-10 10
11 2015-01-11 11
12 2015-01-12 12
13 2015-01-13 13
14 2015-01-14 14
15 2015-01-15 15
16 2015-01-16 16
17 2015-01-17 17
18 2015-01-18 18
19 2015-01-19 19
20 2015-01-20 20
21 2015-01-21 21
22 2015-01-22 22
23 2015-01-23 23
24 2015-01-24 24
25 2015-01-25 25
26 2015-01-26 26
27 2015-01-27 27
28 2015-01-28 28
29 2015-01-29 29
30 2015-01-30 30
31 2015-01-31 31
32 2015-02-01 32
33 2015-02-02 33

Resources