Working with the Rblpapi package, I receive a list of multiple data frames when requesting securities. (Equaling the number of securities requested)
My problem is the following one: Let's say:
I request daily data for A and B from 01.10.2016 - 31.10.2016
Some data for A is missing during that time, while B has,
also some data for B is missing, when A has.
So basically:
list$A
date PX_LAST
1 2016-10-03 216.704
2 2016-10-04 217.245
3 2016-10-05 216.887
4 2016-10-06 217.164
5 2016-10-10 217.504
6 2016-10-11 217.022
7 2016-10-12 217.326
8 2016-10-13 216.219
9 2016-10-14 217.275
10 2016-10-17 216.751
11 2016-10-18 218.812
12 2016-10-19 219.682
13 2016-10-20 220.189
14 2016-10-21 220.930
15 2016-10-25 221.179
16 2016-10-26 219.840
17 2016-10-27 219.158
18 2016-10-31 217.820
list$B
date PX_LAST
1 2016-10-03 1722.82
2 2016-10-04 1717.82
3 2016-10-05 1721.14
4 2016-10-06 1718.40
5 2016-10-07 1712.40
6 2016-10-11 1700.33
7 2016-10-12 1695.54
8 2016-10-13 1689.62
9 2016-10-14 1693.71
10 2016-10-17 1687.84
11 2016-10-18 1701.10
12 2016-10-19 1706.74
13 2016-10-21 1701.16
14 2016-10-24 1706.24
15 2016-10-25 1701.20
16 2016-10-26 1699.92
17 2016-10-27 1694.66
18 2016-10-28 1690.96
19 2016-10-31 1690.92
As you see they have a different number of obervations and dates are also not equal. For example: 5. observation for A is on 2016-10-10 and for B is on 2016-10-07.
So what I need is a means to combine both data frames. My idea was a full range date range (every day) where I add the PX_values for corresponding dates of A and B. After that I could delete empty rows.
Sorry for bad formatting, this is my first post here.
Thanks in advance.
Related
I have a fishing dataset containing dates for when a catch was made, when the ship put the catch into port and the ID of the boats. As an added problem i have several datapoints for the same day as each boat has delivered several classes of fish.
I am trying to find the length of trip by taking the day of catch minus the day they last put into port. And i have to separate it by boat and make sure that it only counts one landing per day.
This is my data, LandingD is date of landing, FangstD is date of catch and SkipID is ship ID
LandingD FangstD SkipID
1 2000-02-19 2000-02-19 0004
2 2000-02-16 2000-02-16 0004
3 2000-04-29 2000-04-29 0004
4 2000-04-29 2000-04-29 0004
5 2000-11-30 2000-11-30 0020B
6 2000-02-16 2000-02-16 0075H
7 2000-02-16 2000-02-16 0075H
8 2000-01-22 2000-01-22 0075H
9 2000-01-15 2000-01-15 0075H
10 2000-01-29 2000-01-29 0075H
11 2000-02-11 2000-02-11 0075H
12 2000-02-04 2000-02-04 0075H
13 2000-06-02 2000-06-02 0076
14 2000-06-02 2000-06-02 0076
15 2000-05-20 2000-05-20 0076
16 2000-03-21 2000-03-21 0087
17 2000-03-21 2000-03-21 0087
18 2000-02-24 2000-02-24 0087
19 2000-02-24 2000-02-24 0087
20 2000-11-27 2000-11-27 0087
Any idea how this could be solved?
Thanks in advance!
I have a time-series data frame looks like:
TS.1
2015-09-01 361656.7
2015-09-02 370086.4
2015-09-03 346571.2
2015-09-04 316616.9
2015-09-05 342271.8
2015-09-06 361548.2
2015-09-07 342609.2
2015-09-08 281868.8
2015-09-09 297011.1
2015-09-10 295160.5
2015-09-11 287926.9
2015-09-12 323365.8
Now, what I want to do is add some new data points (rows) to the existing data frame, say,
320123.5
323521.7
How can I added corresponding date to each row? The data is just sequentially inhered from the last row.
Is there any package can do this automatically, so that the only thing I do is to insert new data point?
Here's some play data:
df <- data.frame(date = seq(as.Date("2015-01-01"), as.Date("2015-01-31"), "days"), x = seq(31))
new.x <- c(32, 33)
This adds the extra observations along with the proper sequence of dates:
new.df <- data.frame(date=seq(max(df$date) + 1, max(df$date) + length(new.x), "days"), x=new.x)
Then just rbind them to get your expanded data frame:
rbind(df, new.df)
date x
1 2015-01-01 1
2 2015-01-02 2
3 2015-01-03 3
4 2015-01-04 4
5 2015-01-05 5
6 2015-01-06 6
7 2015-01-07 7
8 2015-01-08 8
9 2015-01-09 9
10 2015-01-10 10
11 2015-01-11 11
12 2015-01-12 12
13 2015-01-13 13
14 2015-01-14 14
15 2015-01-15 15
16 2015-01-16 16
17 2015-01-17 17
18 2015-01-18 18
19 2015-01-19 19
20 2015-01-20 20
21 2015-01-21 21
22 2015-01-22 22
23 2015-01-23 23
24 2015-01-24 24
25 2015-01-25 25
26 2015-01-26 26
27 2015-01-27 27
28 2015-01-28 28
29 2015-01-29 29
30 2015-01-30 30
31 2015-01-31 31
32 2015-02-01 32
33 2015-02-02 33
I am looking for a way to make regular discrete time intervals in R with data that is irregular and includes location information. (For example, 10 second intervals, and only the first location information per time interval).
The input data looks like this:
ID Time Location Duration
1 Mark 2015-04-15 23:55:41 1 145448
2 Mark 2015-04-15 23:58:07 9 1559
3 Mark 2015-04-15 23:58:08 9 2279
4 Mark 2015-04-15 23:58:11 9 557
5 Mark 2015-04-15 23:58:11 3 10540
6 Mark 2015-04-15 23:58:22 9 1783
7 Mark 2015-04-15 23:58:24 9 8706
8 Mark 2015-04-15 23:58:32 9 555
9 Mark 2015-04-15 23:58:33 2 124137
10 Mark 2015-04-16 00:00:37 2 7411
11 Mark 2015-04-16 00:00:37 20 7411
and the desired output would be:
ID Time Location
1 Mark 2015-04-15 23:55:40 1
2 Mark 2015-04-15 23:55:50 1
3 Mark 2015-04-15 23:56:00 1
...
16 Mark 2015-04-15 23:58:00 9
17 Mark 2015-04-15 23:58:10 9
Any ideas?
So, here is my problem. I have a dataset of locations of radiotagged hummingbirds I’ve been following as part of my thesis. As you might imagine, they fly fast so there were intervals when I lost track of where they were until I eventually found them again.
Now I am trying to identify the segments where the bird was followed continuously (i.e., the intervals between “Lost” periods).
ID Type TimeStart TimeEnd Limiter Starter Ender
1 Observed 6:45:00 6:45:00 NO Start End
2 Lost 6:45:00 5:31:00 YES NO NO
3 Observed 5:31:00 5:31:00 NO Start NO
4 Observed 9:48:00 9:48:00 NO NO NO
5 Observed 10:02:00 10:02:00 NO NO NO
6 Observed 10:18:00 10:18:00 NO NO NO
7 Observed 11:00:00 11:00:00 NO NO NO
8 Observed 13:15:00 13:15:00 NO NO NO
9 Observed 13:34:00 13:34:00 NO NO NO
10 Observed 13:43:00 13:43:00 NO NO NO
11 Observed 13:52:00 13:52:00 NO NO NO
12 Observed 14:25:00 14:25:00 NO NO NO
13 Observed 14:46:00 14:46:00 NO NO End
14 Lost 14:46:00 10:47:00 YES NO NO
15 Observed 10:47:00 10:47:00 NO Start NO
16 Observed 10:57:00 11:00:00 NO NO NO
17 Observed 11:10:00 11:10:00 NO NO NO
18 Observed 11:19:00 11:27:55 NO NO NO
19 Observed 11:28:05 11:32:00 NO NO NO
20 Observed 11:45:00 12:09:00 NO NO NO
21 Observed 11:51:00 11:51:00 NO NO NO
22 Observed 12:11:00 12:11:00 NO NO NO
23 Observed 13:15:00 13:15:00 NO NO End
24 Lost 13:15:00 7:53:00 YES NO NO
25 Observed 7:53:00 7:53:00 NO Start NO
26 Observed 8:48:00 8:48:00 NO NO NO
27 Observed 9:25:00 9:25:00 NO NO NO
28 Observed 9:26:00 9:26:00 NO NO NO
29 Observed 9:32:00 9:33:25 NO NO NO
30 Observed 9:33:35 9:33:35 NO NO NO
31 Observed 9:42:00 9:42:00 NO NO NO
32 Observed 9:44:00 9:44:00 NO NO NO
33 Observed 9:48:00 9:48:00 NO NO NO
34 Observed 9:48:30 9:48:30 NO NO NO
35 Observed 9:51:00 9:51:00 NO NO NO
36 Observed 9:54:00 9:54:00 NO NO NO
37 Observed 9:55:00 9:55:00 NO NO NO
38 Observed 9:57:00 10:01:00 NO NO NO
39 Observed 10:02:00 10:02:00 NO NO NO
40 Observed 10:04:00 10:04:00 NO NO NO
41 Observed 10:06:00 10:06:00 NO NO NO
42 Observed 10:20:00 10:33:00 NO NO NO
43 Observed 10:34:00 10:34:00 NO NO NO
44 Observed 10:39:00 10:39:00 NO NO End
Note: When there is a “Start” and an “End” in the same row it’s because the non-lost period consists only of that record.
I was able to identify the records that start or end these “non-lost” periods (under the columns “Starter” and “Ender”), but now I want to be able to identify those periods by giving them unique identifiers (period A,B,C or 1,2,3, etc).
Ideally, the name of the identifier would be the name of the start point for that period (i.e., ID[ Starter==”Start”])
I'm looking for something like this:
ID Type TimeStart TimeEnd Limiter Starter Ender Period
1 Observed 6:45:00 6:45:00 NO Start End 1
2 Lost 6:45:00 5:31:00 YES NO NO Lost
3 Observed 5:31:00 5:31:00 NO Start NO 3
4 Observed 9:48:00 9:48:00 NO NO NO 3
5 Observed 10:02:00 10:02:00 NO NO NO 3
6 Observed 10:18:00 10:18:00 NO NO NO 3
7 Observed 11:00:00 11:00:00 NO NO NO 3
8 Observed 13:15:00 13:15:00 NO NO NO 3
9 Observed 13:34:00 13:34:00 NO NO NO 3
10 Observed 13:43:00 13:43:00 NO NO NO 3
11 Observed 13:52:00 13:52:00 NO NO NO 3
12 Observed 14:25:00 14:25:00 NO NO NO 3
13 Observed 14:46:00 14:46:00 NO NO End 3
14 Lost 14:46:00 10:47:00 YES NO NO Lost
15 Observed 10:47:00 10:47:00 NO Start NO 15
16 Observed 10:57:00 11:00:00 NO NO NO 15
17 Observed 11:10:00 11:10:00 NO NO NO 15
18 Observed 11:19:00 11:27:55 NO NO NO 15
19 Observed 11:28:05 11:32:00 NO NO NO 15
20 Observed 11:45:00 12:09:00 NO NO NO 15
21 Observed 11:51:00 11:51:00 NO NO NO 15
22 Observed 12:11:00 12:11:00 NO NO NO 15
23 Observed 13:15:00 13:15:00 NO NO End 15
24 Lost 13:15:00 7:53:00 YES NO NO Lost
Would this be too hard to do in R?
Thanks!
> d <- data.frame(Limiter = rep("NO", 44), Starter = rep("NO", 44), Ender = rep("NO", 44), stringsAsFactors = FALSE)
> d$Starter[c(1, 3, 15, 25)] <- "Start"
> d$Ender[c(1, 13, 23, 44)] <- "End"
> d$Limiter[c(2, 14, 24)] <- "Yes"
> d$Period <- ifelse(d$Limiter == "Yes", "Lost", which(d$Starter == "Start")[cumsum(d$Starter == "Start")])
> d
Limiter Starter Ender Period
1 NO Start End 1
2 Yes NO NO Lost
3 NO Start NO 3
4 NO NO NO 3
5 NO NO NO 3
6 NO NO NO 3
7 NO NO NO 3
8 NO NO NO 3
9 NO NO NO 3
10 NO NO NO 3
11 NO NO NO 3
12 NO NO NO 3
13 NO NO End 3
14 Yes NO NO Lost
15 NO Start NO 15
16 NO NO NO 15
17 NO NO NO 15
18 NO NO NO 15
19 NO NO NO 15
20 NO NO NO 15
21 NO NO NO 15
22 NO NO NO 15
23 NO NO End 15
24 Yes NO NO Lost
25 NO Start NO 25
26 NO NO NO 25
27 NO NO NO 25
28 NO NO NO 25
29 NO NO NO 25
30 NO NO NO 25
31 NO NO NO 25
32 NO NO NO 25
33 NO NO NO 25
34 NO NO NO 25
35 NO NO NO 25
36 NO NO NO 25
37 NO NO NO 25
38 NO NO NO 25
39 NO NO NO 25
40 NO NO NO 25
41 NO NO NO 25
42 NO NO NO 25
43 NO NO NO 25
44 NO NO End 25
I have one file (location) that has an x,y coordinates and a date/time identification. I want to get information from a second table (weather) that has a "similar" date/time variable and the co-variables (temperature and wind speed). The trick is the date/time are not exactly the same numbers in both tables. I want to select the weather data that is closest from the location data. I know I need to do some loops and thats about it.
Example location example weather
x y date/time date/time temp wind
1 3 01/02/2003 18:00 01/01/2003 13:00 12 15
2 3 01/02/2003 19:00 01/02/2003 16:34 10 16
3 4 01/03/2003 23:00 01/02/2003 20:55 14 22
2 5 01/04/2003 02:00 01/02/2003 21:33 14 22
01/03/2003 00:22 13 19
01/03/2003 14:55 12 12
01/03/2003 18:00 10 12
01/03/2003 23:44 2 33
01/04/2003 01:55 6 22
So the final output would be a table with the correctly "best" matched weather data to the location data
x y datetime datetime temp wind
1 3 01/02/2003 18:00 ---- 01/02/2003 16:34 10 16
2 3 01/02/2003 19:00 ---- 01/02/2003 20:55 14 22
3 4 01/03/2003 23:00 ---- 01/03/2003 00:22 13 19
2 5 01/04/2003 02:00 ---- 01/04/2003 01:55 6 22
Any suggestions where to start? I am trying to do this in R
I needed to bring that data in as data and time separately and then paste and format
location$dt.time <- as.POSIXct(paste(location$date, location$time),
format="%m/%d/%Y %H:%M")
And the same for weather
Then for each value of date.time in location, find the entry in weather that has the lowest absolute values for the time differences:
sapply(location$dt.time, function(x) which.min(abs(difftime(x, weather$dt.time))))
# [1] 2 3 8 9
cbind(location, weather[ sapply(location$dt.time,
function(x) which.min(abs(difftime(x, weather$dt.time)))), ])
x y date time dt.time date time temp wind dt.time
2 1 3 01/02/2003 18:00 2003-01-02 18:00:00 01/02/2003 16:34 10 16 2003-01-02 16:34:00
3 2 3 01/02/2003 19:00 2003-01-02 19:00:00 01/02/2003 20:55 14 22 2003-01-02 20:55:00
8 3 4 01/03/2003 23:00 2003-01-03 23:00:00 01/03/2003 23:44 2 33 2003-01-03 23:44:00
9 2 5 01/04/2003 02:00 2003-01-04 02:00:00 01/04/2003 01:55 6 22 2003-01-04 01:55:00
cbind(location, weather[
sapply(location$dt.time,
function(x) which.min(abs(difftime(x, weather$dt.time)))), ])[ #pick columns
c(1,2,5,8,9,10)]
x y dt.time temp wind dt.time.1
2 1 3 2003-01-02 18:00:00 10 16 2003-01-02 16:34:00
3 2 3 2003-01-02 19:00:00 14 22 2003-01-02 20:55:00
8 3 4 2003-01-03 23:00:00 2 33 2003-01-03 23:44:00
9 2 5 2003-01-04 02:00:00 6 22 2003-01-04 01:55:00
My answers seem a bit different than yours but another reader has already questioned your abilities to do the matching properly by hand.
One fast and short way may be using data.table.
If you create two data.table's X and Y, both with keys, then the syntax is :
X[Y,roll=TRUE]
We call that a rolling join because we roll the prevailing observation in X forward to match the row in Y. See the examples in ?data.table and the introduction vignette.
Another way to do this is the zoo package which has locf (last observation carried forward), and possibly other packages too.
I'm not sure if you mean closest in terms of location, or time. If location, and that location is x,y coordinates then you will need some distance measure in 2D space I guess. data.table only does univariate 'closest' e.g. by time. Reading your question for a 2nd time it does seem you mean closest in the prevailing sense though.
EDIT: Seen the example data now. data.table won't do this in one step because although it can roll forwards or backwards, it won't roll to the nearest. You could do it with an extra step using which=TRUE and then test whether the one after the prevailing was actually closer.