Removed rows containing missing values ggmap - r

So I'm trying to make a 2d latitude-longitude map with the package ggmap and I'm encountering a problem:
dataset:
slddataset
# A tibble: 382 x 17
station year jd sl_pa sst sss ssf depth sbt sbs sbf gravel sand silt clay lat long
<int> <int> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 4101 2014 142 0 11.7 25.0 0.419 39.9 4.95 31.9 0.320 2.36 97.5 0.110 0.0300 42.2 70.3
2 4102 2014 142 0 11.3 37.8 0.509 27.6 5.03 31.9 0.372 0.390 99.5 0.0700 0.0200 42.2 70.3
3 4104 2014 142 0 11.3 41.2 0.803 24.9 5.50 31.7 0.556 0.700 99.2 0.0800 0.0700 42.2 70.3
4 4105 2014 142 0 10.6 30.8 0.808 28.3 5.14 31.9 0.596 6.83 93.1 0.0700 0.0300 42.2 70.2
5 4106 2014 142 0 10.5 30.7 0.693 35.6 4.93 32.1 0.887 10.8 89.1 0.0500 0.0700 42.2 70.2
6 4107 2014 142 0 11.0 30.7 0.724 41.3 4.44 32.3 0.684 11.3 88.5 0.110 0.120 42.2 70.2
7 4108 2014 142 0 10.3 30.8 0.741 44.4 4.28 32.5 0.340 4.77 95.0 0.110 0.100 42.2 70.1
8 4109 2014 142 0 9.97 30.9 0.980 44.3 4.32 32.4 0.398 7.80 92.0 0.110 0.110 42.2 70.1
9 4110 2014 142 0 10.9 30.7 0.794 41.2 4.60 32.3 0.592 10.3 89.5 0.100 0.0900 42.2 70.2
10 4113 2014 143 0 12.0 30.5 0.684 32.2 4.98 31.9 0.336 0.320 99.6 0.0600 0.0300 42.2 70.3
# ... with 372 more rows
error:
library(ggmap)
stellwagen<-ggmap(get_googlemap(center="stellwagen bank",zoom=7,maptype = "satellite"))
stellwagen + geom_point(aes(x=long, y=lat, color=sl_pa),data=slddataset)
Warning message: Removed 382 rows containing missing values
(geom_point).
Anyone have any ideas?

I think your longitudes are wrong in slddataset. They should all be negative. After correcting those, I can plot the points on the map.
library(dplyr)
library(ggmap)
slddataset <- slddataset %>% mutate(long = long * -1)
stellwagen<-ggmap(get_googlemap(center="stellwagen bank",zoom=7,maptype = "satellite"))
stellwagen +
geom_point(aes(x=long, y=lat),data=slddataset)
DATA
slddataset <- read.table(text = "station year jd sl_pa sst sss ssf depth sbt sbs sbf gravel sand silt clay lat long
1 4101 2014 142 0 11.7 25.0 0.419 39.9 4.95 31.9 0.320 2.36 97.5 0.110 0.0300 42.2 70.3
2 4102 2014 142 0 11.3 37.8 0.509 27.6 5.03 31.9 0.372 0.390 99.5 0.0700 0.0200 42.2 70.3
3 4104 2014 142 0 11.3 41.2 0.803 24.9 5.50 31.7 0.556 0.700 99.2 0.0800 0.0700 42.2 70.3
4 4105 2014 142 0 10.6 30.8 0.808 28.3 5.14 31.9 0.596 6.83 93.1 0.0700 0.0300 42.2 70.2
5 4106 2014 142 0 10.5 30.7 0.693 35.6 4.93 32.1 0.887 10.8 89.1 0.0500 0.0700 42.2 70.2
6 4107 2014 142 0 11.0 30.7 0.724 41.3 4.44 32.3 0.684 11.3 88.5 0.110 0.120 42.2 70.2
7 4108 2014 142 0 10.3 30.8 0.741 44.4 4.28 32.5 0.340 4.77 95.0 0.110 0.100 42.2 70.1
8 4109 2014 142 0 9.97 30.9 0.980 44.3 4.32 32.4 0.398 7.80 92.0 0.110 0.110 42.2 70.1
9 4110 2014 142 0 10.9 30.7 0.794 41.2 4.60 32.3 0.592 10.3 89.5 0.100 0.0900 42.2 70.2
10 4113 2014 143 0 12.0 30.5 0.684 32.2 4.98 31.9 0.336 0.320 99.6 0.0600 0.0300 42.2 70.3",
header = TRUE, stringsAsFactors = FALSE)

Related

Obtaining hourly average data from 1 minute dataframe

I have a data set in 1 minute interval, but I am looking for a way to convert it to hourly average. I am new to R programming for data analysis. Below is an example of how my data looks.
Please if there are other easy ways besides using R to solve this issue, kindly specify. I hope to hear from anyone soon
TimeStamp TSP PM10 PM2.5 PM1 T RH
1 01/12/2022 14:08 44.3 14.2 6.97 3.34 32.9 53.2
2 01/12/2022 14:09 40.3 16.9 7.10 3.52 33.1 53.1
3 01/12/2022 14:10 36.5 15.6 7.43 3.64 33.2 53.1
4 01/12/2022 14:11 33.0 16.5 7.29 3.40 33.2 52.6
5 01/12/2022 14:12 41.3 18.2 7.73 3.41 33.3 52.9
6 01/12/2022 14:13 38.5 16.3 7.54 3.44 33.3 53.3
7 01/12/2022 14:14 38.5 18.5 6.80 3.14 33.2 53.6
8 01/12/2022 14:15 30.7 17.1 6.86 3.33 33.2 53.7
9 01/12/2022 14:16 32.5 18.3 8.56 4.42 33.3 53.5
10 01/12/2022 14:17 26.4 15.6 9.34 4.70 33.4 53.0
11 01/12/2022 14:18 23.8 14.6 7.56 3.97 33.4 52.5
12 01/12/2022 14:19 18.1 11.4 6.15 3.08 33.4 51.7
13 01/12/2022 14:20 22.4 12.2 6.43 3.49 33.5 50.9
14 01/12/2022 14:21 17.9 12.9 6.03 3.15 33.6 50.9
15 01/12/2022 14:22 18.6 12.8 5.87 3.19 33.7 50.7
16 01/12/2022 14:23 22.3 10.7 5.49 2.74 33.7 50.6
17 01/12/2022 14:24 18.1 9.2 4.87 2.52 33.7 49.9
18 01/12/2022 14:25 19.2 13.0 5.12 2.65 33.7 50.2
19 01/12/2022 14:26 19.0 10.3 5.01 2.78 33.9 50.0
20 01/12/2022 14:27 20.0 10.3 4.78 2.57 34.0 49.4
21 01/12/2022 14:28 14.1 9.6 4.71 2.45 34.1 49.0
22 01/12/2022 14:29 16.1 10.3 4.83 2.68 34.1 48.9
23 01/12/2022 14:30 13.9 10.0 5.21 2.99 34.2 49.5
24 01/12/2022 14:31 27.3 11.5 5.90 2.94 34.2 49.7
25 01/12/2022 14:32 23.8 12.8 5.77 2.97 34.2 49.6
26 01/12/2022 14:33 19.3 12.4 5.92 3.29 34.3 49.6
27 01/12/2022 14:34 30.9 14.4 6.10 3.22 34.3 49.3
28 01/12/2022 14:35 30.5 15.0 5.73 2.98 34.3 49.9
29 01/12/2022 14:36 24.7 13.9 6.17 3.17 34.3 50.0
30 01/12/2022 14:37 27.0 12.3 6.16 3.14 34.2 50.2
31 01/12/2022 14:38 27.0 12.4 5.65 3.28 34.2 50.3
32 01/12/2022 14:39 22.2 12.5 5.51 3.10 34.2 50.2
33 01/12/2022 14:40 19.0 11.6 5.46 3.06 34.1 50.3
34 01/12/2022 14:41 24.3 14.3 5.45 3.01 34.1 50.2
35 01/12/2022 14:42 17.6 10.9 5.64 3.30 34.1 50.5
36 01/12/2022 14:43 20.9 10.1 5.80 3.26 34.0 51.0
37 01/12/2022 14:44 19.0 11.7 5.93 3.27 33.9 50.9
38 01/12/2022 14:45 25.7 15.6 6.20 3.40 33.9 51.1
39 01/12/2022 14:46 20.1 14.4 6.08 3.39 34.0 51.3
40 01/12/2022 14:47 14.8 11.1 5.91 3.44 34.1 50.9
I have tried several methods I got via my research but non seems to work for me. Below are the codes I have tried
ref.data.hourly <- ref.data %>%
group_by(hour = format (as.POSIXct(cut(TimeStamp, break = "hour")), "%H")) %>%
summarise(meanval = mean(val, na.rm = TRUE))
I have also tried this
ref.data$TimeStamp <- as.POSIXct(ref.data$TimeStamp, format = "%d/%m/%Y %H:%M")
ref.data.xts$TimeStamp <- NULL
ref.data$TimeStamp <- strptime(ref.data$TimeStamp, "%d/%m/%Y %H:%M")
ref.data$group <- cut(ref.data$TimeStamp, breaks = "hour")
Your first attempt seems sensible to me. Lacking further info about your data or a specific error message, I assume the problem is handling the date-time formatting (or actually using cut() with date-time values).
A workaround is to convert the dates to character (if they aren't yet) and then just omit the minutes. Given that as.character(ref.data$timeStamp) is consistently formatted like e.g. 01/12/2022 14:08, you can do the following:
ref.data.hourly <- ref.data %>%
mutate(hour_grps = substr(as.character(TimeStamp), 1, 13)) %>%
group_by(hour_grps) %>%
summarise(meanval = mean(val, na.rm = TRUE))
I don't think this is good practice because it will break if you use the same code on slightly different formatted data. For instance, if the code were used on a computer with different locale, the date-time formatting used with as.character() may change. So please consider this a quick fix, not a permanent solution.

dplyr - programming dynamic variable & function name - ascending & descending

I am trying to find way to shorten my code using dynamic naming variables & functions related with ascending & descending order. Though I can do desc but couldn't find anything for ascending. Below is the reproducible example to demonstrate my problem.
Here is the sample dataset
library(dplyr)
set.seed(100)
data <- tibble(a = runif(20, min = 0, max = 100),
b = runif(20, min = 0, max = 100),
c = runif(20, min = 0, max = 100))
Dynamically passing variable with percent rank in ascending order
current_var <- "a" # dynamic variable name
data %>%
mutate("percent_rank_{current_var}" := percent_rank(!!sym(current_var)))
#> # A tibble: 20 × 4
#> a b c percent_rank_a
#> <dbl> <dbl> <dbl> <dbl>
#> 1 30.8 53.6 33.1 0.263
#> 2 25.8 71.1 86.5 0.158
#> 3 55.2 53.8 77.8 0.684
#> 4 5.64 74.9 82.7 0
#> 5 46.9 42.0 60.3 0.526
#> 6 48.4 17.1 49.1 0.579
#> 7 81.2 77.0 78.0 0.947
#> 8 37.0 88.2 88.4 0.421
#> 9 54.7 54.9 20.8 0.632
#> 10 17.0 27.8 30.7 0.0526
#> 11 62.5 48.8 33.1 0.737
#> 12 88.2 92.9 19.9 1
#> 13 28.0 34.9 23.6 0.211
#> 14 39.8 95.4 27.5 0.474
#> 15 76.3 69.5 59.1 0.895
#> 16 66.9 88.9 25.3 0.789
#> 17 20.5 18.0 12.3 0.105
#> 18 35.8 62.9 23.0 0.316
#> 19 35.9 99.0 59.8 0.368
#> 20 69.0 13.0 21.1 0.842
Dynamically passing variable with percent rank in descending order
data %>%
mutate("percent_rank_{current_var}" := percent_rank(desc(!!sym(current_var))))
#> # A tibble: 20 × 4
#> a b c percent_rank_a
#> <dbl> <dbl> <dbl> <dbl>
#> 1 30.8 53.6 33.1 0.737
#> 2 25.8 71.1 86.5 0.842
#> 3 55.2 53.8 77.8 0.316
#> 4 5.64 74.9 82.7 1
#> 5 46.9 42.0 60.3 0.474
#> 6 48.4 17.1 49.1 0.421
#> 7 81.2 77.0 78.0 0.0526
#> 8 37.0 88.2 88.4 0.579
#> 9 54.7 54.9 20.8 0.368
#> 10 17.0 27.8 30.7 0.947
#> 11 62.5 48.8 33.1 0.263
#> 12 88.2 92.9 19.9 0
#> 13 28.0 34.9 23.6 0.789
#> 14 39.8 95.4 27.5 0.526
#> 15 76.3 69.5 59.1 0.105
#> 16 66.9 88.9 25.3 0.211
#> 17 20.5 18.0 12.3 0.895
#> 18 35.8 62.9 23.0 0.684
#> 19 35.9 99.0 59.8 0.632
#> 20 69.0 13.0 21.1 0.158
How to combine both into one statement? - I can do for desc but couldn't find any explicit statement for ascending order
rank_function <- desc # dynamic function for ranking
data %>%
mutate("percent_rank_{current_var}" := percent_rank(rank_function(!!sym(current_var))))
#> # A tibble: 20 × 4
#> a b c percent_rank_a
#> <dbl> <dbl> <dbl> <dbl>
#> 1 30.8 53.6 33.1 0.737
#> 2 25.8 71.1 86.5 0.842
#> 3 55.2 53.8 77.8 0.316
#> 4 5.64 74.9 82.7 1
#> 5 46.9 42.0 60.3 0.474
#> 6 48.4 17.1 49.1 0.421
#> 7 81.2 77.0 78.0 0.0526
#> 8 37.0 88.2 88.4 0.579
#> 9 54.7 54.9 20.8 0.368
#> 10 17.0 27.8 30.7 0.947
#> 11 62.5 48.8 33.1 0.263
#> 12 88.2 92.9 19.9 0
#> 13 28.0 34.9 23.6 0.789
#> 14 39.8 95.4 27.5 0.526
#> 15 76.3 69.5 59.1 0.105
#> 16 66.9 88.9 25.3 0.211
#> 17 20.5 18.0 12.3 0.895
#> 18 35.8 62.9 23.0 0.684
#> 19 35.9 99.0 59.8 0.632
#> 20 69.0 13.0 21.1 0.158
Created on 2022-08-17 by the reprex package (v2.0.1)
You could compose a function to return its input:
rank_function <- function(x) x
Actually this function has been defined in base, i.e. identity.
rank_function <- identity
Also, you can explore the source code of desc:
desc
function (x) -xtfrm(x)
Apparently desc is just the opposite number of xtfrm. So you can use it for ascending ordering.
rank_function <- xtfrm
In the help document of xtfrm(x):
A generic auxiliary function that produces a numeric vector which will sort in the same order as x.

R output giving wrong values for difference between columns

I have this tibble called data1:
structure(list(subject = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
12), treatment = c("099526-01", "099526-01", "099526-01", "099526-01",
"099526-01", "099526-01", "099526-01", "099526-01", "099526-01",
"099526-01", "099526-01", "099526-01"), T0 = c(34.35, 26.5, 29.65,
11.575, 34.4, 25.775, 33, 31.6, 18.35, 36.275, 36.075, 34.225
), T15min = c(34.85, 28.95, 30.2, 11.05, 34.1, 22.025, 25.325,
31.775, 17.8, 31.7, 35.35, 34.25), T2h = c(33.425, 26.125, 27.65,
11.475, 36.95, 22.975, 30.025, 31.775, 18.025, 33.025, 34.125,
34.55), T4h = c(35.7, 26.075, 29.3, 13.275, 36.45, 28.475, 30.925,
32.15, 17.425, 34.95, 34.55, 34.775), T6h = c(36.225, 28.15,
29.1, 12.25, 34.275, 26.05, 28.1, 34.025, 17.775, 35.3, 35.125,
36.725), T8h = c(34.9, 25.75, 30.425, 10.75, 34.425, 28.725,
28.475, 34.35, 19.325, 33.925, 36.95, 38.225)), row.names = c(NA,
-12L), class = c("tbl_df", "tbl", "data.frame"))
subject treatment T0 T15min T2h T4h T6h T8h
<dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 099526-01 34.4 34.8 33.4 35.7 36.2 34.9
2 2 099526-01 26.5 29.0 26.1 26.1 28.2 25.8
3 3 099526-01 29.6 30.2 27.6 29.3 29.1 30.4
4 4 099526-01 11.6 11.0 11.5 13.3 12.2 10.8
5 5 099526-01 34.4 34.1 37.0 36.4 34.3 34.4
6 6 099526-01 25.8 22.0 23.0 28.5 26.0 28.7
7 7 099526-01 33 25.3 30.0 30.9 28.1 28.5
8 8 099526-01 31.6 31.8 31.8 32.2 34.0 34.4
9 9 099526-01 18.4 17.8 18.0 17.4 17.8 19.3
10 10 099526-01 36.3 31.7 33.0 35.0 35.3 33.9
11 11 099526-01 36.1 35.4 34.1 34.6 35.1 37.0
12 12 099526-01 34.2 34.2 34.6 34.8 36.7 38.2
I'm creating a new tibble with new columns as the difference of times to T0 (e.g., T15min-T0, T2h-T0), as follows:
data2 <- data1 %>%
mutate(delta_1 = .[[4]] - .[[3]],
delta_2 = .[[5]] - .[[3]],
delta_3 = .[[6]] - .[[3]],
delta_4 = .[[7]] - .[[3]],
delta_5 = .[[8]] - .[[3]])
subject treatment T0 T15min T2h T4h T6h T8h delta_1 delta_2 delta_3 delta_4 delta_5
<dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 099526-01 34.4 34.8 33.4 35.7 36.2 34.9 0.5 -0.925 1.35 1.88 0.550
2 2 099526-01 26.5 29.0 26.1 26.1 28.2 25.8 2.45 -0.375 -0.425 1.65 -0.75
3 3 099526-01 29.6 30.2 27.6 29.3 29.1 30.4 0.550 -2 -0.350 -0.550 0.775
4 4 099526-01 11.6 11.0 11.5 13.3 12.2 10.8 -0.525 -0.100 1.70 0.675 -0.825
5 5 099526-01 34.4 34.1 37.0 36.4 34.3 34.4 -0.300 2.55 2.05 -0.125 0.0250
6 6 099526-01 25.8 22.0 23.0 28.5 26.0 28.7 -3.75 -2.80 2.70 0.275 2.95
7 7 099526-01 33 25.3 30.0 30.9 28.1 28.5 -7.68 -2.98 -2.07 -4.9 -4.52
8 8 099526-01 31.6 31.8 31.8 32.2 34.0 34.4 0.175 0.175 0.550 2.43 2.75
9 9 099526-01 18.4 17.8 18.0 17.4 17.8 19.3 -0.550 -0.325 -0.925 -0.575 0.975
10 10 099526-01 36.3 31.7 33.0 35.0 35.3 33.9 -4.57 -3.25 -1.32 -0.975 -2.35
11 11 099526-01 36.1 35.4 34.1 34.6 35.1 37.0 -0.725 -1.95 -1.53 -0.950 0.875
12 12 099526-01 34.2 34.2 34.6 34.8 36.7 38.2 0.0250 0.325 0.550 2.50 4
However, the differences are not correct. For example, for the first subject, T2h - T0 (33.4 - 34.4) should result -1, and not -0.925
What could be wrong with the code?
The code is correct. However, it appears that the df output (View) is limited to 1 decimal, while your values have 3 decimals.
Try running
options(digits = 5)
at the top of your script.

Taylor Diagrams by Group in R (openair)

I'm trying to create a taylor diagram to show agreement between observations and model output. The openair package lets you differentiate by a group, which I would like to do for each site.
This is the code that I'm using:
TaylorDiagram(month_join, obs = "temp", mod = "temp_surf", group = "dataset_id", normalise = TRUE, cex = 1)
The observation variable is temp, model variable is temp_surf, and site that I want to differentiate by different groups, is dataset_id.
When I do this, though there are 17 different datasets, they are binned into four groups. I can't find any help online about this. The function documentation says that for the group argument, "The total number of models compared will be equal to the number of unique values of group". I have 17 unique values in the group but they are automatically binned into 4.
Taylor diagram with 4 groups instead of 17
[Edit: first 20 rows of data from month_join]
# A tibble: 20 × 14
# Groups: dataset_id [2]
dataset_id month temp_surf temp_mid temp_bot ph_surf ph_mid ph_bot do_surf do_mid do_bot temp ph do
<dbl> <ord> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 3 1 13.4 13.3 13.2 8.01 7.99 7.97 244. 232. 220. 13.3 8.00 NaN
2 3 2 13.3 13.2 13.0 8.01 7.98 7.96 245. 232. 218. 12.5 7.99 NaN
3 3 3 12.9 12.7 12.5 7.97 7.94 7.91 233. 216. 199. 12.7 8.04 NaN
4 3 4 12.6 12.4 12.2 7.93 7.91 7.89 223. 207. 190. NaN NaN NaN
5 3 5 12.9 12.7 12.4 7.93 7.91 7.89 223. 208. 193. NaN NaN NaN
6 3 6 13.5 13.2 12.9 7.94 7.92 7.90 226. 212. 198. 15.1 8.04 NaN
7 3 7 14.3 13.9 13.5 7.97 7.95 7.94 236. 224. 212. 16.0 8.09 NaN
8 3 8 14.4 14.1 13.8 7.98 7.97 7.95 238. 228. 217. 16.6 8.06 NaN
9 3 9 14.8 14.5 14.1 8.00 7.99 7.97 244. 235. 227. 16.7 8.05 NaN
10 3 10 14.8 14.4 14.1 8.00 7.98 7.96 243. 233. 222. 16.2 8.05 NaN
11 3 11 14.3 14.0 13.7 7.99 7.96 7.94 237. 224. 211. 15.5 8.05 NaN
12 3 12 13.6 13.4 13.3 7.99 7.97 7.94 237. 225. 213. 14.4 8.05 NaN
13 6 1 14.3 9.48 4.70 8.07 7.84 7.62 261. 143. 24.7 13.6 NaN NaN
14 6 2 14.2 9.42 4.68 8.07 7.84 7.62 264. 144. 24.4 13.5 NaN NaN
15 6 3 14.5 9.61 4.67 8.07 7.84 7.61 266. 145. 24.2 14.0 NaN NaN
16 6 4 15.0 9.86 4.68 8.06 7.84 7.61 264. 144. 24.0 14.3 NaN NaN
17 6 5 16.0 10.4 4.68 8.05 7.83 7.61 262. 143. 24.0 16.4 NaN NaN
18 6 6 17.3 11.0 4.68 8.04 7.83 7.61 257. 141. 23.9 17.6 NaN NaN
19 6 7 18.8 11.7 4.71 8.03 7.82 7.61 251. 138. 24.2 19.3 NaN NaN
20 6 8 19.2 12.0 4.76 8.03 7.82 7.61 248. 136. 24.7 NA NA NA

Shift time series

I have 2 weekly time-series, which show a small correlation (~0.33).
How can i 'shift in time' one of these series, so that i can check if there's a
greater correlation in the data?
Example data:
x = textConnection('1530.2 1980.9 1811 1617 1585.4 1951.8 2146.6 1605 1395.2 1742.6 2206.5 1839.4 1699.1 1665.9 2144.7 2189.1 1718.4 1615.5 2003.3 2267.6 1772.1 1635.2 1836 2261.8 1799.1 1634.9 1638.6 2056.5 2201.4 1726.8 1586.4 1747.9 1982 1695.2 1624.9 1652.4 2011.9 1788.8 1568.4 1540.7 1866.1 2097.3 1601.3 1458.6 1424.4 1786.9 1628.4 1467.4 1476.2 1823 1736.7 1482.7 1334.2 1871.9 1752.9 1471.6 1583.2 1601.4 1987.7 1649.6 1530.9 1547.1 2165.2 1852 1656.9 1605.2 2184.6 1972 1617.6 1491.1 1709.5 2042.2 1667.1 1542.6 1497.6 2090.5 1816.8 1487.5 1468.2 2228.5 1889.9 1690.8 1395.7 1532.8 1934.4 1557.1 1570.6 1453.2 1669.6 1782 1526.1 1411 1608.1 1740.5 1492.3 1477.8 1102.6 1366.1 1701.1 1500.6 1403.2 1787.2 1776.6 1465.3 1429.5')
x = scan(x)
y = textConnection('29.8 22.6 26 24.8 28.9 27.3 26 29.2 28.2 23.9 24.5 23.6 21.1 22 20.7 19.9 22.8 25 21.6 19.1 27.2 23.7 24.2 22.4 25.5 25.4 23.4 24.7 27.4 23.4 25.8 28.8 27.7 23.7 22.9 29.4 22.6 28.6 22.2 27.6 26.2 26.2 29.8 31.5 24.5 28.7 25.9 26.9 25.9 30.5 30.5 29.4 29.3 31.4 30 27.9 28.5 26.4 29.5 28.4 25.1 24.6 21.1 23.6 20.5 23.7 25.3 20.2 23.4 21.1 23.1 24.6 20.7 20.7 26.9 24.1 24.7 25.8 26.7 26 28.9 29.5 27.4 22.1 31.6 25 27.4 30.4 28.9 27.4 22.5 28.4 28.7 31.1 29.3 28.3 30.6 28.6 26 26.2 26.2 26.7 25.6 31.5 30.9')
y = scan(y)
I'm using R with dtw package, but i'm not familiar with these kind of algorithms.
Thanks for any help!
You could try the ccf() function in base R. This estimates the cross-correlation function of the two time series.
For example, using your data (see below if interested in how I got the data you pasted into your Question into R objects x and y)
xyccf <- ccf(x, y)
yielding
> xyccf
Autocorrelations of series ‘X’, by lag
-17 -16 -15 -14 -13 -12 -11 -10 -9 -8 -7
0.106 0.092 0.014 0.018 0.011 0.029 -0.141 -0.153 -0.107 -0.141 -0.221
-6 -5 -4 -3 -2 -1 0 1 2 3 4
-0.274 -0.175 -0.277 -0.176 -0.217 -0.253 -0.339 -0.274 -0.267 -0.330 -0.278
5 6 7 8 9 10 11 12 13 14 15
-0.184 -0.120 -0.200 -0.156 -0.184 -0.062 -0.076 -0.117 -0.048 0.015 -0.016
16 17
-0.038 -0.029
and this plot
To interpret this, when the lag is positive, y is leading x whereas when the lag is negative x is leading y.
Reading your data into R...
x <- scan(text = "1530.2 1980.9 1811 1617 1585.4 1951.8 2146.6 1605 1395.2 1742.6
2206.5 1839.4 1699.1 1665.9 2144.7 2189.1 1718.4 1615.5 2003.3
2267.6 1772.1 1635.2 1836 2261.8 1799.1 1634.9 1638.6 2056.5
2201.4 1726.8 1586.4 1747.9 1982 1695.2 1624.9 1652.4 2011.9
1788.8 1568.4 1540.7 1866.1 2097.3 1601.3 1458.6 1424.4 1786.9
1628.4 1467.4 1476.2 1823 1736.7 1482.7 1334.2 1871.9 1752.9
1471.6 1583.2 1601.4 1987.7 1649.6 1530.9 1547.1 2165.2 1852
1656.9 1605.2 2184.6 1972 1617.6 1491.1 1709.5 2042.2 1667.1
1542.6 1497.6 2090.5 1816.8 1487.5 1468.2 2228.5 1889.9 1690.8
1395.7 1532.8 1934.4 1557.1 1570.6 1453.2 1669.6 1782 1526.1
1411 1608.1 1740.5 1492.3 1477.8 1102.6 1366.1 1701.1 1500.6
1403.2 1787.2 1776.6 1465.3 1429.5")
y <- scan(text = "29.8 22.6 26 24.8 28.9 27.3 26 29.2 28.2 23.9 24.5 23.6 21.1 22
20.7 19.9 22.8 25 21.6 19.1 27.2 23.7 24.2 22.4 25.5 25.4 23.4
24.7 27.4 23.4 25.8 28.8 27.7 23.7 22.9 29.4 22.6 28.6 22.2 27.6
26.2 26.2 29.8 31.5 24.5 28.7 25.9 26.9 25.9 30.5 30.5 29.4 29.3
31.4 30 27.9 28.5 26.4 29.5 28.4 25.1 24.6 21.1 23.6 20.5 23.7
25.3 20.2 23.4 21.1 23.1 24.6 20.7 20.7 26.9 24.1 24.7 25.8 26.7
26 28.9 29.5 27.4 22.1 31.6 25 27.4 30.4 28.9 27.4 22.5 28.4 28.7
31.1 29.3 28.3 30.6 28.6 26 26.2 26.2 26.7 25.6 31.5 30.9")

Resources