Suppose I have a dataframe with many columns which can be matched into pairs.
E.g.
df = tibble(x = rnorm(1000), y = rnorm(1000))
create_many_columns <- function(df, n) {
varname1 <- paste("x", n , sep=".")
varname2 <- paste("y", n , sep=".")
df %>%
mutate(!!varname1 := x * n) %>%
mutate(!!varname2 := y * n)
}
df
It's clear that we can match columns (x.n and y.n)
# A tibble: 1,000 x 22
x y x.2 y.2 x.3 y.3 x.4 y.4 x.5 y.5 x.6 y.6 x.7 y.7 x.8 y.8
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 -1.57 0.597 -3.14 1.19 -4.71 1.79 -6.28 2.39 -7.85 2.99 -9.42 3.58 -11.0 4.18 -12.6 4.78
2 -1.20 1.02 -2.40 2.03 -3.60 3.05 -4.80 4.06 -6.00 5.08 -7.20 6.10 -8.40 7.11 -9.60 8.13
3 1.16 -0.304 2.32 -0.609 3.47 -0.913 4.63 -1.22 5.79 -1.52 6.95 -1.83 8.10 -2.13 9.26 -2.44
4 0.870 -1.73 1.74 -3.45 2.61 -5.18 3.48 -6.90 4.35 -8.63 5.22 -10.4 6.09 -12.1 6.96 -13.8
5 0.621 1.89 1.24 3.78 1.86 5.68 2.48 7.57 3.11 9.46 3.73 11.4 4.35 13.2 4.97 15.1
6 -0.970 0.347 -1.94 0.694 -2.91 1.04 -3.88 1.39 -4.85 1.74 -5.82 2.08 -6.79 2.43 -7.76 2.78
7 0.453 0.0866 0.906 0.173 1.36 0.260 1.81 0.346 2.26 0.433 2.72 0.520 3.17 0.606 3.62 0.693
8 -0.840 -0.956 -1.68 -1.91 -2.52 -2.87 -3.36 -3.82 -4.20 -4.78 -5.04 -5.73 -5.88 -6.69 -6.72 -7.64
9 -0.938 -0.967 -1.88 -1.93 -2.81 -2.90 -3.75 -3.87 -4.69 -4.83 -5.63 -5.80 -6.57 -6.77 -7.51 -7.73
10 -0.551 0.0267 -1.10 0.0535 -1.65 0.0802 -2.21 0.107 -2.76 0.134 -3.31 0.160 -3.86 0.187 -4.41 0.214
# … with 990 more rows, and 6 more variables: x.9 <dbl>, y.9 <dbl>, x.10 <dbl>, y.10 <dbl>, x.11 <dbl>, y.11 <dbl>
I want to get a sequence of columns which will be a product of the matched columns. E.g.
for(i in 2:11){
df[[paste0("z.", i)]] = df[[paste0("x.", i)]] * df[[paste0("y.", i)]]
}
df %>% select(contains("z"))
# A tibble: 1,000 x 10
z.2 z.3 z.4 z.5 z.6 z.7 z.8 z.9 z.10 z.11
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 -1.44 -3.25 -5.78 -9.02 -13.0 -17.7 -23.1 -29.2 -36.1 -43.7
2 0.865 1.95 3.46 5.41 7.79 10.6 13.8 17.5 21.6 26.2
3 0.972 2.19 3.89 6.07 8.75 11.9 15.6 19.7 24.3 29.4
4 3.54 7.96 14.2 22.1 31.9 43.4 56.6 71.7 88.5 107.
5 -0.298 -0.671 -1.19 -1.86 -2.68 -3.65 -4.77 -6.04 -7.45 -9.02
6 4.10 9.22 16.4 25.6 36.9 50.2 65.5 82.9 102. 124.
7 3.61 8.12 14.4 22.6 32.5 44.2 57.8 73.1 90.2 109.
8 -1.17 -2.64 -4.69 -7.33 -10.5 -14.4 -18.8 -23.7 -29.3 -35.5
9 1.52 3.42 6.08 9.50 13.7 18.6 24.3 30.8 38.0 46.0
10 -0.0328 -0.0738 -0.131 -0.205 -0.295 -0.402 -0.525 -0.665 -0.820 -0.993
# … with 990 more rows
This solution is fine if I don't care about overloading my code with loops. But I do, since I have to apply this type of transformations regularly. Is there any options
to write it in a more parsimonious way?
For instance, if I wanted to get an exponent of all elements of "x" columns, I could do
df %>%
mutate_at(vars(contains("x")), exp )
rather than write a loop like
for(i in 2:11){
df[[paste0("x.", i)]] = exp(df[[paste0("x.", i)]] )
}
For the initial example, I would expect, that there is an option to write something like
df %>% mutate(z.n = x.n * y.n, n = 2:11)
Related
I am looking for a nice tidy/dplyr approach to compute the difference between all possible pair of columns (including repeats e.g A-B & B-A) in a dataframe.
I start with df and would like to end with end_df:
library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 4.2.1
#> Warning: package 'tibble' was built under R version 4.2.1
df <- tibble(A = rnorm(1:10),
B = rnorm(1:10),
C = rnorm(1:10))
print(df)
#> # A tibble: 10 × 3
#> A B C
#> <dbl> <dbl> <dbl>
#> 1 -0.292 1.27 0.783
#> 2 -1.11 0.254 -0.410
#> 3 2.05 1.67 1.35
#> 4 1.31 0.0329 -1.29
#> 5 -1.67 -0.379 -0.696
#> 6 -1.02 -0.686 1.43
#> 7 -0.291 -0.0728 0.336
#> 8 -0.507 0.350 1.70
#> 9 -0.707 0.961 -0.493
#> 10 0.0459 -0.299 -0.0113
end_df <- df %>%
mutate( "A-B" = A-B,
"A-C" = A-C,
"B-A" = B-A,
"B-C" = B-C,
"C-A" = C-A,
"C-B" = C-B)
print(end_df)
#> # A tibble: 10 × 9
#> A B C `A-B` `A-C` `B-A` `B-C` `C-A` `C-B`
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 -0.292 1.27 0.783 -1.56 -1.08 1.56 0.482 1.08 -0.482
#> 2 -1.11 0.254 -0.410 -1.37 -0.703 1.37 0.664 0.703 -0.664
#> 3 2.05 1.67 1.35 0.380 0.702 -0.380 0.321 -0.702 -0.321
#> 4 1.31 0.0329 -1.29 1.28 2.60 -1.28 1.33 -2.60 -1.33
#> 5 -1.67 -0.379 -0.696 -1.29 -0.975 1.29 0.317 0.975 -0.317
#> 6 -1.02 -0.686 1.43 -0.334 -2.44 0.334 -2.11 2.44 2.11
#> 7 -0.291 -0.0728 0.336 -0.218 -0.627 0.218 -0.409 0.627 0.409
#> 8 -0.507 0.350 1.70 -0.857 -2.20 0.857 -1.35 2.20 1.35
#> 9 -0.707 0.961 -0.493 -1.67 -0.215 1.67 1.45 0.215 -1.45
#> 10 0.0459 -0.299 -0.0113 0.345 0.0572 -0.345 -0.288 -0.0572 0.288
Created on 2022-09-05 by the reprex package (v2.0.1)
You can get a list of all of the pairs of names, and then create a list of columns of the original dataframe mutated, the bind them:
pairs <- expand.grid(names(df), names(df)) %>%
filter(Var1 != Var2)
map2(pairs$Var1, pairs$Var2, function(x, y) as_tibble_col(df[[x]] - df[[y]], str_c(x, "-", y))) %>%
bind_cols(df, .)
# # A tibble: 10 × 9
# A B C `B-A` `C-A` `A-B` `C-B` `A-C` `B-C`
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 0.199 0.110 0.0148 -0.0895 -0.184 0.0895 -0.0948 0.184 0.0948
# 2 -0.851 -0.413 0.338 0.438 1.19 -0.438 0.751 -1.19 -0.751
# 3 -1.13 0.112 -1.97 1.24 -0.835 -1.24 -2.08 0.835 2.08
# 4 0.597 -2.89 -2.32 -3.49 -2.92 3.49 0.572 2.92 -0.572
# 5 -1.10 0.0953 0.996 1.19 2.09 -1.19 0.900 -2.09 -0.900
# 6 0.0191 0.500 1.17 0.481 1.15 -0.481 0.667 -1.15 -0.667
# 7 0.416 0.949 -0.865 0.533 -1.28 -0.533 -1.81 1.28 1.81
# 8 1.84 -1.66 -1.39 -3.50 -3.23 3.50 0.267 3.23 -0.267
# 9 0.406 -1.48 -1.33 -1.89 -1.74 1.89 0.149 1.74 -0.149
# 10 0.393 -0.491 -0.139 -0.884 -0.532 0.884 0.352 0.532 -0.352
I've got a tibble that I'm struggling to turn into a tsibble.
# A tibble: 13 x 8
year `Administration, E~ `All Staff` `Ambulance staff` `Healthcare Assi~ `Medical and De~ `Nursing, Midwife~
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2009 3.97 5.08 7.16 6.94 1.36 6.19
2 2010 4.12 5.07 6.89 7.02 1.41 6.02
3 2011 4.06 5.03 6.69 7.06 1.36 6.02
4 2012 4.40 5.40 7.79 7.48 1.52 6.44
5 2013 4.28 5.35 8.19 7.46 1.48 6.44
6 2014 4.45 5.56 8.87 7.82 1.53 6.67
7 2015 4.30 5.29 6.86 7.54 1.44 6.30
8 2016 4.21 5.15 7.56 7.15 1.66 6.17
9 2017 4.33 5.13 7.32 7.20 1.69 6.04
10 2018 4.58 5.30 7.96 7.00 1.73 6.38
11 2019 4.71 5.52 7.66 7.96 1.94 6.65
12 2020 4.69 5.98 7.49 8.37 2.11 7.56
13 2021 4.19 5.72 9.62 8.47 1.71 7.29
# ... with 1 more variable: Scientific, Therapeutic and Technical staff <dbl>
How would I turn this into a tsibble so that I can plot graphs with ggplot2?
When trying as_tsibble()
absence_ts <- as_tsibble(absence, key = absence$All Staff, index = absence$year)
it comes up with the following error:
Error: Must subset columns with a valid subscript vector. x Can't convert from <double> to <integer> due to loss of precision.
Someone here already kindly provided part of the following code:
library(dplyr)
set.seed(12345)
df1 = data.frame(a=c(rep("a",8), rep("b",5), rep("c",7), rep("d",10)),
b=rnorm(30, 6, 2),
c=rnorm(30, 12, 3.5),
d=rnorm(30, 8, 3)
)
df2 = data.frame(b= 1.5,
c= 13,
d= 0.34
)
df1_z <- df1 %>%
group_by(a) %>%
mutate(across(b:d, list(zscore = ~as.numeric(scale(.))))) %>%
ungroup %>%
mutate(total = rowSums(select(., ends_with('zscore'))))
This was exactly what I wanted at the time, but now I would like something slightly different. In df1_z, instead of the values in the last column called "total", I would like this value to be the sum of the multiplications of the values in the _zscore column and the corresponding values in df2, so: b_zscore x 1.5 + c_zscore x 13 + d_zscore x 0.34.
For example, the first value would be 0.6971403 x 1.5 + 0.100595417 x 13 + 0.01790090 x 0.34 = 2.359537177. Expected outcome for the new total column:
total
2.359537177
16.04147765
13.64141872
9.146152274
-3.380574542
-5.55439223
etc...
How to modify above code to get this result in the new "total" column of df1_z?
You could use the crossprod function:
df1 %>%
group_by(a) %>%
mutate(across(b:d, list(zscore = ~as.numeric(scale(.))))) %>%
ungroup %>%
mutate(total = c(crossprod(t(select(., ends_with('zscore'))),t(df2))))
# A tibble: 30 x 8
a b c d b_zscore c_zscore d_zscore total
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 a 7.17 14.8 8.45 0.697 0.101 0.0179 2.36
2 a 7.42 19.7 3.97 0.841 1.17 -1.14 16.0
3 a 5.78 19.2 9.66 -0.108 1.05 0.332 13.6
4 a 5.09 17.7 12.8 -0.508 0.732 1.14 9.15
5 a 7.21 12.9 6.24 0.721 -0.329 -0.555 -3.38
6 a 2.36 13.7 2.50 -2.09 -0.146 -1.52 -5.55
7 a 7.26 10.9 10.7 0.749 -0.774 0.593 -8.74
8 a 5.45 6.18 12.8 -0.302 -1.80 1.14 -23.5
9 b 5.43 18.2 9.55 -0.445 1.12 1.34 14.4
10 b 4.16 12.1 4.11 -1.06 0.0776 -1.02 -0.933
# ... with 20 more rows
Another option:
library(tidyverse)
df1 %>%
group_by(a) %>%
mutate(across(b:d, list(zscore = ~as.numeric(scale(.))))) %>%
ungroup %>%
mutate(total = rowSums(map2_dfc(select(., contains('zscore')), df2, `*`)))
Output:
# A tibble: 30 x 8
a b c d b_zscore c_zscore d_zscore total
<fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 a 7.17 14.8 8.45 0.697 0.101 0.0179 2.36
2 a 7.42 19.7 3.97 0.841 1.17 -1.14 16.0
3 a 5.78 19.2 9.66 -0.108 1.05 0.332 13.6
4 a 5.09 17.7 12.8 -0.508 0.732 1.14 9.15
5 a 7.21 12.9 6.24 0.721 -0.329 -0.555 -3.38
6 a 2.36 13.7 2.50 -2.09 -0.146 -1.52 -5.55
7 a 7.26 10.9 10.7 0.749 -0.774 0.593 -8.74
8 a 5.45 6.18 12.8 -0.302 -1.80 1.14 -23.5
9 b 5.43 18.2 9.55 -0.445 1.12 1.34 14.4
10 b 4.16 12.1 4.11 -1.06 0.0776 -1.02 -0.933
# ... with 20 more rows
A data frame have contains three variables:
from - character - the name of a measure
to - character - the name of another measure
covariance - numeric - the covariance between the two measures
Here's a link to the data. Below is the result of head(have):
from to covariance
a_airportscreener a_airportscreener 4.419285714
a_airportscreener e_airportscreener -1.328928571
a_airportscreener g_airportscreener -3.038928571
a_airportscreener p_airportscreener 0.3292857143
a_airportscreener pres_airportscreener 0.6452857143
a_automechanic a_automechanic 2.635535714
a_automechanic e_automechanic -0.3439285714
I want to create a data frame called need that records the covariances between prefixed versions of the same job title in separate columns. For example, the first row would look like:
job a_a a_e a_g a_p a_pres e_a e_e e_g e_p e_pres g_a g_e g_g g_p g_pres p_a p_e p_g p_p p_pres pres_a pres_e pres_g pres_p pres_pres
airportscreener 4.419 -1.329 -3.039 0.329 0.645 -1.329 2.333 2.441 -1.015 0.659 -3.039 2.441 14.253 3.070 0.977 0.329 -1.015 3.070 6.505 0.366 0.645 0.659 0.977 0.366 0.697
(I rounded the values in have to keep the example of need on the page, but this is not part of the question.)
Try this approach on your complete data
library(tidyverse)
cov_mat %>%
rownames_to_column() %>%
pivot_longer(cols =-rowname) %>%
mutate(key = paste0(sub("_.*", "\\1", name), "_", sub("_.*", "\\1", rowname)),
rowname = sub(".*_(.*)_.*", "\\1", rowname),
name = sub(".*_(.*)_.*", "\\1", name)) %>%
filter(rowname == name) %>%
select(-rowname) %>%
pivot_wider(names_from = key, values_from = value)
# A tibble: 58 x 26
# name a_a e_a g_a p_a pres_a a_e e_e g_e .....
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 airp… 4.42 -1.33 -3.04 0.329 0.645 -1.33 2.33 2.44
# 2 auto… 2.64 -0.344 6.26 -0.712 -0.595 -0.344 0.499 0.113
# 3 auto… 2.67 -0.466 2.36 -0.106 -0.878 -0.466 0.72 -5.95
# 4 blkj… 2.50 0.529 -6.79 0.0129 -0.0666 0.529 1.56 -8.58
# 5 blkt… 1.04 -0.00143 4.86 0.993 -0.194 -0.00143 0.229 -1.69
# 6 brid… 4.15 2.05 -11.5 -1.21 0.453 2.05 2.05 -9.09
# 7 cart… 1.79 0.458 -4.22 0.451 -0.410 0.458 1.23 3.54
# 8 chem… 2.29 0.479 12.4 -0.0384 -0.164 0.479 0.811 2.15
# 9 clth… 4.10 1.15 -18.9 1.77 0.728 1.15 1.7 -4.00
#10 coag… 2.23 -0.382 -7.79 -0.0190 0.460 -0.382 0.342 4.11
This is not as elegant as #Ronak Shah's answer, but I had been working on something similar, and thought it might be worth sharing for someone out there. It also uses pivot_longer and pivot_wider in latest tidyr.
library(readxl)
library(tidyr)
library(dplyr)
df <- read_excel("cov_data.xlsx")
need <- df %>%
separate(from, into = c('from1', 'job'), sep = '_') %>%
separate(to, into = 'to1', extra = 'drop', sep = '_') %>%
unite(comb1, from1, to1, remove = F) %>%
unite(comb2, to1, from1, remove = T) %>%
pivot_longer(c(comb1, comb2)) %>%
dplyr::select(-name) %>%
distinct() %>%
pivot_wider(names_from = value, values_from = covariance) %>%
dplyr::select(job, order(colnames(.)))
# A tibble: 58 x 26
job a_a a_e a_g a_p a_pres e_a e_e e_g e_p e_pres g_a g_e g_g g_p g_pres p_a p_e p_g
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 airp… 4.42 -1.33 -3.04 0.329 0.645 -1.33 2.33 2.44 -1.02 0.659 -3.04 2.44 14.3 3.07 0.977 0.329 -1.02 3.07
2 auto… 2.64 -0.344 6.26 -0.712 -0.595 -0.344 0.499 0.113 0.891 0.321 6.26 0.113 203. 5.16 0.645 -0.712 0.891 5.16
3 auto… 2.67 -0.466 2.36 -0.106 -0.878 -0.466 0.72 -5.95 0.431 0.194 2.36 -5.95 252. 4.65 -4.64 -0.106 0.431 4.65
4 blkj… 2.50 0.529 -6.79 0.0129 -0.0666 0.529 1.56 -8.58 -0.703 0.384 -6.79 -8.58 247. 2.11 1.68 0.0129 -0.703 2.11
5 blkt… 1.04 -0.00143 4.86 0.993 -0.194 -0.00143 0.229 -1.69 0.276 -0.0351 4.86 -1.69 260. 14.3 2.44 0.993 0.276 14.3
6 brid… 4.15 2.05 -11.5 -1.21 0.453 2.05 2.05 -9.09 -0.342 0.576 -11.5 -9.09 326. -2.07 0.992 -1.21 -0.342 -2.07
7 cart… 1.79 0.458 -4.22 0.451 -0.410 0.458 1.23 3.54 0.43 -0.0674 -4.22 3.54 478. 10.5 -1.21 0.451 0.43 10.5
8 chem… 2.29 0.479 12.4 -0.0384 -0.164 0.479 0.811 2.15 0.784 0.0469 12.4 2.15 238. 2.58 -2.05 -0.0384 0.784 2.58
9 clth… 4.10 1.15 -18.9 1.77 0.728 1.15 1.7 -4.00 1.65 0.133 -18.9 -4.00 193. -17.1 -6.81 1.77 1.65 -17.1
10 coag… 2.23 -0.382 -7.79 -0.0190 0.460 -0.382 0.342 4.11 0.161 0.0398 -7.79 4.11 444. 1.96 -7.55 -0.0190 0.161 1.96
I am wondering dplyr provide any useful utilities to conduct quick data aggregation on land surface temperature time series. However, I already extracted gridded data of Germany from E-OBS dataset (E-OBS grid data) and rendered this extracted raster grid in tabular data with excel format. Now, in newly exported data, data has shown with a respective geo-coordinate pair with 15 years temperature observation (1012 rows ,15x365/366 columns). Plase take a look the data on the fly: time series data.
Here is what I want to do, the data on the fly time series data, I want to do data aggregation by year because original observation was done by daily level observation. In particular, each geo-coordinate pair, I intend to calculate an average yearly temperature for each year and all operation goes to 15 years. More specifically, after the aggregation done, I want to put the result in new data.frame where original geo-coordinate pair come along, but add new column such as 1980_avg_temp, 1981_avg_temp,1982_avg_temp` and so on. So I want to reduce data dimension by column, introducing new aggregation column where the yearly average temperature will be added.
How can I get this done by using dplyr or data.table for excel data? Any easier way to make this data aggregation operation on attached data on the fly time series data? Any thought?
i tried that:
library(tidyverse)
library(readxl)
df <- read_excel("YOUR_XLSX_FILE")
df %>%
gather(date, temp, -x, -y) %>%
separate(date, c("year", "month", "day")) %>%
separate(year, c("trash", "year"), sep = "X") %>%
select(-trash) %>%
group_by(year, x, y) %>%
summarise(avg_temp=mean(temp)) %>%
spread(year, avg_temp)
output is:
# A tibble: 19 x 17
# Groups: x [11]
x y `1980` `1981` `1982` `1983` `1984` `1985` `1986` `1987` `1988` `1989` `1990` `1991`
* <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 8.88 54.4 7.79 8.02 8.76 9.20 8.32 7.51 7.88 7.43 9.20 9.63 9.76 8.55
2 8.88 54.9 7.54 7.61 8.41 8.84 8.15 7.15 7.53 7.15 8.97 9.51 9.55 8.42
3 9.12 54.4 7.65 7.86 8.62 9.05 8.17 7.34 7.70 7.28 9.01 9.46 9.60 8.37
4 9.12 54.6 7.44 7.59 8.38 8.81 8.02 7.11 7.50 7.13 8.88 9.36 9.47 8.31
5 9.12 54.9 7.33 7.36 8.25 8.67 8.02 7.05 7.49 7.10 8.91 9.48 9.55 8.41
6 9.38 54.4 7.69 7.91 8.61 9.02 8.15 7.31 7.69 7.24 8.98 9.49 9.64 8.35
7 9.38 54.6 7.45 7.62 8.46 8.85 8.05 7.16 7.59 7.18 8.92 9.48 9.61 8.41
8 9.38 54.9 7.24 7.29 8.21 8.62 7.95 7.04 7.56 7.15 8.94 9.57 9.66 8.53
9 9.62 54.4 7.65 7.90 8.60 9.01 8.14 7.24 7.64 7.16 8.93 9.52 9.65 8.33
10 9.62 54.6 7.39 7.60 8.45 8.82 8.01 7.10 7.56 7.12 8.86 9.46 9.55 8.34
11 9.62 54.9 7.28 7.38 8.28 8.69 7.98 7.07 7.61 7.18 8.96 9.60 9.68 8.54
12 9.88 54.4 7.70 8.00 8.69 9.14 8.23 7.36 7.76 7.23 9.03 9.63 9.73 8.41
13 9.88 54.6 7.40 7.65 8.46 8.87 8.05 7.11 7.58 7.12 8.87 9.47 9.50 8.30
14 10.1 54.4 7.76 8.12 8.78 9.21 8.30 7.49 7.90 7.34 9.08 9.69 9.79 8.52
15 10.4 54.4 7.66 8.09 8.70 9.17 8.23 7.41 7.87 7.29 9.03 9.70 9.82 8.60
16 11.1 54.9 7.61 8.14 8.74 9.14 8.33 7.32 7.92 7.22 9.17 9.93 10.1 8.86
17 11.4 54.9 7.59 8.17 8.74 9.14 8.32 7.29 7.92 7.20 9.17 9.95 10.1 8.87
18 11.9 54.9 7.54 8.15 8.71 9.10 8.28 7.19 7.85 7.15 9.10 9.92 10.1 8.84
19 12.1 54.9 7.52 8.12 8.69 9.08 8.27 7.12 7.80 7.11 9.05 9.91 10.0 8.82
# ... with 3 more variables: `1992` <dbl>, `1993` <dbl>, `1994` <dbl>
to show you that the geocoordinates are not changed in a tibble (it's just rounded), add as.data.frame() at the end of the pipe and look at your data: an example:
df %>%
gather(date, temp, -x, -y) %>%
separate(date, c("year", "month", "day")) %>%
separate(year, c("trash", "year"), sep = "X") %>%
select(-trash) %>%
group_by(year, x, y) %>%
summarise(avg_temp=mean(temp)) %>%
spread(year, avg_temp) %>%
as.data.frame() %>% # add this
head()
output is:
# x y 1980 1981 1982 1983 1984 1985 1986 1987 1988
# 1 8.875 54.375 7.792978 8.021342 8.762274 9.203424 8.317131 7.505370 7.879068 7.427260 9.197431
# 2 8.875 54.875 7.536229 7.607507 8.414877 8.841260 8.154945 7.151890 7.532164 7.147945 8.969781
# 3 9.125 54.375 7.651393 7.862466 8.620904 9.052630 8.169262 7.337589 7.701205 7.282657 9.014590
# 4 9.125 54.625 7.435983 7.590548 8.381753 8.808904 8.019399 7.109096 7.499589 7.127370 8.875656
# 5 9.125 54.875 7.332978 7.363370 8.247205 8.669370 8.024645 7.045425 7.487424 7.098849 8.911776
# 6 9.375 54.375 7.693907 7.914630 8.612438 9.022055 8.150164 7.305068 7.688164 7.242274 8.984207
# 1989 1990 1991 1992 1993 1994
# 1 9.625781 9.760931 8.550356 9.678907 8.208109 9.390904
# 2 9.513863 9.552767 8.420109 9.425328 8.010082 9.134466
# 3 9.462959 9.602876 8.374575 9.465164 8.052794 9.207041
# 4 9.358986 9.473178 8.305863 9.353743 7.935507 9.050109
# 5 9.478192 9.545781 8.412329 9.403005 7.998877 9.074740
# 6 9.493205 9.635561 8.352740 9.385819 8.017260 9.184959
This works on the data that you provided.
library(tidyverse)
library(lubridate)
demo_data %>%
gather(date, temp, -x, -y) %>%
mutate(date = ymd(str_remove(date, "X"))) %>%
mutate(year = year(date)) %>%
group_by(x, y, year) %>%
summarise_at(vars(temp), mean, na.rm = TRUE) %>%
spread(year, temp)
# # A tibble: 19 x 17
# # Groups: x, y [19]
# x y `1980` `1981` `1982` `1983` `1984` `1985` `1986` `1987` `1988`
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 8.88 54.4 7.79 8.02 8.76 9.20 8.32 7.51 7.88 7.43 9.20
# 2 8.88 54.9 7.54 7.61 8.41 8.84 8.15 7.15 7.53 7.15 8.97
# 3 9.12 54.4 7.65 7.86 8.62 9.05 8.17 7.34 7.70 7.28 9.01
# 4 9.12 54.6 7.44 7.59 8.38 8.81 8.02 7.11 7.50 7.13 8.88
# 5 9.12 54.9 7.33 7.36 8.25 8.67 8.02 7.05 7.49 7.10 8.91
# 6 9.38 54.4 7.69 7.91 8.61 9.02 8.15 7.31 7.69 7.24 8.98
# 7 9.38 54.6 7.45 7.62 8.46 8.85 8.05 7.16 7.59 7.18 8.92
# 8 9.38 54.9 7.24 7.29 8.21 8.62 7.95 7.04 7.56 7.15 8.94
# 9 9.62 54.4 7.65 7.90 8.60 9.01 8.14 7.24 7.64 7.16 8.93
# 10 9.62 54.6 7.39 7.60 8.45 8.82 8.01 7.10 7.56 7.12 8.86
# 11 9.62 54.9 7.28 7.38 8.28 8.69 7.98 7.07 7.61 7.18 8.96
# 12 9.88 54.4 7.70 8.00 8.69 9.14 8.23 7.36 7.76 7.23 9.03
# 13 9.88 54.6 7.40 7.65 8.46 8.87 8.05 7.11 7.58 7.12 8.87
# 14 10.1 54.4 7.76 8.12 8.78 9.21 8.30 7.49 7.90 7.34 9.08
# 15 10.4 54.4 7.66 8.09 8.70 9.17 8.23 7.41 7.87 7.29 9.03
# 16 11.1 54.9 7.61 8.14 8.74 9.14 8.33 7.32 7.92 7.22 9.17
# 17 11.4 54.9 7.59 8.17 8.74 9.14 8.32 7.29 7.92 7.20 9.17
# 18 11.9 54.9 7.54 8.15 8.71 9.10 8.28 7.19 7.85 7.15 9.10
# 19 12.1 54.9 7.52 8.12 8.69 9.08 8.27 7.12 7.80 7.11 9.05
# # ... with 6 more variables: `1989` <dbl>, `1990` <dbl>, `1991` <dbl>,
# # `1992` <dbl>, `1993` <dbl>, `1994` <dbl>