Applying SetNames to a list of data frames - r

I'm running into an issue applying new names to a list of data frames. I'm using quantmod to pull stock data, and then calculating the 7-Day moving average in this example. I can create the new columns within the list of data frames, but when I go to rename them using lapply and setNames it is only returning the newly renamed column and not any of the old data in each data frame.
require(quantmod)
require(zoo)
# Select Symbols
symbols <- c('AAPL','GOOG')
# Set start Date
start_date <- '2017-01-01'
# Get data and put data xts' into a list. Create empty list and then loop through to add all symbol data
stocks <- list()
for (i in 1:length(symbols)) {
stocks[[i]] <- getSymbols(symbols[i], src = 'google', from = start_date, auto.assign = FALSE)
}
##### Create the 7 day moving average for each stock in the stocks list #####
stocks <- lapply(stocks, function(x) cbind(x, rollmean(x[,4], 7, align = "right")))
Sample Output:
[[1]]
AAPL.Open AAPL.High AAPL.Low AAPL.Close AAPL.Volume AAPL.Close.1
2017-01-03 115.80 116.33 114.76 116.15 28781865 NA
2017-01-04 115.85 116.51 115.75 116.02 21118116 NA
2017-01-05 115.92 116.86 115.81 116.61 22193587 NA
2017-01-06 116.78 118.16 116.47 117.91 31751900 NA
2017-01-09 117.95 119.43 117.94 118.99 33561948 NA
2017-01-10 118.77 119.38 118.30 119.11 24462051 NA
2017-01-11 118.74 119.93 118.60 119.75 27588593 117.7914
2017-01-12 118.90 119.30 118.21 119.25 27086220 118.2343
2017-01-13 119.11 119.62 118.81 119.04 26111948 118.6657
[[2]]
GOOG.Open GOOG.High GOOG.Low GOOG.Close GOOG.Volume GOOG.Close.1
2017-01-03 778.81 789.63 775.80 786.14 1657268 NA
2017-01-04 788.36 791.34 783.16 786.90 1072958 NA
2017-01-05 786.08 794.48 785.02 794.02 1335167 NA
2017-01-06 795.26 807.90 792.20 806.15 1640170 NA
2017-01-09 806.40 809.97 802.83 806.65 1274645 NA
2017-01-10 807.86 809.13 803.51 804.79 1176780 NA
2017-01-11 805.00 808.15 801.37 807.91 1065936 798.9371
2017-01-12 807.14 807.39 799.17 806.36 1353057 801.8257
2017-01-13 807.48 811.22 806.69 807.88 1099215 804.8229
I would like to change the "AAPL.Close.1" and "GOOG.Close.1" to say "AAPL.Close.7.Day.MA" and "GOOG.Close.7.Day.MA" respectively (for however many symbols that I choose at the top).
The closest that I've gotten is:
stocks <- lapply(stocks[], function(x) setNames(x[,6], paste0(names(x[,4]), ".7.Day.MA")))
This is correctly naming the new columns, but now my stocks list only contains that single column for each ticker:
[[1]]
AAPL.Close.7.Day.MA
2017-01-03 NA
2017-01-04 NA
2017-01-05 NA
2017-01-06 NA
2017-01-09 NA
2017-01-10 NA
2017-01-11 117.7914
2017-01-12 118.2343
2017-01-13 118.6657
[[2]]
GOOG.Close.7.Day.MA
2017-01-03 NA
2017-01-04 NA
2017-01-05 NA
2017-01-06 NA
2017-01-09 NA
2017-01-10 NA
2017-01-11 798.9371
2017-01-12 801.8257
2017-01-13 804.8229
Why is the setNames function removing the original columns?

Almost there:
N = 10 #number of pseudorandom numbers
df1 <- data.frame(a=runif(N),b=sample(N))#1st data frame
df2 <- data.frame(c=rnorm(N),google=df1$b^2,e=df1$a^3)#2nd data frame
stocks<-list(df1,df2)# create the list
lapply(stocks,names) # get the names of each list element (data.frame)
[[1]]
[1] "a" "b"
[[2]]
[1] "c" "google" "e"
Since we are using a function we need to use the <<- in order to overwrite the initial object stocks.
lapply(seq_along(1:length(stocks)),function(x) names(stocks[[x]])<<-gsub(pattern = "google",replacement = "google2",x=names(stocks[[x]])))#replacing the string google
[[1]]
[1] "a" "b"
[[2]]
[1] "c" "google2" "e"
Additionally (verification) stocks contains the new names:
> stocks
[[1]]
a b
1 0.73826897 3
2 0.35627664 8
3 0.89060134 7
4 0.72629312 10
5 0.97069742 4
6 0.12530931 2
7 0.65744257 9
8 0.06218019 1
9 0.67322891 6
10 0.66128204 5
[[2]]
c google2 e
1 -0.5272267 9 0.402386917
2 0.6993945 64 0.045223278
3 0.3707304 49 0.706398932
4 -0.2371541 100 0.383120861
5 1.5073834 16 0.914643019
6 0.4098821 4 0.001967660
7 -0.3014211 81 0.284166886
8 0.3248919 1 0.000240412
9 1.2757740 36 0.305132358
10 1.5938208 25 0.289174620

Related

Efficient way for insertion of multiple rows at given indices & with repetitions

I have a data frame (DATA) with > 2 million rows (observations at different time points) and another data frame (INSERTION) which gives info about missing observations. The latter object contains 2 columns: 1st column with row indices after which empty (NA) rows should be inserted into DATA, and 2nd column with the number of empty rows that should be inserted at that position.
Below is a minimum working example:
DATA <- data.frame(datetime=strptime(as.character(c(201301011700, 201301011701, 201301011703, 201301011704, 201301011705, 201301011708, 201301011710, 201301011711, 201301011715, 201301011716, 201301011718, 201301011719, 201301011721, 201301011722, 201301011723, 201301011724, 201301011725, 201301011726, 201301011727, 201301011729, 201301011730, 201301011731, 201301011732, 201301011733, 201301011734, 201301011735, 201301011736, 201301011737, 201301011738, 201301011739)), format="%Y%m%d%H%M"), var1=rnorm(30), var2=rnorm(30), var3=rnorm(30))
INSERTION <- data.frame(index=c(2, 5, 6, 8, 10, 12, 19), repetition=c(1, 2, 1, 3, 1, 1, 1))
Now I'm looking for an efficient (and thus fast) way to insert the n empty rows at given row indices of the original file. How can I additionally complement the correct datetimes for these empty rows (add 1 minute for every new row; however, every weekend and bank holidays there are some regular gaps which are not contained in INSERTION!)?
Any help is appreciated!
Looking at the pattern in INSERTION and matching it with DATA most probably you are trying to fill the missing minutes in datetime of DATA. You can create a dataframe with every minute sequence from min to max value of datetime from DATA and then merge
merge(data.frame(datetime = seq(min(DATA$datetime), max(DATA$datetime),
by = "1 min")),DATA, all.x = TRUE)
# datetime var1 var2 var3
#1 2013-01-01 17:00:00 -1.063326 0.11925 -0.788622
#2 2013-01-01 17:01:00 1.263185 0.24369 -0.502199
#3 2013-01-01 17:02:00 NA NA NA
#4 2013-01-01 17:03:00 -0.349650 1.23248 1.496061
#5 2013-01-01 17:04:00 -0.865513 -0.51606 -1.137304
#6 2013-01-01 17:05:00 -0.236280 -0.99251 -0.179052
#7 2013-01-01 17:06:00 NA NA NA
#8 2013-01-01 17:07:00 NA NA NA
#9 2013-01-01 17:08:00 -0.197176 1.67570 1.902362
#10 2013-01-01 17:09:00 NA NA NA
#...
#...
Or using similar logic with tidyr::complete
tidyr::complete(DATA, datetime = seq(min(datetime), max(datetime), by = "1 min"))
If performance is a factor on a large data frame, this approach avoids joins:
# Generate new data.frame containing missing datetimes
tmp <- data.frame(datetime = DATA$datetime[with(INSERTION, rep(index, repetition))] + sequence(INSERTION$repetition)*60)
# Create variables filled with NA to match main data.frame
tmp[setdiff(names(DATA), names(tmp))] <- NA
# Bind and sort
new_df <- rbind(DATA, tmp)
new_df <- new_df[order(new_df$datetime),]
head(new_df, 15)
datetime var1 var2 var3
1 2013-01-01 17:00:00 0.98789253 0.68364933 0.70526985
2 2013-01-01 17:01:00 -0.68307496 0.02947599 0.90731512
31 2013-01-01 17:02:00 NA NA NA
3 2013-01-01 17:03:00 -0.60189915 -1.00153188 0.06165694
4 2013-01-01 17:04:00 -0.87329313 -1.81532302 -2.04930719
5 2013-01-01 17:05:00 -0.58713154 -0.42313098 0.37402224
32 2013-01-01 17:06:00 NA NA NA
33 2013-01-01 17:07:00 NA NA NA
6 2013-01-01 17:08:00 2.41350911 -0.13691754 1.57618578
34 2013-01-01 17:09:00 NA NA NA
7 2013-01-01 17:10:00 -0.38961552 0.83838954 1.18283382
8 2013-01-01 17:11:00 0.02290672 -2.10825367 0.87441448
35 2013-01-01 17:12:00 NA NA NA
36 2013-01-01 17:13:00 NA NA NA
37 2013-01-01 17:14:00 NA NA NA

Compute the variance of a moving window in a dataframe

Hey I want to compute the variance of column. My dataframe is sorted by the as.Date() format. Here you can see a snippet of it:
Date USA ARG BRA CHL COL MEX PER
2012-04-01 1 0.2271531 0.4970299 0.001956865 0.0005341452 0.07341428 NA
2012-05-01 1 0.2218906 0.4675895 0.001911405 0.0005273186 0.07026524 NA
2012-06-01 1 0.2054076 0.4531661 0.001891352 0.0005292575 0.06897811 NA
2012-07-01 1 0.2033470 0.4596730 0.001950686 0.0005312600 0.07269619 NA
2012-08-01 1 0.1993882 0.4596039 0.001980537 0.0005271514 0.07268987 NA
2012-09-01 1 0.1967152 0.4593390 0.002011212 0.0005305549 0.07418838 NA
2012-10-01 1 0.1972730 0.4597584 0.002002203 0.0005284380 0.07428555 NA
2012-11-01 1 0.1937618 0.4519187 0.001979805 0.0005238670 0.07329656 NA
2012-12-01 1 0.1854037 0.4500448 0.001993309 0.0005323795 0.07453949 NA
2013-01-01 1 0.1866007 0.4607501 0.002013112 0.0005412329 0.07551040 NA
2013-02-01 1 0.1855950 0.4712956 0.002011067 0.0005359562 0.07554661 NA
The dataframe ranges from january 2004 up to dezember 2018. But I do not want to compute the compute the variance of the whole columnes.
I want to compute the variance of one year (or 12 values) which is moving month by month.
I do not really know how to start. I can imagine using the zoo package and the rollapply. But here the problem is (I think) that R computes uses the values around it and not past it?
I also found this question: R: create a data frame out of a rolling window, so my idea was to get rid of the date column. It is easy to build the matrix, but now I do not understand how to apply the variance function to my data...
Is there a smart way to compute it all in one and also using the information of the date? If not, I also appreciate any other solution from you!
We can use rollappyr to perform the rolling computations. Since there are only 11 rows in the data in the question we can't take 12 month averages but using 3 month averages instead we can illustrate it. Remove fill = NA if you want to omit the NA rows or replace it with partial = TRUE if you want variances using fewer than 12 near the beginning. If you want a data frame result use fortify.zoo(zv) .
library(zoo)
z <- read.zoo(DF)
zv <- rollapplyr(z, 3, var, fill = NA)
zv
giving this zoo object:
USA ARG BRA CHL COL MEX PER
2012-04-01 NA NA NA NA NA NA NA
2012-05-01 NA NA NA NA NA NA NA
2012-06-01 0 1.287083e-04 4.998008e-04 1.126781e-09 1.237524e-11 5.208793e-06 NA
2012-07-01 0 1.033001e-04 5.217420e-05 9.109406e-10 3.883996e-12 3.565057e-06 NA
2012-08-01 0 9.358558e-06 1.396497e-05 2.060928e-09 4.221043e-12 4.600220e-06 NA
2012-09-01 0 1.113297e-05 3.108380e-08 9.159058e-10 4.826929e-12 7.453672e-07 NA
2012-10-01 0 1.988357e-06 4.498977e-08 2.485889e-10 2.953403e-12 8.001948e-07 NA
2012-11-01 0 3.560373e-06 1.944961e-05 2.615387e-10 1.168389e-11 2.971477e-07 NA
2012-12-01 0 3.717777e-05 2.655440e-05 1.271886e-10 1.814869e-11 4.312436e-07 NA
2013-01-01 0 2.042867e-05 3.268476e-05 2.806455e-10 7.540331e-11 1.231438e-06 NA
2013-02-01 0 4.134729e-07 1.129013e-04 1.186146e-10 1.983651e-11 3.263780e-07 NA
We can plot the log of the variances like this:
library(ggplot2)
autoplot(log(zv), facet = NULL) + geom_point() + ylab("log(var(.))")
Note
We assume that the starting point is the data frame generated reproducibly below:
Lines <- "Date USA ARG BRA CHL COL MEX PER
2012-04-01 1 0.2271531 0.4970299 0.001956865 0.0005341452 0.07341428 NA
2012-05-01 1 0.2218906 0.4675895 0.001911405 0.0005273186 0.07026524 NA
2012-06-01 1 0.2054076 0.4531661 0.001891352 0.0005292575 0.06897811 NA
2012-07-01 1 0.2033470 0.4596730 0.001950686 0.0005312600 0.07269619 NA
2012-08-01 1 0.1993882 0.4596039 0.001980537 0.0005271514 0.07268987 NA
2012-09-01 1 0.1967152 0.4593390 0.002011212 0.0005305549 0.07418838 NA
2012-10-01 1 0.1972730 0.4597584 0.002002203 0.0005284380 0.07428555 NA
2012-11-01 1 0.1937618 0.4519187 0.001979805 0.0005238670 0.07329656 NA
2012-12-01 1 0.1854037 0.4500448 0.001993309 0.0005323795 0.07453949 NA
2013-01-01 1 0.1866007 0.4607501 0.002013112 0.0005412329 0.07551040 NA
2013-02-01 1 0.1855950 0.4712956 0.002011067 0.0005359562 0.07554661 NA"
DF <- read.table(text = Lines, header = TRUE)

Create multiple lagged variables using a zoo object

I need to create 'n' number of variables with lags of the original variable from 1 to 'n' on the fly. Something like so :-
OrigVar
DatePeriod, value
2/01/2018,6
3/01/2018,4
4/01/2018,0
5/01/2018,2
6/01/2018,4
7/01/2018,1
8/01/2018,6
9/01/2018,2
10/01/2018,7
Lagged 1 variable
2/01/2018,NA
3/01/2018,6
4/01/2018,4
5/01/2018,0
6/01/2018,2
7/01/2018,4
8/01/2018,1
9/01/2018,6
10/01/2018,2
11/01/2018,7
Lagged 2 variable
2/01/2018,NA
3/01/2018,NA
4/01/2018,6
5/01/2018,4
6/01/2018,0
7/01/2018,2
8/01/2018,4
9/01/2018,1
10/01/2018,6
11/01/2018,2
12/01/2018,7
Lagged 3 variable
2/01/2018,NA
3/01/2018,NA
4/01/2018,NA
5/01/2018,6
6/01/2018,4
7/01/2018,0
8/01/2018,2
9/01/2018,4
10/01/2018,1
11/01/2018,6
12/01/2018,2
13/01/2018,7
and so on
I tried using the shift function and various other functions. Wtih most of them that worked for me, the lagged variables finished at the last date of the original variable. In other words, the length of the lagged variable is the same as that of the original variable.
What I am looking for the new lagged variable to be shifted down by the 'kth' lag and the data series to be extended by 'k' elements including the index.
The reason I need this is to be able to compute the value of the dependent variable using the regression coeffficients and the corresponding lagged variable value beyond the in-sample period
y1 <- Lag(ciresL1_usage_1601_1612, shift = 1)
head(y1)
2016-01-02 2016-01-03 2016-01-04 2016-01-05 2016-01-06 2016-01-07
NA -5171.051 -6079.887 -3687.227 -3229.453 -2110.368
y2 <- Lag(ciresL1_usage_1601_1612, shift = 2)
head(y2)
2016-01-02 2016-01-03 2016-01-04 2016-01-05 2016-01-06 2016-01-07
NA NA -5171.051 -6079.887 -3687.227 -3229.453
tail(y2)
2016-12-26 2016-12-27 2016-12-28 2016-12-29 2016-12-30 2016-12-31
-2316.039 -2671.185 -4100.793 -2043.020 -1147.798 1111.674
tail(ciresL1_usage_1601_1612)
2016-12-26 2016-12-27 2016-12-28 2016-12-29 2016-12-30 2016-12-31
-4100.793 -2043.020 -1147.798 1111.674 3498.729 2438.739
Is there a way to do it relatively easily. I know I can do it by looping and adding 'k' rows in a new vector and reloading the data in to this new vector appropriately shifting the data values in the new vector but I don't want to use that method unless I have to. I am quietly confident that there has to be a better way to do it than this!
By the way, the object is a zoo object with daily dates as the index.
Best regards
Deepak
Convert the input zoo object to zooreg and then use lag.zooreg like this:
library(zoo)
# test input
z <- zoo(1:10, as.Date("2008-01-01") + 0:9)
zr <- as.zooreg(z)
lag(zr, -(0:3))
giving:
lag0 lag-1 lag-2 lag-3
2008-01-01 1 NA NA NA
2008-01-02 2 1 NA NA
2008-01-03 3 2 1 NA
2008-01-04 4 3 2 1
2008-01-05 5 4 3 2
2008-01-06 6 5 4 3
2008-01-07 7 6 5 4
2008-01-08 8 7 6 5
2008-01-09 9 8 7 6
2008-01-10 10 9 8 7
2008-01-11 NA 10 9 8
2008-01-12 NA NA 10 9
2008-01-13 NA NA NA 10

Replace values in dataframe by matching dates of different lengths

I have 52 time series files with differing lengths for date. All have the same end date - 31-01-2017, but all 52 dataframes have different start dates.
'data': nRows
Date FLOW Modelled
01-01-1992 1.856 NA
02-01-1992 1.523 NA
03-01-1992 2.623 NA
04-01-1992 3.679 NA
...
31-12-2017
I also have a file with simulated FLOW values for each of the datasets in columns.
'Simulated': 20819 rows, 53 columns (including Date).
Date 1 2 3 ..52
01-01-1961 1.856 2.889 2.365
02-01-1961 1.523 3.536 4.624
03-01-1961 2.536 2.452 6.352
04-01-1961 3.486 4.267 3.685
...
31-12-2017
My question is I want to select each column from Simulated data (e.g column 1 corresponds to 'data' file 1) and fill the Modelled column of 'data' with the simulated values. Ideally this would loop through the 52 files based on a list of their names
The problem I am facing is when using left_join the error I get is
e.g. replacement has 20819 rows, data has 9657
when 'data' is a shorter than 'Simulated', and
e.g. replacement has 20819 rows, data has 22821
when 'data' is longer than 'Simulated'.
I have tried to use left_join of the dplyr package with no luck as dates are not matching up across 'data' and 'Simulated' dataframes.
library(dplyr)
df <-left_join(data, Simulated, by = c("Date"),all.x=TRUE)
I have formatted both 'data' and 'Simulated' dates using similar to Simulated$Date <- as.Date(with(Simulated, paste(Year, Month, Day, sep="-")), "%Y-%m-%d"). But I still get the error below when using left_join:
cannot join a Date object with an object that is not a Date object
A solution can be achieved using tidyverse and read.table. First read all data frames from all files in a list and then use dplyr::bind_rows to merge them in one dataframe.
#Get the file list
filelist = list.files(path = ".", pattern = ".*.txt", full.names = TRUE)
# Read all files in a list
ll <- lapply(filelist, FUN=read.table, header=TRUE, stringsAsFactors = FALSE)
# Read data from file containing simulate data
simulated <- read.table(file = "simulated.txt", header=TRUE, stringsAsFactors = FALSE)
library(tidyverse)
#Convert simulated data to long format and then join with other dataframes
simulated %>% mutate(Date = as.Date(Date, format = "%d-%m-%Y")) %>%
gather(df_num, SIM_FLOW, -Date) %>%
mutate(df_num = gsub("X(\\d+)", "\\1", df_num)) %>%
right_join(bind_rows(ll, .id="df_num") %>% mutate(Date = as.Date(Date, format = "%d-%m-%Y")),
by=c("df_num", "Date"))
# Date df_num SIM_FLOW FLOW Modelled
# 1 1992-01-01 1 1.86 1.86 NA
# 2 1992-01-02 1 NA 1.52 NA
# 3 1992-01-03 1 NA 2.62 NA
# 4 1992-01-04 1 NA 3.68 NA
# 5 1993-01-01 2 NA 11.86 NA
# 6 1993-01-02 2 3.54 11.52 NA
# 7 1993-01-03 2 NA 12.62 NA
# 8 1993-01-04 2 NA 13.68 NA
# 9 1994-01-01 3 NA 111.86 NA
# 10 1994-01-02 3 NA 111.52 NA
# 11 1994-01-03 3 6.35 112.62 NA
# 12 1994-01-04 3 NA 113.68 NA
Data:
simulated.txt
Date 1 2 3
01-01-1992 1.856 2.889 2.365
02-01-1993 1.523 3.536 4.624
03-01-1994 2.536 2.452 6.352
04-01-1902 3.486 4.267 3.685
File1.txt
Date FLOW Modelled
01-01-1992 1.856 NA
02-01-1992 1.523 NA
03-01-1992 2.623 NA
04-01-1992 3.679 NA
File2.txt
Date FLOW Modelled
01-01-1993 11.856 NA
02-01-1993 11.523 NA
03-01-1993 12.623 NA
04-01-1993 13.679 NA
File3.txt
Date FLOW Modelled
01-01-1994 111.856 NA
02-01-1994 111.523 NA
03-01-1994 112.623 NA
04-01-1994 113.679 NA

Condition for function and loop

I have a data frame simplified as follow:
head(dendro)
X DateTime ID diameter dendro ring DOY month mday year Rain_mm_Tot Through_Tot temp
1 1 2012-06-21 13:45:00 r1_1 5482 1 1 173 6 22 113 NA NA NA
2 2 2012-06-21 13:45:00 r2_3 NA 3 2 173 6 22 113 NA NA NA
3 3 2012-06-21 13:45:00 r1_2 5534 2 1 173 6 22 113 NA NA NA
4 4 2012-06-21 13:45:00 r2_4 NA 4 2 173 6 22 113 NA NA NA
5 5 2012-06-21 13:45:00 r1_3 5606 3 1 173 6 22 113 NA NA NA
6 6 2012-06-21 13:45:00 r2_5 NA 5 2 173 6 22 113 NA NA NA
The dataframe is first splitted by "ID", so it's a list of IDs
After that I apply a function, that includes a loop, and the result is a new column "Diameter2", with the result I want from the function, that works OK:
dendro_sp <- split(dendro, dendro$ID)
library(changepoint)
dendro_sp <- lapply(dendro_sp, function(x){
x <- subset(x, !is.na(diameter))
cpfit <- cpt.mean(x$diameter, method="BinSeg")
x$diameter2 <- x$diameter
cpts <- cpfit#cpts
means <- param.est(cpfit)$mean
meanZero <- means[1]
for(i in 1:(length(cpts)-1)){
x$diameter2[(cpts[i]+1):cpts[i+1]] <- x$diameter2[(cpts[i]+1):cpts[i+1]] + (meanZero - means[i+1])
}
return(x)
})
dendro2 <- do.call(rbind, dendro_sp)
rownames(dendro2) <- NULL
My problem is that I want it to apply it conditionally, for example to r1_1 and r1_3, and grab the "diameter" value for r3 in the new column "diameter2", instead of applying the function for the rest of IDs:
ifelse(diameter$ID==c("r1_1","r1_3"), apply_the_function_to_r11_and_r13_to_calculate_diameter2, otherwise_write_diameter_value_in_diameter2_column)
Remember that the dataframe "dendro" is splitted by ID, I don't know if that is important to define the condition for several IDs.
Thanks
I am not sure if I understand the problem correctly. I try to answer.
I assume you want to apply a function to the "diameter" field of the "diameter" data.frame, conditioning on the "ID" field and retunr the result in the corresponding diameter2 field. I don't know how the function works, so forgive me if this will not work.
Selected fields
diameter$diameter2[diameter$ID=="r1_1"|diameter$ID=="r1_3"]<- yourfun(diameter$diameter[diameter$ID=="r1_1"|diameter$ID=="r1_3"]
Unselected fields
diameter$diameter2[diameter$ID!="r1_1" & diameter$ID=="r1_3"]<- diameter$diameter[diameter$ID=="r1_1"|diameter$ID=="r1_3"]

Resources