Collapsing columns and removing NAs [duplicate] - r

This question already has answers here:
How to implement coalesce efficiently in R
(9 answers)
Closed 2 years ago.
I have a data.frame, in this format:
A w x y z
0.23 1 NA NA NA
0.12 NA 2 NA NA
0.45 NA 2 NA NA
0.89 NA NA 3 NA
0.12 NA NA NA 4
And I want to collapse w:x:y:z into a single column, while removing NA's. Desired result:
A Comb
0.23 1
0.12 2
0.45 2
0.89 3
0.12 4
My approach so far is:
df %>% unite("Comb", w:x:y:z, na.rm=TRUE, remove=TRUE)
However, "Comb" is being populated with strings such as 1_NA_NA_NA and NA_NA_NA_4 i.e. it is not removing the NA's. I've tried switching to character NA's, but that leads to bizarre and unpredictable results. What am I doing wrong?
I'd also like to be able to do this when the original data.frame is populated with strings (in place of the numbers). Is there a method for this?

Using dplyr::coalesce we can do the following:
df %>%
mutate(Comb = coalesce(w,x,y,z)) %>%
select(A, Comb)
which gives the following output:
A Comb
<dbl> <dbl>
1 0.23 1
2 0.12 2
3 0.45 2
4 0.89 3
5 0.12 4

In unite, na.rm does not remove integer/factor columns.
Convert them to the character and then use unite.
library(dplyr)
df %>%
mutate_at(vars(w:z), as.character) %>%
tidyr::unite('comb', w:z, na.rm = TRUE)
# A comb
#1 0.23 1
#2 0.12 2
#3 0.45 2
#4 0.89 3
#5 0.12 4
data
df <- structure(list(A = c(0.23, 0.12, 0.45, 0.89, 0.12), w = c(1L,
NA, NA, NA, NA), x = c(NA, 2L, 2L, NA, NA), y = c(NA, NA, NA,
3L, NA), z = c(NA, NA, NA, NA, 4L)), class = "data.frame",
row.names = c(NA, -5L))

Another option is fcoalesce from data.table
library(data.table)
setDT(df)[, .(A, Comb = fcoalesce(w, x, y, z))]
data
df <- structure(list(A = c(0.23, 0.12, 0.45, 0.89, 0.12), w = c(1L,
NA, NA, NA, NA), x = c(NA, 2L, 2L, NA, NA), y = c(NA, NA, NA,
3L, NA), z = c(NA, NA, NA, NA, 4L)), class = "data.frame",
row.names = c(NA, -5L))

Using na.omit.
dat <- transform(dat[1], Comb=apply(dat[-1], 1, na.omit))
# A Comb
# 1 0.23 1
# 2 0.12 2
# 3 0.45 2
# 4 0.89 3
# 5 0.12 4
Data
dat <- structure(list(A = c(0.23, 0.12, 0.45, 0.89, 0.12), w = c(1L,
NA, NA, NA, NA), x = c(NA, 2L, 2L, NA, NA), y = c(NA, NA, NA,
3L, NA), z = c(NA, NA, NA, NA, 4L)), row.names = c(NA, -5L), class = "data.frame")

Related

R: Copy values from a column in one dataframe to several selected columns in another dataframe for groups

I would like to copy the value from a dataframe to another dataframe. The difficulty for me is that I have four groups in both dataframes and in the dataframe where I would like to copy the values, I have several columns where I want to insert the value. More specifically, the dataframe (df1) from which I would like to copy the values looks like this:
structure(list(Name = c("A", "B", "C", "D"), Value = c(2L, 5L,
3L, 2L)), class = "data.frame", row.names = c(NA, -4L))
The dataframe (df2) where I want to insert the values looks like this:
structure(list(Name = c("A", "B", "C", "D"), `Rating 2017-06` = c(NA,
NA, NA, NA), `Rating 2017-07` = c(NA, NA, NA, NA), `Ratin g 2017-08` = c(NA,
NA, NA, NA), `Rating 2017-09` = c(NA, NA, NA, NA), `Rating 2017-10` = c(NA,
NA, NA, NA), `Rating 2017-11` = c(NA, NA, NA, NA), `Rating 2017-12` = c(NA,
NA, NA, NA), `Rating 2018-01` = c(4L, 4L, 3L, 3L), `Rating 2018-02` = c(3L,
4L, 3L, 2L)), class = "data.frame", row.names = c(NA, -4L))
I would like to copy the data in df1 in column "Value" to df2 to columns 2 to 8 ("Rating 2017-06" to "Rating 2017-12") for each Name.
What I have tried so far:
Merged_Data <- df1 %>%
left_join(df1, df2, by="Name")
The problem is with this code that I cannot specifically copy the values form df1 to selected columns as mentioned above in df2. I don't know what I have to right in the code that I would be able to do that.
Could someone help me here?
Thank you already.
Try this approach:
library(tidyverse)
df1 <- structure(list(
Name = c("A", "B", "C", "D"),
Value = c(2L, 5L,
3L, 2L)
),
class = "data.frame",
row.names = c(NA,-4L))
df2 <- structure(
list(
Name = c("A", "B", "C", "D"),
`Rating 2016-06` = c(NA,
NA, NA, NA),
`Rating 2017-07` = c(NA, NA, NA, NA),
`Ratin g 2017-08` = c(NA,
NA, NA, NA),
`Rating 2017-09` = c(NA, NA, NA, NA),
`Rating 2017-10` = c(NA,
NA, NA, NA),
`Rating 2017-11` = c(NA, NA, NA, NA),
`Rating 2017-12` = c(NA,
NA, NA, NA),
`Rating 2018-01` = c(4L, 4L, 3L, 3L),
`Rating 2018-02` = c(3L,
4L, 3L, 2L)), class = "data.frame", row.names = c(NA, -4L))
df2 |>
left_join(df1) |>
mutate(across(contains("2016") | contains("2017"), ~ Value)) |>
select(- Value)
#> Joining, by = "Name"
#> Name Rating 2016-06 Rating 2017-07 Ratin g 2017-08 Rating 2017-09
#> 1 A 2 2 2 2
#> 2 B 5 5 5 5
#> 3 C 3 3 3 3
#> 4 D 2 2 2 2
#> Rating 2017-10 Rating 2017-11 Rating 2017-12 Rating 2018-01 Rating 2018-02
#> 1 2 2 2 4 3
#> 2 5 5 5 4 4
#> 3 3 3 3 3 3
#> 4 2 2 2 3 2
df2 |>
left_join(df1) |>
mutate(across(`Rating 2016-06`:`Rating 2017-12`, ~ Value)) |>
select(- Value)
#> Joining, by = "Name"
#> Name Rating 2016-06 Rating 2017-07 Ratin g 2017-08 Rating 2017-09
#> 1 A 2 2 2 2
#> 2 B 5 5 5 5
#> 3 C 3 3 3 3
#> 4 D 2 2 2 2
#> Rating 2017-10 Rating 2017-11 Rating 2017-12 Rating 2018-01 Rating 2018-02
#> 1 2 2 2 4 3
#> 2 5 5 5 4 4
#> 3 3 3 3 3 3
#> 4 2 2 2 3 2
Created on 2022-04-30 by the reprex package (v2.0.1)

How to remove NA in character data in R

I would like to copy the last two columns from each month to the beginning of the next month. I did it as follows (below), but the data contains NA and when I change it to character, the program breaks down. How do I copy columns to keep their type?
My code:
library(readxl)
library(tibble)
df<- read_excel("C:/Users/Rezerwa/Documents/Database.xlsx")
df=add_column(df, Feb1 = as.character(do.call(paste0, df["January...4"])), .after = "January...5")
df=add_column(df, Feb2 = as.numeric(do.call(paste0, df["January...5"])), .after = "Feb1")
My data:
df
# A tibble: 10 x 13
Product January...2 January...3 January...4 January...5 February...6 February...7 February...8 February...9 March...10 March...11 March...12 March...13
<chr> <lgl> <lgl> <chr> <dbl> <chr> <dbl> <chr> <dbl> <chr> <dbl> <chr> <dbl>
1 a NA NA 754.00 4 754.00 4 754.00 4 754.00 4 754.00 4
2 b NA NA 706.00 3 706.00 3 706.00 3 706.00 3 706.00 3
3 c NA NA 517.00 3 517.00 3 517.00 3 517.00 3 517.00 3
4 d NA NA 1466.00 9 1466.00 9 1466.00 9 1466.00 9 1466.00 9
5 e NA NA 543.00 8 543.00 8 543.00 8 543.00 8 543.00 8
6 f NA NA NA NA NA NA NA NA NA NA NA NA
7 g NA NA NA NA NA NA NA NA NA NA NA NA
8 h NA NA NA NA NA NA NA NA NA NA NA NA
9 i NA NA 1466.00 8 NA NA NA NA NA NA NA NA
10 j NA NA NA NA 543.00 3 NA NA NA NA NA NA
My error:
> df=add_column(df, Feb1 = as.character(do.call(paste0, df["January...4"])), .after = "January...5")
> df=add_column(df, Feb2 = as.numeric(do.call(paste0, df["January...5"])), .after = "Feb1")
Warning message:
In eval_tidy(xs[[i]], unique_output) : NAs introduced by coercion
Using base R we can split the columns based on the prefix of their names, select last two columns from each group and cbind to original df.
df1 <- cbind(df, do.call(cbind, lapply(split.default(df[-1],
sub("\\..*", "", names(df)[-1])), function(x) {n <- ncol(x);x[, c(n-1, n)]})))
To get data in order, we can do
cbind(df1[1], df1[-1][order(match(sub("\\..*", "", names(df1)[-1]), month.name))])
data
df <- structure(list(Product = structure(1:10, .Label = c("a", "b",
"c", "d", "e", "f", "g", "h", "i", "j"), class = "factor"), January...2 = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA), January...3 = c(NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA), January...4 = c(754, 706, 517,
1466, 543, NA, NA, NA, 1466, NA), January...5 = c(4L, 3L, 3L,
9L, 8L, NA, NA, NA, 8L, NA), February...6 = c(754, 706, 517,
1466, 543, NA, NA, NA, NA, 543), February...7 = c(4L, 3L, 3L,
9L, 8L, NA, NA, NA, NA, 3L), February...8 = c(754, 706, 517,
1466, 543, NA, NA, NA, NA, NA), February...9 = c(4L, 3L, 3L,
9L, 8L, NA, NA, NA, NA, NA), March...10 = c(754, 706, 517, 1466,
543, NA, NA, NA, NA, NA), March...11 = c(4L, 3L, 3L, 9L, 8L,
NA, NA, NA, NA, NA), March...12 = c(754, 706, 517, 1466, 543,
NA, NA, NA, NA, NA), March...13 = c(4L, 3L, 3L, 9L, 8L, NA, NA,
NA, NA, NA)), class = "data.frame", row.names = c("1", "2", "3",
"4", "5", "6", "7", "8", "9", "10"))

Recode into new variable conditional on values in two other variables

I would like to be able to create a new variable based on specific values in two existing variables. My dataframe looks like:
structure(list(id = structure(c(1L, 2L, 3L, NA, NA, NA), .Label = c("blue",
"red", "yellow"), class = "factor"), value = c(-4.3, -2.5, -3.6,
NA, NA, NA)), .Names = c("id", "value"), row.names = c(NA, -6L
), class = "data.frame")
I would like to create a new column that contains only those values that pertain to blue (e.g., 4.2). All other values would result in NA, like so:
structure(list(id = structure(c(1L, 2L, 3L, NA, NA, NA), .Label = c("blue",
"red", "yellow"), class = "factor"), value = c(-4.3, -2.5, -3.6,
NA, NA, NA), newvalue = c(-4.3, NA, NA, NA, NA, NA)), .Names = c("id",
"value", "newvalue"), row.names = c(NA, -6L), class = "data.frame")
I tried the following:
b1 <- dat$id=="blue"
dat$newvalue <- dat$value[b1]
But that filled every cell in the new column with the same value (-4.3).
Due to presence of NA's it becomes tricky to assign values directly using indexing. We can use replace instead where we replace any non "blue" value to NA.
dat$newvalue <- replace(dat$value, dat$id != "blue", NA)
dat
# id value newvalue
#1 blue -4.3 -4.3
#2 red -2.5 NA
#3 yellow -3.6 NA
#4 <NA> NA NA
#5 <NA> NA NA
#6 <NA> NA NA
The equivalent ifelse statement would be :
dat$newvalue <- ifelse(dat$id != "blue", NA, dat$value)

R: Replacing a factor with an integer value in numerous cells across numerous columns

So, my challenge has been to convert a raw scale csv to a scored csv. Within numerous columns, the file has cells filled with "Strongly Agree" to "Strongly Disagree", 6 levels. These factors need to be converted in integers 5 to 0 respectively.
I have tried unsuccessfully to use sapply and convert the table to a string. Sapply works on the vector, but it destroys the table structure.
Method 1:
dat$Col<-sapply(dat$Col,switch,'Strongly Disagree'=0,'Disagree'=1,'Slightly Disagree'=2,'Slightly Agree'=3,'Agree'=4, 'Strongly Agree'=5)
My second approach is to convert the csv into a string. When I examined the dput output, I saw the area I wanted to target that started with a .Label="","Strongly Agree"... Mistake. My changes did not result in a useful outcome.
My third approach came from the internet gods of destruction who seemed to express that gsub() might handle the string approach as well. Nope, again the underlying table structure was destroyed.
Method #3: Convert into a string and pattern match
dat <- textConnection("control/Surveys/StudyDat_1.csv")
#Score Scales
##"Strongly Agree"= 5
##"Agree"= 4
##"Strongly Disagree" = 0
#levels(dat$Col) <- gsub("Strongly Agree", "5", levels(dat$Col))
df<- gsub("Strongly Agree", "5",dat)
dat<-read.csv(textConnection(df),header=TRUE)
In the end, I am wanting to replace ALL "Strongly Agree" to 5 across numerous columns without the consequence of destroying the retrievability of the data.
Maybe I used the wrong search string and you know the resource I need to address this problem. I would rather avoid ALL character vector approaches as that this would require labeling each column if you provide a code response. It will need to go across ALL COLUMNS.
Thanks
Data Sample Problem
structure(list(last_updated = structure(c(3L, 1L, 7L, 2L, 10L, 6L, 8L, 9L, 7L, 5L, 4L), .Label = c("2016-05-13T12:53:56.704184Z",
"2016-05-13T12:54:09.273359Z", "2016-05-13T12:54:22.757251Z",
"2016-05-14T12:44:13.474992Z", "2016-05-14T12:44:31.736469Z",
"2016-05-16T16:45:10.623410Z", "2016-05-16T16:46:17.881402Z",
"2016-05-16T16:46:55.122257Z", "2016-05-16T16:47:14.160793Z",
"2016-05-24T02:26:04.770799Z"), class = "factor"), feedback = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), A = structure(c(NA,
NA, 2L, NA, 1L, NA, NA, NA, 2L, NA, NA), .Label = c("", "Slightly Disagree"
), class = "factor"), B = structure(c(NA, NA, 2L, NA, 1L, NA,
NA, NA, 3L, NA, NA), .Label = c("", "Disagree", "Strongly Agree"
), class = "factor"), C = structure(c(NA, NA, 2L, NA, 1L, NA,
NA, NA, 3L, NA, NA), .Label = c("", "Agree", "Disagree"), class = "factor"),
D = structure(c(NA, NA, 2L, NA, 1L, NA, NA, NA, 2L, NA, NA
), .Label = c("", "Agree"), class = "factor"), E = structure(c(NA,
NA, 2L, NA, 1L, NA, NA, NA, 3L, NA, NA), .Label = c("", "Agree",
"Strongly Disagree"), class = "factor")), .Names = c("last_updated",
"feedback", "A", "B", "C", "D", "E"), class = "data.frame", row.names = c(NA,
-11L))
Data Sample Solution
df<-dget(structure(list(last_updated = structure(c(3L, 1L, 7L, 2L, 10L, 6L,8L, 9L, 7L, 5L, 4L), .Label = c("2016-05-13T12:53:56.704184Z",
"2016-05-13T12:54:09.273359Z", "2016-05-13T12:54:22.757251Z",
"2016-05-14T12:44:13.474992Z", "2016-05-14T12:44:31.736469Z",
"2016-05-16T16:45:10.623410Z", "2016-05-16T16:46:17.881402Z",
"2016-05-16T16:46:55.122257Z", "2016-05-16T16:47:14.160793Z",
"2016-05-24T02:26:04.770799Z"), class = "factor"), feedback = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), A = c(NA, NA, 2L, NA,
NA, NA, NA, NA, 2L, NA, NA), B = c(NA, NA, 1L, NA, NA, NA, NA,
NA, 5L, NA, NA), C = c(NA, NA, 4L, NA, NA, NA, NA, NA, 1L, NA,
NA), D = c(NA, NA, 4L, NA, NA, NA, NA, NA, 4L, NA, NA), E = c(NA,
NA, 4L, NA, NA, NA, NA, NA, 0L, NA, NA)), .Names = c("last_updated",
"feedback", "A", "B", "C", "D", "E"), class = "data.frame", row.names = c(NA,-11L)))
we can use factor with levels specified
nm1 <- c('Strongly Disagree', 'Disagree',
'Slightly Disagree','Slightly Agree','Agree', 'Strongly Agree')
factor(dat$col, levels = nm1,
labels = 0:5))
If there are multiple factor columns with the same levels, identify the factor columns ('i1'), loop through it with lapply and specify the levels and labels.
i1 <- sapply(dat, is.factor)
dat[i1] <- lapply(dat[i1], factor, levels = nm1, labels= 0:5)
Update
Using the OP's dput output
dat[-(1:2)] <- lapply(dat[-(1:2)], factor, levels = nm1, labels = 0:5)
dat
# last_updated feedback A B C D E
#1 2016-05-13T12:54:22.757251Z NA <NA> <NA> <NA> <NA> <NA>
#2 2016-05-13T12:53:56.704184Z NA <NA> <NA> <NA> <NA> <NA>
#3 2016-05-16T16:46:17.881402Z NA 2 1 4 4 4
#4 2016-05-13T12:54:09.273359Z NA <NA> <NA> <NA> <NA> <NA>
#5 2016-05-24T02:26:04.770799Z NA <NA> <NA> <NA> <NA> <NA>
#6 2016-05-16T16:45:10.623410Z NA <NA> <NA> <NA> <NA> <NA>
#7 2016-05-16T16:46:55.122257Z NA <NA> <NA> <NA> <NA> <NA>
#8 2016-05-16T16:47:14.160793Z NA <NA> <NA> <NA> <NA> <NA>
#9 2016-05-16T16:46:17.881402Z NA 2 5 1 4 0
#10 2016-05-14T12:44:31.736469Z NA <NA> <NA> <NA> <NA> <NA>
#11 2016-05-14T12:44:13.474992Z NA <NA> <NA> <NA> <NA> <NA>
Another option is set from data.table
library(data.table)
for(j in names(dat)[-(1:2)]){
set(dat, i = NULL, j= j, value = factor(dat[[j]], levels = nm1, labels = 0:5))
}
I would just match each target column vector into a precomputed character vector to get an integer index. You can subtract 1 afterward to change the range from 1:6 to 0:5.
## define desired value order, ascending
o <- c(
'Strongly Disagree',
'Disagree',
'Slightly Disagree',
'Slightly Agree',
'Agree',
'Strongly Agree'
);
## convert target columns
for (cn in names(df)[-(1:2)]) df[[cn]] <- match(as.character(df[[cn]]),o)-1L;
df;
## last_updated feedback A B C D E
## 1 2016-05-13T12:54:22.757251Z NA NA NA NA NA NA
## 2 2016-05-13T12:53:56.704184Z NA NA NA NA NA NA
## 3 2016-05-16T16:46:17.881402Z NA 2 1 4 4 4
## 4 2016-05-13T12:54:09.273359Z NA NA NA NA NA NA
## 5 2016-05-24T02:26:04.770799Z NA NA NA NA NA NA
## 6 2016-05-16T16:45:10.623410Z NA NA NA NA NA NA
## 7 2016-05-16T16:46:55.122257Z NA NA NA NA NA NA
## 8 2016-05-16T16:47:14.160793Z NA NA NA NA NA NA
## 9 2016-05-16T16:46:17.881402Z NA 2 5 1 4 0
## 10 2016-05-14T12:44:31.736469Z NA NA NA NA NA NA
## 11 2016-05-14T12:44:13.474992Z NA NA NA NA NA NA
Previous answers might meet your needs, but note that changing the labels of a factor isn't the same as changing a factor to an integer variable. One possibility would be to use ifelse (I've made a new data frame as the one you posted didn't actually have variables with these levels in it):
lev <- c('Strongly disagree', 'Disagree', 'Slightly disagree', 'Slightly agree', 'Agree', 'Strongly agree')
dta <- sample(lev, 55, replace = TRUE)
dta <- data.frame(matrix(dta, nrow = 11))
names(dta) <- LETTERS[1:5]
f_to_int <- function(f) {
if (is.factor(f)){
ifelse(f == 'Strongly disagree', 0,
ifelse(f == 'Disagree', 1,
ifelse(f == 'Slightly disagree', 2,``
ifelse(f == 'Slightly agree', 3,
ifelse(f == 'Agree', 4,
ifelse(f == 'Strongly agree', 5, f))))))
} else f
}
dta2 <- sapply(dta, f_to_int)
Note that this returns a matrix, but it is easily converted to a data frame if necessary.

Replace value in one data frame from another

I've got two data.frames:
First:
> dput(head(tbl_mz))
structure(list(m.z = c(258.1686969, 258.168752, 587.8313625,
587.8425292, 523.2863282, 523.2859396), Measured.mass = c(514.3228408,
514.3229511, 1173.648172, 802.4706732, 1272.645144, 1044.557326
)), .Names = c("m.z", "Measured.mass"), row.names = c(NA, 6L), class = "data.frame")
Second:
> dput(head(tbl_exl))
structure(list(V1 = c(802.4706732, 1272.649209, 1272.646875,
1272.646599, 1272.646521, 1272.645144), V2 = c(NA, NA, NA, NA,
NA, NA), V3 = c(NA, NA, NA, NA, NA, NA), V4 = c(NA, NA, NA, NA,
NA, NA), V5 = c(NA, NA, NA, NA, NA, NA), V6 = structure(c(2L,
2L, 2L, 2L, 2L, 2L), .Label = c("", "Positive"), class = "factor"),
V7 = c(28.7, 29.4, 29.4, 23.8, 28.6, 23.3), V8 = c(30.7,
31.4, 31.4, 25.8, 30.6, 25.3), X = c(NA, NA, NA, NA, NA,
NA), X.1 = c(NA, NA, NA, NA, NA, NA), X.2 = c(NA, NA, NA,
NA, NA, NA)), .Names = c("V1", "V2", "V3", "V4", "V5", "V6",
"V7", "V8", "X", "X.1", "X.2"), row.names = c(NA, 6L), class = "data.frame")
I would like to replace some values from tbl_exl, column V1 by values from the the other table tbl_mz. The values from column V1 (tbl_exl) can be found in the column Measured.mass (tbl_mz) and they should be replaced by the values from the next column m.z in tbl_mz data frame.
In another words the values in the V1 should be replaced by the m.z values.
The problem is that not all values from V1 can't be find in the other data frame. Those which can be find can be deleted or just left like they are.
The output, which I want to get:
> dput(head(tbl_exl_modified))
structure(list(V1 = c(587.8425292, 1272.649209, 1272.646875,
1272.646599, 1272.646521, 523.2863282), V2 = c(NA, NA, NA, NA,
NA, NA), V3 = c(NA, NA, NA, NA, NA, NA), V4 = c(NA, NA, NA, NA,
NA, NA), V5 = c(NA, NA, NA, NA, NA, NA), V6 = structure(c(2L,
2L, 2L, 2L, 2L, 2L), .Label = c("", "Positive"), class = "factor"),
V7 = c(28.7, 29.4, 29.4, 23.8, 28.6, 23.3), V8 = c(30.7,
31.4, 31.4, 25.8, 30.6, 25.3), X = c(NA, NA, NA, NA, NA,
NA), X.1 = c(NA, NA, NA, NA, NA, NA), X.2 = c(NA, NA, NA,
NA, NA, NA)), .Names = c("V1", "V2", "V3", "V4", "V5", "V6",
"V7", "V8", "X", "X.1", "X.2"), row.names = c(NA, 6L), class = "data.frame")
You could try match. Create numeric indexes based on the match between the columns ("Measured.mass", "V1") of the two datasets. Remove the NA values ("indx1", "indxN1") and replace the "V1" values to "m.z" based on these index.
indx <- match(tbl_mz$Measured.mass, tbl_exl$V1)
indx1 <- indx[!is.na(indx)]
indxN <- match(tbl_exl$V1, tbl_mz$Measured.mass)
indxN1 <- indxN[!is.na(indxN)]
tbl_exl$V1[indx1] <- tbl_mz$m.z[indxN1]
identical(tbl_exl, tbl_exl_modified)
#[1] TRUE
Or use left_join from dplyr
library(dplyr)
tbl_exl1 <- left_join(tbl_exl, tbl_mz, by=c('V1'='Measured.mass')) %>%
mutate(V1= pmax((NA^!is.na(m.z))*V1, m.z,
na.rm=TRUE)) %>%
select(-m.z)
tbl_exl1
# V1 V2 V3 V4 V5 V6 V7 V8 X X.1 X.2
#1 587.8425 NA NA NA NA Positive 28.7 30.7 NA NA NA
#2 1272.6492 NA NA NA NA Positive 29.4 31.4 NA NA NA
#3 1272.6469 NA NA NA NA Positive 29.4 31.4 NA NA NA
#4 1272.6466 NA NA NA NA Positive 23.8 25.8 NA NA NA
#5 1272.6465 NA NA NA NA Positive 28.6 30.6 NA NA NA
#6 523.2863 NA NA NA NA Positive 23.3 25.3 NA NA NA
Here's a solution using data.tables binary join
library(data.table)
setnames(setDT(tbl_exl), 1, "Measured.mass") # Changing the first column name for the join to work
setkey(tbl_exl, Measured.mass) # Keying tbl_exl by `Measured.mass`
setkey(setDT(tbl_mz), Measured.mass) # Keying tbl_exl by `Measured.mass`
tbl_exl[tbl_mz, Measured.mass := i.m.z][] # Joining and retrieving only matched values from `i.m.z`
# Measured.mass V2 V3 V4 V5 V6 V7 V8 X X.1 X.2
# 1: 587.8425 NA NA NA NA Positive 28.7 30.7 NA NA NA
# 2: 523.2863 NA NA NA NA Positive 23.3 25.3 NA NA NA
# 3: 1272.6465 NA NA NA NA Positive 28.6 30.6 NA NA NA
# 4: 1272.6466 NA NA NA NA Positive 23.8 25.8 NA NA NA
# 5: 1272.6469 NA NA NA NA Positive 29.4 31.4 NA NA NA
# 6: 1272.6492 NA NA NA NA Positive 29.4 31.4 NA NA NA

Resources