Manipulate Vector and Data Frame in R - r

list1=c(1,6,3,4,4,5)
data=data.frame("colA" = c(1:6),
"colB"=c(4,3,1,8,9,8))
I have 'list1' and 'data'
I wish to match the values in 'colB' to the ones in list1 using 'colA' as a key aso it looks like

Perhaps, we need match
data.frame(list1, colB = data$colB[match(list1, data$colA)])
# list1 colB
#1 1 4
#2 6 8
#3 3 1
#4 4 8
#5 4 8
#6 5 9

You can also use merge, which was one of your tags.
merge(data.frame(list1=list1), data, by.x=c("list1"), by.y="colA")
list1 colB
1 1 4
2 3 1
3 4 8
4 4 8
5 5 9
6 6 8
Or if you don't care about the column name:
merge(data.frame(colA=list1), data)
colA colB
1 1 4
2 3 1
3 4 8
4 4 8
5 5 9
6 6 8

Related

merging duplicated colums by which row is greater than others

i have list of dataframes and the dataframes have some duplicated columns. I want to merge duplicated columns which row is greater than others(some data frames have much more duplicates).
example data:
temp <- data.frame(seq_len(15), 5, 3)
colnames(temp) <- c("A", "A", "B")
temp$A[5]=NA
temp$A[3]=NA
temp$A[2]=NA
temp[7,2]=NA
A A B
<int> <dbl> <dbl>
1 5 3
NA 5 3
NA 5 3
4 5 3
NA 5 3
6 5 3
7 NA 3
8 5 3
9 5 3
10 5 3
final output
A B
<int> <dbl>
1 3
5 3
5 3
5 3
5 3
6 3
7 3
8 3
9 3
10 3
Thanks for everyone
A base R approach would be to split the data frame based on similarity of columns and select row-wise maximum using do.call + pmax.
data.frame(sapply(split.default(temp, names(temp)), function(x)
do.call(pmax, c(x, na.rm = TRUE))))
# A B
#1 5 3
#2 5 3
#3 5 3
#4 5 3
#5 5 3
#6 6 3
#7 7 3
#8 8 3
#9 9 3
#10 10 3
#11 11 3
#12 12 3
#13 13 3
#14 14 3
#15 15 3

Merge 2 rows with duplicated pair of values into a single row

I have the dataframe below in which there are 2 rows with the same pair of values for columns A and B -3RD AND 4RTH with 2 3 -, -7TH AND 8TH with 4 6-.
master <- data.frame(A=c(1,1,2,2,3,3,4,4,5,5), B=c(1,2,3,3,4,5,6,6,7,8),C=c(5,2,5,7,7,5,7,9,7,8),D=c(1,2,5,3,7,5,9,6,7,0))
A B C D
1 1 1 5 1
2 1 2 2 2
3 2 3 5 5
4 2 3 7 3
5 3 4 7 7
6 3 5 5 5
7 4 6 7 9
8 4 6 9 6
9 5 7 7 7
10 5 8 8 0
I would like to merge these rows into one by adding the pipe | operator between values of C and D. The 2nd and 3rd line for example would be like:
A B C D
2 3 2|5 2|5
I think your combined pairs are off by a row in your example, assuming that's the case, this is what you're looking for. We group by the columns we want to collapse the duplicates out of, and then use summarize_all with paste0 to combine the values with a separator.
library(tidyverse)
master %>% group_by(A,B) %>% summarize_all(funs(paste0(., collapse="|")))
A B C D
<dbl> <dbl> <chr> <chr>
1 1 1 5 1
2 1 2 2 2
3 2 3 5|7 5|3
4 3 4 7 7
5 3 5 5 5
6 4 6 7|9 9|6
7 5 7 7 7
8 5 8 8 0
We can do this in base R with aggregate
aggregate(.~ A + B, master, FUN = paste, collapse= '|')
# A B C D
#1 1 1 5 1
#2 1 2 2 2
#3 2 3 5|7 5|3
#4 3 4 7 7
#5 3 5 5 5
#6 4 6 7|9 9|6
#7 5 7 7 7
#8 5 8 8 0

Assign value to group based on condition in column

I have a data frame that looks like the following:
> df = data.frame(group = c(1,1,1,2,2,2,3,3,3),
date = c(1,2,3,4,5,6,7,8,9),
value = c(3,4,3,4,5,6,6,4,9))
> df
group date value
1 1 1 3
2 1 2 4
3 1 3 3
4 2 4 4
5 2 5 5
6 2 6 6
7 3 7 6
8 3 8 4
9 3 9 9
I want to create a new column that contains the date value per group that is associated with the value "4" from the value column.
The following data frame shows what I hope to accomplish.
group date value newValue
1 1 1 3 2
2 1 2 4 2
3 1 3 3 2
4 2 4 4 4
5 2 5 5 4
6 2 6 6 4
7 3 7 6 8
8 3 8 4 8
9 3 9 9 8
As we can see, group 1 has the newValue "2" because that is the date associated with the value "4". Similarly, group two has newValue 4 and group three has newValue 8.
I assume there is an easy way to do this using ave() or a range of dplyr/data.table functions, but I have been unsuccessful with my many attempts.
Here's a quick data.table one
library(data.table)
setDT(df)[, newValue := date[value == 4L], by = group]
df
# group date value newValue
# 1: 1 1 3 2
# 2: 1 2 4 2
# 3: 1 3 3 2
# 4: 2 4 4 4
# 5: 2 5 5 4
# 6: 2 6 6 4
# 7: 3 7 6 8
# 8: 3 8 4 8
# 9: 3 9 9 8
Here's a similar dplyr version
library(dplyr)
df %>%
group_by(group) %>%
mutate(newValue = date[value == 4L])
Or a possible base R solution using merge after filtering the data (will need some renaming afterwards)
merge(df, df[df$value == 4, c("group", "date")], by = "group")
Here is a base R option
df$newValue = rep(df$date[which(df$value == 4)], table(df$group))
Another alternative using lapply
do.call(rbind, lapply(split(df, df$group),
function(x){x$newValue = rep(x$date[which(x$value == 4)],
each = length(x$group)); x}))
# group date value newValue
#1.1 1 1 3 2
#1.2 1 2 4 2
#1.3 1 3 3 2
#2.4 2 4 4 4
#2.5 2 5 5 4
#2.6 2 6 6 4
#3.7 3 7 6 8
#3.8 3 8 4 8
#3.9 3 9 9 8
One more base R path:
df$newValue <- ave(`names<-`(df$value==4,df$date), df$group, FUN=function(x) as.numeric(names(x)[x]))
df
group date value newValue
1 1 1 3 2
2 1 2 4 2
3 1 3 3 2
4 2 4 4 4
5 2 5 5 4
6 2 6 6 4
7 3 7 6 8
8 3 8 4 8
9 3 9 9 8
10 3 11 7 8
I used a test on variable length groups. I assigned the date column as the names for the logical index of value equal to 4. Then identify the value by group.
Data
df = data.frame(group = c(1,1,1,2,2,2,3,3,3,3),
date = c(1,2,3,4,5,6,7,8,9,11),
value = c(3,4,3,4,5,6,6,4,9,7))

How to only keep the columns with same names between two data frames?

I have two data frames like the following:
a<-c(1,3,4,5,6,8)
b<-c(2,3,4,2,6,7)
c<-c(2,5,6,3,5,6)
df1<-data.frame(a,b,c)
d<-c(3,4,5,6,7,8)
e<-c(1,2,3,2,1,1)
c<-c(1,3,4,5,6,2)
df2<-data.frame(d,e,c)
> df1
a b c
1 1 2 2
2 3 3 5
3 4 4 6
4 5 2 3
5 6 6 5
6 8 7 6
> df2
d e c
1 3 1 1
2 4 2 3
3 5 3 4
4 6 2 5
5 7 1 6
6 8 1 2
I want combine the two data frames,and only keep the columns with the same names. The final data frame should like this:
> df3
c1 c2
1 2 1
2 5 3
3 6 4
4 3 5
5 5 6
6 6 2
My real data frames have hundreds columns,so I need codes do this job. Can anyone help me?
Find out which names belong to both dataframes and then bind them:
eqnames <- names(df1)[names(df1) %in% names(df2)]
df3 <- cbind(df1[eqnames], df2[eqnames])
You can then rename the columns:
names(df3) <- paste0(names(df3), 1:ncol(df3))
Resulting in:
> df3
c1 c2
1 2 1
2 5 3
3 6 4
4 3 5
5 5 6
6 6 2

Add new column to data frame, taking existing values within range

I was wondering if anyone knows a simple way to create a new column in a data frame, taking data from an existing column, within a certain range.
For example, I have this data frame
range col1
1 5
2 4
3 9
4 5
5 2
6 8
7 9
I would like to create col2 using the data in col1, and have col2 take values above the range 3
range col1 col2
1 5 0
2 4 0
3 9 0
4 5 5
5 2 2
6 8 8
7 9 9
I have tried
data$col2 <- data$col1 [which(data$range > 3)) ]
data$col2 <- subset ( data$col1 , data$range >3 )
However both of these produce error:
replacement has 4 rows, data has 7
Any help greatly appreciated
You can do it even without ifelse here:
data$new <- with(data, (range > 3) * col1)
data
# range col1 new
#1 1 5 0
#2 2 4 0
#3 3 9 0
#4 4 5 5
#5 5 2 2
#6 6 8 8
#7 7 9 9
Try ifelse
transform(data, col2=ifelse(range >3, col1, 0))
# range col1 col2
#1 1 5 0
#2 2 4 0
#3 3 9 0
#4 4 5 5
#5 5 2 2
#6 6 8 8
#7 7 9 9

Resources