Moving rows from one dataframe to another based on a matching column - r

I'm very sorry for asking this question, because I saw something similar in the past but I couldn't find it (so duplication will be understandable).
I have 2 data frames, and I want to move all my (matching) customers who appears in the 2 data frames into one of them. Please pay attention that I want to add the entire row.
Here is an example:
# df1
customer_ip V1 V2
1 15 20
2 12 18
# df2
customer_ip V1 V2
2 45 50
3 12 18
And I want my new data frames to look like:
# df1
customer_ip V1 V2
1 15 20
2 12 18
2 45 50
# df2
customer_ip V1 V2
3 12 18
Thank you in advance!

This does it.
df1<-rbind(df1,df2[df2$customer_ip %in% df1$customer_ip,])
df2<-df2[!(df2$customer_ip %in% df1$customer_ip),]
EDIT: Gaurav & Sotos got here before me whilst I was writing with essentially the same answer, but I'll leave this here as it shows the code without the redundant 'which'

This should do the trick:
#Add appropriate rows to df1
df1 <- rbind(df1, df2[which(df2$customer_ip %in% df1$customer_ip),])
#Remove appropriate rows from df2
df2 <- df2[-which(df2$customer_ip %in% df1$customer_ip),]

Related

In R, How can I filter only those rows in which the value for colmun V6 appears exactly 2 times?

How can I filter in R only those row in which the value for Column V6 appears exactly 2 times?
my dataseta date:
I try:
library(dplyr)
df <- as.data.frame(date)
df1 <- subset(df,duplicated(V6))
but it does not work.
You can use a contingency table to get the value counts. Here's some example code.
# Make some dummy data (only 8 and 2 appear exactly twice in this example)
df <- data.frame(V1=1:10,
V2=11:10,
V6=c(1,2,8,3,4,3,2,3,8,7))
# Get table of counts for column "V6"
tab <- table(df$V6)
# Get values that appear exactly twice
twice <- as.numeric(names(tab)[tab == 2])
# Filter the data frame based on these values
df <- df[df$V6 %in% twice,]
Output:
V1 V2 V6
2 2 10 2
3 3 11 8
7 7 11 2
9 9 11 8

Creating a new variable in R from two existing ones

My apologies if this is a basic question. I'm new to R.
I have a dataset, DAT, which has 3 variables: ID, V1 and V2. Unfortunately, V2 data are missing for many cases. I want to create a new variable, V3. I want V3 to have the same values as V2, but for any case that has a missing value for V2, I want V3 to take the value of V1 instead. What is the most efficient way to do this in R?
One approach using the dplyr package.
# Step 1: Load verb-like data wrangling package.
library(dplyr)
# Step 2: Create some data.
df <- data.frame(ID=1:5, V1 = 11:15, V2 = c(31:33, NA, NA))
ID V1 V2
1 11 31
2 12 32
3 13 33
4 14 NA
5 15 NA
# Step 3: Create a variable V3 using your criteria
df <- mutate(df, V3 = if_else(is.na(V2), V1, V2))
ID V1 V2 V3
1 11 31 31
2 12 32 32
3 13 33 33
4 14 NA 14
5 15 NA 15
Using the data.table package would probably be more efficient if you have a big data frame.
You can also use the ifelse statement.
DAT$V3 <- ifelse(is.na(DAT$V2), DAT$V1, DAT$V2)
Reads as if V2 is blank then use V1, otherwise use the data in V2.

Make rows out of one column based on another columns value

I have a data frame like this
V1 V2
10 5
20 4
30 8
40 6
10 10
20 7
30 4
40 9
And I would like to have all the values relating to the same V1 in one row, like so...
V1 V2 V3
10 5 10
20 4 7
30 8 4
40 6 9
Here is a solution in base R. You can feed the uniques in row V1 into a lapply and extract all values in V2 for each unique V1. This you feed into a call to do.call (because the result from lapply is a list) with rbind, and then you merge the resulting matrix with the vector of uniques via cbind.
# Create df1 for demonstration
df1 = data.frame(a = rep(1:4, 10), b = sample(1:40))
output = cbind(unique(df1$a), do.call(rbind, lapply(unique(df1$a), function(x) df1$b[df1$a == x])))
This solution depends on the values inside the source data frame to be of the same type. If they are not, you might have to invest some time into casting the data into the correct types or so. But this should not be a problem.
You can do what you want with apply functions.
DF <- data.frame(A = c(1:5,1:5),B=11:20)
lst <- lapply(unique(DF$A),function(AA) DF[DF$A ==AA,'B'])
Result <- do.call(rbind,lst)
If you wish to have the A column back in you can use Results <- cbind(A=names(lst),Results)
Be careful, this will give you a matrix not a data.frame. If your values are not numeric like this example that may cause some issues.
There are some alternate ways to do this using Data Tables or dplyr.
We can do this with dcast from data.table
library(data.table)
dcast(setDT(df1), V1~paste0("V", rowid(V1)+1))
# V1 V2 V3
#1: 10 5 10
#2: 20 4 7
#3: 30 8 4
#4: 40 6 9

Same function over multiple data frames in R - not over a list of data frames

This Issue is almost what I wanted to do, except by the fact of an output being giving as a list of data frames. Let's reproduce the example of mentioned SE issue above.
Let's say I have 2 data frames:
df1
ID col1 col2
x 0 10
y 10 20
z 20 30
df1
ID col1 col2
a 0 10
b 10 20
c 20 30
What I want is an 4th column with an ifelse result. My rationale is:
if col1>=20 in any data.frame I could have named with the pattern "df", then the new column res=1, else res=0.
But I want to create a new column in each data.frame with the same name pattern, not put all of those data.frames in a list and apply the function, except if I could "extract" each 3rd dimension of this list back to individual data frames.
Thanks
Per #Frank...if my understanding of what you are looking for is correct, consider using data.table. MWE:
library(data.table);
addcol <- function(x) x[,res:=ifelse(col1>=20,1,0)]
df1 <- data.table(ID=c("x","y","z"),col1=c(0,10,20),col2=c(10,20,30))
df2 <- data.table(ID=c("x","y","z"),col1=c(20,10,20),col2=c(10,20,30))
#modified df2 so you can see different effects
lapply(list(df1,df2),addcol)
> df1
ID col1 col2 res
1: x 0 10 0
2: y 10 20 0
3: z 20 30 1
> df2
ID col1 col2 res
1: x 20 10 1
2: y 10 20 0
3: z 20 30 1
This works because data.table operates by reference on tables, so inside the function you're actually updating the underlying table, not only the scoped reference to the table.

Delete rows with value frequencies lesser than x in R

I got a data frame in R like the following:
V1 V2 V3
1 2 3
1 43 54
2 34 53
3 34 51
3 43 42
...
And I want to delete all rows which value of V1 has a frequency lower then 2. So in my example the row with V1 = 2 should be deleted, because the value "2" only appears once in the column ("1" and "3" appear twice each).
I tired to add a extra column with the frequency of V1 in it to delete all rows where the frequency is > 1 but with the following I only get NAs in the extra column.
data$Frequency <- table(data$V1)[data$V1]
Thanks
You can try this:
library(dplyr)
df %>% group_by(V1) %>% filter(n() > 1)
You can also consider using data.table. We first count the occurence of each value in V1, then filter on those occurences being more than 1. Finally, we remove our count-column as we no longer need it.
library(data.table)
setDT(dat)
dat2 <- dat[,n:=.N,V1][n>1,,][,n:=NULL]
Or even quicker, thanks to RichardScriven:
dat[, .I[.N >= 2], by = V1]
> dat2
V1 V2 V3
1: 1 2 3
2: 1 43 54
3: 3 34 51
4: 3 43 42
With this you do not need to load a library
res<-data.frame(V1=c(1,1,2,3,3,3),V2=rnorm(6),V3=rnorm(6))
res[res$V1%in%names(table(res$V1)>=2)[table(res$V1)>=2],]

Resources