Merging dataframes without changing values [duplicate] - r

This question already has an answer here:
Column binding in R
(1 answer)
Closed 3 years ago.
I have two dataframes
df1 <- data.frame(c(1:10))
df2 <- data.frame(c(1,0,1,1,0,1,0,0,1,0)
I tried to merge them using this code:
merge(df1,df2,all = TRUE, sort = FALSE)
But my dataframe comes out really weird, there are 100 rows
I want the dataframe to look like this:
col1 col2
1 1
2 0
3 1
4 1
5 0
6 1
7 0
8 0
9 1
10 0
How can I do this?

You can just define a new data frame, and use [,1] to extract the columns from your existing data frames, this gives you the ability to name the columns.
data.frame(col1=df1[,1], col2 = df2[,1])
# col1 col2
#1 1 1
#2 2 0
#3 3 1
#4 4 1
#5 5 0
#6 6 1
#7 7 0
#8 8 0
#9 9 1
#10 10 0

This will get you the formatting you want, with named columns:
library(dplyr)
df1 <- data.frame(col1 = c(1:10))
df2 <- data.frame(col2 = c(1,0,1,1,0,1,0,0,1,0))
df <- bind_cols(df1, df2)

You can do that with cbind(), which stands for column bind:
cbind(df1, df2)

Related

In R, how do I add rows containing counts of the number of values in that column that ==x?

I have a dataframe like this
Q1 <- c(1,0,1,4,3)
Q2 <- c(0,1,2,1,4)
df <- data.frame(Q1,Q2)
df
Q1 Q2
1 1 0
2 0 1
3 1 2
4 4 1
5 3 4
There are many more columns like this, and what I want to do is add 5 rows at the bottom of the dataframe with the count of how many items in each column==0, how many ==1, how many ==2, how many==3 and how many==4. Thank you.
You can apply table on each column in df and rbind to original dataset.
rbind(df, sapply(df, function(x) table(factor(x, levels = 0:4))))
# Q1 Q2
#1 1 0
#2 0 1
#3 1 2
#4 4 1
#5 3 4
#0 1 1
#11 2 2
#21 0 1
#31 1 0
#41 1 1
You can use this (if it's numeric, make it character or factor in order to get the count of each level)
New_df1 <- as.data.frame(table(df$Q1)
New_df2 <- as.data.frame(table(df$Q2)
Then you can transform and add (if you want) them to your original data frame.

Making a vector on condition in R [duplicate]

This question already has answers here:
Logical comparison of two vectors with binary (0/1) result
(2 answers)
Closed 3 years ago.
I'm completely new to R, have a little background in Python only.
Say I have 2 columns in my dataframe df that are
col1 = c(1,3,4,5,2,6,7)
col2 = c(2,5,1,5,6,5,3)
and I want to add a new column in df consisting elements 0s and 1s only, it takes 1 if the element in col1 is less than the element in col2, and 0 otherwise. So it should be like
col3 = c(1,1,0,0,1,0,0)
I think there's a way to do it in one line,
df$col3 <- c(...)
but I don't know how to fill in (...) part. Any help would be greatly appreciated.
You may simply compare the vectors themselves:
df <- data.frame(c1 = col1, c2 = col2)
df$c3 <- as.integer(df$c1 < df$c2)
df
c1 c2 c3
1 1 2 1
2 3 5 1
3 4 1 0
4 5 5 0
5 2 6 1
6 6 5 0
7 7 3 0

How to give each instance its own row in a data frame? [duplicate]

This question already has answers here:
Repeat each row of data.frame the number of times specified in a column
(10 answers)
Closed 3 years ago.
How is it possible to transform this data frame so that the count is divided into separate observations?
df = data.frame(object = c("A","B", "A", "C"), count=c(1,2,3,2))
object count
1 A 1
2 B 2
3 A 3
4 C 2
So that the resulting data frame looks like this?
object observation
1 A 1
2 B 1
3 B 1
4 A 1
5 A 1
6 A 1
7 C 1
8 C 1
rep(df$object, df$count)
If you want the 2 columns:
df2 = data.frame(object = rep(df$object, df$count))
df2$count = 1
If you're working with tidyverse - otherwise that's overkill -, you could also do:
library(tidyverse)
uncount(df, count) %>% mutate(observation = 1)
Using data.table:
library(data.table)
setDF(df)[rep(seq_along(count), count), .(object, count = 1L)]
object count
1: A 1
2: B 1
3: B 1
4: A 1
5: A 1
6: A 1
7: C 1
8: C 1

Double left join in dplyr to recover values

I've checked this issue but couldn't find a matching entry.
Say you have 2 DFs:
df1:mode df2:sex
1 1
2 2
3
And a DF3 where most of the combinations are not present, e.g.
mode | sex | cases
1 1 9
1 1 2
2 2 7
3 1 2
1 2 5
and you want to summarise it with dplyr obtaining all combinations (with not existent ones=0):
mode | sex | cases
1 1 11
1 2 5
2 1 0
2 2 7
3 1 2
3 2 0
If you do a single left_join (left_join(df1,df3) you recover the modes not in df3, but 'Sex' appears as 'NA', and the same if you do left_join(df2,df3).
So how can you do both left join to recover all absent combinations, with cases=0? dplyr preferred, but sqldf an option.
Thanks in advance, p.
The development version of tidyr, tidyr_0.2.0.9000, has a new function called complete that I saw the other day that seems like it was made for just this sort of situation.
The help page says:
This is a wrapper around expand(), left_join() and replace_na that's
useful for completing missing combinations of data. It turns
implicitly missing values into explicitly missing values.
To add the missing combinations of df3 and fill with 0 values instead, you would do:
library(tidyr)
library(dplyr)
df3 %>% complete(mode, sex, fill = list(cases = 0))
mode sex cases
1 1 1 9
2 1 1 2
3 1 2 5
4 2 1 0
5 2 2 7
6 3 1 2
7 3 2 0
You would still need to group_by and summarise to get the final output you want.
df3 %>% complete(mode, sex, fill = list(cases = 0)) %>%
group_by(mode, sex) %>%
summarise(cases = sum(cases))
Source: local data frame [6 x 3]
Groups: mode
mode sex cases
1 1 1 11
2 1 2 5
3 2 1 0
4 2 2 7
5 3 1 2
6 3 2 0
First here's you data in a more friendly, reproducible format
df1 <- data.frame(mode=1:3)
df2 <- data.frame(sex=1:2)
df3 <- data.frame(mode=c(1,1,2,3,1), sex=c(1,1,2,1,2), cases=c(9,2,7,2,5))
I don't see an option for a full outer join in dplyr, so I'm going to use base R here to merge df1 and df2 to get all mode/sex combinations. Then i left join that to the data and replace NA values with zero.
mm <- merge(df1,df2) %>% left_join(df3)
mm$cases[is.na(mm$cases)] <- 0
mm %>% group_by(mode,sex) %>% summarize(cases=sum(cases))
which gives
mode sex cases
1 1 1 11
2 1 2 5
3 2 1 0
4 2 2 7
5 3 1 2
6 3 2 0

How to combine two data frames using dplyr or other packages?

I have two data frames:
df1 = data.frame(index=c(0,3,4),n1=c(1,2,3))
df1
# index n1
# 1 0 1
# 2 3 2
# 3 4 3
df2 = data.frame(index=c(1,2,3),n2=c(4,5,6))
df2
# index n2
# 1 1 4
# 2 2 5
# 3 3 6
I want to join these to:
index n
1 0 1
2 1 4
3 2 5
4 3 8 (index 3 in two df, so add 2 and 6 in each df)
5 4 3
6 5 0 (index 5 not exists in either df, so set 0)
7 6 0 (index 6 not exists in either df, so set 0)
The given data frames are just part of large dataset. Can I do it using dplyr or other packages in R?
Using data.table (would be efficient for bigger datasets). I am not changing the column names, as the rbindlist uses the name of the first dataset ie. in this case n from the second column (Don't know if it is a feature or bug). Once you join the datasets by rbindlist, group it by column index i.e. (by=index) and do the sum of n column (list(n=sum(n)) )
library(data.table)
rbindlist(list(data.frame(index=0:6,n=0), df1,df2))[,list(n=sum(n)), by=index]
index n
#1: 0 1
#2: 1 4
#3: 2 5
#4: 3 8
#5: 4 3
#6: 5 0
#7: 6 0
Or using dplyr. Here, the column names of all the datasets should be the same. So, I am changing it before binding the datasets using rbind_list. If the names are different, there will be multiple columns for each name. After joining the datasets, group it by index and then use summarize and do the sum of column n.
library(dplyr)
nm1 <- c("index", "n")
colnames(df1) <- colnames(df2) <- nm1
rbind_list(df1,df2, data.frame(index=0:6, n=0)) %>%
group_by(index) %>%
summarise(n=sum(n))
This is something you could do with the base functions aggregate and rbind
df1 = data.frame(index=c(0,3,4),n=c(1,2,3))
df2 = data.frame(index=c(1,2,3),n=c(4,5,6))
aggregate(n~index, rbind(df1, df2, data.frame(index=0:6, n=0)), sum)
which returns
index n
1 0 1
2 1 4
3 2 5
4 3 8
5 4 3
6 5 0
7 6 0
How about
names(df1) <- c("index", "n") # set colnames of df1 to target
df3 <- rbind(df1,setNames(df2, names(df1))) # set colnnames of df2 and join
df <- df3 %>% dplyr::arrange(index) # sort by index
Cheers.

Resources