split character combine and reshape dataframe in R [duplicate] - r

This question already has answers here:
R: Split unbalanced list in data.frame column
(2 answers)
Closed 5 years ago.
I am trying to built a patent network. I have a sample dataframe (aa) that contains an ID variable (origin) and string character (Target). I want to split the string character into separate groups and then add it back to the dataframe in long format so that it shows up as a new dataframe (ab). I've tried a few things trying to combine strsplit, do.call and reshape functions but to no avail. Appreciate any help.
From
aa<-data.frame(Origin=c(1,2,3),Target=c('a b c','d e','f g a b'))
aa
to
ab<-data.frame(Origin=c(rep(1,3),rep(2,2),rep(3,4)), Target=c('a','b','c','d','e','f','g','a','b'))
ab

You can achieve this using a combination of strsplit, mutate and unnest functions
library(dplyr)
library(tidyr)
aa %>% mutate(Target = strsplit(as.character(Target), " ")) %>% unnest(Target)
# Origin Target
# 1 1 a
# 2 1 b
# 3 1 c
# 4 2 d
# 5 2 e
# 6 3 f
# 7 3 g
# 8 3 a
# 9 3 b

Related

Reshaping dataframe to list values over unique id - back and forth [duplicate]

This question already has answers here:
Collapse text by group in data frame [duplicate]
(2 answers)
Collapse / concatenate / aggregate a column to a single comma separated string within each group
(6 answers)
Closed 3 years ago.
I want to condense information in a dataframe to reduce the number of rows.
Consider the dataframe:
df <- data.frame(id=c("A","A","A","B","B","C","C","C"),b=c(4,5,6,1,2,7,8,9))
df
id b
1 A 4
2 A 5
3 A 6
4 B 1
5 B 2
6 C 7
7 C 8
8 C 9
I want to collapse the dataframe to all unique values of "id" and list the values in variable b. The result should look like
df.results <- data.frame(id=c("A","B","C"),b=c("4,5,6","1,2","7,8,9"))
df.results
id b
1 A 4,5,6
2 B 1,2
3 C 7,8,9
A solution for the first step is:
library(dplyr)
df.results <- df %>%
group_by(id) %>%
summarise(b = toString(b)) %>%
ungroup()
How would you turn df.results back into df?

Condensing data frame with same names and different values [duplicate]

This question already has answers here:
How to use Aggregate function in R
(3 answers)
How to sum a variable by group
(18 answers)
Closed 5 years ago.
I have a data frame that I am trying to condense. There are multiple value os X with the same names but with different Y values associated with them:
X Y
1 a 1
2 b 3
3 a 2
4 c 4
5 b 7
I want to condense the data frame so there are no duplicate names in X, like below:
X Y
1 a 3
2 b 10
3 c 4
Using tidyverse:
library(tidyverse)
df <- df %>%
group_by(x) %>%
summarise(y = sum(y))

Combining multiple columns in one R [duplicate]

This question already has answers here:
Flatting a dataframe with all values of a column into one
(3 answers)
Closed 5 years ago.
How can I combine multiple all dataframe's columns in just 1 column? , in an efficient way... I mean not using the column names to do it, using dplyr or tidyr on R, cause I have too much columns (10.000+)
For example, converting this data frame
> Multiple_dataframe
a b c
1 4 7
2 5 8
3 6 9
to
> Uni_dataframe
d
1
2
3
4
5
6
7
8
9
I looked around Stack Overflow but without success.
We can use unlist
Uni_dataframe <- data.frame(d = unlist( Multiple_dataframe, use.names = FALSE))
Or using dplyr/tidyr (as the question is specific about it)
library(tidyverse)
Uni_dataframe <- gather(Multiple_dataframe, key, d) %>%
select(-key)

In R: get multiple rows by splitting a column using tidyr and reshape2 [duplicate]

This question already has answers here:
Split comma-separated strings in a column into separate rows
(6 answers)
Closed 6 years ago.
What is the most simpel way using tidyr or reshape2 to turn this data:
data <- data.frame(
A=c(1,2,3),
B=c("b,g","g","b,g,q"))
Into (e.g. make a row for each comma separated value in variable B):
A B
1 1 b
2 1 g
3 2 g
4 3 b
5 3 g
6 3 q
Try
library(splitstackshape)
cSplit(data, 'B', ',', 'long')
Or using base R
lst <- setNames(strsplit(as.character(data$B), ','), data$A)
stack(lst)
Or
library(tidyr)
unnest(lst,A)

Return df with a columns values that occur more than once [duplicate]

This question already has answers here:
Subset data frame based on number of rows per group
(4 answers)
Closed 5 years ago.
I have a data frame df, and I am trying to subset all rows that have a value in column B occur more than once in the dataset.
I tried using table to do it, but am having trouble subsetting from the table:
t<-table(df$B)
Then I try subsetting it using:
subset(df, table(df$B)>1)
And I get the error
"Error in x[subset & !is.na(subset)] :
object of type 'closure' is not subsettable"
How can I subset my data frame using table counts?
Here is a dplyr solution (using mrFlick's data.frame)
library(dplyr)
newd <- dd %>% group_by(b) %>% filter(n()>1) #
newd
# a b
# 1 1 1
# 2 2 1
# 3 5 4
# 4 6 4
# 5 7 4
# 6 9 6
# 7 10 6
Or, using data.table
setDT(dd)[,if(.N >1) .SD,by=b]
Or using base R
dd[dd$b %in% unique(dd$b[duplicated(dd$b)]),]
May I suggest an alternative, faster way to do this with data.table?
require(data.table) ## 1.9.2
setDT(df)[, .N, by=B][N > 1L]$B
(or) you can couple .I (another special variable - see ?data.table) which gives the corresponding row number in df, along with .N as follows:
setDT(df)[df[, .I[.N > 1L], by=B]$V1]
(or) have a look at #mnel's another for another variation (using yet another special variable .SD).
Using table() isn't the best because then you have to rejoin it to the original rows of the data.frame. The ave function makes it easier to calculate row-level values for different groups. For example
dd<-data.frame(
a=1:10,
b=c(1,1,2,3,4,4,4,5,6, 6)
)
dd[with(dd, ave(b,b,FUN=length))>1, ]
#subset(dd, ave(b,b,FUN=length)>1) #same thing
a b
1 1 1
2 2 1
5 5 4
6 6 4
7 7 4
9 9 6
10 10 6
Here, for each level of b, it counts the length of b, which is really just the number of b's and returns that back to the appropriate row for each value. Then we use that to subset.

Resources