This question already has answers here:
Flatting a dataframe with all values of a column into one
(3 answers)
Closed 5 years ago.
How can I combine multiple all dataframe's columns in just 1 column? , in an efficient way... I mean not using the column names to do it, using dplyr or tidyr on R, cause I have too much columns (10.000+)
For example, converting this data frame
> Multiple_dataframe
a b c
1 4 7
2 5 8
3 6 9
to
> Uni_dataframe
d
1
2
3
4
5
6
7
8
9
I looked around Stack Overflow but without success.
We can use unlist
Uni_dataframe <- data.frame(d = unlist( Multiple_dataframe, use.names = FALSE))
Or using dplyr/tidyr (as the question is specific about it)
library(tidyverse)
Uni_dataframe <- gather(Multiple_dataframe, key, d) %>%
select(-key)
Related
This question already has answers here:
Unlist data frame column preserving information from other column
(3 answers)
Closed 2 years ago.
I have a dataframe with phonetic transcriptions of words called trans, and a column pos_numwhich records the position of the phoneme tin the transcription strings.
df <- data.frame(
trans = c("ðət", "əˈpærəntli", "ˈkɒntrækt", "təˈwɔːdz", "pəˈteɪtəʊz"), stringsAsFactors = F
)
df$pos_num <- sapply(strsplit(df$trans, ""), function(x) which(grepl("t", x)))
df
trans pos_num
1 ðət 3
2 əˈpærəntli 8
3 ˈkɒntrækt 5, 9
4 təˈwɔːdz 1
5 pəˈteɪtəʊz 4, 7
In some transcriptions, t occurs more than once, resulting in multiple values in pos_num. Where this is the case I would like to duplicate the entire row, with the original row containing one value and the duplicated row containing the other value. The desired output would be:
df
trans pos_num
1 ðət 3
2 əˈpærəntli 8
3 ˈkɒntrækt 5
4 ˈkɒntrækt 9
5 təˈwɔːdz 1
6 pəˈteɪtəʊz 4
7 pəˈteɪtəʊz 7
How can this be achieved? (There seem to be a few posts on that question for other programming languages but not R.)
library(data.table)
setDT(df)
df[, .(pos_num = unlist((pos_num))),by = .(trans)]
This question already has answers here:
How to use Aggregate function in R
(3 answers)
How to sum a variable by group
(18 answers)
Closed 5 years ago.
I have a data frame that I am trying to condense. There are multiple value os X with the same names but with different Y values associated with them:
X Y
1 a 1
2 b 3
3 a 2
4 c 4
5 b 7
I want to condense the data frame so there are no duplicate names in X, like below:
X Y
1 a 3
2 b 10
3 c 4
Using tidyverse:
library(tidyverse)
df <- df %>%
group_by(x) %>%
summarise(y = sum(y))
This question already has answers here:
R: Split unbalanced list in data.frame column
(2 answers)
Closed 5 years ago.
I am trying to built a patent network. I have a sample dataframe (aa) that contains an ID variable (origin) and string character (Target). I want to split the string character into separate groups and then add it back to the dataframe in long format so that it shows up as a new dataframe (ab). I've tried a few things trying to combine strsplit, do.call and reshape functions but to no avail. Appreciate any help.
From
aa<-data.frame(Origin=c(1,2,3),Target=c('a b c','d e','f g a b'))
aa
to
ab<-data.frame(Origin=c(rep(1,3),rep(2,2),rep(3,4)), Target=c('a','b','c','d','e','f','g','a','b'))
ab
You can achieve this using a combination of strsplit, mutate and unnest functions
library(dplyr)
library(tidyr)
aa %>% mutate(Target = strsplit(as.character(Target), " ")) %>% unnest(Target)
# Origin Target
# 1 1 a
# 2 1 b
# 3 1 c
# 4 2 d
# 5 2 e
# 6 3 f
# 7 3 g
# 8 3 a
# 9 3 b
This question already has answers here:
Combine two data frames by rows (rbind) when they have different sets of columns
(14 answers)
Closed 6 years ago.
Suppose I've 8 tables. Let 6 columns be same in each of those tables but, among those 8 tables 5 of them has one extra column(whose column name is same in all 5, that means those 5 of them has in total of 7 columns).
My question is how we will bind all 8 tables such that the other 3 tables also now consist of that extra column which the other 5 has.
I hope the question is quite clear.
You can use rbind.fill from the plyr package for this:
library(plyr)
# df_list contains a list of all the csv files you read, e.g. using lapply(list_paths, read.csv)
df_list = list(data.frame(a = c(1,2), b = c(3,4)),
data.frame(a = c(4,5), b = c(6,3), c = c(20, 21)))
> do.call('rbind.fill', df_list)
a b c
1 1 3 NA
2 2 4 NA
3 4 6 20
4 5 3 21
or alternatively, use rbindlist from data.table as #akrun suggested. This is probably a lot faster for larger datasets.
This question already has answers here:
Split comma-separated strings in a column into separate rows
(6 answers)
Closed 6 years ago.
What is the most simpel way using tidyr or reshape2 to turn this data:
data <- data.frame(
A=c(1,2,3),
B=c("b,g","g","b,g,q"))
Into (e.g. make a row for each comma separated value in variable B):
A B
1 1 b
2 1 g
3 2 g
4 3 b
5 3 g
6 3 q
Try
library(splitstackshape)
cSplit(data, 'B', ',', 'long')
Or using base R
lst <- setNames(strsplit(as.character(data$B), ','), data$A)
stack(lst)
Or
library(tidyr)
unnest(lst,A)