How to expand a data.frame according to one of its columns? [duplicate] - r

This question already has answers here:
MATCH function in r [duplicate]
(1 answer)
New column in dataframe based on match between two columns [duplicate]
(1 answer)
Closed 4 years ago.
What's an elegant way (without additional packages) to "expand" a given data.frame according to one of its columns?
Given:
df <- data.frame(values = 1:5, strings = c("e", "g", "h", "b", "c"))
more.strings <- letters[c(3, 5, 7, 1, 4, 8, 6)]
Desired outcome: A data.frame containing:
5 c
1 e
2 g
NA a
NA d
3 h
NA f
So those values of df$strings appearing in more.strings should be used to fill the new data.frame (otherwise NA).

you can do a join:
In base R you could do:
merge(df, more.strings, by.y="y",by.x="strings", all.y=TRUE)
strings values
1 c 5
2 e 1
3 g 2
4 h 3
5 a NA
6 d NA
7 f NA
or even as given by #thelatemailin the comment section below:
merge(df, list(strings=more.strings),by="strings", all.y=TRUE)
Using library:
library(tidyverse)
right_join(df,data.frame(strings=more.strings),by="strings")
values strings
1 5 c
2 1 e
3 2 g
4 NA a
5 NA d
6 3 h
7 NA f

We can do this without using any library i.e. using only base R
data.frame(value = with(df, match(more.strings, strings)),
strings = more.strings)
# value strings
#1 5 c
#2 1 e
#3 2 g
#4 NA a
#5 NA d
#6 3 h
#7 NA f
Or we can use complete
library(tidyverse)
complete(df, strings = more.strings) %>%
arrange(match(strings, more.strings)) %>%
select(names(df))
# A tibble: 7 x 2
# values strings
# <int> <chr>
#1 5 c
#2 1 e
#3 2 g
#4 NA a
#5 NA d
#6 3 h
#7 NA f

Related

Simplest way to replace a list of values in a data frame with a list of new values

Say we have a data frame with a factor (Group) that is a grouping variable for a list of IDs:
set.seed(123)
data <- data.frame(Group = factor(sample(5,10, replace = T)),
ID = c(1:10))
In this example, the ID's belong to one of 5 Groups, labeled 1:5. We simply want to replace 1:5 with A:E. In other words, if Group == 1, we want to change it to A, if Group == 2, we want to change it to B, and so on. What is the simplest way to achieve this?
You may assign new labels= in a names list using factor once again.
data$Group1 <- factor(data$Group, labels=list("1"="A", "2"="B", "3"="C", "4"="D", "5"="E"))
## more succinct:
data$Group2 <- factor(data$Group, labels=setNames(list("A", "B", "C", "D", "E"), 1:5))
data
# Group ID Group1 Group2 Group3
# 1 3 1 C C C
# 2 3 2 C C C
# 3 2 3 B B B
# 4 2 4 B B B
# 5 3 5 C C C
# 6 5 6 E E E
# 7 4 7 D D D
# 8 1 8 A A A
# 9 2 9 B B B
# 10 3 10 C C C
This for general, if indeed capital letters are wanted see #RonakShah's solution.
You can use the built-in constant in R LETTERS :
data$new_group <- LETTERS[data$Group]
data
# Group ID new_group
#1 3 1 C
#2 3 2 C
#3 2 3 B
#4 2 4 B
#5 3 5 C
#6 5 6 E
#7 4 7 D
#8 1 8 A
#9 2 9 B
#10 3 10 C
Created a new column (new_group) here for comparison purposes. You can overwrite the same column if you wish to.

merge columns that have the same name r

I am working in R with a dataset that is created from mongodb with the use of mongolite.
I am getting a list that looks like so:
_id A B A B A B NA NA
1 a 1 b 2 e 5 NA NA
2 k 4 l 3 c 3 d 4
I would like to merge the datasetto look like this:
_id A B
1 a 1
2 k 4
1 b 2
2 l 3
1 e 5
2 c 3
1 NA NA
2 d 4
The NAs in the last columns are there because the columns are named from the first entry and if a later entry has more columns than that they don't get names assigned to them, (if I get help for this as well it would be awesome but it's not the reason I am here).
Also the number of columns might differ for different subsets of the dataset.
I have tried melt() but since it is a list and not a dataframe it doesn't work as expected, I have tried stack() but it dodn't work because the columns have the same name and some of them don't even have a name.
I know this is a very weird situation and appreciate any help.
Thank you.
using library(magrittr)
data:
df <- fread("
_id A B A B A B NA NA
1 a 1 b 2 e 5 NA NA
2 k 4 l 3 c 3 d 4 ",header=T)
setDF(df)
Code:
df2 <- df[,-1]
odds<- df2 %>% ncol %>% {(1:.)%%2} %>% as.logical
even<- df2 %>% ncol %>% {!(1:.)%%2}
cbind(df[,1,drop=F],
A=unlist(df2[,odds]),
B=unlist(df2[,even]),
row.names=NULL)
result:
# _id A B
# 1 1 a 1
# 2 2 k 4
# 3 1 b 2
# 4 2 l 3
# 5 1 e 5
# 6 2 c 3
# 7 1 <NA> NA
# 8 2 d 4
We can use data.table. Assuming A and B are always following each other. I created an example with 2 sets of NA's in the header. With grep we can find the ones fread has named V8 etc. Using R's recycling of vectors, you can rename multiple headers in one go. If in your case these are named differently change the pattern in the grep command. Then we melt the data in via melt
library(data.table)
df <- fread("
_id A B A B A B NA NA NA NA
1 a 1 b 2 e 5 NA NA NA NA
2 k 4 l 3 c 3 d 4 e 5",
header = TRUE)
df
_id A B A B A B A B A B
1: 1 a 1 b 2 e 5 <NA> NA <NA> NA
2: 2 k 4 l 3 c 3 d 4 e 5
# assuming A B are always following each other. Can be done in 1 statement.
cols <- names(df)
cols[grep(pattern = "^V", x = cols)] <- c("A", "B")
names(df) <- cols
# melt data (if df is a data.frame replace df with setDT(df)
df_melted <- melt(df, id.vars = 1,
measure.vars = patterns(c('A', 'B')),
value.name=c('A', 'B'))
df_melted
_id variable A B
1: 1 1 a 1
2: 2 1 k 4
3: 1 2 b 2
4: 2 2 l 3
5: 1 3 e 5
6: 2 3 c 3
7: 1 4 <NA> NA
8: 2 4 d 4
9: 1 5 <NA> NA
10: 2 5 e 5
Thank you for your help, they were great inspirations.
Even though #Andre Elrico gave a solution that worked in the reproducible example better #phiver gave a solution that worked better on my overall problem.
By using both those I came up with the following.
library(data.table)
#The data were in a list of lists called list for this example
temp <- as.data.table(matrix(t(sapply(list, '[', seq(max(sapply(list, lenth))))),
nrow = m))
# m here is the number of lists in list
cols <- names(temp)
cols[grep(pattern = "^V", x = cols)] <- c("B", "A")
#They need to be the opposite way because the first column is going to be substituted with id, and this way they fall on the correct column after that
cols[1] <- "id"
names(temp) <- cols
l <- melt.data.table(temp, id.vars = 1,
measure.vars = patterns(c("A", "B")),
value.name = c("A", "B"))
That way I can use this also if I have more than 2 columns that I need to manipulate like that.

Need help in data manipulation in R [duplicate]

This question already has answers here:
Split data frame string column into multiple columns
(16 answers)
Closed 6 years ago.
i have a dataframe with 2 columns id, cat_list
id cat_list
1 A
2 A|B
3 E|F|G
4 I
5 P|R|T|Z
i want to achieve the below using R code.
id cat_list1 cat_list2 cat_list3 cat_list4
1 A
2 A B
3 E F G
4 I
5 P R T Z
tidyr::separate is handy:
library(tidyr)
df %>% separate(cat_list, into = paste0('cat_list', 1:4), fill = 'right')
## id cat_list1 cat_list2 cat_list3 cat_list4
## 1 1 A <NA> <NA> <NA>
## 2 2 A B <NA> <NA>
## 3 3 E F G <NA>
## 4 4 I <NA> <NA> <NA>
## 5 5 P R T Z
We can use cSplit. Here, we don't need to worry to about the number of splits as it will automatically detect it.
library(splitstackshape)
cSplit(df1, "cat_list", "|")
# id cat_list_1 cat_list_2 cat_list_3 cat_list_4
#1: 1 A NA NA NA
#2: 2 A B NA NA
#3: 3 E F G NA
#4: 4 I NA NA NA
#5: 5 P R T Z
NOTE: It may be better to fill with NA rather than ''.

how to merge characters of a certain group into a new column in r [duplicate]

This question already has answers here:
Concatenate / paste a column by a group and add to original data
(2 answers)
Closed 8 years ago.
I have the problem to merge certain characters of a group into a new column, e.g.
df = read.table(text="ID Class
1 a
1 b
2 a
2 c
3 b
4 a
4 b
4 c", header=T)`
and the output should be something like
ID Class Class.aggr
1 a a, b
1 b
2 a a, c
2 c
3 b b
4 a a,b,c
4 b
4 c`
I thought about using cat(union), but the data sample size is very high and I don't know how to call the Class characters dependent on the ID (tapply doesn't seem to work).
Here's a solution using base functions
df$class.arg<-""
df$class.arg[!duplicated(df$ID)]<-
tapply(df$Class, factor(df$ID,unique(df$ID)), paste, collapse=",")
which also produces
ID Class class.arg
1 1 a a,b
2 1 b
3 2 a a,c
4 2 c
5 3 b b
6 4 a a,b,c
7 4 b
8 4 c
This is a possible approach with dplyr
Create data.frame:
ID <- c(1,1,2,2,3,4,4,4)
Class <- c("a","b","a", "c", "b", "a", "b", "c")
df <- data.frame(ID,Class)
And then:
require(dplyr)
df <- df %.%
group_by(ID) %.% #group by ID
mutate(count = 1:n()) %.% #count occurence per ID
mutate(Class.aggr = paste(Class,collapse=",")) #paste the "Class" objects in a new column
df$Class.aggr[df$count>1] <- "" #delete information from other rows
df$count <- NULL #delete column with counts
#>df
# ID Class Class.aggr
#1 1 a a, b
#2 1 b
#3 2 a a, c
#4 2 c
#5 3 b b
#6 4 a a, b, c
#7 4 b
#8 4 c

R assign values in a column of a dataframe based on an identification variable [duplicate]

This question already has answers here:
How to fill NAs with LOCF by factors in data frame, split by country
(8 answers)
Closed 5 years ago.
EDITED
I have a dataframe where the identification variable contains duplicates. How can I create a new variable (VAR2) where I assign values to NA's based on this identification variable.
df <- data.frame(
ID = c(1,2,3,4,4,4,7,8,9,10),
VAR1 = c("a","b","c","d",NA,NA,"g","h","i","j")
)
The dataframe looks like this :
ID VAR1
1 a
2 b
3 c
4 d
4 NA
4 NA
7 g
8 h
9 i
10 j
The expected output
ID VAR1
1 a
2 b
3 c
4 d
4 d
4 d
7 g
8 h
9 i
10 j
require(data.table)
df <- fread('ID VAR1 VAR2
1 a a
2 b b
3 c c
4 d d
4 NA d
4 NA d
7 g g
8 h h
9 i i
10 j j')[,-'VAR1']
df
df[, VAR1 := replace(VAR2, seq_len(.N) > 1, NA), by = ID]
df

Resources