melt data frame and split values - r

I have the following data frame with measurements concatenated into a single column, separated by some delimiter:
df <- data.frame(v1=c(1,2), v2=c("a;b;c", "d;e;f"))
df
v1 v2
1 1 a;b;c
2 2 d;e;f;g
I would like to melt/transforming it into the following format:
v1 v2
1 1 a
2 1 b
3 1 c
4 2 d
5 2 e
6 2 f
7 2 g
Is there an elegant solution?
Thx!

You can split the strings with strsplit.
Split the strings in the second column:
splitted <- strsplit(as.character(df$v2), ";")
Create a new data frame:
data.frame(v1 = rep.int(df$v1, sapply(splitted, length)), v2 = unlist(splitted))
The result:
v1 v2
1 1 a
2 1 b
3 1 c
4 2 d
5 2 e
6 2 f

Related

R function to replace tricky merge in Excel (vlookup + hlookup)

I have a tricky merge that I usually do in Excel via various formulas and I want to automate with R.
I have 2 dataframes, one called inputs looks like this:
id v1 v2 v3
1 A A C
2 B D F
3 T T A
4 A F C
5 F F F
And another called df
id v
1 1
1 2
1 3
2 2
3 1
I would like to combined them based on the id and v values such that I get
id v key
1 1 A
1 2 A
1 3 C
2 2 D
3 1 T
So I'm matching on id and then on the column from v1 thru v2, in the first example you will see that I match id = 1 and v1 since the value of v equals 1. In Excel I do this combining creatively VLOOKUP and HLOOKUP but I want to make this simpler in R. Dataframe examples are simplified versions as the I have more records and values go from v1 thru up to 50.
Thanks!
You could use pivot_longer:
library(tidyr)
library(dplyr)
key %>% pivot_longer(!id,names_prefix='v',names_to = 'v') %>%
mutate(v=as.numeric(v)) %>%
inner_join(df)
Joining, by = c("id", "v")
# A tibble: 5 × 3
id v value
<int> <dbl> <chr>
1 1 1 A
2 1 2 A
3 1 3 C
4 2 2 D
5 3 1 T
Data:
key <- read.table(text="
id v1 v2 v3
1 A A C
2 B D F
3 T T A
4 A F C
5 F F F",header=T)
df <- read.table(text="
id v
1 1
1 2
1 3
2 2
3 1 ",header=T)
You can use two column matrices as index arguments to "[" so this is a one liner. (Not the names of the data objects are d1 and d2. I'd opposed to using df as a data object name.)
d1[-1][ data.matrix(d2)] # returns [1] "A" "A" "C" "D" "T"
So full solution is:
cbind( d2, key= d1[-1][ data.matrix(d2)] )
id v key
1 1 1 A
2 1 2 A
3 1 3 C
4 2 2 D
5 3 1 T
Try this:
x <- "
id v1 v2 v3
1 A A C
2 B D F
3 T T A
4 A F C
5 F F F
"
y <- "
id v
1 1
1 2
1 3
2 2
3 1
"
df <- read.table(textConnection(x) , header = TRUE)
df2 <- read.table(textConnection(y) , header = TRUE)
key <- c()
for (i in 1:nrow(df2)) {
key <- append(df[df2$id[i],(df2$v[i] + 1L)] , key)
}
df2$key <- rev(key)
df2
># id v key
># 1 1 1 A
># 2 1 2 A
># 3 1 3 C
># 4 2 2 D
># 5 3 1 T
Created on 2022-06-06 by the reprex package (v2.0.1)

R: How to replace a set of values in a dataframe [duplicate]

Complicated title but here is a simple example of what I am trying to achieve:
d <- data.frame(v1 = c(1,2,3,4,5,6,7,8),
v2 = c("A","E","C","B","B","C","A","E"))
m <- data.frame(v3 = c("D","E","A","C","D","B"),
v4 = c("d","e","a","c","d","b"))
Values in d$v2 should be replaced by values in m$v4 by matching the values from d$v2 in m$v3
The resulting data frame d should look like:
v1 v4
1 a
2 e
3 c
4 b
5 b
6 c
7 a
8 e
I tried different stuff and the closest I came was: d$v2 <- m$v4[which(m$v3 %in% d$v2)]
I try to avoid any for-loops again! Must be possible :-) somehow... ;)
You could try:
merge(d,m, by.x="v2", by.y="v3")
v2 v1 v4
1 A 1 a
2 A 7 a
3 B 4 b
4 B 5 b
5 C 3 c
6 C 6 c
7 E 2 e
8 E 8 e
Edit
Here is another approach, to preserve the order:
data.frame(v1=d$v1, v4=m[match(d$v2, m$v3), 2])
v1 v4
1 1 a
2 2 e
3 3 c
4 4 b
5 5 b
6 6 c
7 7 a
8 8 e
You could use a standard left join.
Loading the data:
d <- data.frame(v1 = c(1,2,3,4,5,6,7,8), v2 = c("A","E","C","B","B","C","A","E"), stringsAsFactors=F)
m <- data.frame(v3 = c("D","E","A","C","D","B"), v4 = c("d","e","a","c","d","b"), stringsAsFactors=F)
Changing column name, such that I can join by column "v2"
colnames(m) <- c("v2", "v4")
Left joining and maintaining the order of data.frame d
library(dplyr)
left_join(d, m)
Output:
v1 v2 v4
1 1 A a
2 2 E e
3 3 C c
4 4 B b
5 5 B b
6 6 C c
7 7 A a
8 8 E e
This will give you the desired output:
d$v2 <- m$v4[match(d$v2, m$v3)]
match function returns the position from m matrix's v3 column for the values in d$v2 being matched. Once you have obtained the indices (from using match()), access elements from m$v4 using those indices to replace the elements in d matrix, column v2.

R counting strings variables in each row of a dataframe

I have a dataframe that looks something like this, where each row represents a samples, and has repeats of the the same strings
> df
V1 V2 V3 V4 V5
1 a a d d b
2 c a b d a
3 d b a a b
4 d d a b c
5 c a d c c
I want to be able to create a new dataframe, where ideally the headers would be the string variables in the previous dataframe (a, b, c, d) and the contents of each row would be the number of occurrences of each the respective variable from
the original dataframe. Using the example from above, this would look like
> df2
a b c d
1 2 1 0 2
2 2 1 1 1
3 2 1 0 1
4 1 1 1 2
5 1 0 3 1
In my actual dataset, there are hundreds of variables, and thousands of samples, so it'd be ideal if I could automatically pull out the names from the original dataframe, and alphabetize them into the headers for the new dataframe.
You may try
library(qdapTools)
mtabulate(as.data.frame(t(df)))
Or
mtabulate(split(as.matrix(df), row(df)))
Or using base R
Un1 <- sort(unique(unlist(df)))
t(apply(df ,1, function(x) table(factor(x, levels=Un1))))
You can stack the columns and then use table:
table(cbind(id = 1:nrow(mydf),
stack(lapply(mydf, as.character)))[c("id", "values")])
# values
# id a b c d
# 1 2 1 0 2
# 2 2 1 1 1
# 3 2 2 0 1
# 4 1 1 1 2
# 5 1 0 3 1

Summarize data frame based on condition

I have this kind of dataset (ID, V1, V2 are the 3 variables of my data frame):
ID V1 V2
1 A 10
1 B 5
1 D 1
2 C 9
2 E 8
I would like a new data frame with, for each ID, the line that has the value max in V2. For the example, the result would be:
ID V1 V2
1 A 10
2 C 9
Use ddply from plyr package (assume data is sample)
library(plyr)
ddply(sample,.(ID),summarize,V1=V1[which.max(V2)],V2=max(V2))
ID V1 V2
1 1 A 10
2 2 C 9
This is sort of clumsy code, but it works....
> mydf[with(mydf, ave(V2, ID, FUN = function(x) x == max(x))) == 1, ]
ID V1 V2
1 1 A 10
4 2 C 9
Less clumsy:
do.call(rbind,
by(mydf, mydf$ID,
FUN = function(x) x[which.max(x$V2), ]))
# ID V1 V2
# 1 1 A 10
# 2 2 C 9

Match values in data frame with values in another data frame and replace former with a corresponding pattern from the other data frame

Complicated title but here is a simple example of what I am trying to achieve:
d <- data.frame(v1 = c(1,2,3,4,5,6,7,8),
v2 = c("A","E","C","B","B","C","A","E"))
m <- data.frame(v3 = c("D","E","A","C","D","B"),
v4 = c("d","e","a","c","d","b"))
Values in d$v2 should be replaced by values in m$v4 by matching the values from d$v2 in m$v3
The resulting data frame d should look like:
v1 v4
1 a
2 e
3 c
4 b
5 b
6 c
7 a
8 e
I tried different stuff and the closest I came was: d$v2 <- m$v4[which(m$v3 %in% d$v2)]
I try to avoid any for-loops again! Must be possible :-) somehow... ;)
You could try:
merge(d,m, by.x="v2", by.y="v3")
v2 v1 v4
1 A 1 a
2 A 7 a
3 B 4 b
4 B 5 b
5 C 3 c
6 C 6 c
7 E 2 e
8 E 8 e
Edit
Here is another approach, to preserve the order:
data.frame(v1=d$v1, v4=m[match(d$v2, m$v3), 2])
v1 v4
1 1 a
2 2 e
3 3 c
4 4 b
5 5 b
6 6 c
7 7 a
8 8 e
You could use a standard left join.
Loading the data:
d <- data.frame(v1 = c(1,2,3,4,5,6,7,8), v2 = c("A","E","C","B","B","C","A","E"), stringsAsFactors=F)
m <- data.frame(v3 = c("D","E","A","C","D","B"), v4 = c("d","e","a","c","d","b"), stringsAsFactors=F)
Changing column name, such that I can join by column "v2"
colnames(m) <- c("v2", "v4")
Left joining and maintaining the order of data.frame d
library(dplyr)
left_join(d, m)
Output:
v1 v2 v4
1 1 A a
2 2 E e
3 3 C c
4 4 B b
5 5 B b
6 6 C c
7 7 A a
8 8 E e
This will give you the desired output:
d$v2 <- m$v4[match(d$v2, m$v3)]
match function returns the position from m matrix's v3 column for the values in d$v2 being matched. Once you have obtained the indices (from using match()), access elements from m$v4 using those indices to replace the elements in d matrix, column v2.

Resources