How to append the column in R? - r

Consider the following data named mydata. My intention is to put v1 and v2 in the same column by adding an identifier variable v4.
id v1 v2
1 2 3
2 4 5
3 7 8
OUTPUT required:
id v3 v4
1 2 1
2 4 1
3 7 1
1 3 2
2 5 2
3 8 2
Any help is much appreciated!

I think you are looking for something like dplyr::mutate() for adding columns, and rbind() for stacking two data frames on top of each other.
mydata <- data.frame (id = c(1,2,3),
v1 = c(2,4,7),
v2 = c(3,5,8))
a<- data.frame(mydata$id, mydata$v1)%>%
rename(v3=mydata.v1, )
b<- data.frame(mydata$id, mydata$v2)%>%
rename(v3=mydata.v2, )
> rbind(a,b)
id v3 v4
1 1 2 1
2 2 4 1
3 3 7 1
4 1 3 2
5 2 5 2
6 3 8 2

What about this:
mydata <- data.frame(c(1,2,3),c(2,4,7),c(3,5,8))
colnames(mydata) <- c("id","v1","v2")
mydata_2 <- rbind(mydata[,c(1,2)], setNames(mydata[,c(1,3)], names(mydata[,c(1,2)])))
mydata_2$v4 <- c(rep(1,length(mydata$v1)),rep(2,length(mydata$v2)))
colnames(mydata_2) <- c("id","v3","v4")

A data.table option
setnames(melt(setDT(df), id.var = "id", = "v4"), "value", "v3"),
v4 = as.numeric(factor(v4))
), c("id", "v3", "v4")
id v3 v4
1: 1 2 1
2: 2 4 1
3: 3 7 1
4: 1 3 2
5: 2 5 2
6: 3 8 2


How to find duplicated values in two columns between two dataframes and remove non-duplicates in R?

So let's say I have two dataframes that look like this
df1 <- data.frame(ID = c("A","B","F","G","B","B","A","G","G","F","A","A","A","B","F"),
code = c(1,2,2,3,3,1,2,2,1,1,3,2,2,1,1),
class = c(2,4,5,5,2,3,2,5,1,2,4,5,3,2,1))
df2 <- data.frame(ID = c("G","F","C","F","B","A","F","C","A","B","A","B","C","A","G"),
code = c(1,2,2,3,3,1,2,2,1,1,3,2,2,1,1),
class = c(2,4,5,5,2,3,2,5,1,2,4,5,3,2,1))
I want to check the duplicates in df1$ID and df2$ID and remove all the rows from df2 if the IDs are not present in df1 so the new dataframe would look like this:
df3 <- data.frame(ID = c("G","F","F","B","A","F","A","B","A","B","A","G"),
code = c(1,2,3,3,1,2,1,1,3,2,1,1),
class = c(2,4,5,2,3,2,1,2,4,5,2,1))
With %in%:
df2[df2$ID %in% df1$ID, ]
ID code class
1 G 1 2
2 F 2 4
4 F 3 5
5 B 3 2
6 A 1 3
7 F 2 2
9 A 1 1
10 B 1 2
11 A 3 4
12 B 2 5
14 A 1 2
15 G 1 1
You can use the 'intersect' function to tackle the issue.
common_ids <- intersect(df1$ID, df2$ID)
df3 <- df2[df2$ID %in% common_ids, ]
ID code class
1 G 1 2
2 F 2 4
4 F 3 5
5 B 3 2
6 A 1 3
7 F 2 2
9 A 1 1
10 B 1 2
11 A 3 4
12 B 2 5
14 A 1 2
15 G 1 1
I want to throw semi_join in.
df_test <- df2 |> semi_join(df1, by = "ID")
all.equal(df3, df_test)
#> [1] TRUE

Reformating input data

I have a data with two columns (edge file) with representing vertex ids and there connections as
I need to reformat it, i.e. vertex ids need to be consecutive and starting with one like this
Can anyone suggest how to do it automatically?
Also, I would need conversion table with both original and new ids.
Your support is appreciated.
Here is another approach which uses factor() for renumbering:
# reshape from wide to long format using row numbers
tmp <- melt(setDT(DT)[, rn := .I], "rn", = "old")[
# create new ids from factor levels
, new := as.integer(factor(old))][]
# reshape back to wide format again
dcast(tmp, rn ~ variable, value.var = "new")[, -"rn"]
v1 v2
1: 1 2
2: 1 4
3: 1 6
4: 2 3
5: 2 4
6: 2 6
7: 4 5
8: 4 6
9: 5 6
10: 6 7
The translation table can be created by
tmp[, unique(.SD), .SDcols = c("old", "new")]
old new
1: 23732 1
2: 23778 2
3: 23871 4
4: 58009 5
5: 58098 6
6: 23824 3
7: 58256 7
In order to reproduce exactly OP's new id numbering we need to rearrange factor levels using the fct_inorder() function from the forcats package:
tmp <- melt(DT[, rn := .I], "rn", = "old")[
order(rn, variable), new := as.integer(forcats::fct_inorder(factor(old)))][]
dcast(tmp, rn ~ variable, value.var = "new")[, -"rn"]
v1 v2
1: 1 2
2: 1 3
3: 1 4
4: 2 5
5: 2 3
6: 2 4
7: 3 6
8: 3 4
9: 6 4
10: 4 7
Then, the translation becomes
old new
1: 23732 1
2: 23778 2
3: 23871 3
4: 58009 6
5: 58098 4
6: 23824 5
7: 58256 7
DT <- fread(
This isn't quite what you asked for, as I sorted the node names before assigning IDs.
What I chose to do is get all of the unique node IDs, sort them, and assign them each to an integer.
df <- structure(list(v1 = c(23732L, 23732L, 23732L, 23778L, 23778L,
23778L, 23871L, 23871L, 58009L, 58098L), v2 = c(23778L, 23871L,
58098L, 23824L, 23871L, 58098L, 58009L, 58098L, 58098L, 58256L
)), .Names = c("v1", "v2"), class = "data.frame", row.names = c(NA,
# Put nodes in ascending order
df <- df[order(df$v1, df$v2), ]
# create a mapping of node number to node ID (as a vector)
# All unique nodes between the two columns, sorted
node_names <- sort(unique(c(df$v1, df$v2)))
# a vector of integers from 1 to length(node_names)
node_id <- seq_along(node_names)
# assign (map) the node names to the integer values
names(node_id) <- node_names
# Add the node IDs to df
df$v1_id <- node_id[as.character(df$v1)]
df$v2_id <- node_id[as.character(df$v2)]
v1 v2 v1_id v2_id
1 23732 23778 1 2
2 23732 23871 1 4
3 23732 58098 1 6
4 23778 23824 2 3
5 23778 23871 2 4
6 23778 58098 2 6
7 23871 58009 4 5
8 23871 58098 4 6
9 58009 58098 5 6
10 58098 58256 6 7

Removing character from dataframe

I have this simple code, which generates a data frame. I want to remove the V character from the middle column. Is there any simple way to do that?
Here is a test code (the actual code is very long), very similar with the actual code.
mat1=matrix(c(1,2,3,4,5,"V1","V2","V3","V4","V5",1,2,3,4,5), ncol=3)
This is the data frame:
x row y
1 1 V1 1
2 2 V2 2
3 3 V3 3
4 4 V4 4
5 5 V5 5
I just want to remove the V's like this:
x row y
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
We can use str_replace from stringr
mat$row <- str_replace(mat$row, "V", "")

The which command is returning an error, what is an alternative?

I have 2 data frames
D1 = V1 V2 V3 V4
1 2 3 4
2 3 4 5
3 5 4 2
D2 = V1 V2 V3
1 2 3
3 5 4
I am trying to match the two data frames and extract index of row D2 which matches with that of D1 using which but getting the error
Error in : ‘==’ only defined for equally-sized data frames
(but if I write the equation separately as ,
which(D2[,1]==D1[3,1] & D2[,1]==D1[3,2] & D2[,1]==D1[3,3])
there is no problem but I want to generalise it)
Please suggest some alternative.
This does the trick:
which(apply(D2, 1, function(x) all(D1[3,1:3] == x)))
[1] 2
D1 <- read.table(text="V1 V2 V3 V4
1 2 3 4
2 3 4 5
3 5 4 2", header=T)
D2 <- read.table(text="V1 V2 V3
1 2 3
3 5 4", header=T)

melt data frame and split values

I have the following data frame with measurements concatenated into a single column, separated by some delimiter:
df <- data.frame(v1=c(1,2), v2=c("a;b;c", "d;e;f"))
v1 v2
1 1 a;b;c
2 2 d;e;f;g
I would like to melt/transforming it into the following format:
v1 v2
1 1 a
2 1 b
3 1 c
4 2 d
5 2 e
6 2 f
7 2 g
Is there an elegant solution?
You can split the strings with strsplit.
Split the strings in the second column:
splitted <- strsplit(as.character(df$v2), ";")
Create a new data frame:
data.frame(v1 =$v1, sapply(splitted, length)), v2 = unlist(splitted))
The result:
v1 v2
1 1 a
2 1 b
3 1 c
4 2 d
5 2 e
6 2 f
