How to append the column in R? - r

Consider the following data named mydata. My intention is to put v1 and v2 in the same column by adding an identifier variable v4.
id v1 v2
1 2 3
2 4 5
3 7 8
OUTPUT required:
id v3 v4
1 2 1
2 4 1
3 7 1
1 3 2
2 5 2
3 8 2
Any help is much appreciated!

I think you are looking for something like dplyr::mutate() for adding columns, and rbind() for stacking two data frames on top of each other.
library(dplyr)
mydata <- data.frame (id = c(1,2,3),
v1 = c(2,4,7),
v2 = c(3,5,8))
)
a<- data.frame(mydata$id, mydata$v1)%>%
mutate(v4=1)%>%
rename(v3=mydata.v1, id=mydata.id )
b<- data.frame(mydata$id, mydata$v2)%>%
mutate(v4=2)%>%
rename(v3=mydata.v2, id=mydata.id )
> rbind(a,b)
id v3 v4
1 1 2 1
2 2 4 1
3 3 7 1
4 1 3 2
5 2 5 2
6 3 8 2

What about this:
mydata <- data.frame(c(1,2,3),c(2,4,7),c(3,5,8))
colnames(mydata) <- c("id","v1","v2")
mydata_2 <- rbind(mydata[,c(1,2)], setNames(mydata[,c(1,3)], names(mydata[,c(1,2)])))
mydata_2$v4 <- c(rep(1,length(mydata$v1)),rep(2,length(mydata$v2)))
colnames(mydata_2) <- c("id","v3","v4")

A data.table option
setcolorder(
transform(
setnames(melt(setDT(df), id.var = "id", variable.name = "v4"), "value", "v3"),
v4 = as.numeric(factor(v4))
), c("id", "v3", "v4")
)[]
gives
id v3 v4
1: 1 2 1
2: 2 4 1
3: 3 7 1
4: 1 3 2
5: 2 5 2
6: 3 8 2

Related

How to find duplicated values in two columns between two dataframes and remove non-duplicates in R?

So let's say I have two dataframes that look like this
df1 <- data.frame(ID = c("A","B","F","G","B","B","A","G","G","F","A","A","A","B","F"),
code = c(1,2,2,3,3,1,2,2,1,1,3,2,2,1,1),
class = c(2,4,5,5,2,3,2,5,1,2,4,5,3,2,1))
df2 <- data.frame(ID = c("G","F","C","F","B","A","F","C","A","B","A","B","C","A","G"),
code = c(1,2,2,3,3,1,2,2,1,1,3,2,2,1,1),
class = c(2,4,5,5,2,3,2,5,1,2,4,5,3,2,1))
I want to check the duplicates in df1$ID and df2$ID and remove all the rows from df2 if the IDs are not present in df1 so the new dataframe would look like this:
df3 <- data.frame(ID = c("G","F","F","B","A","F","A","B","A","B","A","G"),
code = c(1,2,3,3,1,2,1,1,3,2,1,1),
class = c(2,4,5,2,3,2,1,2,4,5,2,1))
With %in%:
df2[df2$ID %in% df1$ID, ]
ID code class
1 G 1 2
2 F 2 4
4 F 3 5
5 B 3 2
6 A 1 3
7 F 2 2
9 A 1 1
10 B 1 2
11 A 3 4
12 B 2 5
14 A 1 2
15 G 1 1
You can use the 'intersect' function to tackle the issue.
common_ids <- intersect(df1$ID, df2$ID)
df3 <- df2[df2$ID %in% common_ids, ]
ID code class
1 G 1 2
2 F 2 4
4 F 3 5
5 B 3 2
6 A 1 3
7 F 2 2
9 A 1 1
10 B 1 2
11 A 3 4
12 B 2 5
14 A 1 2
15 G 1 1
I want to throw semi_join in.
library(tidyverse)
df_test <- df2 |> semi_join(df1, by = "ID")
all.equal(df3, df_test)
#> [1] TRUE

Reformating input data

I have a data with two columns (edge file) with representing vertex ids and there connections as
v1,v2
23732,23778
23732,23871
23732,58098
23778,23824
23778,23871
23778,58098
23871,58009
23871,58098
58009,58098
58098,58256
I need to reformat it, i.e. vertex ids need to be consecutive and starting with one like this
v1,v2
1,2
1,3
1,4
2,5
2,3
2,4
3,5
3,4
5,4
4,6
Can anyone suggest how to do it automatically?
Also, I would need conversion table with both original and new ids.
Your support is appreciated.
Here is another approach which uses factor() for renumbering:
library(data.table)
# reshape from wide to long format using row numbers
tmp <- melt(setDT(DT)[, rn := .I], "rn", value.name = "old")[
# create new ids from factor levels
, new := as.integer(factor(old))][]
# reshape back to wide format again
dcast(tmp, rn ~ variable, value.var = "new")[, -"rn"]
v1 v2
1: 1 2
2: 1 4
3: 1 6
4: 2 3
5: 2 4
6: 2 6
7: 4 5
8: 4 6
9: 5 6
10: 6 7
The translation table can be created by
tmp[, unique(.SD), .SDcols = c("old", "new")]
old new
1: 23732 1
2: 23778 2
3: 23871 4
4: 58009 5
5: 58098 6
6: 23824 3
7: 58256 7
In order to reproduce exactly OP's new id numbering we need to rearrange factor levels using the fct_inorder() function from the forcats package:
tmp <- melt(DT[, rn := .I], "rn", value.name = "old")[
order(rn, variable), new := as.integer(forcats::fct_inorder(factor(old)))][]
dcast(tmp, rn ~ variable, value.var = "new")[, -"rn"]
v1 v2
1: 1 2
2: 1 3
3: 1 4
4: 2 5
5: 2 3
6: 2 4
7: 3 6
8: 3 4
9: 6 4
10: 4 7
Then, the translation becomes
old new
1: 23732 1
2: 23778 2
3: 23871 3
4: 58009 6
5: 58098 4
6: 23824 5
7: 58256 7
Data
library(data.table)
DT <- fread(
"v1,v2
23732,23778
23732,23871
23732,58098
23778,23824
23778,23871
23778,58098
23871,58009
23871,58098
58009,58098
58098,58256"
)
This isn't quite what you asked for, as I sorted the node names before assigning IDs.
What I chose to do is get all of the unique node IDs, sort them, and assign them each to an integer.
df <- structure(list(v1 = c(23732L, 23732L, 23732L, 23778L, 23778L,
23778L, 23871L, 23871L, 58009L, 58098L), v2 = c(23778L, 23871L,
58098L, 23824L, 23871L, 58098L, 58009L, 58098L, 58098L, 58256L
)), .Names = c("v1", "v2"), class = "data.frame", row.names = c(NA,
-10L))
# Put nodes in ascending order
df <- df[order(df$v1, df$v2), ]
# create a mapping of node number to node ID (as a vector)
# All unique nodes between the two columns, sorted
node_names <- sort(unique(c(df$v1, df$v2)))
# a vector of integers from 1 to length(node_names)
node_id <- seq_along(node_names)
# assign (map) the node names to the integer values
names(node_id) <- node_names
# Add the node IDs to df
df$v1_id <- node_id[as.character(df$v1)]
df$v2_id <- node_id[as.character(df$v2)]
df
v1 v2 v1_id v2_id
1 23732 23778 1 2
2 23732 23871 1 4
3 23732 58098 1 6
4 23778 23824 2 3
5 23778 23871 2 4
6 23778 58098 2 6
7 23871 58009 4 5
8 23871 58098 4 6
9 58009 58098 5 6
10 58098 58256 6 7

Removing character from dataframe

I have this simple code, which generates a data frame. I want to remove the V character from the middle column. Is there any simple way to do that?
Here is a test code (the actual code is very long), very similar with the actual code.
mat1=matrix(c(1,2,3,4,5,"V1","V2","V3","V4","V5",1,2,3,4,5), ncol=3)
mat=as.data.frame(mat1)
colnames(mat)=c("x","row","y")
mat
This is the data frame:
x row y
1 1 V1 1
2 2 V2 2
3 3 V3 3
4 4 V4 4
5 5 V5 5
I just want to remove the V's like this:
x row y
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
We can use str_replace from stringr
library(stringr)
mat$row <- str_replace(mat$row, "V", "")

The which command is returning an error, what is an alternative?

I have 2 data frames
D1 = V1 V2 V3 V4
1 2 3 4
2 3 4 5
3 5 4 2
D2 = V1 V2 V3
1 2 3
3 5 4
I am trying to match the two data frames and extract index of row D2 which matches with that of D1 using which but getting the error
which(D2[,1:3]==D1[3,1:3])
Error in Ops.data.frame : ‘==’ only defined for equally-sized data frames
(but if I write the equation separately as ,
which(D2[,1]==D1[3,1] & D2[,1]==D1[3,2] & D2[,1]==D1[3,3])
there is no problem but I want to generalise it)
Please suggest some alternative.
This does the trick:
which(apply(D2, 1, function(x) all(D1[3,1:3] == x)))
[1] 2
Data:
D1 <- read.table(text="V1 V2 V3 V4
1 2 3 4
2 3 4 5
3 5 4 2", header=T)
D2 <- read.table(text="V1 V2 V3
1 2 3
3 5 4", header=T)

melt data frame and split values

I have the following data frame with measurements concatenated into a single column, separated by some delimiter:
df <- data.frame(v1=c(1,2), v2=c("a;b;c", "d;e;f"))
df
v1 v2
1 1 a;b;c
2 2 d;e;f;g
I would like to melt/transforming it into the following format:
v1 v2
1 1 a
2 1 b
3 1 c
4 2 d
5 2 e
6 2 f
7 2 g
Is there an elegant solution?
Thx!
You can split the strings with strsplit.
Split the strings in the second column:
splitted <- strsplit(as.character(df$v2), ";")
Create a new data frame:
data.frame(v1 = rep.int(df$v1, sapply(splitted, length)), v2 = unlist(splitted))
The result:
v1 v2
1 1 a
2 1 b
3 1 c
4 2 d
5 2 e
6 2 f

Resources