I have an R data frame that looks like this:
z = as.data.frame(list(Col1=c("a","c","e","g"),Col2=c("b","d","f","h"),Col3=c("1,2,5","3,5,7","9,8","1")))
> z
Col1 Col2 Col3
1 a b 1,2,5
2 c d 3,5,7
3 e f 9,8
4 g h 1
(The third column is a text column with comma-separated values.) I would like to convert it to a data frame like this:
a b 1
a b 2
a b 5
c d 3
c d 5
c d 7
e f 9
e f 8
g h 1
Can anyone suggest a way to accomplish this using apply? I'm close using the command below but it's not quite right. Any suggestions on more efficient ways to do this would be appreciated as well...
> apply(z,1,function(a){ids=strsplit(as.character(a[3]),",")[[1]];out<-c();for(id in ids){out<-rbind(out,c(a[1:2],id))};return(out)})
[[1]]
Col1 Col2
[1,] "a" "b" "1"
[2,] "a" "b" "2"
[3,] "a" "b" "5"
[[2]]
Col1 Col2
[1,] "c" "d" "3"
[2,] "c" "d" "5"
[3,] "c" "d" "7"
[[3]]
Col1 Col2
[1,] "e" "f" "9"
[2,] "e" "f" "8"
[[4]]
Col1 Col2
[1,] "g" "h" "1"
You can use ddply.
library(plyr)
ddply(z, c("Col1", "Col2"), summarize,
Col3=strsplit(as.character(Col3),",")[[1]]
)
With reshapeor reshape2
require(reshape2)
merge(cbind(z[,-3], L1=rownames(z)), melt(strsplit(as.character(z$Col3),",")))
gives
L1 Col1 Col2 value
1 1 a b 1
2 1 a b 2
3 1 a b 5
4 2 c d 3
5 2 c d 5
6 2 c d 7
7 3 e f 9
8 3 e f 8
9 4 g h 1
Related
I have a below stdin input and I am trying to convert this input to a list.
Input
input <- suppressWarnings(readLines(stdin(), n=31))
8 30
a s 3
b s 5
s a 3
b a 1
c a 10
d a 11
s b 5
a b 3
c b 2
d b 3
a c 10
b c 2
d c 3
e c 7
f c 12
a d 15
b d 7
c d 2
e d 11
f d 2
c e 7
d e 11
f e 3
z e 2
c f 12
d f 2
e f 3
z f 2
e z 2
f z 2
Line 1 first value denotes total number of alphabets , Second value denotes total number of rows.
From Line 2 to Line n. First value denotes starting node , second is ending node and third is cost.
I want to group the alphabets and cost as a list in below manner.
Expected output
> alphabets
$s
[1] "a" "b"
$a
[1] "s" "b" "c" "d"
$b
[1] "s" "a" "c" "d"
$c
[1] "a" "b" "d" "e" "f"
$d
[1] "a" "b" "c" "e" "f"
$e
[1] "c" "d" "f" "z"
$f
[1] "c" "d" "e" "z"
$z
[1] "e" "f"
> cost
$s
[1] 3 5
$a
[1] 3 1 10 11
$b
[1] 5 3 2 3
$c
[1] 10 2 3 7 12
$d
[1] 15 7 2 11 2
$e
[1] 7 11 3 2
$f
[1] 12 2 3 2
$z
[1] 2 2
Any suggestions from where to start.?
Does this give you what you want? I convert your input to a data.frame and the split based on your second column. The output of this differs slightly from yours since split will sort. If you do not want that, you can order the output based on the input.
df <- read.table(textConnection(input[-1]))
alphabets <- split(df$V1, df$V2)
cost <- split(df$V3, df$V2)
# you can do this to reorder how you had it
order <- unique(df$V2)
alphabets[order]
cost[order]
not sure if this is possible but it should. i want to have a matrix which elements have names just like you can do in a vector like this:
v = 1:10
names(v) = LETTERS[1:10]
result:
A B C D E F G H I J
1 2 3 4 5 6 7 8 9 10
I've tried to create a matrix and use the same sintax:
m = matrix(v, ncol=2, nrow=5)
names(m) = letters[1:8]
but the result is not what i hoped for.
result:
[,1] [,2] [,3] [,4] [,5]
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
attr(,"names")
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
I dont want it to be two separated entities. is there a way to do this without any libraries? or at all?
Thank you
I have a table
rawData <- as.data.frame(matrix(c(1,2,3,4,5,6,"a,b,c","d,e","f"),nrow=3,ncol=3))
1 4 a,b,c
2 5 d,e
3 6 f
I would like to convert to
1 2 3
4 5 6
a d f
b e
c
so far I can transpose and split the third column, however, I'm lost as to how to reconstruct a new table with the format outline above?
new = t(rawData)
for (e in 1:ncol(new)){
s<-strsplit(new[3:3,e], split=",")
print(s)
}
I tried creating new vectors for each iteration but I'm not sure how to efficiently put each one back into a dataframe. Would be grateful for any help. thanks!
You can use stri_list2matrix from the stringi package:
library(stringi)
rawData <- as.data.frame(matrix(c(1,2,3,4,5,6,"a,b,c","d,e","f"),nrow=3,ncol=3),stringsAsFactors = F)
d1 <- t(rawData[,1:2])
rownames(d1) <- NULL
d2 <- stri_list2matrix(strsplit(rawData$V3,split=','))
rbind(d1,d2)
# [,1] [,2] [,3]
# [1,] "1" "2" "3"
# [2,] "4" "5" "6"
# [3,] "a" "d" "f"
# [4,] "b" "e" NA
# [5,] "c" NA NA
You can also use cSplit from my "splitstackshape" package.
By default, it just creates additional columns after splitting the input:
library(splitstackshape)
cSplit(rawData, "V3")
# V1 V2 V3_1 V3_2 V3_3
# 1: 1 4 a b c
# 2: 2 5 d e NA
# 3: 3 6 f NA NA
You can just transpose that to get your desired output.
t(cSplit(rawData, "V3"))
# [,1] [,2] [,3]
# V1 "1" "2" "3"
# V2 "4" "5" "6"
# V3_1 "a" "d" "f"
# V3_2 "b" "e" NA
# V3_3 "c" NA NA
How can I get all the combinations of a list with duplicates. By duplicates I mean an element with itself. I am building a symmetric matrix.
names.list<-c("A","B","C")
as.data.frame(t(combn(names.list,2)))
Result is:
V1 V2
1 A B
2 A C
3 B C
When I want:
V1 V2
1 A A
2 A B
3 A C
4 B B
5 B C
6 C C
Or even:
V1 V2
1 A A
2 A B
3 A C
4 B A
5 B B
6 B C
7 C A
8 C B
9 C C
But my matrices are large so I would like to keep combinations to a minimum (so preferably the second result), since more combinations = more computations = larger run times..
Thanks.
It sounds like you're looking for expand.grid instead of combn:
expand.grid(names.list, names.list)
# Var1 Var2
# 1 A A
# 2 B A
# 3 C A
# 4 A B
# 5 B B
# 6 C B
# 7 A C
# 8 B C
# 9 C C
Update
There's also combinations from "gtools" which would give you your preferred output.
library(gtools)
combinations(3, 2, names.list, repeats = TRUE)
# [,1] [,2]
# [1,] "A" "A"
# [2,] "A" "B"
# [3,] "A" "C"
# [4,] "B" "B"
# [5,] "B" "C"
# [6,] "C" "C"
The relationship is expressed as a matrix x like this:
A B C D
A 0 2 1 1
B 2 0 1 0
C 1 1 0 1
D 1 0 1 0
The entries refer to the number of connections they have.
Could anyone show me how to write it as an edge list?
I would prefer to write it as an edge list:
A B
A B
A C
A D
B C
But would this edge list allow me to create a network plot?
Using the igraph package:
x <- matrix(c(0,2,1,1,2,0,1,0,1,1,0,1,1,0,1,0), 4, 4)
rownames(x) <- colnames(x) <- LETTERS[1:4]
library(igraph)
g <- graph.adjacency(x)
get.edgelist(g)
# [,1] [,2]
# [1,] "A" "B"
# [2,] "A" "B"
# [3,] "A" "C"
# [4,] "A" "D"
# [5,] "B" "A"
# [6,] "B" "A"
# [7,] "B" "C"
# [8,] "C" "A"
# [9,] "C" "B"
# [10,] "C" "D"
# [11,] "D" "A"
# [12,] "D" "C"
I would also recommend you spend some time reading the igraph documentation at http://igraph.sourceforge.net/index.html since a lot of your recent questions are all simple case usages.
(As a bonus, plot(g) will answer your other question How to plot relationships in R?)
using melt in reshape2, and then delete the weight==0. if no need to print the weight. just delete it.
x
sample1 sample2 sample3 sample4
feature1 0 2 1 1
feature2 2 0 1 0
feature3 1 1 0 1
feature4 1 0 1 0
melt(x)
Var1 Var2 value
1 feature1 sample1 0
2 feature2 sample1 2
3 feature3 sample1 1
4 feature4 sample1 1
5 feature1 sample2 2
Try this
M <- matrix( c(0,2,1,1,2,0,1,0,1,1,0,1,1,0,1,0), 4, 4, dimnames=list(c("A","B","C","D"), c("A","B","C","D")))
eList <- NULL
for ( i in 1:nrow(M) ){
for ( j in 1:ncol(M)) {
eList <- c(eList, rep(paste(dimnames(M)[[1]][i], dimnames(M)[[2]][j] ), M[i,j]))
}
}
Output
> eList
[1] "A B" "A B" "A C" "A D" "B A" "B A" "B C" "C A" "C B" "C D" "D A" "D C"