wrong labels on a directed graph in R - r

(Hi,
I have yet another question in R and I do not know what I am doing wrong. In this thread I have asked how to read the directed graph which worked well with the answer of user1317221_G.
Now I've deleted the edge 6->7 from the directed graph and read it that way:
library(igraph)
graph2 <- read.table("Graph_2.txt")
graph2 <- graph.data.frame(graph2)
That's how Graph_2.txt looks like:
1 2
1 3
2 5
3 4
3 5
4 5
5 6
5 10
7 8
7 9
7 12
8 9
9 10
9 11
9 12
10 7
10 11
11 7
11 12
But the plot shows (again, like in the other thread) a different directed graph:
As you can see in the file, there is no edge between 5->9 or from 10->12 as an example. So my question, again, is, how can I read the directed graph correctly? What am I doing wrong?
Thank you!

You can set the vertices labels as you create the graph using graph.data.frame, via its vertices option:
graph2 <- graph.data.frame(graph2, vertices = data.frame(symbols = 1:12,
label = 1:12))
plot(graph2, layout = layout.fruchterman.reingold)

Related

Color nodes in directed network in R

I have a directed network with only two type of nodes, A and B.The direction is always from any given A, to any given B. No other direction is possible.
Edge list looks like this:
edges <- read.table(text = "
from to weight
1 6 1.2
3 7 1.4
4 6 1.2
1 7 1.2
2 8 1.2
1 9 1.2
5 10 1.2 ", header=T )
Nodes list looks like this:
nodes
id
1 1
2 1
3 3
4 4
5 5
6 6
7 7
8 B
9 9
10 10
The graph is created using the igraph package.
g <- graph_from_data_frame(d = edges, vertices=nodes, directed = TRUE)
Is it possible to color nodes based on whether they are from or to in the edgeslist, without adding other variables/labels to the nodeslist?
(I tried coloring nodes like so, but realized it does not make much sense)
plot(g, vertex.color=V(g$edges=='from'))
I am not 100% sure, but I think what you are looking for doesn't really exist. vertex.color needs a vector of colors, one color for each of the vertices.
In the meantime, as a workaround, you can use the output of degree to select vertices with in (or out) degree of 0 or higher:
plot(g,
vertex.color=ifelse(degree(g, mode = "out")>0, "red", "black"),
size=15)

Frequency distribution using binCounts

I have a dataset of Ages for the customer and I wanted to make a frequency distribution by 9 years of a gap of age.
Ages=c(83,51,66,61,82,65,54,56,92,60,65,87,68,64,51,
70,75,66,74,68,44,55,78,69,98,67,82,77,79,62,38,88,76,99,
84,47,60,42,66,74,91,71,83,80,68,65,51,56,73,55)
My desired outcome would be similar to below-shared table, variable names can be differed(as you wish)
Could I use binCounts code into it ? if yes could you help me out using the code as not sure of bx and idxs in this code?
binCounts(x, idxs = NULL, bx, right = FALSE) ??
Age Count
38-46 3
47-55 7
56-64 7
65-73 14
74-82 10
83-91 6
92-100 3
Much Appreciated!
I don't know about the binCounts or even the package it is in but i have a bare r function:
data.frame(table(cut(Ages,0:7*9+37)))
Var1 Freq
1 (37,46] 3
2 (46,55] 7
3 (55,64] 7
4 (64,73] 14
5 (73,82] 10
6 (82,91] 6
7 (91,100] 3
To exactly duplicate your results:
lowerlimit=c(37,46,55,64,73,82,91,101)
Labels=paste(head(lowerlimit,-1)+1,lowerlimit[-1],sep="-")#I add one to have 38 47 etc
group=cut(Ages,lowerlimit,Labels)#Determine which group the ages belong to
tab=table(group)#Form a frequency table
as.data.frame(tab)# transform the table into a dataframe
group Freq
1 38-46 3
2 47-55 7
3 56-64 7
4 65-73 14
5 74-82 10
6 83-91 6
7 92-100 3
All this can be combined as:
data.frame(table(cut(Ages,s<-0:7*9+37,paste(head(s+1,-1),s[-1],sep="-"))))

making a new dataframe by looking for keywords in specific variable

I have a big dataset of about 35000 cases X 32 variables
one of those variables is Description in which a description of status is given. for example: patient suffered ischemic stroke.
Now I would like to make a dataframe in which I place all cases in which the word "stroke", "STROKE" or "Stroke" is found in the variable Description.
Could anyone suggest a efficient way to do this. Because now I just added all by hand in a very inefficient way:
df1<-rbind(df[1,],df[2,],df[3,]
It works but it's unbelievably inelegant and prone to mistakes.
Here I create some example data to work with.
a <- c(1:10)
b <- c(11:20)
description <- c("Stroke","ALS","Parkinsons","STROKE","STROKE","stroke","Alzheimers","Stroke","ALS","Parkinsons")
df<-data.frame(a,b,description)
df
a b description
1 1 11 Stroke
2 2 12 ALS
3 3 13 Parkinsons
4 4 14 STROKE
5 5 15 STROKE
6 6 16 stroke
7 7 17 Alzheimers
8 8 18 Stroke
9 9 19 ALS
10 10 20 Parkinsons
With this code you can remove every case (row) that is not associated with "Stroke", "STROKE" or "stroke":
df1<-df[!(df$description!="STROKE" & df$description!="Stroke" & df$description!="stroke"),]
df1
a b description
1 1 11 Stroke
4 4 14 STROKE
5 5 15 STROKE
6 6 16 stroke
8 8 18 Stroke
Hope this was what you were looking for.

How to show the cluster assignment in each cluster

Is there a way to show the the member in a cluster after cutree step in R?
for example:
tree <- hclust(dist, method='single')
plot(tree, hang=-1, cex=0.8)
cutree(tree, h=18)
I obtain sth like:
X10100 X3755 X13068 X264 X13216
1 1 2 2 3
X8379 X13727 X9925 X13849 X467
3 4 4 5 5
X14265 X388 X14426 X8246 X14961
6 6 7 7 8
X17037 X1200 X844 X13024 X155
8 9 9 10 11
I want to see/print it as a more straightforward way
such as:
cluster 1: 10100,03755
cluster 2: ..........
How can I do it? Thanks!
You can group the results using split or by :
hh <- cutree(tree, h=18)
split(names(hh),hh)
Or
by(names(hh),hh,paste,collapse=',')

Frequency distribution with custom format data

I need help with a R plot, with a data format I have not worked with before. Please help if you know.
NUMBER FREQUENCY
10 1
11 1
12 3
10 45
11 2
12 3
i need a bar plot with numbers on X axis (continuous, not bins in histogram) and frequency on Y, but combined.
like
10 46
11 3
12 6
it seems simple enough, but i have 10,000 rows and large numbers in real data so I am looking for a good solution in R without doing it manually.
What about:
##tapply splits dd$FREQ by dd$NUM and "sums" them
barplot(tapply(dd$FREQUENCY, dd$NUMBER, sum))
to get:
Read in your data:
dd = read.table(textConnection("NUMBER FREQUENCY
10 1
11 1
12 3
10 45
11 2
12 3"), header=TRUE)

Resources