Vertex Labels in igraph with R - r

I have some issue while adding vertex labels in a weighted igraph working with R.
The data frame of the graph is:
df <- read.table(text=
"From, To, Weight
A,B,1
B,C,2
B,F,3
C,D,5
B,F,4
C,D,6
D,E,7
E,B,8
E,B,9
E,C,10
E,F,11", sep=',',header=TRUE)
# From To Weight
# 1 A B 1
# 2 B C 2
# 3 B F 3
# 4 C D 5
# 5 B F 4
# 6 C D 6
# 7 D E 7
# 8 E B 8
# 9 E B 9
# 10 E C 10
# 11 E F 11
and I use :
g<-graph.data.frame(df,directed = TRUE)
plot(g)
to plot the following graph :
One can see that vertex labels (for example) from E to B are superimposed.
(The same problem appears for vertex C-D and vertex B-F)
I'd like to know how to separate these labels so as to have each
different weight on each vertex ?

try the qgraph package. qgraph builds on igraph and does a lot of stuff for you in the background.
install.packages('qgraph')
require(qgraph)
qgraph(df,edge.labels=T)
Hope this helps.

Related

Creating an identifier using pairs of row indices [duplicate]

I would like to generate indices to group observations based on two columns. But I want groups to be made of observation that share, at least one observation in commons.
In the data below, I want to check if values in 'G1' and 'G2' are connected directly (appear on the same row), or indirectly via other intermediate values. The desired grouping variable is shown in 'g'.
For example, A is directly linked to Z (row 1) and X (row 2). A is indirectly linked to 'B' via X (A -> X -> B), and further linked to Y via X and B (A -> X -> B -> Y).
dt <- data.frame(id = 1:10,
G1 = c("A","A","B","B","C","C","C","D","E","F"),
G2 = c("Z","X","X","Y","W","V","U","s","T","T"),
g = c(1,1,1,1,2,2,2,3,4,4))
dt
# id G1 G2 g
# 1 1 A Z 1
# 2 2 A X 1
# 3 3 B X 1
# 4 4 B Y 1
# 5 5 C W 2
# 6 6 C V 2
# 7 7 C U 2
# 8 8 D s 3
# 9 9 E T 4
# 10 10 F T 4
I tried with group_indices from dplyr, but haven't managed it.
Using igraph get membership, then map on names:
library(igraph)
# convert to graph, and get clusters membership ids
g <- graph_from_data_frame(df1[, c(2, 3, 1)])
myGroups <- components(g)$membership
myGroups
# A B C D E F Z X Y W V U s T
# 1 1 2 3 4 4 1 1 1 2 2 2 3 4
# then map on names
df1$group <- myGroups[df1$G1]
df1
# id G1 G2 group
# 1 1 A Z 1
# 2 2 A X 1
# 3 3 B X 1
# 4 4 B Y 1
# 5 5 C W 2
# 6 6 C V 2
# 7 7 C U 2
# 8 8 D s 3
# 9 9 E T 4
# 10 10 F T 4

quanteda::dfm_lookup(): capture found term

I would like to perform the amazing quanteda's dfm_lookup() on a dictionary but also retrieve the matches.
Consider the following example:
dict_ex <- dictionary(list(christmas = c("Christmas", "Santa", "holiday"),
opposition = c("Opposition", "reject", "notincorpus"),
taxglob = "tax*",
taxregex = "tax.+$",
country = c("United_States", "Sweden")))
dfmat_ex <- dfm(tokens(c("My Christmas was ruined by your opposition tax plan.",
"Does the United_States or Sweden have more progressive taxation?")),
remove = stopwords("english"))
dfmat_ex
dfm_lookup(dfmat_ex, dict_ex)
This gives me:
Document-feature matrix of: 2 documents, 5 features (50.00% sparse) and 0 docvars.
features
docs christmas opposition taxglob taxregex country
text1 1 1 1 0 0
text2 0 0 1 0 2
However, since every dictionary tool also has multiple entries, I would like to know which token produced the match. (My real dictionary is rather long, so the example might seem trivial but for the real use case, it is not.)
I would like to achieve a result like this:
Document-feature matrix of: 2 documents, 5 features (50.00% sparse) and 0 docvars.
features
docs christmas christmas.match opposition opposition.match taxglob taxglob.match taxregex taxreg.match country country.match
text1 1 Christmas 1 Opposition 1 tax 0 NA 0 NA
text2 0 NA 0 NA 1 taxation 0 NA 2 United_States, Sweden
Can someone help me with this? Many thanks in advance! :)
That's not really possible for two reasons.
First, a matrix(-like) object (dfm or otherwise) cannot mix element modes, here a mixture of counts and character values. This would be possible with a data.frame but then you lose the advantages of sparsity, and here, you would have a n x 2*V (where V = number of features) data.frame dimensions.
Second, "christmas.match" could have more than one feature/token matching it, so the character value would require a list, straining the object class even further.
A better way would be to use kwic() to match the tokens to the patterns formed by the dictionary. You can do this for the keys by supplying the dictionary as pattern(), or unlisting the dictionary to get matches for each value.
library("quanteda")
## Package version: 3.1
## Unicode version: 13.0
## ICU version: 69.1
## Parallel computing: 12 of 12 threads used.
## See https://quanteda.io for tutorials and examples.
dict <- dictionary(list(one = c("a*", "b"), two = c("e", "f")))
toks <- tokens(c(d1 = "a b c d e f g and another"))
# where the dictionary keys are the patterns matched
kwic(toks, dict) %>%
as.data.frame()
## docname from to pre keyword post pattern
## 1 d1 1 1 a b c d e f one
## 2 d1 2 2 a b c d e f g one
## 3 d1 5 5 a b c d e f g and another two
## 4 d1 6 6 a b c d e f g and another two
## 5 d1 8 8 c d e f g and another one
## 6 d1 9 9 d e f g and another one
# where the dictionary values are the patterns matched
kwic(toks, unlist(dict)) %>%
as.data.frame()
## docname from to pre keyword post pattern
## 1 d1 1 1 a b c d e f a*
## 2 d1 2 2 a b c d e f g b
## 3 d1 5 5 a b c d e f g and another e
## 4 d1 6 6 a b c d e f g and another f
## 5 d1 8 8 c d e f g and another a*
## 6 d1 9 9 d e f g and another a*

Creating a table for people in a column who reach a minimum assigned value?

I'm new to R and I'm struggling with a particular problem.
I have a data frame of people hailing from different locations, and I'm trying to create a table from that data of regions that have at least 3 people hailing from it.
I've created a table and sorted it in order of increasing numerical value
sort(table(GCBers$Location), decreasing = FALSE)
but now I'm stuck on how to only return locations that have at least 3 people hailing from it. Can anyone help?
Does this do what you want? We need to create a simple data set that resembles yours:
set.seed(42)
x <- sample(LETTERS[1:10], 50, replace=TRUE)
xtable <- sort(table(x), decreasing=FALSE)
xtable
# x
# F G C I A E H B J D
# 2 2 3 4 5 6 6 7 7 8
xtable[xtable > 2]
# x
# C I A E H B J D
# 3 4 5 6 6 7 7 8

Creating a new variable column based on data from another column

I'm pretty new to R, and programming in general, and I'm wondering the best way to loop through a column so I can add a column to the data frame further describing the observations I looped through.
I currently have a list of amino acids and their positions on a protein that looks like this:
Residue Position
H 1
R 2
K 3
D 4
E 5
H 6
R 7
K 8
D 9
E 10
I'd like something that looks like this (where H, R, and K are basic amino acids, and D and E are acidic amino acids):
Residue Position Properties
H 1 Basic
R 2 Basic
K 3 Basic
D 4 Acidic
E 5 Acidic
H 6 Basic
R 7 Basic
K 8 Basic
D 9 Acidic
E 10 Acidic
I'm really not sure where to start, and I'm having difficulty finding a good resource for this kind of situation in R.
I started by trying to subset the data, but then I realized that wouldn't do the trick:
Basic
h.dat <- subset(all, all$Residue == "H")
r.dat <- subset(all, all$Residue == "R")
k.dat <- subset(all, all$Residue == "K")
Acidic
d.dat <- subset(all, all$Residue == "D")
e.dat <- subset(all, all$Residue == "E")
Thanks!
Note:
H = Histidine (Basic amino acid)
R = Arginine (Basic)
K = Lysine (Basic)
E = Glutamic Acid (Acidic)
D = Aspartic Acid (Acidic)
You can use ifelse. If df is the name of your original data,
df$Property <- ifelse(df$Residue %in% c("H", "R", "K"), "Basic", "Acidic")
df
# Residue Position Property
# 1 H 1 Basic
# 2 R 2 Basic
# 3 K 3 Basic
# 4 D 4 Acidic
# 5 E 5 Acidic
# 6 H 6 Basic
# 7 R 7 Basic
# 8 K 8 Basic
# 9 D 9 Acidic
# 10 E 10 Acidic
Try:
> df1
Residue Position
1 H 1
2 R 2
3 K 3
4 D 4
5 E 5
6 H 6
7 R 7
8 K 8
9 D 9
10 E 10
Create a reference table:
> df2
Residue Property
1 H Basic
2 R Basic
3 K Basic
4 D Acidic
5 E Acidic
Then merge:
> merge(df1, df2)
Residue Position Property
1 D 9 Acidic
2 D 4 Acidic
3 E 5 Acidic
4 E 10 Acidic
5 H 1 Basic
6 H 6 Basic
7 K 8 Basic
8 K 3 Basic
9 R 7 Basic
10 R 2 Basic
I think you might want to allow for non-polar amino acids as well:
c(rep("Basic",3),rep("Acidic",2),"Non-Polar")[ # those are the choices
match(dat$Residue, c("H","R","K","E","D"), nomatch=6) ] #select indices
So I added an 11th residue named "Z" and tested:
> dat$Property <- c(rep("Basic",3),rep("Acidic",2),"Non-Polar")[
match(dat$Residue, c("H","R","K","E","D"), nomatch=6) ]
> dat
Residue Position Property
1 H 1 Basic
2 R 2 Basic
3 K 3 Basic
4 D 4 Acidic
5 E 5 Acidic
6 H 6 Basic
7 R 7 Basic
8 K 8 Basic
9 D 9 Acidic
10 E 10 Acidic
11 Z 11 Non-Polar

Return vertexes of separate subgraph

I have this graph:
df<-data.frame(x=c('a','b','c'),y=c('d','c','f'))
g<-graph.data.frame(df,directed=F)
is there a way to return two lists of vertexes according to which subgraph they belong?
I'd like to get to this output:
vertex id
1 a 1
2 d 1
3 b 2
4 c 2
5 f 2
Thank you
See clusters. Btw. what you are looking for is the components of the graph. (The igraph terminology is confusing, too.)
data.frame(vertex=V(g)$name, id=clusters(g)$membership)
# vertex id
# 1 a 1
# 2 b 2
# 3 c 2
# 4 d 1
# 5 f 2

Resources