Issues with plotting network in igraph - r

I am having some issues in realizing a bipartite network in R with the library igraph. Here is my script:
library(igraph)
library(reshape2)
setwd("....")
getwd()
library(readxl)
network=read_excel("network1.xlsx")
print(network)
subjects=as.character(unlist(network[,1]))
agents=colnames(network[-1])
print(network)
network = network[,-1]
g=graph.incidence(network, weighted = T)
V(g)$type
V(g)$name=c(subjects,agents)
V(g)$color = V(g)$type
V(g)$color=gsub("FALSE","red",V(g)$color)
V(g)$color=gsub("TRUE","lightblue",V(g)$color)
plot(g, edge.arrow.width = 0.3,
vertex.size = 5,
edge.arrow.size = 0.5,
vertex.size2 = 5,
vertex.label.cex = 1,
vertex.label.color="black",
asp = 0.35,
margin = 0,
edge.color="grey",
edge.width=(E(g)$weight),
layout=layout_as_bipartite)
The network is properly plotted
as you can see
however I have two issues
(1) I don't understand the order in which the vertexs are showed in the plot. They are not in the same order of the excel file, neither in alphabetical or numerical order. They seem to be in random order. How could I choose the order in which the vertex should be placed?
(2) I don't understand why some vertex are closer toghether, and some are more far apart. I would all vertexes at the same distance. How could I do it?
Thank you a lot for your invaluable help.

Since you do not provide your data, I will illustrate with a made-up example.
Sample graph data
library(igraph)
set.seed(123)
EL = matrix(c(sample(8,18, replace=T),
sample(LETTERS[1:6], 18, replace=T)), ncol=2)
g = simplify(graph_from_edgelist(EL))
V(g)$type = bipartite_mapping(g)$type
VCol = c("#FF000066", "#0000FF66")[as.numeric(V(g)$type)+1]
plot(g, layout=layout_as_bipartite(g), vertex.color=VCol)
As with your graph, this has two problems. The nodes are ordered arbitrarily
and the lower row is oddly spaced. Let's address those problems one at a time.
To do so, we will need to take control of the layout instead of using any of
the automated layout functions. A layout is simply a vcount(g) * 2 matrix
giving the x-y coordinates of the vertices for plotting. Here, I will put one
type of nodes in the top row by specifying the y coordinate as 1 and the other
nodes in a lower row by specifying y=0. We want to specify the order horizontally
by rank (alphabetically) within each group. So
LO = matrix(0, nrow=vcount(g), ncol=2)
LO[!V(g)$type, 2] = 1
LO[V(g)$type, 1] = rank(V(g)$name[V(g)$type])
LO[!V(g)$type, 1] = rank(V(g)$name[!V(g)$type])
plot(g, layout=LO, vertex.color=VCol)
Now both rows are ordered and evenly spaced, but because there are fewer
vertices in the bottom row, there is an unattractive, unbalanced look. We
can fix that by stretching the bottom row. I find it easier to make the right
scale factor if the coordinates go from 0 to (number of nodes) - 1 rather than
1 to (number of nodes) as above. Doing this, we get
LO[V(g)$type, 1] = rank(V(g)$name[V(g)$type]) - 1
LO[!V(g)$type, 1] = (rank(V(g)$name[!V(g)$type]) - 1) *
(sum(V(g)$type) - 1) / (sum(!V(g)$type) - 1)
plot(g, layout=LO, vertex.color=VCol)

thank you a lot. I performed your very very helpful example, and with the step one I did it work properly with my data, keeping the different thickness of the edges and all as in my plot, but with the proper order. This is very important, thank you a lot. However, I have some troubles in understanding how to rescale properly the top and the bottom row with my data, because they always seem to bee too near. probably I did not understand completly the coordinates on which I have to work. Here are my data.
> `> network=read_excel("network1.xlsx",2)
> dput(network)
structure(list(`NA` = c(2333, 2439, 2450, 2451, 2452, 2453, 2454,
2455, 2456, 2457, 2458, 2459, 2460, 2461, 2480, 2490, 2491, 2492,
2493, 2494, 2495), A = c(12, 2, 2, 5, 2, 0, 5, 3, 0, 0, 7, 0,
0, 0, 6, 2, 10, 7, 1, 2, 5), B = c(0, 1, 0, 1, 0, 0, 2, 0, 0,
0, 0, 0, 1, 0, 5, 0, 2, 0, 0, 0, 0), C = c(0, 0, 0, 0, 1, 0,
4, 0, 0, 0, 0, 1, 0, 0, 2, 0, 4, 4, 2, 1, 0), D = c(2, 0, 0,
0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 7, 0, 4, 0, 1, 4, 0), E = c(11,
2, 3, 3, 3, 8, 3, 6, 4, 1, 1, 0, 12, 0, 5, 0, 4, 6, 4, 8, 9),
F = c(2, 0, 0, 3, 1, 0, 10, 1, 0, 0, 0, 1, 0, 0, 9, 0, 0,
1, 1, 3, 3), G = c(0, 3, 1, 1, 0, 0, 0, 0, 0, 3, 2, 0, 0,
0, 1, 0, 0, 2, 0, 1, 0), H = c(0, 0, 2, 0, 0, 0, 1, 0, 0,
0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 1), I = c(0, 0, 0, 0, 0,
0, 3, 0, 6, 3, 0, 0, 1, 0, 7, 0, 0, 4, 1, 2, 0), J = c(0,
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,
0)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-21L), .Names = c(NA, "A", "B", "C", "D", "E", "F", "G", "H",
"I", "J"))
> print(network)
NA A B C D E F G H I J
1 2333 12 0 0 2 11 2 0 0 0 0
2 2439 2 1 0 0 2 0 3 0 0 0
3 2450 2 0 0 0 3 0 1 2 0 0
4 2451 5 1 0 0 3 3 1 0 0 0
5 2452 2 0 1 0 3 1 0 0 0 0
6 2453 0 0 0 0 8 0 0 0 0 1
7 2454 5 2 4 2 3 10 0 1 3 0
8 2455 3 0 0 0 6 1 0 0 0 0
9 2456 0 0 0 0 4 0 0 0 6 0
10 2457 0 0 0 0 1 0 3 0 3 0
11 2458 7 0 0 0 1 0 2 0 0 0
12 2459 0 0 1 0 0 1 0 0 0 0
13 2460 0 1 0 0 12 0 0 0 1 0
14 2461 0 0 0 0 0 0 0 0 0 0
15 2480 6 5 2 7 5 9 1 2 7 1
16 2490 2 0 0 0 0 0 0 0 0 0
17 2491 10 2 4 4 4 0 0 0 0 0
18 2492 7 0 4 0 6 1 2 0 4 0
19 2493 1 0 2 1 4 1 0 0 1 0
20 2494 2 0 1 4 8 3 1 0 2 0
21 2495 5 0 0 0 9 3 0 1 0 0
> `

Related

Create a new variable based on other columns values

I have a paneldata dataframe structure, something like this:
df <- data.frame("id" = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3),
"Status_2014" = c(1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0),
"Status_2015" = c(0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0),
"Status_2016" = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0))
I want to generate a new dummy variable, that takes the value 1, if the rows contains 1 in any of the three columns or otherwise 0 if not. It should end up like this:
df <- data.frame("id" = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3),
"Status_2014" = c(1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0),
"Status_2015" = c(0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0),
"Status_2016" = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
"Final_status" = c(1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0))
Can anyone help me achieve this?
We can use if_any on the columns that starts_with 'Status', to check for any 1 value in a row and it returns TRUE if there is one or else FALSE which is coerced to binary with as.integer/+
library(dplyr)
df %>%
mutate(Final_status = +(if_any(starts_with('Status'), ~ . ==1)))
-outptu
id Status_2014 Status_2015 Status_2016 Final_status
1 1 1 0 0 1
2 1 1 0 0 1
3 1 1 0 0 1
4 1 1 0 0 1
5 2 0 1 0 1
6 2 0 1 0 1
7 2 0 1 0 1
8 2 0 1 0 1
9 3 0 0 0 0
10 3 0 0 0 0
11 3 0 0 0 0
12 3 0 0 0 0
Or using rowSums from base R
df$Final_status <- +(rowSums(df[-1] > 0) > 0)
You write an if condition to define the variable as 1 or 0, and inside this condition the most straight forward ways would be a dplyr pipe.
I don't have the dplyr syntax in my head, to long not used, but dplyr is what you want.
https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf
best greetings

Dummy variables to factor [duplicate]

This question already has answers here:
For each row return the column name of the largest value
(10 answers)
Closed 2 years ago.
Hello I am trying to create a new variable in my data set, that combines each dummy of "education" with their respective character strings so I can use the factor of edu in a regression model.
I am not certain how to create a new variable "edu" with "edu4"in the first & second row and so on...
Help is much appreciated!
As you not provide the dataset by dput function I built a small example by myself.
dput(df)
structure(list(id = 1:10, edu1 = c(1, 0, 0, 0, 0, 0, 0, 0, 1,
0), edu2 = c(0, 0, 0, 0, 0, 1, 0, 1, 0, 0), edu3 = c(0, 0, 0,
0, 0, 0, 0, 0, 0, 0), edu4 = c(0, 1, 1, 0, 1, 0, 0, 0, 0, 0),
edu5 = c(0, 0, 0, 1, 0, 0, 1, 0, 0, 1)), class = "data.frame", row.names = c(NA,
-10L))
Solution
df$edu = factor(apply(df[,paste0("edu", 1:5)], 1, which.max))
Result
> df
id edu1 edu2 edu3 edu4 edu5 edu
1 1 1 0 0 0 0 1
2 2 0 0 0 1 0 4
3 3 0 0 0 1 0 4
4 4 0 0 0 0 1 5
5 5 0 0 0 1 0 4
6 6 0 1 0 0 0 2
7 7 0 0 0 0 1 5
8 8 0 1 0 0 0 2
9 9 1 0 0 0 0 1
10 10 0 0 0 0 1 5
Try this: df is your data frame, and your edu variables are in colum 7 to 12. But we start from 8. If all your edu variables are 0 edu1 will be generated.
factor_variable <- factor((df[ ,8:12] %*% (1:ncol(df[ ,8:12]))) + 1,
labels = c("edu1", colnames(df[ ,8:12])))
Let me know if this worked.

grouped and stacked bar plots using plotly

I am new to plotly and not very good with R. I am trying to do stack plots and ended up with a very cumbersome code, that I am sure could be simplify using RColorbrewer and perhaps ggplot2 to group my stacked bar plots, but I am unsure on how to do it.
Below is the data I used, which is in a data.frame called data2
Nation glider radar AUV ROV USV corer towed_eq Seismic_eq Drill_rig Manned_sub Other clean
1 Belgium 0 0 1 1 1 3 0 0 0 0 0 6
2 Bulgaria 0 0 0 0 0 0 1 0 0 1 0 2
3 Croatia 0 2 1 2 0 0 0 0 0 0 0 5
4 Cyprus 3 0 0 0 0 0 0 0 0 0 0 3
5 Estonia 0 0 0 1 0 0 0 0 0 0 0 1
6 Finland 1 0 0 0 0 0 0 0 0 0 0 1
7 France 11 2 3 1 0 1 1 3 0 1 0 23
8 Germany 18 3 3 4 0 0 1 4 2 1 0 36
9 Greece 1 0 0 3 0 0 0 0 0 0 0 4
10 Ireland 0 0 0 2 0 0 0 0 0 0 0 2
11 Italy 10 8 3 2 4 0 0 1 0 0 0 28
12 Malta 0 2 0 0 0 0 0 0 0 0 0 2
13 Netherlands 0 2 0 0 0 0 0 0 0 0 0 2
14 Norway 17 3 1 3 0 1 3 1 0 0 1 30
15 Poland 0 0 0 1 0 0 0 0 0 0 0 1
16 Portugal 0 3 6 6 4 2 1 0 0 2 1 25
17 Romania 0 0 0 1 0 0 0 0 0 0 0 1
18 Slovenia 0 1 0 0 0 0 0 0 0 0 0 1
19 Spain 12 17 2 1 0 0 0 2 0 0 0 34
20 Sweden 0 2 1 3 0 0 0 0 0 0 0 6
21 Turkey 0 0 0 0 0 0 0 0 0 2 0 2
22 United Kingdom 0 0 13 4 1 11 4 2 1 0 4 40
23 Unknown 5 0 0 0 0 0 0 0 0 0 0 5
And this is the code I used
fig <- plot_ly(data2, x = ~Nation, y = ~glider, type = 'bar', name = 'Glider')
fig <- fig %>% add_trace(y = ~radar, name = 'Radar', marker=list(color='rgb(26, 118, 255)'))
fig <- fig %>% add_trace(y = ~AUV, name = 'AUV',marker=list(color='rgb(255, 128, 0)'))
fig <- fig %>% add_trace(y = ~ROV, name = 'ROV',marker=list(color='rgb(204, 0, 0)'))
fig <- fig %>% add_trace(y = ~USV, name = 'USV',marker=list(color='rgb(51, 255, 153)'))
fig <- fig %>% add_trace(y = ~corer, name = 'Corer',marker=list(color='rgb(204, 0, 204)'))
fig <- fig %>% add_trace(y = ~towed_eq, name = 'Towed equipment',marker=list(color='rgb(255, 255, 51)'))
fig <- fig %>% add_trace(y = ~Seismic_eq, name = 'Seismic equipment',marker=list(color='rgb(255, 204, 229)'))
fig <- fig %>% add_trace(y = ~Drill_rig, name = 'Drill rig',marker=list(color='rgb(102, 255, 255)'))
fig <- fig %>% add_trace(y = ~Manned_sub, name = 'Manned submersible',marker=list(color='rgb(128, 255, 0)'))
fig <- fig %>% add_trace(y = ~Other, name = 'Other equipment',marker=list(color='rgb(153, 153, 0)'))
fig <- fig %>% layout(xaxis = list(title = "",tickfont = list(size = 14)), yaxis = list(title = 'Number of assets',tickfont = list(size = 14)), barmode = 'stack')
fig
Is there an easier way to code this by using Rcolorbrewer instead of coding each color? and is it possible to group my stacked barplots Group1 (glider, auv, rov, usv), Group 2 (corer,towed_ew, seismic_eq, drill_rig) and Group 3 (radar, manned_sub, Other)?stack_plot
You can try this approach by melting the data:
library(dplyr)
library(plotly)
library(tidyr)
library(RColorBrewer)
#Data
data <- structure(list(Nation = c("Belgium", "Bulgaria", "Croatia", "Cyprus",
"Estonia", "Finland", "France", "Germany", "Greece", "Ireland",
"Italy", "Malta", "Netherlands", "Norway", "Poland", "Portugal",
"Romania", "Slovenia", "Spain", "Sweden", "Turkey", "United Kingdom",
"Unknown"), glider = c(0, 0, 0, 3, 0, 1, 11, 18, 1, 0, 10, 0,
0, 17, 0, 0, 0, 0, 12, 0, 0, 0, 5), radar = c(0, 0, 2, 0, 0,
0, 2, 3, 0, 0, 8, 2, 2, 3, 0, 3, 0, 1, 17, 2, 0, 0, 0), AUV = c(1,
0, 1, 0, 0, 0, 3, 3, 0, 0, 3, 0, 0, 1, 0, 6, 0, 0, 2, 1, 0, 13,
0), ROV = c(1, 0, 2, 0, 1, 0, 1, 4, 3, 2, 2, 0, 0, 3, 1, 6, 1,
0, 1, 3, 0, 4, 0), USV = c(1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0,
0, 0, 0, 4, 0, 0, 0, 0, 0, 1, 0), corer = c(3, 0, 0, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 1, 0, 2, 0, 0, 0, 0, 0, 11, 0), towed_eq = c(0,
1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 3, 0, 1, 0, 0, 0, 0, 0, 4,
0), Seismic_eq = c(0, 0, 0, 0, 0, 0, 3, 4, 0, 0, 1, 0, 0, 1,
0, 0, 0, 0, 2, 0, 0, 2, 0), Drill_rig = c(0, 0, 0, 0, 0, 0, 0,
2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0), Manned_sub = c(0,
1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 2, 0,
0), Other = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1,
0, 0, 0, 0, 0, 4, 0), clean = c(6, 2, 5, 3, 1, 1, 23, 36, 4,
2, 28, 2, 2, 30, 1, 25, 1, 1, 34, 6, 2, 40, 5)), row.names = c(NA,
-23L), class = "data.frame")
Now the code:
#First reshape
df2 <- pivot_longer(data,cols = -Nation)
#Plot
p <- plot_ly(df2, x = df2$Nation,
y = df2$value,
type = 'bar',
name = df2$name,
text = df2$value,
color = df2$name,
colors = brewer.pal(length(unique(df2$name)),
"Paired"))%>%
layout(barmode = 'stack',hoverlabel = list(bgcolor= 'white') ,bargap = 0.5) %>%
layout(xaxis = list(categoryorder = 'array',
categoryarray = df2$Nation), showlegend = T)
The output:

cumsum according to certain restricts in r

I have a large data of car accidents and a sample of it is provided below.
accident is a binary variable of whether the accident happens or
not.
shift_number is the number of the shift, 0 means the driver is
taking a rest and not a shift.
time_diff is the amount of time at each observation.
df <- data.frame(
accident = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1),
shift_number = c(1, 1, 0, 0, 0, 2, 2, 2, 0, 0, 3, 3, 3, 3, 3),
time_diff = 3:17
)
My question is to measure the total amount of working time since the driver starts this shift for each accident.
wanted <- data.frame
(
accident = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1),
shift_number = c(1, 1, 0, 0, 0, 2, 2, 2, 0, 0, 3, 3, 3, 3, 3),
time_diff = 3:17,
cum_time = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 27, 0, 0, 75)
)
Does anyone have ideas on solving this problem with R? It's better to have data.table or vectorised solution because I've got huge data to deal with.
df$cum_time = 0
accident = which(df$accident == 1)
df$cum_time[accident] <- sapply(accident, function(x) {
sum(df$time_diff[(which.max(cumsum(df$shift_number[1:x] == 0)) + 1): x])
})
df
# accident shift_number time_diff cum_time
#1 0 1 3 0
#2 0 1 4 0
#3 0 0 5 0
#4 0 0 6 0
#5 0 0 7 0
#6 0 2 8 0
#7 0 2 9 0
#8 0 2 10 0
#9 0 0 11 0
#10 0 0 12 0
#11 0 3 13 0
#12 1 3 14 27
#13 0 3 15 0
#14 0 3 16 0
#15 1 3 17 75
We first make all the values in cum_time variable as 0. We find the indices where accident has occurred. For each of those indices we find the latest 0 in shift_number and calculate the sum of values of time_diff from the latest 0 to x and assign it to its respective indices.
Use the ave function to compute the cumulative sum of time_diff by shift_number:
cumsum_by_shift <- ave(df$time_diff, df$shift_number, FUN=cumsum)
#[1] 3 7 5 11 18 8 17 27 29 41 13 27 42 58 75
Pick out elements of cumsum_by_shift where accidents occur:
cum_time <- ifelse(df$accident == 1, cumsum_by_shift, 0)
#[1] 0 0 0 0 0 0 0 0 0 0 0 27 0 0 75
Note the use of the vectorized ifelse function.

Get the average every 10 steps in a vector in R [duplicate]

This question already has answers here:
Stats on every n rows for each column
(2 answers)
Closed 6 years ago.
I have a vector of values:
[1] 0 0 4 1 0 0 -1 1 1 0 -1 0 0 -2 0 0
[17] 1 2 0 2 0 1 1 1 0 1 -1 0 0 0 0 0
[33] 0 2 0 4 -2 0 0 -1 1 0 0 0 -1 -2 2 0
[49] -1 0 -1 0 3 0 0 -1 1 0 0 0 1 -3 0 -1
[65] 0 -1 0 1 1 0 1 -2 1 1 0 0 -1 -2 0 0
[81] 0 2 0 0 1 1 0 0 0 -1 -2 0 -1 -1 -1 -1
[97] 1 1 0 1
I would like to get the average every 10 steps (the average of the previous 10 numbers at that point), and thus produce a new vector with these averages. Since there are 100 values in the original vector this would give a new vector of length 10 (the 10 averages).
I know I can get access to the number at each 10th point using:
result <- my_vector[seq(1, length(my_vector), 10)]
But I need the average of the 10 previous points at that step, not just the number itself.
colMeans(matrix(x, 10))
[1] 0.4 0.7 0.8 0.2 0.0 0.4 -0.4 -0.4 -0.7 0.1
We turn the vector into a matrix with the dimensions matching your desired length and use colMeans to find the mean of each group. We could have also used rowMeans, but since the matrix is populated column-wise by default we would have to add another argument byrow=TRUE and potentially hurt ourselves with all of the extra typing.
We can test our answer by explicitly finding the mean of a few of the subsetted vectors.
#Test
mean(x[1:10])
[1] 0.4
mean(x[11:20])
[1] 0.7
Data
x <- c(0, 1, 0, -1, 0, 0, 0, 2, 2, 0, -1, 2, 4, 0, 0, -1, 0, 0, 1,
2, 4, 0, 1, 0, 0, 0, -2, 3, 1, 1, 0, 1, 0, 0, 0, 1, -1, 1, 0,
0, 1, 0, 1, 1, -1, -1, -2, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1,
-1, -1, -1, 0, 0, 0, -2, 0, 0, 0, 0, 0, 0, 0, 0, -1, 1, -1, -1,
-2, 0, -2, -3, -2, -1, 0, 0, 2, 0, 0, -1, 0, 0, 0, -1, 0, -1,
1, 1, 0, 1)

Resources