Plot conditional colors based on rows - r

I have two data.frames called outlier and data.
outlier just keeps row numbers which needs to be coloured.
data has 1000 data.
It has two columns called x and y.
If row number exists in outliers I want dots in plot to be red, otherwise black
plot(data$x, data$y, col=ifelse(??,"red","black"))
Something should be in ?? .

Hi this way works for me using ifelse, let me know what you think:
outlier <- sample(1:100, 50)
data <- data.frame(x = 1:100, y = rnorm(n = 100))
plot(
data[ ,1], data[ ,2]
,col = ifelse(row.names(data) %in% outlier, "red", "blue")
,type = "h"
)

I think this can be accomplished by creating a new color column in your data frame:
data$color <- "black"
Then set the outliers to a different value:
data[outlier,"color"] <- "red"
I dont have your exact data but I think I got something similar to what you wanted using the following:
outlier <- c(1, 2, 7, 9)
data <- data.frame(x=c(1,2,3,4,5,6,7,8,9,10),
y=c(1,2,3,4,5,6,7,8,9,10))
data$color <- "black"
data[outlier,"color"] <- "red"
data
x y color
1 1 1 red
2 2 2 red
3 3 3 black
4 4 4 black
5 5 5 black
6 6 6 black
7 7 7 red
8 8 8 black
9 9 9 red
10 10 10 black
Finally plot using the new value in data:
plot(data$x, data$y, col=data$color)
Results in:

Related

Changing edge color based on attribute

I'm working on the visual representation of a network on R software, using the igraph package.
I have a data set with links between all the nodes and, for each link/edge, the district that they are assign to.
So, I would like to change the edge color of each edge, based on each district they are assign to. The table above shows the structure of the table.
nodei
nodej
depot1
depot2
4
5
1
0
In this case the link (4-5) is assigned to depot1, so the edge color of the edge should be green, for example.
Here is an exmaple that edges associated with 1 will be colored in "red", and "green" otherwise:
g <- make_ring(5)
g %>%
set_edge_attr(
name = "color",
value = c("green", "red")[1 + (rowSums(ends(., E(.)) == "1") > 0)]
) %>%
plot()
In your case, you could replace "1" by "depot1" and have a try.
I tried to understand your question, I created a node type to color the edges and nodes.
library(igraph)
data <- read.table(text = "
N D type
1 6 A
3 7 B
7 8 A
4 5 B
7 10 A
4 6 B
1 7 A
6 8 B
7 9 B
6 10 A ", header=T )
nodes <- data.frame(id=unique(c(data$N, data$D)) )
nodes$type <- c("A","B") # this if for the layout
nodes$x <- c(1,3,7,4,7, 4, 1,6,7,6)
nodes$y <- c(6,7,8,5,10,,5, 7,8,9,10)
nodes
G <- graph_from_data_frame(dd, vertices = nodes )
V(G)$color <- ifelse( V(G)$type == "A", "red", "green")
E(G)$color <- ifelse( E(G)$type == "A", "red","green")
edge_attr(G)
vertex_attr(G)
plot(G)

Partitioning Data creates unexpected results

I am trying to partition my data to a 60% Training and 40% Test Set using the following code.
split <- sample.split(divdat, SplitRatio = 0.6)
split
train.div <- subset(divdat, split == "TRUE")
test.div <- subset(divdat, split == "FALSE")
However, when using this code it splits my data as if it were 50/50. I have two hundred observations but and I get 100 observations for each. Any ideas what I am doing wrong here?
Function sample.split splits not by row, but by labels. to do it should change the first argument of sample.split to column values where you store labels. Then you'll observe 60/40 ration of training/test sets. I.e.
library(caTools)
divdat <- data.frame(id = 1:10, chars = letters[1:10], labels = c("X", "Y"))
split <- sample.split(divdat$labels, SplitRatio = 0.6)
train.div <- subset(divdat, split == "TRUE")
test.div <- subset(divdat, split == "FALSE")
train.div
test.div
Output:
> train.div
id chars labels
2 2 b Y
3 3 c X
5 5 e X
6 6 f Y
9 9 i X
10 10 j Y
> test.div
id chars labels
1 1 a X
4 4 d Y
7 7 g X
8 8 h Y

Match vertex and edge color in igraph

I have a large data set that I want to represent with a network graph using igraph. I just don't understand how to get the colors right. Is it possible to get an igraph plot with edge having the same color as vertex color? I my example below, I would like to color vertex and edges according to the status 'sampled' or 'unsampled'. An other problem is that all the edge do not appear on the igraph, and I don't understand why
My code so far is:
d <- data.frame(individual=c(1:10), mother_id = c(0,0,0,0,0,1,3,7,6,7), father_id = c(0,0,0,0,0,4,1,6,7,6) , generation = c(0,0,0,0,0,1,1,2,2,2), status=c("sampled","unsampled","unsampled","sampled",'sampled',"sampled","unsampled","unsampled","sampled",'sampled'))
#Just some settings for layout plot
g <- d$generation
n <- nrow(d)
pos <- matrix(data = NA, nrow = n, ncol = 2)
pos[, 2] <- max(g) - g
pos[, 1] <- order(g, partial = order(d$individual, decreasing = TRUE)) - cumsum(c(0, table(g)))[g + 1]
#Plotting the igraph
G <- graph_from_data_frame(d)
plot(G, rescale = T, vertex.label = d$individual, layout = pos,
edge.arrow.mode = "-",
vertex.color = d$status,
edge.color = d$status,
asp = 0.35)
My question is somewhat similar to this question, but I would like to do it with igraph package.
Ggraph node color to match edge color
Thanks for your help
if you plot(G) you will see that the graph from data frame object is not what you expect, most likely. That is why you dont see all edges (i.e the column father_id is not used at all).
By default igraph takes the first column as "from" and the second one as "to". That is why you see 1to0, 2to0 and so on.
You can fix this by passing in two objects, one with the edges and their attributes, and one with the nodes and their attributes.
It is not so clear to me where the edges should be. However, your code should look something like this:
dd <- read.table(text = "
from to type
1 6 A
3 7 B
7 8 A
6 9 B
7 10 A
4 6 B
1 7 A
6 8 B
7 9 B
6 10 A ", header=T )
nodes <- data.frame(id=unique(c(dd$from, dd$to)) )
nodes$type <- sample(LETTERS[1:2], 8, replace = T )
nodes$x <- c(8,3,5,7,1,2,4,10) # this if for the layout
nodes$y <- c(1, 2, 4, 5, 6, 8, 5, 7)
nodes
id type x y
1 1 B 8 1
2 3 A 3 2
3 7 B 5 4
4 6 A 7 5
5 4 A 1 6
6 8 B 2 8
7 9 A 4 5
8 10 A 10 7
G <- graph_from_data_frame(dd, vertices = nodes ) # directed T or F?
V(G)$color <- ifelse( V(G)$type == "A", "pink", "skyblue")
E(G)$color <- ifelse( E(G)$type == "A", "pink", "skyblue")
edge_attr(G)
vertex_attr(G)
plot(G)

Identify and plot datapoints surrounded by NAs

I am using ggplot2 and geom_line() to make a lineplot of a large number of time series. The dataset has a high number of missing values, and I am generally happy that lines are not drawn across missing segments, as this would look awkard.
My problem is that single non-NA datapoints surrounded by NAs (or points at the beginning/end of the series with an NA on the other side) are not plotted. A potential solution would be adding geom_point() for all observations, but this increases my filesize tenfold, and makes the plot harder to read.
Thus, I want to identify only those datapoints that do not get shown with geom_line() and add points only for those. Is there a straightforward way to identify these points?
My data is currently in long format, and the following MWE can serve as an illustration. I want to identify rows 1 and 7 so that I can plot them:
library(ggplot2)
set.seed(1)
dat <- data.frame(time=rep(1:5,2),country=rep(1:2,each=5),value=rnorm(10))
dat[c(2,6,8),3] <- NA
ggplot(dat) + geom_line(aes(time,value,group=country))
> dat
time country value
1 1 1 -0.6264538
2 2 1 NA
3 3 1 -0.8356286
4 4 1 1.5952808
5 5 1 0.3295078
6 1 2 NA
7 2 2 0.4874291
8 3 2 NA
9 4 2 0.5757814
10 5 2 -0.3053884
You can use zoo::rollapply function to create a new column with values surrended with NA only. Then you can simply plot those points. For example:
library(zoo)
library(ggplot2)
foo <- data.frame(time =c(1:11), value = c(1 ,NA, 3, 4, 5, NA, 2, NA, 4, 5, NA))
# Perform sliding window processing
val <- c(NA, NA, foo$value, NA, NA) # Add NA at the ends of vector
val <- rollapply(val, width = 3, FUN = function(x){
if (all(is.na(x) == c(TRUE, FALSE, TRUE))){
return(x[2])
} else {
return(NA)
}
})
foo$val_clean <- val[c(-1, -length(val))] # Remove first and last values
foo$val_clean
ggplot(foo) + geom_line(aes(time, value)) + geom_point(aes(time, val_clean))
Do you mean something like this?
library(tidyverse)
dat %>%
na.omit() %>%
ggplot() +
geom_line(aes(time, value, group = country))

Plot every 10 datapoint in a vector by different color in R

I have one dimensional vector in R which I would like to plot like :
Every 10 data points have different color. How do I do this in R with normal plot function, with ggplot and with plotly?
in base R you can try this.
I changed the data a little bit compared to the other answer
# The data
set.seed(2017);
df <- data.frame(x = 1:100, y = 0.001 * 1:100 + runif(100));
nCol <- 10;
df$col <- rep(1:10, each = 10);
# base R plot
plot(df[1:2]) #add `type="n"` to remove the points
sapply(1:nrow(df), function(x) lines(df[x+0:1,1:2], col=df$col[x], lwd=2))
As for lines the col parameter will be recycled you have to use a loop (here sapply) over the rows and plot segments.
Here is a ggplot solution; unfortunately you don't provide sample data, so I'm generating some random data.
# Sample data
set.seed(2017);
df <- data.frame(x = 1:100, y = 0.001 * 1:100 + runif(1000));
# The number of different colours
nCol <- 5;
df$col <- rep(1:nCol, each = 10);
# ggplot
library(tidyverse);
ggplot(df, aes(x = x, y = y, col = as.factor(col), group = 1)) +
geom_line();
For plotly just wrap the ggplot call within ggplotly.
This answer doesn't show you how to do it in a specific plotting package, but instead shows how to assign random colors to your data according to your specifications. The benefit of this approach is that it gives you control over which colors you use if you choose.
library(dplyr) # assumed okay given ggplot2 mention
df = data_frame(v1=rnorm(100))
n = nrow(df)
df$group = (1:n - (1:n %% -10)) / 10
colors = sample(colors(), max(df$group), replace=FALSE)
df$color = colors[df$group]
df %>% group_by(group) %>% filter(row_number() <= 2) %>% ungroup()
# A tibble: 20 x 3
v1 group color
<dbl> <dbl> <chr>
1 -0.6941434087 1 lightsteelblue2
2 -0.4559695973 1 lightsteelblue2
3 0.7567737300 2 darkgoldenrod2
4 0.9478937275 2 darkgoldenrod2
5 -1.2358486079 3 slategray3
6 -0.7068140340 3 slategray3
7 1.3625895045 4 cornsilk
8 -2.0416315923 4 cornsilk
9 -0.6273386846 5 darkgoldenrod4
10 -0.5884521130 5 darkgoldenrod4
11 0.0645078975 6 antiquewhite1
12 1.3176727205 6 antiquewhite1
13 -1.9082708004 7 khaki
14 0.2898018693 7 khaki
15 0.7276799336 8 greenyellow
16 0.2601492048 8 greenyellow
17 -0.0514811315 9 seagreen1
18 0.8122600269 9 seagreen1
19 0.0004641533 10 darkseagreen4
20 -0.9032770589 10 darkseagreen4
The above code first creates a fake dataset with 100 rows of data, and sets n equal to 100. df$group is set by taking the row numbers (1:n) performing a rather convoluted evaluation to get a vector of numbers like c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, ..., 10). It then samples the colors available in base R returning as many colors as their are groups (max(df$group)) and then using the group variable to index the color vector to get the color. The final output is just the first two rows of each group to show that the colors are the same within group, but different between groups. This should now be able to be passed in as a variable in your various plotting environments.

Resources