geom_polygon with multiple hole - r

I refer to the answer for this question and have additional question.
I have modify the code as below:
library(ggplot2)
ids <- letters[1:2]
# IDs and values to use for fill colour
values <- data.frame(
id = ids,
value = c(4,5)
)
# Polygon position
positions <- data.frame(
id = c(rep(ids, each = 10),rep("b",5)),
# shape hole shape hole
x = c(1,4,4,1,1, 2,2,3,3,2, 5,10,10,5,5, 6,6,7,7,6, 8,8,9,9,8),
y = c(1,1,4,4,1, 2,3,3,2,2, 5,5,10,10,5, 6,7,7,6,6, 8,9,9,8,8)
)
# Merge positions and values
datapoly <- merge(values, positions, by=c("id"))
chart <- ggplot(datapoly, aes(x=x, y=y)) +
geom_polygon(aes(group=id, fill=factor(value)),colour="grey") +
scale_fill_discrete("Key")
And gives the following output:
There is a line passing through the two colored boxes, which I don't quite like it, how can I remove that? Thanks.

The solution I came up with years ago for drawing holes is to make sure that after each hole your x,y coordinates return to the same place. This stops the line buzzing all around and crossing other polygons and leaving open areas that the winding number algorithm doesn't fill (or does fill when it shouldn't).
So, if you have a data set where the first 27 points are your outer, and then you've got three holes of 5, 6, and 7 points, construct a new dataset which is:
newdata = data[c(1:27,28:32,27,33:38,27,39:45,27),] # untested
note how it jumps back to point 27 after each hole. Make sure your holes go in the clockwise direction (I think).
Then draw using newdata but only filling, not drawing outlines. If you want outlines, add them later (using the original data grouped by ring id)
You can sometimes get very very thin artifacts where the outgoing line to the hole isn't quite drawn the same as the incoming line, but they are hardly noticeable. Blame Bresenham.

Try this one
ggplot(datapoly, aes(x=x, y=y)) +
geom_polygon(aes(group=id, fill=factor(value))) +
scale_fill_discrete("Key")

Related

Scatter plot with lat long points with respect to a given point at the center of the plot

I am trying to plot lat long points on a plot using ggplot2 in R. The axes are lats on y and longs on x. I want a given location point to be the center of my plot and rest of the points on the scatter plot with respect to how far they are from this point (these points' lat long values are coming from a data frame).
How can that particular point be at the center of it all? I tried making two separate geom_point layers and added the one point I want in the center first, and then added the second geom layer with the rest of the data. But it doesn't work. I also tried coord_fixed by using the lat long limits from the first geom layer when I only plotted the main center point on plot, but after adding second layer, it does not remain in the center. I also wonder why is there no function or attribute to set the center of the plot around a particular point, so that rest of the points can fall wherever on the plot, but the focus point is there where I want, but maybe it is too specific of a thing.
Also, could the units on the axes be converted to meters?
The easiest way to do this is figure out what you want the range of your x axis to be, and the range that you want your y axis to be. Measure the distance along the x axis to the furthest point from your target point, and just make sure the x axis range is this big on both sides. Do the same for the y axis.
To demonstrate, I'll make a random sample of points with x and y co-ordinates, each made small and black:
set.seed(1234) # Makes this example reproducible
df <- data.frame(x = rnorm(200), y = rnorm(200), colour = "black", size = 1)
Now I'll choose one at random as my target point, making it big and red:
point_of_interest <- sample(200, 1)
df$colour[point_of_interest] <- "red"
df$size[point_of_interest] <- 5
So let's work out the furthest points North-South and East-West from our target and calculate a range which would include all points but have the target in the centre:
max_x_diff <- max(abs(df$x[point_of_interest] - df$x))
max_y_diff <- max(abs(df$y[point_of_interest] - df$y))
x_range <- df$x[point_of_interest] + c(-max_x_diff, max_x_diff)
y_range <- df$y[point_of_interest] + c(-max_y_diff, max_y_diff)
And now we just need to plot:
ggplot(df, aes(x, y, colour = colour, size = size)) +
geom_point() +
scale_colour_identity() +
lims(x = x_range, y = y_range) +
scale_size_identity() +
coord_equal()
We can see that even though our target is well off the center of the cluster, the target remains in the center of the plot.
With regards to changing latitude and longitude to meters, this requires a co-ordinate transformation. This has been answered many times on Stack Overflow and I won't duplicate those answers here. You could check out packages like rgdal or perhaps SpatialEpi which has the latlong2grid function.

Plotting lines between two points in ggplot2

I'm looking for a way to represent a vector coming off of a point given angle and magnitude in ggplot. I've calculated what the endpoint of these vectors should be, but can't figure out a way to plot this properly in ggplot2. In short, given an observation with (X,Y,vec.x,vec.y), how can I plot a line from (X,Y) to (vec.x,vec.y) that does not show (vec.x,vec.y)?
My first instinct was to use geom_line, but this seems to rely on connecting different observations, so I would need to separate each observation into two observations, one with the original point and one with the vector endpoint. However, this seems fairly messy and like there should be a cleaner way to achieve this. Furthermore, this would make it complicated to show the original points but hide the vector points, as they would be plotted within the same geom_point call.
Here's a sample dataset in the form I'm talking about:
test <- tibble(
x = c(1,2,3,4,5),
y = c(5,4,3,2,1),
vec.x = c(1.5,2.5,3.5,4.5,5.5),
vec.y = c(4,3,2,1,0)
)
test %>%
ggplot() +
geom_point(aes(x=x,y=y),color='red') +
geom_point(aes(x=vec.x,y=vec.y),color='blue')
What I'm hoping to achieve is this, but without the blue dots:
Any thoughts? Apologies if this is a duplicated issue. I did some Googling and was unable to find a similar question for ggplot.
test %>%
ggplot() +
geom_point(aes(x=x,y=y),color='red') +
geom_point(aes(x=vec.x,y=vec.y),color='blue') +
geom_segment(
aes(x = x,y = y, xend = vec.x,yend = vec.y),
arrow = arrow(length = unit(0.03,units = "npc")),
size = 1
)
Reference: https://ggplot2.tidyverse.org/reference/geom_segment.html

Dot Priority in ggplot2 jittered scatterplot [duplicate]

I'm plotting a dense scatter plot in ggplot2 where each point might be labeled by a different color:
df <- data.frame(x=rnorm(500))
df$y = rnorm(500)*0.1 + df$x
df$label <- c("a")
df$label[50] <- "point"
df$size <- 2
ggplot(df) + geom_point(aes(x=x, y=y, color=label, size=size))
When I do this, the scatter point labeled "point" (green) is plotted on top of the red points which have the label "a". What controls this z ordering in ggplot, i.e. what controls which point is on top of which?
For example, what if I wanted all the "a" points to be on top of all the points labeled "point" (meaning they would sometimes partially or fully hide that point)? Does this depend on alphanumerical ordering of labels?
I'd like to find a solution that can be translated easily to rpy2.
2016 Update:
The order aesthetic has been deprecated, so at this point the easiest approach is to sort the data.frame so that the green point is at the bottom, and is plotted last. If you don't want to alter the original data.frame, you can sort it during the ggplot call - here's an example that uses %>% and arrange from the dplyr package to do the on-the-fly sorting:
library(dplyr)
ggplot(df %>%
arrange(label),
aes(x = x, y = y, color = label, size = size)) +
geom_point()
Original 2015 answer for ggplot2 versions < 2.0.0
In ggplot2, you can use the order aesthetic to specify the order in which points are plotted. The last ones plotted will appear on top. To apply this, you can create a variable holding the order in which you'd like points to be drawn.
To put the green dot on top by plotting it after the others:
df$order <- ifelse(df$label=="a", 1, 2)
ggplot(df) + geom_point(aes(x=x, y=y, color=label, size=size, order=order))
Or to plot the green dot first and bury it, plot the points in the opposite order:
ggplot(df) + geom_point(aes(x=x, y=y, color=label, size=size, order=-order))
For this simple example, you can skip creating a new sorting variable and just coerce the label variable to a factor and then a numeric:
ggplot(df) +
geom_point(aes(x=x, y=y, color=label, size=size, order=as.numeric(factor(df$label))))
ggplot2 will create plots layer-by-layer and within each layer, the plotting order is defined by the geom type. The default is to plot in the order that they appear in the data.
Where this is different, it is noted. For example
geom_line
Connect observations, ordered by x value.
and
geom_path
Connect observations in data order
There are also known issues regarding the ordering of factors, and it is interesting to note the response of the package author Hadley
The display of a plot should be invariant to the order of the data frame - anything else is a bug.
This quote in mind, a layer is drawn in the specified order, so overplotting can be an issue, especially when creating dense scatter plots. So if you want a consistent plot (and not one that relies on the order in the data frame) you need to think a bit more.
Create a second layer
If you want certain values to appear above other values, you can use the subset argument to create a second layer to definitely be drawn afterwards. You will need to explicitly load the plyr package so .() will work.
set.seed(1234)
df <- data.frame(x=rnorm(500))
df$y = rnorm(500)*0.1 + df$x
df$label <- c("a")
df$label[50] <- "point"
df$size <- 2
library(plyr)
ggplot(df) + geom_point(aes(x = x, y = y, color = label, size = size)) +
geom_point(aes(x = x, y = y, color = label, size = size),
subset = .(label == 'point'))
Update
In ggplot2_2.0.0, the subset argument is deprecated. Use e.g. base::subset to select relevant data specified in the data argument. And no need to load plyr:
ggplot(df) +
geom_point(aes(x = x, y = y, color = label, size = size)) +
geom_point(data = subset(df, label == 'point'),
aes(x = x, y = y, color = label, size = size))
Or use alpha
Another approach to avoid the problem of overplotting would be to set the alpha (transparancy) of the points. This will not be as effective as the explicit second layer approach above, however, with judicious use of scale_alpha_manual you should be able to get something to work.
eg
# set alpha = 1 (no transparency) for your point(s) of interest
# and a low value otherwise
ggplot(df) + geom_point(aes(x=x, y=y, color=label, size=size,alpha = label)) +
scale_alpha_manual(guide='none', values = list(a = 0.2, point = 1))
The fundamental question here can be rephrased like this:
How do I control the layers of my plot?
In the 'ggplot2' package, you can do this quickly by splitting each different layer into a different command. Thinking in terms of layers takes a little bit of practice, but it essentially comes down to what you want plotted on top of other things. You build from the background upwards.
Prep: Prepare the sample data. This step is only necessary for this example, because we don't have real data to work with.
# Establish random seed to make data reproducible.
set.seed(1)
# Generate sample data.
df <- data.frame(x=rnorm(500))
df$y = rnorm(500)*0.1 + df$x
# Initialize 'label' and 'size' default values.
df$label <- "a"
df$size <- 2
# Label and size our "special" point.
df$label[50] <- "point"
df$size[50] <- 4
You may notice that I've added a different size to the example just to make the layer difference clearer.
Step 1: Separate your data into layers. Always do this BEFORE you use the 'ggplot' function. Too many people get stuck by trying to do data manipulation from with the 'ggplot' functions. Here, we want to create two layers: one with the "a" labels and one with the "point" labels.
df_layer_1 <- df[df$label=="a",]
df_layer_2 <- df[df$label=="point",]
You could do this with other functions, but I'm just quickly using the data frame matching logic to pull the data.
Step 2: Plot the data as layers. We want to plot all of the "a" data first and then plot all the "point" data.
ggplot() +
geom_point(
data=df_layer_1,
aes(x=x, y=y),
colour="orange",
size=df_layer_1$size) +
geom_point(
data=df_layer_2,
aes(x=x, y=y),
colour="blue",
size=df_layer_2$size)
Notice that the base plot layer ggplot() has no data assigned. This is important, because we are going to override the data for each layer. Then, we have two separate point geometry layers geom_point(...) that use their own specifications. The x and y axis will be shared, but we will use different data, colors, and sizes.
It is important to move the colour and size specifications outside of the aes(...) function, so we can specify these values literally. Otherwise, the 'ggplot' function will usually assign colors and sizes according to the levels found in the data. For instance, if you have size values of 2 and 5 in the data, it will assign a default size to any occurrences of the value 2 and will assign some larger size to any occurrences of the value 5. An 'aes' function specification will not use the values 2 and 5 for the sizes. The same goes for colors. I have exact sizes and colors that I want to use, so I move those arguments into the 'geom_plot' function itself. Also, any specifications in the 'aes' function will be put into the legend, which can be really useless.
Final note: In this example, you could achieve the wanted result in many ways, but it is important to understand how 'ggplot2' layers work in order to get the most out of your 'ggplot' charts. As long as you separate your data into different layers before you call the 'ggplot' functions, you have a lot of control over how things will be graphed on the screen.
It's plotted in order of the rows in the data.frame. Try this:
df2 <- rbind(df[-50,],df[50,])
ggplot(df2) + geom_point(aes(x=x, y=y, color=label, size=size))
As you see the green point is drawn last, since it represents the last row of the data.frame.
Here is a way to order the data.frame to have the green point drawn first:
df2 <- df[order(-as.numeric(factor(df$label))),]

Controlling alpha in ggparcoord (from GGally package)

I am trying to build from a question similar to mine (and from which I borrowed the self-contained example and title inspiration). I am trying to apply transparency individually to each line of a ggparcoord or somehow add two layers of ggparcoord on top of the other. The detailed description of the problem and format of data I have for the solution to work is provided below.
I have a dataset with thousand of lines, lets call it x.
library(GGally)
x = data.frame(a=runif(100,0,1),b=runif(100,0,1),c=runif(100,0,1),d=runif(100,0,1))
After clustering this data I also get a set of 5 lines, let's call this dataset y.
y = data.frame(a=runif(5,0,1),b=runif(5,0,1),c=runif(5,0,1),d=runif(5,0,1))
In order to see the centroids y overlaying x I use the following code. First I add y to x such that the 5 rows are on the bottom of the final dataframe. This ensures ggparcoord will put them last and therefore stay on top of all the data:
df <- rbind(x,y)
Next I create a new column for df, following the question advice I referred such that I can color differently the centroids and therefore can tell it apart from the data:
df$cluster = "data"
df$cluster[(nrow(df)-4):(nrow(df))] <- "centroids"
Finally I plot it:
p <- ggparcoord(df, columns=1:4, groupColumn=5, scale="globalminmax", alphaLines = 0.99) + xlab("Sample") + ylab("log(Count)")
p + scale_colour_manual(values = c("data" = "grey","centroids" = "#94003C"))
The problem I am stuck with is from this stage and onwards. On my original data, plotting solely x doesn't lead to much insight since it is a heavy load of lines (on this data this is equivalent to using ggparcoord above on x instead of df:
By reducing alphaLines considerably (0.05), I can naturally see some clusters due to the overlapping of the lines (this is again running ggparcoord on x reducing alphaLines):
It makes more sense to observe the centroids added to df on top of the second plot, not the first.
However, since everything it is on a single dataframe, applying such a high value for alphaLine makes the centroid lines disappear. My only option is then to use ggparcoord (as provided above) on df without decreasing the alphaValue:
My goal is to have the red lines (centroid lines) on top of the second figure with very low alpha. There are two ways I thought so far but couldn't get it working:
(1) Is there any way to create a column on the dataframe, similar to what is done for the color, such that I can specify the alpha value for each line?
(2) I originally attempted to create two different ggparcoords and "sum them up" hoping to overlay but an error was raised.
The question may contain too much detail, but I thought this could motivate better the applicability of the answer to serve the interest of other readers.
The answer I am looking for would use the provided data variables on the current format and generate the plot I am looking for. Better ways to reconstruct the data is also welcomed, but using the current structure is preferred.
In this case I think it easier to just use ggplot, and build the graph yourself. We make slight adjustments to how the data is represented (we put it in long format), and then we make the parallel coordinates plot. We can now map any attribute to cluster that you like.
library(dplyr)
library(tidyr)
# I start the same as you
x <- data.frame(a=runif(100,0,1),b=runif(100,0,1),c=runif(100,0,1),d=runif(100,0,1))
y <- data.frame(a=runif(5,0,1),b=runif(5,0,1),c=runif(5,0,1),d=runif(5,0,1))
# I find this an easier way to combine the two data.frames, and have an id column
df <- bind_rows(data = x, centroids = y, .id = 'cluster')
# We need to add id's, so we know which points to connect with a line
df$id <- 1:nrow(df)
# Put the data into long format
df2 <- gather(df, 'column', 'value', a:d)
# And plot:
ggplot(df2, aes(column, value, alpha = cluster, color = cluster, group = id)) +
geom_line() +
scale_colour_manual(values = c("data" = "grey", "centroids" = "#94003C")) +
scale_alpha_manual(values = c("data" = 0.2, "centroids" = 1)) +
theme_minimal()

Add multiple barchart or piechart at coordinate location in ggplot2

I cannot figure out how to add multiple barcharts (or, even better, piecharts) to one plot.
The simplest case would be to add two barcharts at different x,y locations onto a plane.
An application example would be to illustrate both the number of people living in a certain area, and the number of migrants (for lack of better example) living there as well.
By packaging this population information with spatial information, I hope to convey the corresponding information efficiently.
Solutions involving ggmaps are fine, however, I do not require them (displaying the data without a map layer in the background is acceptable).
To be more precise, here is some code, that is not working as I would like it to. In particular, the bar-charts are replaced by rectangles, which are not stacked, but overlap each other, leading to wrongly displayed information.
Furthermore, at each location, the total height of each bar in the bar chart (or size of the pie, for that matter) should correspond to the sum of both parts.
require(ggplot2)
x <- c(1,2,3)
y <- c(3,2,4)
pop <- c(1,7,8)
mig <- c(1,5,2)
df <- rbind(x,y,pop,mig)
df <- t(df)
df <- data.frame(df)
# bring data in long format
require(reshape2)
tmp <- melt(df, id.vars = c("x","y"))
p <- ggplot(tmp, aes(x=x, y=y, fill = variable))
p <- p + geom_rect(aes(xmin = x, xmax = x + 0.1,
ymin = y, ymax = y + value
))
print(p)
Eventually, this should serve as an input into a larger animation, that visualizes temporal development of the variables.

Resources