I am visualizing a 4 dimensional data set.
Let's denote the variables as x, y1, y2 and y3, where x is for dates, y is a continuous variable and y2, y3 are components of 2 dimensional vectors (y2, y3). Now I want to a line plot for x and y1, additionally attaching arrows for (y2, y3) at points (x, y1).
I have tried
ggplot(data=data,aes(x=x,y=y1)) + geom_line() +
geom_segment(aes(xend=x+y2,yend=y1+y3), arrow = arrow())
but it doesn't work well so I think I may need to do some rescaling. How can I do this with ggplot?
UPDATE: I've attached a sample data set (together with its column definition). The data set contains oceanographic and surface meteorological readings taken from a series of buoys positioned throughout the equatorial Pacific. The data is expected to aid in the understanding and prediction of El Nino/Southern Oscillation (ENSO) cycles (from the description of the repository). Now, for example, I want to visualize x=day, y1=humidity, y2=zon.winds, y3=mer.winds with the symbol described above.
UPDATE2: for example, I want to plot this for a particular buoy
I am having trouble figuring what you want to display.
As far as I can see, your dataset has 50 buoys that each deliver a measurement each day.
library(ggplot2)
elnino <- read.table('elnino.txt', col.names=c('buoy','day','latitude','longitude','zon.winds','mer.winds','humidity','air.temp','ss.temp'), as.is=TRUE, na='.')
elnino <- elnino[elnino$humidity > 40,] # removing a single point that seems to be an outlier.
ggplot(elnino, aes(x=day,y=humidity, group=buoy)) + geom_point()
ggplot(elnino, aes(x=day,y=humidity, group=buoy)) + geom_line()
Which gives these two results.
What I cannot see is how do you want to display the ''zon.winds'' and ''mer.winds'' variables? I figure these in combination gives a vector, but where do you want these placed? You would get ~ 700 arrows filling your plot.
Update
In that case, you got it right, that you have to use geom_segment and calculate the ''x'', ''xend'', ''y'' and ''yend'', see geom_segment.
# We select a single buoy
el <- subset(elnino, buoy==1)
library(grid)
ggplot(el, aes(x=day,y=humidity, group=buoy)) + geom_line() + geom_segment(aes(yend=humidity+zon.winds, xend=day+mer.winds), arrow = arrow(length = unit(0.1,"cm")))
This however doe not look very nice, because the coordinates in ''zon.winds'' and ''mer.winds'' are taken as absolutes! So to utilise them, we will need to do some manual transformation of them. My values are absolute arbitrarily.
el <- transform(el, zon.winds = zon.winds * -0.3, mer.winds=mer.winds * -0.3)
Related
I am trying to plot lat long points on a plot using ggplot2 in R. The axes are lats on y and longs on x. I want a given location point to be the center of my plot and rest of the points on the scatter plot with respect to how far they are from this point (these points' lat long values are coming from a data frame).
How can that particular point be at the center of it all? I tried making two separate geom_point layers and added the one point I want in the center first, and then added the second geom layer with the rest of the data. But it doesn't work. I also tried coord_fixed by using the lat long limits from the first geom layer when I only plotted the main center point on plot, but after adding second layer, it does not remain in the center. I also wonder why is there no function or attribute to set the center of the plot around a particular point, so that rest of the points can fall wherever on the plot, but the focus point is there where I want, but maybe it is too specific of a thing.
Also, could the units on the axes be converted to meters?
The easiest way to do this is figure out what you want the range of your x axis to be, and the range that you want your y axis to be. Measure the distance along the x axis to the furthest point from your target point, and just make sure the x axis range is this big on both sides. Do the same for the y axis.
To demonstrate, I'll make a random sample of points with x and y co-ordinates, each made small and black:
set.seed(1234) # Makes this example reproducible
df <- data.frame(x = rnorm(200), y = rnorm(200), colour = "black", size = 1)
Now I'll choose one at random as my target point, making it big and red:
point_of_interest <- sample(200, 1)
df$colour[point_of_interest] <- "red"
df$size[point_of_interest] <- 5
So let's work out the furthest points North-South and East-West from our target and calculate a range which would include all points but have the target in the centre:
max_x_diff <- max(abs(df$x[point_of_interest] - df$x))
max_y_diff <- max(abs(df$y[point_of_interest] - df$y))
x_range <- df$x[point_of_interest] + c(-max_x_diff, max_x_diff)
y_range <- df$y[point_of_interest] + c(-max_y_diff, max_y_diff)
And now we just need to plot:
ggplot(df, aes(x, y, colour = colour, size = size)) +
geom_point() +
scale_colour_identity() +
lims(x = x_range, y = y_range) +
scale_size_identity() +
coord_equal()
We can see that even though our target is well off the center of the cluster, the target remains in the center of the plot.
With regards to changing latitude and longitude to meters, this requires a co-ordinate transformation. This has been answered many times on Stack Overflow and I won't duplicate those answers here. You could check out packages like rgdal or perhaps SpatialEpi which has the latlong2grid function.
I am attempting to place individual points on a plot using ggplot2, however as there are many points, it is difficult to gauge how densely packed the points are. Here, there are two factors being compared against a continuous variable, and I want to change the color of the points to reflect how closely packed they are with their neighbors. I am using the geom_point function in ggplot2 to plot the points, but I don't know how to feed it the right information on color.
Here is the code I am using:
s1 = rnorm(1000, 1, 10)
s2 = rnorm(1000, 1, 10)
data = data.frame(task_number = as.factor(c(replicate(100, 1),
replicate(100, 2))),
S = c(s1, s2))
ggplot(data, aes(x = task_number, y = S)) + geom_point()
Which generates this plot:
However, I want it to look more like this image, but with one dimension rather than two (which I borrowed from this website: https://slowkow.com/notes/ggplot2-color-by-density/):
How do I change the colors of the first plot so it resembles that of the second plot?
I think the tricky thing about this is you want to show the original values, and evaluate the density at those values. I borrowed ideas from here to achieve that.
library(dplyr)
data = data %>%
group_by(task_number) %>%
# Use approxfun to interpolate the density back to
# the original points
mutate(dens = approxfun(density(S))(S))
ggplot(data, aes(x = task_number, y = S, colour = dens)) +
geom_point() +
scale_colour_viridis_c()
Result:
One could, of course come up with a meausure of proximity to neighbouring values for each value... However, wouldn't adjusting the transparency basically achieve the same goal (gauging how densely packed the points are)?
geom_point(alpha=0.03)
I come to encounter a problem that using two different data with the help of second axis function as described in this previous post how-to-use-facets-with-a-dual-y-axis-ggplot.
I am trying to use geom_point and geom_bar but the since the geom_bar data range is different it is not seen on the graph.
Here is what I have tried;
point_data=data.frame(gr=seq(1,10),point_y=rnorm(10,0.25,0.1))
bar_data=data.frame(gr=seq(1,10),bar_y=rnorm(10,5,1))
library(ggplot2)
sec_axis_plot <- ggplot(point_data, aes(y=point_y, x=gr,col="red")) + #Enc vs Wafer
geom_point(size=5.5,alpha=1,stat='identity')+
geom_bar(data=bar_data,aes(x = gr, y = bar_y, fill = gr),stat = "identity") +
scale_y_continuous(sec.axis = sec_axis(trans=~ .*15,
name = 'bar_y',breaks=seq(0,10,0.5)),breaks=seq(0.10,0.5,0.05),limits = c(0.1,0.5),expand=c(0,0))+
facet_wrap(~gr, strip.position = 'bottom',nrow=1)+
theme_bw()
as it can be seen that bar_data is removed. Is is possible to plot them together in this context ??
thx
You're running into problems here because the transformation of the second axis is only used to create the second axis -- it has no impact on the data. Your bar_data is still being plotted on the original axis, which only goes up to 0.5 because of your limits. This prevents the bars from appearing.
In order to make the data show up in the same range, you have to normalize the bar data so that it falls in the same range as the point data. Then, the axis transformation has to undo this normalization so that you get the appropriate tick labels. Like so:
# Normalizer to bring bar data into point data range. This makes
# highest bar equal to highest point. You can use a different
# normalization if you want (e.g., this could be the constant 15
# like you had in your example, though that's fragile if the data
# changes).
normalizer <- max(bar_data$bar_y) / max(point_data$point_y)
sec_axis_plot <- ggplot(point_data,
aes(y=point_y, x=gr)) +
# Plot the bars first so they're on the bottom. Use geom_col,
# which creates bars with specified height as y.
geom_col(data=bar_data,
aes(x = gr,
y = bar_y / normalizer)) + # NORMALIZE Y !!!
# stat="identity" and alpha=1 are defaults for geom_point
geom_point(size=5.5) +
# Create second axis. Notice that the transformation undoes
# the normalization we did for bar_y in geom_col.
scale_y_continuous(sec.axis = sec_axis(trans= ~.*normalizer,
name = 'bar_y')) +
theme_bw()
This gives you the following plot:
I removed some of your bells and whistles to make the axis-specific stuff more clear, but you should be able to add it back in no problem. A couple of notes though:
Remember that the second axis is created by a 1-1 transformation of the primary axis, so make sure they cover the same limits under the transformation. If you have bars that should go to zero, the primary axis should include the untransformed analogue of zero.
Make sure that the data normalization and the axis transformation undo each other so that your axis lines up with the values you're plotting.
I'm plotting a dense scatter plot in ggplot2 where each point might be labeled by a different color:
df <- data.frame(x=rnorm(500))
df$y = rnorm(500)*0.1 + df$x
df$label <- c("a")
df$label[50] <- "point"
df$size <- 2
ggplot(df) + geom_point(aes(x=x, y=y, color=label, size=size))
When I do this, the scatter point labeled "point" (green) is plotted on top of the red points which have the label "a". What controls this z ordering in ggplot, i.e. what controls which point is on top of which?
For example, what if I wanted all the "a" points to be on top of all the points labeled "point" (meaning they would sometimes partially or fully hide that point)? Does this depend on alphanumerical ordering of labels?
I'd like to find a solution that can be translated easily to rpy2.
2016 Update:
The order aesthetic has been deprecated, so at this point the easiest approach is to sort the data.frame so that the green point is at the bottom, and is plotted last. If you don't want to alter the original data.frame, you can sort it during the ggplot call - here's an example that uses %>% and arrange from the dplyr package to do the on-the-fly sorting:
library(dplyr)
ggplot(df %>%
arrange(label),
aes(x = x, y = y, color = label, size = size)) +
geom_point()
Original 2015 answer for ggplot2 versions < 2.0.0
In ggplot2, you can use the order aesthetic to specify the order in which points are plotted. The last ones plotted will appear on top. To apply this, you can create a variable holding the order in which you'd like points to be drawn.
To put the green dot on top by plotting it after the others:
df$order <- ifelse(df$label=="a", 1, 2)
ggplot(df) + geom_point(aes(x=x, y=y, color=label, size=size, order=order))
Or to plot the green dot first and bury it, plot the points in the opposite order:
ggplot(df) + geom_point(aes(x=x, y=y, color=label, size=size, order=-order))
For this simple example, you can skip creating a new sorting variable and just coerce the label variable to a factor and then a numeric:
ggplot(df) +
geom_point(aes(x=x, y=y, color=label, size=size, order=as.numeric(factor(df$label))))
ggplot2 will create plots layer-by-layer and within each layer, the plotting order is defined by the geom type. The default is to plot in the order that they appear in the data.
Where this is different, it is noted. For example
geom_line
Connect observations, ordered by x value.
and
geom_path
Connect observations in data order
There are also known issues regarding the ordering of factors, and it is interesting to note the response of the package author Hadley
The display of a plot should be invariant to the order of the data frame - anything else is a bug.
This quote in mind, a layer is drawn in the specified order, so overplotting can be an issue, especially when creating dense scatter plots. So if you want a consistent plot (and not one that relies on the order in the data frame) you need to think a bit more.
Create a second layer
If you want certain values to appear above other values, you can use the subset argument to create a second layer to definitely be drawn afterwards. You will need to explicitly load the plyr package so .() will work.
set.seed(1234)
df <- data.frame(x=rnorm(500))
df$y = rnorm(500)*0.1 + df$x
df$label <- c("a")
df$label[50] <- "point"
df$size <- 2
library(plyr)
ggplot(df) + geom_point(aes(x = x, y = y, color = label, size = size)) +
geom_point(aes(x = x, y = y, color = label, size = size),
subset = .(label == 'point'))
Update
In ggplot2_2.0.0, the subset argument is deprecated. Use e.g. base::subset to select relevant data specified in the data argument. And no need to load plyr:
ggplot(df) +
geom_point(aes(x = x, y = y, color = label, size = size)) +
geom_point(data = subset(df, label == 'point'),
aes(x = x, y = y, color = label, size = size))
Or use alpha
Another approach to avoid the problem of overplotting would be to set the alpha (transparancy) of the points. This will not be as effective as the explicit second layer approach above, however, with judicious use of scale_alpha_manual you should be able to get something to work.
eg
# set alpha = 1 (no transparency) for your point(s) of interest
# and a low value otherwise
ggplot(df) + geom_point(aes(x=x, y=y, color=label, size=size,alpha = label)) +
scale_alpha_manual(guide='none', values = list(a = 0.2, point = 1))
The fundamental question here can be rephrased like this:
How do I control the layers of my plot?
In the 'ggplot2' package, you can do this quickly by splitting each different layer into a different command. Thinking in terms of layers takes a little bit of practice, but it essentially comes down to what you want plotted on top of other things. You build from the background upwards.
Prep: Prepare the sample data. This step is only necessary for this example, because we don't have real data to work with.
# Establish random seed to make data reproducible.
set.seed(1)
# Generate sample data.
df <- data.frame(x=rnorm(500))
df$y = rnorm(500)*0.1 + df$x
# Initialize 'label' and 'size' default values.
df$label <- "a"
df$size <- 2
# Label and size our "special" point.
df$label[50] <- "point"
df$size[50] <- 4
You may notice that I've added a different size to the example just to make the layer difference clearer.
Step 1: Separate your data into layers. Always do this BEFORE you use the 'ggplot' function. Too many people get stuck by trying to do data manipulation from with the 'ggplot' functions. Here, we want to create two layers: one with the "a" labels and one with the "point" labels.
df_layer_1 <- df[df$label=="a",]
df_layer_2 <- df[df$label=="point",]
You could do this with other functions, but I'm just quickly using the data frame matching logic to pull the data.
Step 2: Plot the data as layers. We want to plot all of the "a" data first and then plot all the "point" data.
ggplot() +
geom_point(
data=df_layer_1,
aes(x=x, y=y),
colour="orange",
size=df_layer_1$size) +
geom_point(
data=df_layer_2,
aes(x=x, y=y),
colour="blue",
size=df_layer_2$size)
Notice that the base plot layer ggplot() has no data assigned. This is important, because we are going to override the data for each layer. Then, we have two separate point geometry layers geom_point(...) that use their own specifications. The x and y axis will be shared, but we will use different data, colors, and sizes.
It is important to move the colour and size specifications outside of the aes(...) function, so we can specify these values literally. Otherwise, the 'ggplot' function will usually assign colors and sizes according to the levels found in the data. For instance, if you have size values of 2 and 5 in the data, it will assign a default size to any occurrences of the value 2 and will assign some larger size to any occurrences of the value 5. An 'aes' function specification will not use the values 2 and 5 for the sizes. The same goes for colors. I have exact sizes and colors that I want to use, so I move those arguments into the 'geom_plot' function itself. Also, any specifications in the 'aes' function will be put into the legend, which can be really useless.
Final note: In this example, you could achieve the wanted result in many ways, but it is important to understand how 'ggplot2' layers work in order to get the most out of your 'ggplot' charts. As long as you separate your data into different layers before you call the 'ggplot' functions, you have a lot of control over how things will be graphed on the screen.
It's plotted in order of the rows in the data.frame. Try this:
df2 <- rbind(df[-50,],df[50,])
ggplot(df2) + geom_point(aes(x=x, y=y, color=label, size=size))
As you see the green point is drawn last, since it represents the last row of the data.frame.
Here is a way to order the data.frame to have the green point drawn first:
df2 <- df[order(-as.numeric(factor(df$label))),]
I'm an undergrad researcher and I've been teaching myself R over the past few months. I just started trying ggplot, and have run into some trouble. I've made a series of boxplots looking at the depth of fish at different acoustic receiver stations. I'd like to add a scatterplot that shows the depths of the receiver stations. This is what I have so far:
data <- read.csv(".....MPS.csv", header=TRUE)
df <- data.frame(f1=factor(data$Tagging.location), #$
f2=factor(data$Station),data$Detection.depth)
df2 <- data.frame(f2=factor(data$Station), data$depth)
df$f1f2 <- interaction(df$f1, df$f2) #$
plot1 <- ggplot(aes(y = data$Detection.depth, x = f2, fill = f1), data = df) + #$
geom_boxplot() + stat_summary(fun.data = give.n, geom = "text",
position = position_dodge(height = 0, width = 0.75), size = 3)
plot1+xlab("MPS Station") + ylab("Depth(m)") +
theme(legend.title=element_blank()) + scale_y_reverse() +
coord_cartesian(ylim=c(150, -10))
plot2 <- ggplot(aes(y=data$depth, x=f2), data=df2) + geom_point()
plot2+scale_y_reverse() + coord_cartesian(ylim=c(150,-10)) +
xlab("MPS Station") + ylab("Depth (m)")
Unfortunately, since I'm a new user in this forum, I'm not allowed to upload images of these two plots. My x-axis is "Stations" (which has 12 options) and my y-axis is "Depth" (0-150 m). The boxplots are colour-coded by tagging site (which has 2 options). The depths are coming from two different columns in my spreadsheet, and they cannot be combined into one.
My goal is to to combine those two plots, by adding "plot2" (Station depth scatterplot) to "plot1" boxplots (Detection depths). They are both looking at the same variables (depth and station), and must be the same y-axis scale.
I think I could figure out a messy workaround if I were using the R base program, but I would like to learn ggplot properly, if possible. Any help is greatly appreciated!
Update: I was confused by the language used in the original post, and wrote a slightly more complicated answer than necessary. Here is the cleaned up version.
Step 1: Setting up. Here, we make sure the depth values in both data frames have the same variable name (for readability).
df <- data.frame(f1=factor(data$Tagging.location), f2=factor(data$Station), depth=data$Detection.depth)
df2 <- data.frame(f2=factor(data$Station), depth=data$depth)
Step 2: Now you can plot this with the 'ggplot' function and split the data by using the `col=f1`` argument. We'll plot the detection data separately, since that requires a boxplot, and then we'll plot the depths of the stations with colored points (assuming each station only has one depth). We specify the two different plots by referencing the data from within the 'geom' functions, instead of specifying the data inside the main 'ggplot' function. It should look something like this:
ggplot()+geom_boxplot(data=df, aes(x=f2, y=depth, col=f1)) + geom_point(data=df2, aes(x=f2, y=depth), colour="blue") + scale_y_reverse()
In this plot example, we use boxplots to represent the detection data and color those boxplots by the site label. The stations, however, we plot separately using a specific color of points, so we will be able to see them clearly in relation to the boxplots.
You should be able to adjust the plot from here to suit your needs.
I've created some dummy data and loaded into the chart to show you what it would look like. Keep in mind that this is purely random data and doesn't really make sense.