I have a dataset where each row corresponds to an x,y and z value.
I would like to create a 2d scatterplot of variables x and y and overlay the 2d contours of the 3 dimensional space.
I tried the following:
load("https://www.dropbox.com/s/ya5g2n47al2cn1j/df.Rdata?dl=0")
df <- as.data.frame(df)
ggplot(data=df, aes(x=df$x,y=df$y,color=df$z))+
geom_point()+
geom_contour(aes(z=df$z))
but I get the warning message:
Warning message:
Not possible to generate contour data
Is there a way to do this? Most examples I could find online use similar data of x, y, z form
Here's how the data looks:
> head(df)
x y z
1 0.15395671 0.1548728 -9.622222e-02
2 0.18148413 0.1554308 -1.091111e-01
3 0.07870902 0.1538021 -2.911111e-02
4 0.13514970 0.1134729 -1.133333e-01
5 0.03504008 0.1053258 4.222222e-03
6 0.02161680 0.1140364 -1.110223e-16
I think your data may not have a value of z for each possible combination of x and y values; you could not have a matrix that is "x" rows and "y columns and have a value of z at each index, you would have gaps. You may still be able to get what you want with geom_density_2d(), however. Given your example data above:
x<-c(0.15395671, 0.18148413, 0.07870902, 0.1351497, 0.03504008, 0.0216168)
y<-c(0.1548728, 0.1554308, 0.1538021, 0.1134729, 0.1053258, 0.1140364)
z<-c(-0.09622222, -0.1091111, -0.02911111, -0.1133333, 0.004222222, 0)
xyz <- data.frame(x,y,z)
ggplot(xyz, aes(x, y, z = z) ) + geom_density_2d()
Related
I have 2-dimension data (from the lower part of a matrix):
m <- data.frame(x=c(1,1,1,2,2,3),y=c(1,2,3,1,2,1))
# x y
#1 1 1
#2 1 2
#3 1 3
#4 2 1
#5 2 2
#6 3 1
If I plot this, it gives something like this:
x
x x
x x x
So, I have the x and y axis. However, I'd like to plot this data in a ternary plot, like this:
x
x x
x x x
I need the z axis. It's the same data, but with another axis.
I don't think what you want is a ternary plot, though I am not at all sure why you are looking to move the data in this manner if you don't have a z-value. Take a look at this description of ternary plots and note that they, by definition, have three variables (and your data only have 2) and they must always sum to a constant (unless you mean yours are missing the last variable, which makes each row sum to some constant that you have not defined in your question?).
If you are just looking to shift the x-values, you can center them for each value of y, though only if the y values are discrete as is the case here. This uses dplyr to do the modifications, and scales each set of x values to center around 0 (though it does not modify for standard deviation).
m %>%
group_by(y) %>%
mutate(newX = as.numeric(scale(x, scale = FALSE)) ) %>%
ggplot(aes(x = newX, y = y)) +
geom_point()
Gives:
I'm not sure what information you are getting from doing this, as you lose all ability to compare back to the original x scale this way. Unless you add color = factor(x) to the mapping, like this:
If this is not what you are trying to do (and I rather hope that it is not), please update your question to clarify the output that you are expecting.
On the off chance that what you meant was that there was a z column missing which caused each row to sum to a particular constant, here is an example using ggtern to plot that, with the assumption that each row sums to 10 units:
m %>%
mutate(z = 10 - (x + y)) %>%
ggtern(aes(x, y, z)) +
geom_point()
I have 36 different data frames that contain dX and dY variables. I have stored them in a list and want to display them all on the same graph with x = dX and y = dY.
The 36 data frames do not share the same dX values. They roughly cover the same range but don't have the exact same values, so using a merge creates a ton of NA values. The number of rows are however identical.
I tried something ugly that almost works:
g <- ggplot()
for (i in 1:36) {
g <- g + geom_line(data = df.list[[i]], aes(dX, dY, colour = i))
}
print(g)
This displays the curves correctly, but the colours are not applied (and I don't have an appropriate legend). OK, 36 lines in the legend might not be practical. In that case I would reduce the number of lines to draw.
Second approach: I tried melting the data frames as follows.
df <- melt(df.list, id.vars = "dX")
ggplot(df, aes(x = dX, y = value, colour = L1)) + geom_line()
But this creates a 4-variable data frame with columns: dX, variable (always equal to dY), value (here are the dY values) and L1, which contains the index of the data frame in the list.
Here are the first lines of the melted data frame:
dX variable value L1
1 4.952296 dY 6.211485e-05 1
2 6.766889 dY 7.661041e-05 1
3 8.581481 dY 9.550221e-05 1
4 10.396074 dY 1.192053e-04 1
5 12.210666 dY 1.498834e-04 1
6 14.025259 dY 1.883612e-04 1
7 15.839851 dY 2.365646e-04 1
8 17.654444 dY 2.956796e-04 1
9 19.469036 dY 3.662252e-04 1
10 21.283629 dY 4.470143e-04 1
There are several problems here:
"variable" is always equal to dY. What I was expecting was the index
of the data frame in the list (which is stored in L1), or even
better, the result of a function name(i)
The curve uses a continuous scale, ranging from 1 to 36 while I wanted a discrete scale
Finally, using the geom_line() does not seem to draw the data frames curves individually, but links the points of different data sets together
Any idea how to solve my problem?
I would combine the data.frame into one large data.frame, add an id column, and then plot with ggplot. Lots of ways to do this, here is one:
newDF <- do.call(rbind, list.df)
newDF$id <- factor(rep(1:length(df.list), each = sapply(df.list, nrow)))
g <- geom(newDF, aes(x = dX, y = dY, colour = id)
g <- g + geom_line()
print(g)
It seems like the most straightforward option would be to create a single data frame (as suggested by one of the commenters) and use the index of the source data frame for the colour aesthetic:
library(dplyr) # For bind_rows() function
ggplot(bind_rows(df.list, .id="id"), aes(dX, dY, colour=id)) +
geom_line()
In the code above, .id="id" causes bind_rows to include a column called id containing the names of the list elements containing each of the data frames.
I have a three column data frame with latitude, longitude, and underground measurements as the columns. I am trying to figure out how to interpolate data points between the points I have (which are irregularly space) and then create a smooth surface plot of the entire area. I have tried to use the 'surface3d' function in the 'rgl' package but my result looks like a single giant spike. I have been able to plot the data with 'plot3d' but I need to take it a step further and fill in the blank spaces with interpolation. Any ideas or suggestions? I'm also open to using other packages, the rgl just seemed like the best fit at the time.
EDIT: here's an excerpt from my data (measurements of aquifer depth) :
lat_dd_NAD83 long_dd_NAD83 lev_va_ft
1 37.01030 -101.5006 288.49
2 37.03977 -101.6633 191.68
3 37.05201 -100.4994 159.34
4 37.06567 -101.3292 174.07
5 37.06947 -101.4561 285.08
6 37.10098 -102.0134 128.94
Just to add small but (maybe) important note about interpolation.
Using very nice package "akima" you can easily interpolate your data:
library(akima)
library(rgl)
# library(deldir)
# Create some fake data
x <- rnorm(100)
y <- rnorm(100)
z <- x^2 + y^2
# # Triangulate it in x and y
# del <- deldir(x, y, z = z)
# triangs <- do.call(rbind, triang.list(del))
#
# # Plot the resulting surface
# plot3d(x, y, z, type = "n")
# triangles3d(triangs[, c("x", "y", "z")], col = "gray")
n_interpolation <- 200
spline_interpolated <- interp(x, y, z,
xo=seq(min(x), max(x), length = n_interpolation),
yo=seq(min(y), max(y), length = n_interpolation),
linear = FALSE, extrap = TRUE)
x.si <- spline_interpolated$x
y.si <- spline_interpolated$y
z.si <- spline_interpolated$z
persp3d(x.si, y.si, z.si, col = "gray")
Spline - interpolated picture (200 steps)
With this package you can easily change amount of steps of interpolation, etc. You will need at least 10 (the more the better) points to get a reasonable spline interpolation with this package. Linear version works well regardless amount of points.
P.S. Thanks for user 2554330 - didn't knew about deldir, really useful thing in some cases.
You could use the deldir package to get a Delaunay triangulation of your points, then convert it to the form of data required by triangles3d for plotting. I don't know how effective this would be on a really large dataset, but it seems to work on 100 points:
library(deldir)
library(rgl)
# Create some fake data
x <- rnorm(100)
y <- rnorm(100)
z <- x^2 + y^2
# Triangulate it in x and y
del <- deldir(x, y, z = z)
triangs <- do.call(rbind, triang.list(del))
# Plot the resulting surface
plot3d(x, y, z, type = "n")
triangles3d(triangs[, c("x", "y", "z")], col = "gray")
EDITED to add:
The version of rgl on R-forge now has a function to make this easy. You can now produce a plot similar to the one above using
library(deldir)
library(rgl)
plot3d(deldir(x, y, z = z))
There is also a function to construct mesh3d objects from the deldir() output.
take the following as a simple example:
A <- c(1,1,1,2,2,3,3,4,4,4)
B <- c(1,0,0,1,0,1,0,1,0,0)
C <- c(6,3,2,4,1,2,6,8,4,3)
data <- data.frame(A,B,C)
data
I want to create a scatterplot that looks like so:
without the blue and red boarders, they are there as an explanitary guide
So I want to plot:
Each time B=1, I want to use its C value for the horizontal scale and plot the C value where B=0 along the vertical scale.
So for example; where X=6, we have points at x=3 and 2
where X=4, we have points at x=1
where X=2, we have a point at x=6
where X=8, we have a points at x=4 and 3
Must i manipuulate/melt/reshape my data somehow?
Using na.locf from the zoo package there is no need for reshaping.
library(zoo)
#extract the part of C that we need for mapping x
data$D = ifelse(data$B==1,data$C,NA)
#fill in the blanks
data$D = na.locf(data$D)
#Extract from C what we need for y
data$E = ifelse(data$B==1,NA,data$C)
#Done!
plot(data$D,data$E)
I have a data frame of two variables, x and y in R. What i want to do is bin each entry by its value of x, but then display the density of the value of y for all entries in each bin. More specifically, for each interval in units of x, i want to plot the sum(of all values of y of entries whose values of x are in the specific interval)/(sum of all values of y for all entries). I know how to do this manually via vector manipulation, but i have to make a lot of these plots and wanted to know if their was a quicker way to do this, maybe via some advanced hist.
You could generate the groupings using cut and then use a facet_grid to display the multiple histograms:
# Sample data with y depending on x
set.seed(144)
dat <- data.frame(x=rnorm(1000))
dat$y <- dat$x + rnorm(1000)
# Generate bins of x values
dat$grp <- cut(dat$x, breaks=2)
# Plot
library(ggplot2)
ggplot(dat, aes(x=y)) + geom_histogram() + facet_grid(grp~.)