How can I use stat_bin2d with pre-binned data? - r

I want to generate a stat_bin2d() plot but for pre-binned data;
i.e. Rather than raw points
x y
5 3
13 4
13 14
16 12
15 13
I instead have the data pre-binned with the corner points, in this case.
x y freq
0 0 1
0 10 0
10 0 1
10 10 3
I believe it might have something to do with the data param of stat_bin2d but i can't find any doco on this.

You can use geom_bin2d() (with an "identity" stat), or just directly draw rectangles.
dat <- data.frame(x=c(0,0,10,10), y=c(0,10,0,10), freq=c(1,0,1,3))
ggplot(dat) +
geom_bin2d(aes(xmin=x, ymin=y, xmax=x+10, ymax=y+10, fill=freq), stat="identity")
ggplot(dat) +
geom_rect(aes(xmin=x, ymin=y, xmax=x+10, ymax=y+10, fill=freq))

Related

geom_hline with multiple points and facet_wrap

i am trying to plot horizontal lines at specific points of my data. The idea is that i would like a horizontal line from the first value of equivalent iterations(i.e 0) at y intercept for each of my axis; SA, VLA, HLA. My question will become clearer with data.
iterations subsets equivalent_iterations axis ratio1 ratio2
0 0 0 SA 0.023569024 0.019690577
0 0 0 SA 0.023255814 0.019830028
0 0 0 VLA 0.025362319 0.020348837
0 0 0 HLA 0.022116904 0.021472393
2 2 4 SA 0.029411765 0.024911032
2 2 4 SA 0.024604569 0.022838499
2 2 4 VLA 0.026070764 0.022727273
2 2 4 HLA 0.027833002 0.027888446
4 15 60 SA 0.019746121 0.014403292
4 15 60 SA 0.018691589 0.015538291
4 15 60 VLA 0.021538462 0.01686747
4 15 60 HLA 0.017052375 0.017326733
16 5 80 SA 0.019021739 0.015021459
16 5 80 SA 0.020527859 0.015384615
16 5 80 VLA 0.023217247 0.017283951
16 5 80 HLA 0.017391304 0.016298021
and this is my plot using ggplot
ggplot(df)+
aes(x = equivalent_iterations, y = ratio1, color = equivalent_iterations)+
geom_point() +
facet_wrap(~axis) +
expand_limits(x = 0, y = 0)
What i want is for each axis SA, VLA, HLA (i.e. each facet_wrap) a horizontal line from the first point (which is at 0 equivalent iterations) at the y intercept (which is given by the ratio1 in column 5 in the first 4 values). Any help will be greatly appreciated. Thank you in advance
You can treat it like any other geom_*. Just create a new column with the value of ratio1 at which you want to plot the horizontal line. I do this by sub setting the the data by those where iterations = 0 (note SA has 2 of these) and joining the ratio1 column onto the original dataframe. This column can then be passed to the aesthetics call in geom_hline().
library(tidyverse)
df %>%
left_join(df %>%
filter(iterations == 0) %>%
select(axis, intercept = ratio1)) %>%
ggplot(aes(x = equivalent_iterations, y = ratio1,
color = equivalent_iterations)) +
geom_point() +
geom_hline(aes(yintercept = intercept)) +
facet_wrap(~axis) +
expand_limits(x = 0, y = 0)

In R, using ggplot + geom_bar, how do you vary the width of a bar?

I am able to produce a barplot with variable binwidths using
byA<-barplot(table(A),width=B$length, space = 0,col="black")
How might I do the same using ggplot?
I have tried this:
ggplot(data.frame(table(A)), aes(x=B$Start, y=Freq, width=B$length)) +
geom_bar(aes(fill=B$length), stat="identity", position="identity")
How can I get rid of the spaces between bars and shift their position to start at 0 rather than be centered on the x markers? (I'm guessing this might get me the same barplot.) NB: I prefer the x-axis values as in the second plot, so that's fine.
These are the first 20 rows of my data:
bin Freq
1 [0,0.78) 9
2 [0.78,0.99) 1
3 [0.99,1.07) 1
4 [1.07,1.201) 1
5 [1.201,1.211) 0
6 [1.211,1.77) 3
7 [1.77,1.95) 0
8 [1.95,2.14) 2
9 [2.14,2.15) 0
10 [2.15,2.581) 0
11 [2.581,3.04) 4
12 [3.04,3.11) 0
13 [3.11,3.22) 0
14 [3.22,3.33) 1
15 [3.33,3.58) 3
16 [3.58,4.18) 8
17 [4.18,4.29) 2
18 [4.29,4.48) 4
19 [4.48,4.62) 4
20 [4.62,4.8) 3
By using the breaks argument within stat_bin you can build the plot you
are looking for.
# We'll use the diamonds data set within ggplot2 for an example
library(ggplot2)
# Set breaks for the bars. The bins are:
# 1st bin: [0, 100), centered at 50
# 2nd bin: [100, 500), centered at 300
# 3rd bin: [500, 1000),
# 4th bin: [1000, 2000),
# 5th bin: [2000, 5000),
# etc.
#
# to have (a, b] style intervals use the argument right = TRUE in the stat_bin
# call.
brks <- c(0, 100, 500, 1000, 2000, 5000, 7500, 10000, max(diamonds$price))
ggplot(diamonds, aes(x = price)) +
geom_bar() +
stat_bin(breaks = brks)

R: Plot Density Graph for data in tables with respect to Labels in tables

I got a data in table form which look like this in R:
V1 V2
1 19 -1539
2 7 -1507
3 3 -1446
4 7 -1427
5 8 -1401
6 2 -422
7 22 4178
8 5 4277
9 10 4303
10 18 4431
....200 million more lines to go
I would like to plot a density plot for the value in the second column with respect to the label in the first column (i.e. each label has on density curve on a same graph). But I don't know how. Any suggestion?
If I understood the question correctly, this would end up somewhat like a density heatmap in the end. (Considering there are 200 million observations total and V1 has fairly considerable range of variation)
For that I would try ggplot and stat_binhex:
df <- read.table(text="V1 V2
1 19 -1539
2 7 -1507
3 3 -1446
4 7 -1427
5 8 -1401
6 2 -422
7 22 4178
8 5 4277
9 10 4303
10 18 4431")
library(ggplot2)
ggplot(data=df,aes(V1,V2)) +
stat_binhex() +
scale_fill_gradient(low="red", high="steelblue") +
scale_y_continuous() +
theme_bw()
stat_binhex should work well with large data and has several parameters that will help with presentation (like bins, binwidth. See ?stat_binhex)
OK I figure it out by myself
ggplot(data, aes(x=V2, color=V1)) + geom_density(aes(group=V1))
Should be able to do that.
However there is two thing I need to make sure first in order to let it run:
V1 is a factor
V2 is a numerical value
The data I got wasn't set directly by read.tables in the way I want, so I have to do the following before using ggplot:
data$V1 = as.factor(data$V1)
data$V2 = as.numeric(as.character(data$V2))

R/ ggplot2/ how to move from connected points in a scatter plot to filled and transparent triangles?

Here are the x and y coordinates from a multidimensional scaling
experiment: three cases with different distance metrics and
scaling/ no scaling. "Set" is the metrics-scaling combination
(1 to 6). Each case has a class label (0 or 4).
X1 X2 method scale class set
1 18.881729 -2.931111 euclidean no 0 1
2 -13.141592 -9.750710 euclidean no 4 1
3 -5.740138 12.681822 euclidean no 4 1
4 -21.886160 -15.467637 manhattan scaled 0 2
5 -16.755615 16.900148 manhattan scaled 4 2
6 38.641776 -1.432512 manhattan scaled 4 2
7 32.927820 -7.900971 minkowski no 0 3
8 -28.957697 -11.666982 minkowski no 4 3
9 -3.970123 19.567953 minkowski no 4 3
10 5.944225 25.819482 euclidean scaled 0 4
11 44.574669 -15.330675 euclidean scaled 4 4
12 -50.518894 -10.488807 euclidean scaled 4 4
13 14.287762 1.142065 manhattan no 0 5
14 -5.843410 -9.981600 manhattan no 4 5
15 -8.444351 8.839535 manhattan no 4 5
16 -24.838956 -8.194378 minkowski scaled 0 6
17 -11.435517 10.496471 minkowski scaled 4 6
18 36.274473 -2.302093 minkowski scaled 4 6
and ggplotting:
p <- ggplot(df, aes(X1, X2))
p <- p + geom_point(aes(colour = factor(scale), shape = factor(method)), size=10)
p <- p + geom_text(aes(label=class), size=5)
p <- p + geom_line(aes(X1,X2, group=factor(set)))
p <- p + theme_bw()
p
I would like to make 6 filled and transparent triangles one for each group ("set").
The top triangle being Manhattan-No scaling. My experiments with geom_segment have
not been successful and I am not sure whether geom_polygon is the right direction.
Any advice? Thanks!
You can use geom_polygon for closed paths:
library(ggplot2)
p <- ggplot(df, aes(X1, X2))
p <- p + geom_polygon(aes(fill = factor(set)), alpha = .4)
p <- p + theme_bw()
p
Here, alpha is used to specify the degree of transparency.

Plot Issues - Start always in (0,0)

I am working with a huge data set where all columns look something like this:
0
10
12
30
10
0
20
30
0
40
50
10
0
The idea is to make a simple plot in R where every time it reads a 0 the plot will begin in (0,0).
Do you have any idea of how I can do this?
Thanks in advance,
J
UPDATE:
I am a new user so I can't post any images!
Here's an example of the column I want to plot:
0
10
20
12
5
6
9
0
20
24
40
14
0
20
59
50
12
0
20
23
49
45
23
12
(...)
Image a line plot.
Instead of plotting a long line with all the values I want to plot several shorter lines with the first line plotting (0,10,20,12,5,6,9), the second line plotting (0,20,24,40,14) etc...
I would add an additional column specifying which subdataset your are:
Value Group
0 1
1 1
5 1
0 2
Etc.
You can then plot the subgroups using e.g. ggplot2:
ggplot(yourdata, aes(x = xcoor, y = Value, color = Group)) +
geom_line()
Which will draw the lines with different colors. Or using plot using something like:
split_dat = with(yourdata, split(Value, Group))
plot(split_dat[[1]])
for(i in 2:length(split_dat)) {
lines(split_dat[[i]])
}

Resources