Unable to understand graph data in R language - r

I was working on a task where I'm required to find if there is increase in price while increase in number of rooms. I've used ggplot2 and geom_point.
But I'm unable to understand is there any increment. Could any one help to make me understand this graph please. Or is there any other way to draw graph so that I can understand easily.
The following line is my code.
ggplot(df, aes(x = rooms, y = price)) + geom_point()

Try this - it adds a regression line with confidence interval:
ggplot(df, aes(x = rooms, y = price)) +
geom_point() +
geom_smooth(method = "lm")

What you could do to improve presentation of your data is use geom_jitter to make the points overlap less. Perhaps you could tweak transparency, too. If you add geom_violin you could also show the distribution of points. Finally, you can add mean to every level (number of rooms). Something along the lines of
library(ggplot2)
ggplot(mtcars, mapping = aes(x = cyl, y = hp)) +
theme_bw() +
stat_summary(geom = "point", fun.y = mean, aes(group = 1), size = 2, color = "red") +
geom_jitter(width = 0.25)

Related

R, ggplot; Control size of symbols and line thickness independently in legend in ggplot

I have a plot looking principally like this when based on the mpg-dataset:
library(datasets)
plot2 <- ggplot(mapping = aes(
x = cty,
y = hwy,
group = as.factor(cyl),
shape = as.factor(cyl),
linetype = as.factor(cyl)),
data = mpg) +
geom_point() +
geom_smooth(method = lm, se = F, color = "black") +
theme(legend.key.width = unit(4,"cm"))
plot2
I would like to be able to control the size of the symbols in the legend without affecting the thickness of the lines. Trying with the idea of using override.aes from similar threads gives bigger symbols but at the same time thicker lines.
plot2 + guides(shape = guide_legend(override.aes = list(size=5)))
This question is partly described in Modifying legends in ggplot2 with interactions and guides; however, the question in the link adresses changes to Geom_Path and not Geom_Smooth and does not explain how to find the exact definition of the ggproto()-object to change. It would be helpful if someone could supply this information - then I could probably modify the code myself in a similar fashion.

How to correctly represent both hline and abline in a legend in ggplot2?

I am trying to create a legend in ggplot2 for hlines and ablines using clues from other similar questions. I am close to getting what I need with the following code (and example image) but I can't seem to get rid of the extra lines crossing the legend icons.
p <- ggplot(mtcars, aes(x = wt, y=mpg, col = factor(cyl))) + geom_point()
p + geom_hline(aes(lty="foo",yintercept=20)) +
geom_hline(aes(lty="bar",yintercept=25)) +
geom_hline(aes(lty="bar",yintercept=30)) +
geom_abline(aes(lty = "regression", intercept = 10 , slope = 1)) +
scale_linetype_manual(name="",values=c(2,3,1))
This behavior in the legend only appears when I include the abline. Without it, both hline appear as intended in the legend.
What am I missing here?
As a secondary point: both hlines (labelled "bar" here) here use the exact same configuration, but have different values for yintercept. I wasn't able to draw both of them with the same command, receiving an error (Error: Aesthetics must be either length 1 or the same as the data (32): linetype, yintercept).
Whenever I copy&paste a command like this, it feels like I'm not doing it right. Is it possible to set two yintercepts, while manually defining the linetype to create a legend?
You can use argument show.legend in the geom_abline:
ggplot() +
geom_point(aes(x = mtcars$wt, y=mtcars$mpg, col = factor(mtcars$cyl))) +
geom_hline(aes(lty=c("foo", "bar","bar"),yintercept=c(20,25,30))) +
geom_abline(aes(lty = "regression", intercept = 10 , slope = 1), show.legend = F) +
scale_linetype_manual(name="",values=c(2,3,1) )
If you not define the data on the ggplot command you can define all the hlines in just one command:
ggplot(mtcars, aes(x = wt, y=mpg, col = factor(cyl))) + geom_point() +
geom_hline(aes(lty="foo",yintercept=20)) +
geom_hline(aes(lty="bar",yintercept=25)) +
geom_hline(aes(lty="bar",yintercept=30)) +
geom_abline(aes(lty = "regression", intercept = 10 , slope = 1), show.legend = F) +
scale_linetype_manual(name="",values=c(2,3,1) )

Change color and add shape to lines on ggplot2 geom_freqpoly

I'm trying to add shapes on the lines plotted using geom_freqpoly to give more visibility to them if the plot is printed b/w on paper.
data <- data.frame(time=runif(1000,0,20000),
class=c("a","b","c","d"))
ggplot(data, aes(time, colour = class)) + geom_freqpoly(binwidth = 1000) + geom_point(aes(shape=class))
but this generates this error:
'Error: geom_point requires the following missing aesthetics: y'
How can I solve this error?
Another thing is that I want to use a single colour (eg. blue) to draw the lines
but with scale_colour_brewer() I can't change the colour scale, I want to change it because the lightest colour is nearly white and you can barely see it.
How can I add a custom min and max for the colours?
How about this? The error you are getting is being produced by geom_point which needs x and y, so I removed it.
ggplot(data, aes(x = time, color = class)) +
geom_freqpoly(binwidth = 1000) +
scale_color_brewer(palette = "Blues") +
theme_dark()
If you don't want the dark background, pass manual values from RColorBrewer. The following example uses every second color to increase the contrast.
p1 <- ggplot(data, aes(x = time, color = class)) +
geom_freqpoly(binwidth = 1000) +
scale_color_manual(values = RColorBrewer::brewer.pal(9, name = "Blues")[c(3, 5, 7, 9)])
EDIT
You can extract summarised data from a ggplot object using layer_data function.
xy <- layer_data(p1)
ggplot(xy, aes(x = x, y = count, color = colour)) +
theme_bw() +
geom_line() +
geom_point() +
scale_color_manual(values = RColorBrewer::brewer.pal(9, name = "Blues")[c(3, 5, 7, 9)])

ggplot2: geom_bar with group, position_dodge and fill

I am trying to generate a barplot such that the x-axes is by patient with each patient having multiple samples. So for instance (using the mtcars data as a template of what the data would look like):
library("ggplot2")
ggplot(mtcars, aes(x = factor(cyl), group = factor(gear))) +
geom_bar(position = position_dodge(width = 0.8), binwidth = 25) +
xlab("Patient") +
ylab("Number of Mutations per Patient Sample")
This would produce something like this:
With each barplot representing a sample in each patient.
I want to add additional information about each patient sample by using colors to fill the barplots (e.g. different types of mutations in each patient sample). I was thinking I could specify the fill parameter like this:
ggplot(mtcars, aes(x = factor(cyl), group = factor(gear), fill = factor(vs))) +
geom_bar(position = position_dodge(width = 0.8), binwidth = 25) +
xlab("Patient") +
ylab("Number of Mutations per Patient Sample")
But this doesn't produce "stacked barplots" for each patient sample barplot. I am assuming this is because the position_dodge() is set. Is there anyway to get around this? Basically, what I want is:
ggplot(mtcars, aes(x = factor(cyl), fill = factor(vs))) +
geom_bar() +
xlab("Patient") +
ylab("Number of Mutations per Patient Sample")
But with these colors available in the first plot I listed. Is this possible with ggplot2?
I think facets are the closest approximation to what you seem to be looking for:
ggplot(mtcars, aes(x = factor(gear), fill = factor(vs))) +
geom_bar(position = position_dodge(width = 0.8), binwidth = 25) +
xlab("Patient") +
ylab("Number of Mutations per Patient Sample") +
facet_wrap(~cyl)
I haven't found anything related in the issue tracker of ggplot2.
If I understand your question correctly, you want to pass in aes() into your geom_bar layer. This will allow you to pass a fill aesthetic. You can then place your bars as "dodge" or "fill" depending on how you want to display the data.
A short example is listed here:
ggplot(mtcars, aes(x = factor(cyl), fill = factor(vs))) +
geom_bar(aes(fill = factor(vs)), position = "dodge", binwidth = 25) +
xlab("Patient") +
ylab("Number of Mutations per Patient Sample")
With the resulting plot: http://imgur.com/ApUJ4p2 (sorry S/O won't let me post images yet)
Hope that helps!
I have hacked around this a few times by layering multiple geom_cols on top of each other in the order I prefer. For example, the code
ggplot(data, aes(x=cert, y=pct, fill=Party, group=Treatment, shape=Treatment)) +
geom_col(aes(x=cert, y=1), position=position_dodge(width=.9), fill="gray90") +
geom_col(position=position_dodge(width=.9)) +
scale_fill_manual(values=c("gray90", "gray60"))
Allowed me to produce the feature you're looking for without faceting. Notice how I set the background layer's y value to 1. To add more layers, you can just cumulatively sum your variables.
Image of the plot:
I guess, my answer in this post will help you to build the chart with multiple stacked vertical bars for each patient ...
Layered axes in ggplot?
One way I don't see suggested above is to use facet_wrap to group samples by patient and then stack mutations by sample. Removes the need for dodging. Also changed and modified which mtcars attributes used to match question and get more variety in the mutations attribute.
patients <-c('Tom','Harry','Sally')
samples <- c('S1','S2','S3')
mutations <- c('M1','M2','M3','M4','M5','M6','M7','M8')
ds <- data.frame(
patients=patients[mtcars$cyl/2 - 1],
samples=samples[mtcars$gear - 2],
mutations=mutations[mtcars$carb]
)
ggplot(
ds,
aes(
x = factor(samples),
group = factor(mutations),
fill = factor(mutations)
)
) +
geom_bar() +
facet_wrap(~patients,nrow=1) +
ggtitle('Patient') +
xlab('Sample') +
ylab('Number of Mutations per Patient Sample') +
labs(fill = 'Mutation')
Output now has labels that match the specific language of the request...easier to see what is going on.

What is the simplest method to fill the area under a geom_freqpoly line?

The x-axis is time broken up into time intervals. There is an interval column in the data frame that specifies the time for each row. The column is a factor, where each interval is a different factor level.
Plotting a histogram or line using geom_histogram and geom_freqpoly works great, but I'd like to have a line, like that provided by geom_freqpoly, with the area filled.
Currently I'm using geom_freqpoly like this:
ggplot(quake.data, aes(interval, fill=tweet.type)) + geom_freqpoly(aes(group = tweet.type, colour = tweet.type)) + opts(axis.text.x=theme_text(angle=-60, hjust=0, size = 6))
I would prefer to have a filled area, such as provided by geom_density, but without smoothing the line:
The geom_area has been suggested, is there any way to use a ggplot2-generated statistic, such as ..count.., for the geom_area's y-values? Or, does the count aggregation need to occur prior to using ggplot2?
As stated in the answer, geom_area(..., stat = "bin") is the solution:
ggplot(quake.data, aes(interval)) + geom_area(aes(y = ..count.., fill = tweet.type, group = tweet.type), stat = "bin") + opts(axis.text.x=theme_text(angle=-60, hjust=0, size = 6))
produces:
Perhaps you want:
geom_area(aes(y = ..count..), stat = "bin")
geom_ribbon can be used to produce a filled area between two lines without needing to explicitly construct a polygon. There is good documentation here.
ggplot(quake.data, aes(interval, fill=tweet.type, group = 1)) + geom_density()
But I don't think this is a meaningful graphic.
I'm not entirely sure what you're aiming for. Do you want a line or bars. You should check out geom_bar for filled bars. Something like:
p <- ggplot(data, aes(x = time, y = count))
p + geom_bar(stat = "identity")
If you want a line filled in underneath then you should look at geom_area which I haven't personally used but it appears the construct will be almost the same.
p <- ggplot(data, aes(x = time, y = count))
p + geom_area()
Hope that helps. Give some more info and we can probably be more helpful.
Actually i would throw on an index, just the row of the data and use that as x, and then use
p <- ggplot(data, aes(x = index, y = count))
p + geom_bar(stat = "identity") + scale_x_continuous("Intervals",
breaks = index, labels = intervals)

Resources