What is happening with my geom_line() in ggplot2? - r

I am no expert in R, but I have used ggplot2 many times and never had any problems. Still, this time I am not able to plot lines in my graph and I have no idea why (it should be something really simple though).
For instance for:
def.percent period
1 5.0657339 1984-1985
2 3.9164528 1985-1986
3 -1.756613 1986-1987
4 2.8184863 1987-1988
5 -2.606311 1988-1989
I have to code:
ggplot(plot.tab, aes(x=period, y=def.percent)) + geom_line() + geom_point() + ggtitle("Deforestation rates within Properties")
BUt when I run it, it just plots the points without a line. It also gives me this message:
geom_path: Each group consist of only one observation. Do you need to adjust the group aesthetic?
Its not really an error but I cannot figure it out how to plot the lines... Any ideas?

Your x axis (period) is a factor rather than numeric, so it doesn't connect them. You can fix this by setting group = 1 in the aesthetics, which tells ggplot2 to group them all together into a single line:
ggplot(plot.tab, aes(x = period, y = def.percent, group = 1)) +
geom_line() +
geom_point() +
ggtitle("Deforestation rates within Properties")

Related

struggling with scaling a secondary axis on a plot that is not a percentage

I'm getting crazy here, please help me!
I'm new to R and this is why. I have a graph here in which I'm trying to plot steps given against time needed to fall asleep (in minutes) and I decided to plot user ID on the x axis and the other two variables in a vertical axis of its own.
The result is as follows:
I'm not happy with many things. The scaling of the line plot and the scale of the secondary axis, the width of the columns in geom_col, and the y axis labels, I mean, the user IDs have 10 digits each and it shows up as a potency.
Can you please help me out with all I mentioned, specially with the scaling of the secondary axis?
I've searched and searched and can't do it.
The code is this one:
ggplot(data= sleep_steps) +
+ geom_col(mapping = aes(x=Id, y=AVGSteps), fill = 'cyan') +
+ geom_line(mapping = aes(x=Id,y=AVGMinToFallAsleep)) +
+ labs(title = "Relationship between Steps and Time to Fall Asleep") +
+ scale_y_continuous(sec.axis = sec_axis(~ . - 8*60*60, name = "Minutes to Fall Asleep"))
And the table is like this:
head(sleep_steps)
Id AVGSteps AVGKcal AVGMinToFallAsleep AVGTotalMinAsleep
1 1503960366 12116.742 1816.419 22.92000 360.2800
2 1644430081 7282.967 2811.300 52.00000 294.0000
3 1844505072 2580.065 1573.484 309.00000 652.0000
4 1927972279 916.129 2172.806 20.80000 417.0000
5 2026352035 5566.871 1540.645 31.46429 506.1786
6 2347167796 9519.667 2043.444 44.53333 446.8000
I'm clueless. Since it is not a percentage nor is a datetime variable, I'm not sure what to do. I've tried to change the trans argument in sec_axis function but no success. The structure of the data frame is all num.
Thank you!
You need Id as a factor to start because they are individuals, not actual numbers.
Insert before plot
sleep_steps$Id <- as.factor(sleep_steps$Id)
Without the code for your data to check, I would also say that you need another fill colour for your second scale, but you are using geom_line which is not normally how you would plot individuals because they are not connected. You may need to reconsider that. Normally you would plot all your data with boxplots which would show the averages and the quartiles etc.
If you are looking for an actual RELATIONSHIP, then you need to look into an lm plot LINK HERE

How to create a bar chart in R using with both X and Y axis defined? [duplicate]

I have a dataframe:
>picard
count reads
1 20681318
2 3206677
3 674351
4 319173
5 139411
6 117706
How do I plot log10(count) vs log10(reads) on a ggplot (barplot)?
I tried:
ggplot(picard) + geom_bar(aes(x=log10(count),y=log10(reads)))
But it is not accepting y=log10(reads). How do I plot my y values?
You can do something like this, but plotting the x axis, which is not continuous, with a log10 scale doesn't make sense for me :
ggplot(picard) +
geom_bar(aes(x=count,y=reads),stat="identity") +
scale_y_log10() +
scale_x_log10()
If you only want an y axis with a log10 scale, just do :
ggplot(picard) +
geom_bar(aes(x=count,y=reads),stat="identity") +
scale_y_log10()
Use stat="identity":
ggplot(picard) + geom_bar(aes(x=log10(count),y=log10(reads)), stat="identity")
You will actually get a warning with your approach:
Mapping a variable to y and also using stat="bin".
With stat="bin", it will attempt to set the y value to the count of cases in each group.
This can result in unexpected behavior and will not be allowed in a future version of ggplot2.
If you want y to represent counts of cases, use stat="bin" and don't map a variable to y.
If you want y to represent values in the data, use stat="identity".
See ?geom_bar for examples. (Deprecated; last used in version 0.9.2)
There's a direct way to do this, i.e. by using the geom_col() function. Just make a tiny adjustment to your code:
ggplot(picard) + geom_col(aes(x=log10(count), y=log10(reads)))
and it will give the same output as setting the stat argument to identity with geom_bar(). The thing is, geom_bar() uses count as default for stat, hence it will not take any variable for the y-axis. It will simply use the count, i.e, the number of occurrences of each value of the x-axis, for it's y-axis. I hope this answers your question.

How to make stacked bar chart with count values on y axis>

I'm trying to create a stacked barchart with gene sequencing data, where for each gene there is a tRF.type and Amino.Acid value. An example data set looks like this:
tRF <- c('tRF-26-OB1690PQR3E', 'tRF-27-OB1690PQR3P', 'tRF-30-MIF91SS2P46I')
tRF.type <- c('5-tRF', 'i-tRF', '3-tRF')
Amino.Acid <- c('Ser', 'Lys', 'Ser')
tRF.data <- data.frame(tRF, tRF.type, Amino.Acid)
I would like the x-axis to represent the amino acid type, the y-axis the number of counts of each tRF type and the the fill of the bars to represent each tRF type.
My code is:
ggplot(chart_data, aes(x = Amino.Acid, y = tRF.type, fill = tRF.type)) +
geom_bar(stat="identity") +
ggtitle("LAN5 - 4 days post CNTF treatment") +
xlab("Amino Acid") +
ylab("tRF type")
However, it generates this graph, where the y-axis is labelled with the categories of tRF type. How can I change my code so that the y-axis scale is numerical and represents the counts of each tRF type?
Barchart
OP and Welcome to SO. In future questions, please, be sure to provide a minimal reproducible example - meaning provide code, an image (if possible), and at least a representative dataset that can demonstrate your question or problem clearly.
TL;DR - don't use stat="identity", just use geom_bar() without providing a stat, since default is to use the counts. This should work:
ggplot(chart_data, aes(x = Amino.Acid, fill = tRF.type)) + geom_bar()
The dataset provided doesn't adequately demonstrate your issue, so here's one that can work. The example data herein consists of 100 observations and two columns: one called Capitals for randomly-selected uppercase letters and one Lowercase for randomly-selected lowercase letters.
library(ggplot2)
set.seed(1234)
df <- data.frame(
Capitals=sample(LETTERS, 100, replace=TRUE),
Lowercase=sample(letters, 100, replace=TRUE)
)
If I plot similar to your code, you can see the result:
ggplot(df, aes(x=Capitals, y=Lowercase, fill=Lowercase)) +
geom_bar(stat="identity")
You can see, the bars are stacked, but the y axis is all smooshed down. The reason is related to understanding the difference between geom_bar() and geom_col(). Checking the documentation for these functions, you can see that the main difference is that geom_col() will plot bars with heights equal to the y aesthetic, whereas geom_bar() plots by default according to stat="count". In fact, using geom_bar(stat="identity") is really just a complicated way of saying geom_col().
Since your y aesthetic is not numeric, ggplot still tries to treat the discrete levels numerically. It doesn't really work out well, and it's the reason why your axis gets smooshed down like that. What you want, is geom_bar(stat="count").... which is the same as just using geom_bar() without providing a stat=.
The one problem is that geom_bar() only accepts an x or a y aesthetic. This means you should only give it one of them. This fixes the issue and now you get the proper chart:
ggplot(df, aes(x=Capitals, fill=Lowercase)) + geom_bar()
You want your y-axis to be a count, not tRF.type. This code should give you the correct plot: I've removed the y = tRF.type from ggplot(), and stat = "identity from geom_bar() (it is using the default value of stat = "count instead).
ggplot(tRF.data, aes(x = Amino.Acid, fill = tRF.type)) +
geom_bar() +
ggtitle("LAN5 - 4 days post CNTF treatment") +
xlab("Amino Acid") +
ylab("tRF type")

ggplot line plot by group and sub-group without a facet

I am wondering if I am able to graph separate lines for 2 variables without using the grid function. I would prefer the 4 lines on one graph than 2 lines in 2 grids. Its ok if I can't but thought I would ask.
My data is as follows:
nd<-data.frame(Machine = c(2,2,3,3,2,2,3,3),
Source = c("tube", "machine","tube", "machine","tube", "machine","tube", "machine"),
Time=c(0,0,0,0,2,2,2,2),
Count=c(224000, 107000, 850000, 940000, 610000,116000, 1160000, 1100000))
and this code gives me what I want with a facet...
ggplot(data=nd, aes(x=Time, y=Count, group=Machine, color=Machine)) +
geom_line(aes(group=Machine))+ geom_point()+facet_grid(~Source)
Is there an alternative to this?
P.S. even though Machine is a factor variable why is my legend showing it as continuous?
One quick way is to use the interaction function, which paste your two variables with a "."
ggplot(data=nd, aes(x=Time, y=Count, color=interaction(Machine,Source))) +
geom_line() + geom_point() +
scale_color_manual("groups",
values=c("#61d4b3","#fdd365","#fb8d62","#fd2eb3"))

Changing the xlim of numeric value causing error ggplot R

I have a grouped barplot produced using ggplot in R with the following code
ggplot(mTogether, aes(x = USuniquNegR, y = value, fill = variable)) +
geom_bar(stat = "identity", position = "dodge") +
scale_fill_discrete(name = "Area",
labels = c("Everywhere", "New York")) +
xlab("Reasons") +
ylab("Proportion of total complaints") +
coord_flip() +
ggtitle("Comparison between NY and all areas")
mTogether is created using the following code
mTogether <- melt(together, id.vars = 'USuniquNegR')
The Data Frame together is made up of
USperReasons USperReasonsNY USuniquNegR
1 0.198343304187759 0.191304347826087 Late Flight
2 0.35987114588127 0.321739130434783 Customer Service Issue
3 0.0667280257708237 0.11304347826087 Lost Luggage
4 0.0547630004601933 0.00869565217391304 Flight Booking Problems
5 0.109065807639208 0.121739130434783 Can't Tell
6 0.00460193281178095 0 Damaged Luggage
7 0.0846755637367694 0.0782608695652174 Cancelled Flight
8 0.0455591348366314 0.0521739130434783 Bad Flight
9 0.0225494707777266 0.0347826086956522 longlines
10 0.0538426138978371 0.0782608695652174 Flight Attendant Complaints
Together can be generated by the following
together<-data.frame(cbind(USperReasons,USperReasonsNY,USuniquNegR))
where
USperReasons <- c(0.19834,0.35987,.06672,0.05476,0.10906,.00460,.08467,0.04555,0.02254,0.05384)
USperReasonsNY <- c(0.191304348,0.321739130,0.113043478,0.008695652,0.121739130,0.000000000,0.078260870,0.05217391,0.034782609,0.078260870)
USuniquNegR <- c("Late Flight","Customer Service Issue","Lost Luggage","Flight Booking Problems","Can't Tell","Damaged Luggage","Cancelled Flight","Bad Flight","longlines","Flight Attendant Complaints")
The problem is when I try change xlim of the ggplot using
+ xlim(0, 1)
I just seem to get an error:
Discrete value supplied to continuous scale
I can't understand why this happens but I need to resolve it because currently the x axis starts below 0 and is very highly packed:
image of ggplot output
The problem is that you are cbind()ing your column vectors together, which converts the numbers to characters. Fix that and the rest should fix itself.
together<-data.frame(USperReasons,USperReasonsNY,USuniquNegR)
You need to remove the cbind from
together<-data.frame(cbind(USperReasons,USperReasonsNY,USuniquNegR))
because str(together) tells that all three columns are factors.
With
together <- data.frame(USperReasons, USperReasonsNY, USuniquNegR)
the plot looks reasonable to me (without having to use ylim or xlim).
So, the error was not within ggplot2 but in data preparation.
Therefore, please, provide a full working example which can be copied, pasted and run when asking a question next time. Thank you.

Resources