New to R, new to stackoverflow, so forgive me....
I'm trying to do a timeseries plot in R using ggplot2. I want to show two line graphs which are filled below their value for a given date. I've been trying to do this with the geom_area(position="identity") function.
However, only one color shows up on my graph (though both show in the legend). I started by melting my data using melt() and am now working with three columns (X=time, variable=groundwater well, value=groundwater elevation). Below is a simplified version of my code, and a screenshot at what I get.
Bank01MWtest<-data.frame(X=(c(1,2,2,1)),variable=(c("MW-01A","MW-01A","MW-01B","MW-01B")),value=(c(576,571,584,580)))
ggplot(data=Bank01MWtest, aes(x=X, y=value,group=variable))+geom_area(position="identity", aes(fill=variable))+geom_line(aes(color=variable))+coord_cartesian(ylim=c(570,590))
I want to show two colors. One color below MW.01A line and one below MW.01B line.
Any help?
Try this with geom_area, with some synthetically generated Bank01MWtest dataset:
head(Bank01MWtest)
Time variable value
1 2016-07-01 MW-01A 582.5482
2 2016-07-02 MW-01A 580.5652
3 2016-07-03 MW-01A 582.3305
4 2016-07-04 MW-01A 583.3122
5 2016-07-05 MW-01A 576.3432
6 2016-07-06 MW-01A 584.4086
tail(Bank01MWtest)
Time variable value
195 2016-10-03 MW-01B 573.8355
196 2016-10-04 MW-01B 575.3218
197 2016-10-05 MW-01B 570.8007
198 2016-10-06 MW-01B 572.3415
199 2016-10-07 MW-01B 575.3291
200 2016-10-08 MW-01B 578.0055
ggplot(data=Bank01MWtest, aes(x=Time, y=value,group=variable))+
geom_area(position='identity', aes(fill=variable), alpha=0.2)+
scale_x_date(date_breaks= "1 month", date_minor_breaks = "15 days", date_labels = "%b",
limits = c(min(Bank01MWtest$Time),max(Bank01MWtest$Time))) +
geom_line(aes(color=variable))+coord_cartesian(ylim=c(570,590))
I believe geom_area is being replaced by geom_ribbon in ggplot2, so I'll use the latter in my solution. You'll also need to restructure your data from long to wide for this solution, giving each of the legend categories their own column. I'll do this with the dcast function within the reshape2 package.
The idea here is to add layers with different ymax variables, assign legend labels with the fill option, and then add a legend with colors using the scale_fill_manual function.
library(ggplot2)
library(reshape2)
Bank01MWtest<-data.frame(X=sample(c(1,1,2,2)),
variable=sample(c("MW01A","MW01A","MW01B","MW01B")),
value=sample(c(576,571,584,580)))
### Note above I modified your category labels by getting rid of the "-" sign
### so that they can be used as variable names below.
dat = dcast(Bank01MWtest, X~variable)
ggplot(data=dat, aes(x=X)) +
geom_ribbon(aes(ymin=0, ymax=MW01A, fill="MW01A")) +
geom_ribbon(aes(ymin=0, ymax=MW01B, fill="MW01B")) +
scale_fill_manual("", values=c("green", "blue")) +
coord_cartesian(ylim=c(570,590))
Related
I have tried to use ggplot2 to create a professional-looking graph, but I have having some trouble with a lot of things. I would like to add color to the data points, add dates on the x-axis, and create a line of best fit or something similar if possible. I have been searching on Stack Exchange and Google in general to try and solve this problem but to no avail. I am using the "Civilian Labor Force Participation Rate: 20 years and over, Black or African American Men" from the Federal Reserve Bank of St. Louis (FRED).
I am using RStudio, and I imported the data from LNS11300031 and then used the read.csv() function to read it into RStudio. I initially used the plot() function to plot the data, but I want to use the ggplot() function to create a better looking graph, but when I create the graph the data points look very opaque, blurry, and cloudy, and there is no labels on the x-axis. I would like to add color and a line of best fit, but I do not know how to do that.
This is the code I used to create the graph with no x-axis labels:
ggplot(data = labor, mapping = aes(x = labor$DATE, y = labor$LNS11300031)) + geom_point(alpha = 0.1)
This is the graph that my code produced:
Here is some sample data (labor is the variable I used to store the data from the FRED site):
head(labor) DATE LNS11300031
1 1972-01-01 77.6
2 1972-02-01 78.3
3 1972-03-01 78.7
4 1972-04-01 78.6
5 1972-05-01 78.7
6 1972-06-01 79.4
I would like to change the variable name LNS11300031 to Labor Force Participation Rate
Additional information about the data:
str(labor)
'data.frame': 566 obs. of 2 variables:
$ DATE : Factor w/ 566 levels "1972-01-01","1972-02-01",..: 1 2 3 4 5 6 7 8 9 10 ...
$ LNS11300031: num 77.6 78.3 78.7 78.6 78.7 79.4 78.8 78.7 78.6 78.1 ...
I would like the code to create much clearer data points with color and a trend line, and be able to have an x-axis with the corresponding dates.
Here's a basic attempt to cover all 3 of your desired improvements:
Clearer points: don't set the alpha too low! A bit of alpha is good for overlapping points, but alpha = 0.1 makes them too blurry.
Colour: R understands simple colour names like "red", but also hex colour codes. Pick any colours you want.
Trend line: easy to add with stat_smooth(). I've used method='lm' which gives a straight linear regression line but there are more flexible alternatives.
Date labels on the x-axis: Make sure your DATE column is correctly set as a Date type, and use scale_x_date() to tweak the labels.
quantmod::getSymbols("LNS11300031", src="FRED")
# Your data is available from the quantmod package
labor = LNS11300031 %>%
as.data.frame() %>%
rownames_to_column(var = "DATE") %>%
# Make sure DATE is a Date column
mutate(DATE = as.Date(DATE))
# Generally, you don't use data$column syntax within ggplot,
# just give the column name
ggplot(data = labor, mapping = aes(x = DATE, y = LNS11300031)) +
geom_point(alpha = 0.7, colour = "#B07AA1") +
stat_smooth(method = "lm", colour = "#E15759", se = FALSE) +
scale_x_date(date_breaks = "5 years", date_labels = "%Y") +
theme_minimal()
Output:
There's a little problem that I'm not able to solve. In my dataset I have three columns (pluginUserID, type, timestamp) and I want to create a ggplot with facet wrap for every pluginUserID. My dataset looks like this, just with more users.
pluginUserID type timestamp
3 follow 2015-03-23
3 follow 2015-03-27
43 follow 2015-04-28
So in the next step I wanted to create a ggplot with a facet wrap, so my code looks like this.
timeline.plot <- ggplot(
timeline.follow.data,
aes(x=timeline.follow.data$timestamp, y=timeline.follow.data$type)
) + geom_bar(stat = "identity") +
facet_wrap(~timeline.follow.data$pluginUserID) +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank()
)
If I'm going to view my plot, it looks like this.
As you can see, on the y axis there's no unit to read and that's what I want to do. I want to visualise the number of follows per day and per pluginUser. And on the y axis should be a unit.
as I see your dataset I would do one thing before visualize it- count.
timeline.follow.data<- timeline.follow.data %>%
count(pluginUserID, type, timestamp)
if your the data looks like this:
pluginUserID type timestamp
3 follow 2015-03-23
3 follow 2015-03-27
3 follow 2015-03-27
43 follow 2015-04-28
43 follow 2015-04-28
after count function:
pluginUserID type timestamp n
3 follow 2015-03-23 1
3 follow 2015-03-27 2
43 follow 2015-04-28 2
and so on.
Then use ggplot function:
timeline.plot <- ggplot(
timeline.follow.data,
aes(x=timeline.follow.data$timestamp, y=timeline.follow.data$n)
) + geom_bar(stat = "identity") +
facet_wrap(~timeline.follow.data$pluginUserID) +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank()
)
n would mean as you wanted, how many follows was for selected user and day. Hope it helped :)
I'm running into trouble plotting some data onto two seperate y-scales. Here are two visualizations of some air quality data I've been working with. The first figure depicts each pollutant on a parts per billion y-scale. In this figure, co dominates the y-axis, and none of the other pollutants' variation is being properly represented. Within air quality science, the pollutant co is conventionally represented in parts per million instead of parts per billion. The second figure illustrates the same no, no2, and o3 data, but I've converted the co concentration from ppb to ppm (divide by 1000). However, while no, no2, and o3 look better, the variation in co is not being justly represented...
Is there an easy way using ggplot() to normalize the scale of the y-axis and best represent each type of pollutant? I'm also trying to work through some other examples that make use of gridExtra to stitch together two seperate plots, each retaining their original y-scales.
The data required to generate these figures is huge (26,295 observations), so I'm still working on a reproducible example. Hopefully a solution can be found within the ggplot() code described below:
plt <- ggplot(df, aes(x=date, y = value, color = pollutant)) +
geom_point() +
facet_grid(id~pollutant, labeller = label_both, switch = "y")
plt
Here's what the head(df) looks like (before converting the co to ppm):
date id pollutant value
1 2017-06-16 10:00:00 Pohl co 236.00
2 2017-06-16 10:00:00 Pohl no 23.06
3 2017-06-16 10:00:00 Pohl no2 12.05
4 2017-06-16 10:00:00 Pohl o3 8.52
5 2017-06-16 11:00:00 Pohl co 207.00
6 2017-06-16 11:00:00 Pohl no 20.82
Marius pointed out that including scales = "free_y" in the facet_grid() function would provide the desired output. Thanks!
Solution:
plt <- ggplot(df, aes(x=date, y = value, color = pollutant)) +
geom_point() +
facet_grid(pollutant~id, scales = "free_y", labeller = label_both, switch = "y")
plt
Output:
I am having a problem with generating line plots in r. I have the following table:
7*7 6*6 5*5 4*4 3*3
Biodiff 728 436 0 0 0
EdgeR 728 638 421 132 34
DESeq 728 367 158 33 13
Cuff 728 596 493 256 138
Now I want to plot a line plot with this table as can be plotted in excel. I am putting the image of the excel here but it is pretty straight forward approach to make marked line curve in excel. Now I want to create a similar one in R in which the values in the table are hard coded.
Can you please guide me how to do this?
If you really want to start from a single table construct a data.frame (since no numbers are allowed as first symbol in variable names I have written the numbers as words):
dat<-data.frame(method=c("Biodiff","EdgeR","DESeq","Cuff"),
sevenXseven=c(728,728,728,728),
sixXsix=c(436,638,367,596),
fiveXfive=c(0,421,158,493),
fourXfour=c(0,132,33,256),
threeXthree=c(0,34,13,138))
I would suggest using the ggplot2 package. ggplot2 likes long table format instead of wide tables. With the reshape2 package you can melt the table into long format:
library(reshape2)
data<-melt(dat,id.vars="method",variable.name="cat")
You can plot the data with ggplot2 and mapping the method to a colour aesthetic:
library(ggplot2)
ggplot(data,aes(x=cat,y=value)) +
geom_point(size=10,aes(colour=method)) +
geom_line(aes(colour = method, group = method),size=2) +
theme_bw()
And to make it look even more like your example you could try:
ggplot(data,aes(x=cat,y=value)) +
geom_point(size=10,aes(colour=method,shape=method)) +
geom_line(aes(colour = method, group = method),size=2) +
scale_colour_discrete(name="") +
scale_shape_discrete(guide="none") +
theme_bw() +
labs(y="Number of positive DE genes",x="")
I am trying to plot two vectors with different values, but equal length on the same graph as follows:
a<-23.33:52.33
b<-33.33:62.33
days<-1:30
df<-data.frame(x,y,days)
a b days
1 23.33 33.33 1
2 24.33 34.33 2
3 25.33 35.33 3
4 26.33 36.33 4
5 27.33 37.33 5
etc..
I am trying to use ggplot2 to plot x and y on the x-axis and the days on the y-axis. However, I can't figure out how to do it. I am able to plot them individually and combine the graphs, but I want just one graph with both a and b vectors (different colors) on x-axis and number of days on y-axis.
What I have so far:
X<-ggplot(df, aes(x=a,y=days)) + geom_line(color="red")
Y<-ggplot(df, aes(x=b,y=days)) + geom_line(color="blue")
Is there any way to define the x-axis for both a and b vectors? I have also tried using the melt long function, but got stuck afterwards.
Any help is much appreciated. Thank you
I think the best way to do it is via a the approach of melting the data (as you have mentioned). Especially if you are going to add more vectors. This is the code
library(reshape2)
library(ggplot2)
a<-23:52
b<-33:62
days<-1:30
df<-data.frame(x=a,y=b,days)
df_molten=melt(df,id.vars="days")
ggplot(df_molten) + geom_line(aes(x=value,y=days,color=variable))
You can also change the colors manually via scale_color_manual.
A simpler solution is to use only ggplot. The following code will work in your case
a<-23.33:52.33
b<-33.33:62.33
days<-1:30
df<-data.frame(a,b,days)
ggplot(data = df)+
geom_line(aes(x = df$days,y = df$a), color = "blue")+
geom_line(aes(x = df$days,y = df$b), color = "red")
I added the colors, you might want to use them to differentiate between your variables.