I am using ggplot in R on a Mac, doing a line graph using the group option. I want to add the values that correspond to the end points for each of the lines. This is part of the data I am using:
Year Foundation Type No. of Houses Percent Shares
1 2000 Crawl Space 209529 16.84583
2 2001 Crawl Space 206431 16.58441
3 2002 Crawl Space 204327 15.58577
4 2003 Crawl Space 213328 15.39025
5 2004 Crawl Space 224195 14.63272
6 2005 Crawl Space 258254 15.91873
I run the following code:
ggplot(USbyFoundType, aes(x=Year, y=`Percent Shares`,
group=`Foundation Type`, color=`Foundation Type`)) +
geom_line()
I get this chart. I want to place the value at the end of each of the lines.
Thanks for any help
It would be nice to have a reproducible example, but something like:
endpts <- (USbyFoundType
%>% group_by(`Foundation Type`)
%>% filter(Year == max(Year))
)
Then add
+ geom_text(data = endpts, aes(x = Year, y = `Percent Shares`,
colour = `Foundation Types`,
label = `Percent Shares`)
You'll probably have to play with horizontal justification (hjust), spacing (nudge_x), and margins (e.g. + expand_limits(y=2030)).
This question is about plotting labels (not values) at the end of the lines, but contains lots of useful information about adjusting positioning, margins, clipping etc.
Related
I'm trying to put ActivityDate on the X Axis, and Calories on the Y Axis, relating to how 33 different users ranged in their calorie burnings daily. I'm new to ggplot and visualizations as you can tell, so I'd appreciate the most basic solution that I can understand. Thank you so much.
I really tried several iterations of this code, and each one of them weren't quite right in how the visualization turned out. Here are a couple of my thoughts:
##first and foremost:
install.packages("tidyverse") install.packages("here") library(tidyverse) library(here)
Attempt 1 Bar Graph
ggplot(data=trimmed_dactivity) + geom_bar(mapping=aes(x=Id, color=ActivityDate))
Attempt 1 Bar Graph
##Not probably the best for stakeholders, but if I could maybe have the bars a little closer together that might help, so I tried to identify the unique IDs. Perhaps the reason why they are so small is that they appear in long number format, and are not sequential, so it could be adding the extra space and making the bars so small because of the spaces of empty sequential numbers.
Attempt 2 Bar Graph
UId <- unique("Id") ggplot(data=trimmed_dactivity) + geom_bar(mapping=aes(x=UId, color=ActivityDate))
Attempt 2 Bar Graph
##Facepalm, definitely not what I was looking for at all, but that was my effort to solve the above problem.
Attempt 3 Bar Graph
ggplot(data=trimmed_dactivity) + geom_bar(mapping=aes(x=ActivityDate, fill=Id)) + theme(axis.text.x = element_text(angle=45))
Attempt 3 Bar Graph
##The fill function does not work, and on the y-axis if you will, I don't know what "count" is referring to in this case, so could be useful except for those two issues.
##Finally, I switch to a line graph
Attempt 4 Line Graph
ggplot(data=trimmed_dactivity) + geom_line(mapping=aes(x=ActivityDate, y=Calories)) + theme(axis.text.x = element_text(angle=45))
Attempt 4 Line Graph
##Now what I get is separate lines going up and down, and what I want is 33 separate lines representing unique Id numbers to travel along the x axis for time, and rise in the y axis for calories. Of course I'm not sure how to do that...
Any help with what I'm missing on this journey here?
what I want is 33 separate lines representing unique Id numbers…
It sounds like you want a spaghetti plot. To make one, map Id to color (or to group if you don’t want each id to be colored differently).
library(ggplot2)
ggplot(fakedata, aes(ActivityDate, Calories)) +
geom_line(aes(color = factor(Id)), show.legend = FALSE)
Example data:
set.seed(13)
fakedata <- expand.grid(
Id = 1:33,
ActivityDate = seq(as.Date("2016-04-13"), length.out = 10, by = "day")
)
fakedata$Calories <- round(rnorm(330, 2500, 500))
I have a dataset (df) that looks like this:
EIN Year Cat Fund
1 16 2005 A 9784.490
2 16 2006 A 10020.720
3 16 2007 A 9232.796
4 15 2008 B 8567.893
5 15 2009 B 10292.670
6 17 2010 C 9274.589
The data has relatively large dimensions (around 300k observations), which makes plotting a potentially slow process. I would like to plot the variable Fund for each year, by the identifier EIN. Based on this post I have tried the following code:
library(ggplot2)
ggplot(df, mapping = aes(x = Year, y = Fund)) +
geom_line(aes(linetype = as.factor(EIN)))
Here are my questions:
This code becomes pretty slow given the high amount of observations that I have. Do you suggest any alternatives that could speed up the process?
Since I have a huge number of EINs, the legend ends-up taking all the space available for the graph, so I would like to get rid of it unsuccesfully. I tried adding + guides(fill=FALSE) at the end, but it did not work. Any advice?
If I wanted to either subset or color code my plot by Cat, what would be the best way to do it?
Thanks a lot for your help!
You can get rid of the legend using:
+ theme(legend.position = 'none')
To subset (facet) your plot, especially if there aren't too many categories, use facet_wrap:
+ facet_wrap(~Cat)
To colour instead, put colour = Cat inside your aes() calll.
So I have a data set that sorts DJs by Rank, the year they received that rank, and the name of the DJ that received the previously mention information on a horizontal access in Excel.
When I plot the data I'm currently working with it ends up displaying a line chart with the a vertical line from 1 to 5 for each year and I'm not sure what to do from here.
library(ggplot2)
library(plyr)
DJMAG <- DJMAG_MOdified
Top <-data.frame(DJMAG$Year, DJMAG$Rank , DJMAG$DJ)
names(Top) <- c("Year","Rank","DJ")
ggplot(Top, aes(Top$Year)) +
geom_line(aes(y = as.numeric(Top$Rank), color = "Hardwell")) + xlab("2004 to 2018") + ylab("Rank")
There are no error messages but What I'm trying to show with this data is how (X = Year) DJs with their own line plot increased or decreased in ranking from 2004 to 2017 and the rankings of the top 5, 1-5 on the Y-axis with an inverted y-axis.
So I took the liberty of coming up with some example data.
DJMAG_MOdified <- data.frame(Year=rep(2004:2018,3),
Rank=runif(45,0,1),
DJ=rep(c("A","B","C"),each=15),
Other=runif(45,0,1))
I purposefully added the Other column, so we still subset it as you have done.
Instead of your method which was:
Top <-data.frame(DJMAG$Year, DJMAG$Rank , DJMAG$DJ)
names(Top) <- c("Year","Rank","DJ")
It would be preferable to have it in one line where you dont need to change column names as follows:
Top <- DJMAG_MOdified[,c("Year","Rank","DJ")]
As for the plot, I am thinking maybe this is what you are looking for, where each DJ is represented by a different coloured line?
ggplot(Top, aes(x=Year,y=as.numeric(Rank))) +
geom_line(aes(col = DJ)) +
xlab("2004 to 2018") +
ylab("Rank")
I didnt understand where the color = "Hardwell" part of your code came from...
I have a Count for each Site (which corresponds with a country), and each Site belongs to a Region. The data looks like this:
> summary_data
Site Count Region
1 Chad 5 Africa
2 Angola 1 Africa
3 France 10 Europe
4 USA 6 Americas
5 Bolivia 3 Americas
6 Chile 4 Americas
I would like to generate a bar graph that:
Has a bar per country
The bars for a region are all next to each other in the bar graph
Per region, the bars appear in descending order
The bars are all the same width, but the heights are all on the same scale
Can be generalized (in particular: arbitrary regions, arbitrary countries per region)
I do not want to use fill color to represent the region (I want to use color to represent another characteristic eventually)
I want to have some visual representation to group the columns. For instance, having a gray background behind all the columns for the Americas region, a blue background behind all the columns for the Africa region, etc). I actually would be open to other approaches (perhaps a line at the top spanning all of Africa with "Africa" as a label or something).
Obviously each region can have a different number of country sites, and no country site spans two regions (I tried using facets but quickly realized that was not the right route). I also tried looping through all the regions to generate separate graphs per region and then put them together but that didn't quite seem the right approach either.
I have generated a graph like this (Closest I have gotten):
Using this code:
library("dplyr")
library(ggplot2)
sorted <- arrange(summary_data,Region,-Count)
sorted$Site <- factor(sorted$Site, levels = sorted$Site)
bar = ggplot(sorted,
aes(
x = Site,
y = Count,
fill = Region
)) +
geom_col()
print(bar)
But this does not meet the last two requirements I set above (I specifically do not want to use fill to represent region). I started down the path of geom_rect() but did not understand the coordinate system for discrete x values rather than continuous (I did find Stackoverflow questions / answers on continuous but didn't see how to translate to this). I think having shaded rectangles behind the columns is probably the best approach, but I would appreciate any input in general approach as well as how to pull it off.
You could consider defining a new panel for each region to separate them using facet_grid. If you want the colors to be the same, just remove the aes(fill = Site) argument inside geom_bar.
The argument space = "free_x" assures that the width of the bars are the same and with scale = free only those axis values corresponding to the specific region are shown.
ggplot(sorted, aes(x = Site, y = Count)) +
geom_bar(position = "dodge", stat = "identity", aes(fill = Site)) +
facet_grid(. ~ Region,scale="free", space="free_x")
I am looking for a way where data points are connected following a top-down manner to visualize a ranking. In that the y-axis represents the rank and the x-axis the attributes. With the normal setting the line connects the point starting from left to right. This results that the points are connected in the wrong order.
With the data below the line should be connected from (6,1) to (4,2) and then (5,3) etc. Optimally the ranking scale need to be inverted so that rank one starts on the top.
data <- read.table(header=TRUE, text='
attribute rank
1 6
2 5
3 4
4 2
5 3
6 1
7 7
8 11
9 10
10 8
11 9
')
plot(data$attribute,data$rank,type="l")
Is there a way to change the line drawing direction? My second idea would be to rotate the graph or maybe you have better ideas.
The graph I am trying to achieve is somewhat similar to this one:
example vertical line chart
You can do this with ggplot:
library(ggplot2)
ggplot(data, aes(y = attribute, x = rank)) +
geom_line() +
coord_flip() +
scale_x_reverse()
It solves the problem exactly the way you suggested. The first part of the command (ggplot(...) + geom_line()) creates an "ordinary" line plot. Note that I have already switched x- and y-coordinates. The next command (coord_flip()) flips x- and y-axis, and the last one (scale_x_reverse) changes the ordering of the x-axis (which is plotted as the y-axis) such that 1 is in the top left corner.
Just to show you that something like the example you linked in your question can be done with ggplot2, I add the following example:
library(tidyr)
data$attribute2 <- sample(data$attribute)
data$attribute3 <- sample(data$attribute)
plot_data <- pivot_longer(data, cols = -"rank")
ggplot(plot_data, aes(y = value, x = rank, colour = name)) +
geom_line() +
geom_point() +
coord_flip() +
scale_x_reverse()
If you intend to do your plots with R, learning ggplot2 is really worthwhile. You can find many examples on Cookbook for R.