I have the attached dataframe. I am wanting to create a line graph using ggplot in order to plot Total and Year, with seperate lines for each offence category. I have used the following code, but I feel it is very incorrect as the output does not have any connected lines, it looks more like a vertical line graph. Any help is much appreciated :)
Dataframe
The code I have tried is:
ggplot(data = annual, aes(x = (as.numeric(Year)), y = Total, group = Offence Category)) +
geom_line()
Related
Background
I have a dataframe, df, of athlete injuries:
df <- data.frame(number_of_injuries = c(1,2,3,4,5,6),
number_of_people = c(73,52,43,12,7,2),
stringsAsFactors=FALSE)
The Problem
I'd like to use ggplot2 to make a bar chart or histogram of this simple data using geom_bar or geom_histogram. Important point: I'm pretty novice with ggplot2.
I'd like something where the x-axis shows bins of the number of injuries (number_of_injuries), and the y-axis shows the counts in number_of_people. Like this (from Excel):
What I've tried
I know this is the most trivial dang ggplot issue, but I keep getting errors or weird results, like so:
ggplot(df, aes(number_of_injuries)) +
geom_bar(stat = "count")
Which yields:
I've been in the tidyverse reference website for an hour at this and I can't crack the code.
It can cause confusion from time to time. If you already have "count" statistics, then do not count data using geom_bar(stats = "count") again, otherwise you simply get 1 in all categories. You want to plot those values as they are with geom_col:
ggplot(df, aes(x = number_of_injuries, y = number_of_people)) + geom_col()
I am analyzing US election data volume from Google trend. I type the below command in R studio.
The poliData dataframe contains the SearchVolume for all months for three Politicians.
ggplot(data = poliData, aes(x=Date, group=Politician, colour=Politician)) +
geom_density()
But I only get the density line (blue) for one politician only with the above command.See the attached picture. Can you please help
I guess you got three lines on top each other because Date variable values are the same for all three politicians. My understanding of your analysis could be something like this:
ggplot(data = poliData,
aes(x=Date, colour=Politician,
weight = SearchVolume/sum(SearchVolume))) +
geom_density()
Adding weight should produce distinct lines for different politicians. If this is not what you wanted, please dput your data for others to work out a solution for you. Also, as I do not have the data, I have not tested the above code yet. Please let me know if it does not work.
I have huge dataset for which I need to create plot with all the temperature data for a time scale on the same plot. Now, the data frame consists of many repeats of the following structure:
Name, Date, Temp, Name__1, Date__1, Temp__1,...
So for every line, I want to use 3 columns and then plot the next 3 columns. I don't know if it is possible to use some kind of loop for this. What I've been doing so far is the following:
ggplot(data = "name_of_mydatase") +
geom_line(mapping = aes(Date, Temp, col = Name)) +
geom_line(mapping = aes(Date__1, Temp1, col = Name__1))
If I have to repeat this for every single temperature logger, the code would be endless and I know there are more elegant ways to do this but I just can't figure out a simpler ways. Can someone please help? Thank you so much!!!
I'm trying to make a stacked bar graph but I can't seem to get the protobacteria to group together. This is the code I used
ggplot(data = Bacteria, aes(x = bacteria$Location, y = bacteria$reads, fill = bacteria$Phylum.Division)) +
geom_bar(stat="identity")
Is there something I can add to my code? I've attached a picture of my graph currently.
There are probably duplicate entries of protobacteria in your dataframe, but I cannot reproduce this in a simple example.
I noticed that in your code you use Bacteria and bacteria together. R is case sensitive and it could be you are using 2 dataframes for the plot. Also you can remove the bacteria$ part in the aes statement:
ggplot(data = bacteria, aes(x = Location, y = reads, fill = Phylum.Division)) + geom_bar(stat="identity")
If you want better help, please give a reproducible example of your problem.
I have a dataframe with Wikipedia edits, with information about the number of edit for the user (1st edit, 2nd edit and so on), the timestamp when the edit was made, and how many words were added.
In the actual dataset, I have up to 20.000 edits per user and in some edits, they add up to 30.000 words.
However, here is a downloadable small example dataset to exemplify my problem. The header looks like this:
I am trying to plot the distribution of added words across the Edit Progression and across time. If I use the regular R barplot, i works just like expected:
barplot(UserFrame3$NoOfAdds,UserFrame3$EditNo)
But I want to do it in ggplot for nicer graphics and more customizing options.
If I plot this as a scatterplot, I get the same result:
ggplot(data = UserFrame3, aes(x = UserFrame3$EditNo, y = UserFrame3$NoOfAdds)) + geom_point(size = 0.1)
Same for a linegraph:
ggplot(data = UserFrame3, aes(x = UserFrame3$EditNo, y = UserFrame3$NoOfAdds)) +geom_line(size = 0.1)
But when I try to plot it as a bargraph in ggplot, I get this result:
ggplot(data = UserFrame3, aes(x = UserFrame3$EditNo, y = UserFrame3$NoOfAdds)) + geom_bar(stat = "identity", position = "dodge")
There appear to be a lot more holes on the X-axis and the maximum is nowhere close to where it should be (y = 317).
I suspect that ggplot somehow groups the bars and uses means instead of the actual values despite the "dodge" parameter? How can I avoid this? and how would I go about plotting the time progression as a bargraph aswell without ggplot averaging over multiple edits?
You should expect more x-axis "holes" using bars as compared with lines. Lines connect the zero values together, bars do not.
I used geom_col with your data download, it looks as expected:
UserFrame3 %>%
ggplot(aes(EditNo, NoOfAdds)) + geom_col()