choosing specific values on the X axis when using ggplot2 - r

I am trying to plot a graph showing the number of events at the Olympics as a function of the year that a specific Olympic took place.
My data frame is called supertable and it consists of 2 columns, the first is the year and the second is the number of events in the games held that year.
My problem is that on the x axis I only get the years 1920 and 1980 and I would like to have 1920,1950,1980,2010
this is my code
ggplot(data = supertable,aes(x=year,y=no.of.events))+geom_point(colour='red')+
scale_x_discrete(breaks=c(1920,1950,1980,2010))
This is the picture I get
I tried doing this
scale_x_discrete(breaks=c(1920,1950,1980,2010),limits=c(1920,1950,1980,2010)
but it didn't help
I am assuming It is some thing small that I am missing, I tried searching for the answer but didn't find it.

Your x-axis is a continuous variable, so you need to use scale_x_continuous.
You used breaks correctly to indicate where your ticks on the x axis are, but the limits value should be a c(min, max) of the range of the plot you want to show.
Try this: scale_x_continuous(breaks=c(1920,1950,1980,2010), limits = c(1920, 2019))

Related

How to create a heat map with the number of repetition inside a certain range value

I have a dataset that looks like this one, with month (mese) in one column and the corresponding value in the other column and I'm trying to create a heatmap with the month(s) on the x axis, different "intervals" on the y axis (e.g. from 0 to 10, 10 to 20, 20 to 30 etc.) and the number of times a certain range of value repeats itself inside the month for each range.
I tried to use the cut function for both the x and the y axis in order to create a number of ranges of values, then putting everything into a table and plotting it with this code
x_c <- cut(x, 12)
y_c <- cut(y, 50)
z <- table(x_c, y_c)
image2D(z=z, border="black")
but it doesn't seem to work: the scale is always from 0 to 1 (and i need the actual values)... is there an easier solution?
Essentially, I need the end result to look something like this (sorry for my very poor paint skills): i.e. the level of sulphate is higher during the winter than the summer and the majority of the data follow a "curve" that reflect this tendency
You can use geom_bin2d from ggplot2. You can define the number of bins:
ggplot(data, aes(mese, nnso4)) +
geom_bin2d(bins=c(12,50)) +
scale_fill_gradient(low="yellow", high="red")
You can change the fill scale, for instance viridis package has some options.

I am using ggplot2 to make a bar chart and can't get the years correct along the x-axis

I am using ggplot2 to make a bar chart of the number of participants per year by gender. If I have 14 years included, I would like 2 bars for each year corresponding to the number of males and females for that year. I am not getting each year along the x-axis. I think data is being binned. I have tried changing the bin width, using scale_x_date and am still stuck.
Can you help me figure out how to have the data for EACH year in my graph?
As an example, here is my data for years 2004-2017:
year=c(2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017)
gender=c("male" , "female")
Participants is by gender, male then female respectively per year:
Participants=c(1307,443,1847,630,2109,765, 1824,691,2250,952,3123,1421,4097,1904,6415,3284,8788,4678,11581,6694,13141,8478,16389,10575,20990,13811,26951,19729)
data=data.frame(year,gender,Participants)
Here is how I am trying to generate my plot:
MyPlot <- ggplot(data, aes(fill=gender, y=Participants, x=year)) +
geom_bar(position="dodge", stat="identity",width = .8)
print(MyPlot + ggtitle("Annual Number of Participants by Gender"))
On the x-axis, the years 2006, 2010, 2014 and 2018 are marked and the bars correspond to data from two years. I want data for each year, both in terms of the bars and in terms of the ticks on the x-axis.
Any help would be appreciated!
You have more participants than years, so you don't have a clear dataframe design to serve as an input to ggplot.
Start here:
Read this: https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html
The key to which is:
Each variable forms a column.
Each observation forms a row.
Each type of observational unit forms a table.
Then once you have a tibble/data frame your ggplot2 code should work fine. I'd kill the width= option until you have it working.

How to plot two y axis? or combine(merge) two plots? Should handle faceted column as well

I've a combination of two difficult(I'm naive) requirements :(
Consider the Weather data as example. Let's say I've dataset with following information.
"Datetime", "Word", "Frequency", "Temperature"
Visualization: I want to see change in frequency of a word over time and at temperature.
X-axis shows the time series(date)
Y-axis has the frequency scale(0 to max freq).
Requirements:
I need to draw frequencies of several words(Column "word") over the time.
Correlate the frequency with temperature.
I started with ggplot2:
ggplot(TemperatureData, aes(x=timeId, y=termFrequency)) + geom_line() + facet_wrap(~Keyword) +
geom_line(data = TemperatureData, aes(y = temperature)) +
labs(x="Time Series over X days", y = "Term Frequency")
The above approach results in overlapping y axis (frequency, temperature). And, a separate bin for each "Word" (facet for ggplot). i.e plot has 3 bin's for each keyword. Each bin shows temperature over time, and frequency of a word over time.
Problems:
I want to be able to separate y-axis for temperature, and frequency. Also, I do not want to normalize these y-axis as it gets tough to understand what are the high/low values of each axis over days. Plot Loses readability. I learnt that two y-axis is not possible using ggplot2.
Separate bin for each keyword is not required. One horizontal line per keyword is what I'm looking for.
The plot should have only one appearance(line graph) of temperature to reflect change over time.
I tried using PAR, but could not succeed.
Example solution using plotrix package

ggplot multi-factor-level grouping for boxplot with continuous scale

I'm trying to create a boxplot of the following data
Temp<-rnorm(90,mean=100,sd=10)
Yr<-sample(c("1999","2000","2005","2009","2010"),size=90,replace=TRUE)
Month<-sample(c("June","July","August"),size=90,replace=TRUE)
Month
df<-data.frame(Temp,Month,Yr)
The visual I want and its corresponding code are below:
ggplot(df,aes(x=interaction(Month,Yr),y=Temp,fill=Month))+
geom_boxplot()+
xlab("Year")+
ylab("Daily Maximum Temperature")
You'll notice, though, that there are a few years missing from the data, and I'm trying to make the plot reflect that with gaps in the x-scale. The other problem is the text and tick marks on the axis. I'd like the ticks to just be the Year of observation rather than Month.Year since the month is already coded in the fill. I've tried scale_x_discrete, but trying to supply discrete values for a continuous axis spits out a blank graph and an error. I've met my swearing at the computer quota for the day, and it would be really awesome to get a little help on this.
This creates huge gaps, as every year gets its own gap, but you can adapt this by passing only specific years as the levels argument to the factor() call.
df$Yr <- factor(df$Yr, levels=1999:2010)
ggplot(df,aes(x=Yr,y=Temp,fill=Month))+
geom_boxplot(position=position_dodge(1))+
ylab("Daily Maximum Temperature") +
scale_x_discrete("Year", drop=FALSE)

Plotting multiple time-series in ggplot

I have a time-series dataset consisting of 10 variables.
I would like to create a time-series plot, where each 10 variable is plotted in different colors, over time, on the same graph. The values should be on the Y axis and the dates on the X axis.
Click Here for dataset csv
This is the (probably wrong) code I have been using:
c.o<-read.csv(file="co.csv",head=TRUE)
ggplot(c.o, aes(Year, a, b, c, d, e,f))+geom_line()
and here's what the output from the code looks like:
Can anyone point me in the right direction? I wasn't able to find anything in previous threads.
PROBLEM SOLVED, SEE BELOW.
One additional thing I would like to know:
Is it possible to add an extra line to the plot which represents the average of all variables across time, and have some smoothing below and above that line to represent individual variations?
If your data is called df something like this:
library(ggplot2)
library(reshape2)
meltdf <- melt(df,id="Year")
ggplot(meltdf,aes(x=Year,y=value,colour=variable,group=variable)) + geom_line()
So basically in my code when I use aes() im telling it the x-axis is Year, the y-axis is value and then the colour/grouping is by the variable.
The melt() function was to get your data in the format ggplot2 would like. One big column for year, etc.. which you then effectively split when you tell it to plot by separate lines for your variable.

Resources