Plotting years with decimals in R ggplot - r

I am plotting years across two decades using ggplot.I have a situation where due to how the data for the years was taken, the datapoints are really halfway through the year so to be accurate, I labeled the years with a .5 at the end. In addition, I also have one single datapoint that was taken in early 2005 so it's labeled as 2005.22 so the years look like : 2005.22, 2005.5,2006.5,2007.5,2008.5,2009.5,2010.5,2011.5,2012.5. Since I am technically missing data for 2005-2005.21, I want the plot to start at 2005 with no line showing until 2005.22 and then breaking every 2 years starting at 2005.5,2007.5 and so on...
I've been using the following to plot geom_line for the years but I do not know how to get the above result. I was able to get the limits to start at 2005 but with the datapoint starting at 2005.22, it just plots like 2005.22,2007.22....below is what I am using to properly plot and break the years.
scale_x_continuous(
name = "year",
breaks = seq(c(2005, 2012.5), by=2),
expand = c(0,0))+
coord_cartesian(xlim = c(2005, 2012.5))```

It's a little hard for me to understand what exactly you want the plot to look like (especially in terms of the labels), but does this do what you're looking for? You can add 2005 to the front of the breaks sequence, which places it in front without disrupting the rest of the sequence.
library(ggplot2)
d <- data.frame(x=c(2005.22, 2005.5,2006.5,2007.5,2008.5,2009.5,2010.5,2011.5,2012.5),
y=runif(9,-1,1))
ggplot(d, aes(x,y)) +
geom_line() +
scale_x_continuous(breaks=c(2005, seq(2005.5, 2012.5,2)))

Related

plotting multiple lines in ggplot R

I have neuroscientific data where we count synapses/cells in the cochlea and quantify these per frequency. We do this for animals of different ages. What I thus ideally want is the frequencies (5,10,20,30,40) in the x-axis and the amount of synapses/cells plotted on the y-axis (usually a numerical value from 10 - 20). The graph then will contain 5 lines of the different ages (6 weeks, 17 weeks, 43 weeks, 69 weeks and 96 weeks).
I try this with ggplot and first just want to plot one age. When I use the following command:
ggplot(mydata, aes(x=Frequency, y=puncta6)) + geom_line()
I get a graph, but no line and the following error: 'geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic?'
So I found I have to adjust the code to:
ggplot(mydata, aes(x=Frequency, y=puncta6, group = 1)) + geom_line()
This works, except for the fact that my first data point (5 kHz) is now plotted behind my last data point (40 kHz)......... (This also happens without the 'group = 1' addition). How do I solve this or is there an easier way to plot this kind of data?
I couldnt add a file so I added a photo of my code + graph with the 5 kHz data point oddly located and I added a photo of my data in excel.
example data
example code and graph

Adding a custom gap in ggplot2

I have data for the first 5 months of 2 separate years presented in a stacked bar plot. Currently it looks like this
I've used this code to get to where I am
ggplot(data=total_arr2, aes(fill=type, y=flights, x=period, group=year)) +
geom_bar(position="fill", stat="identity") +
ggtitle("Arrivals by period, type stacked proportions") +
scale_y_continuous(labels = scales::percent)
I want to add a custom gap to separate the 2019 cases from the 2020 cases so that the trends in the first 5 months of each year can be compared side by side, but I want to keep them on the same graph.
Is there a relatively easy way to do this (been using R for about 3 weeks in total, so it's a steep learning curve!!)

I am using ggplot2 to make a bar chart and can't get the years correct along the x-axis

I am using ggplot2 to make a bar chart of the number of participants per year by gender. If I have 14 years included, I would like 2 bars for each year corresponding to the number of males and females for that year. I am not getting each year along the x-axis. I think data is being binned. I have tried changing the bin width, using scale_x_date and am still stuck.
Can you help me figure out how to have the data for EACH year in my graph?
As an example, here is my data for years 2004-2017:
year=c(2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017)
gender=c("male" , "female")
Participants is by gender, male then female respectively per year:
Participants=c(1307,443,1847,630,2109,765, 1824,691,2250,952,3123,1421,4097,1904,6415,3284,8788,4678,11581,6694,13141,8478,16389,10575,20990,13811,26951,19729)
data=data.frame(year,gender,Participants)
Here is how I am trying to generate my plot:
MyPlot <- ggplot(data, aes(fill=gender, y=Participants, x=year)) +
geom_bar(position="dodge", stat="identity",width = .8)
print(MyPlot + ggtitle("Annual Number of Participants by Gender"))
On the x-axis, the years 2006, 2010, 2014 and 2018 are marked and the bars correspond to data from two years. I want data for each year, both in terms of the bars and in terms of the ticks on the x-axis.
Any help would be appreciated!
You have more participants than years, so you don't have a clear dataframe design to serve as an input to ggplot.
Start here:
Read this: https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html
The key to which is:
Each variable forms a column.
Each observation forms a row.
Each type of observational unit forms a table.
Then once you have a tibble/data frame your ggplot2 code should work fine. I'd kill the width= option until you have it working.

ggplot multi-factor-level grouping for boxplot with continuous scale

I'm trying to create a boxplot of the following data
Temp<-rnorm(90,mean=100,sd=10)
Yr<-sample(c("1999","2000","2005","2009","2010"),size=90,replace=TRUE)
Month<-sample(c("June","July","August"),size=90,replace=TRUE)
Month
df<-data.frame(Temp,Month,Yr)
The visual I want and its corresponding code are below:
ggplot(df,aes(x=interaction(Month,Yr),y=Temp,fill=Month))+
geom_boxplot()+
xlab("Year")+
ylab("Daily Maximum Temperature")
You'll notice, though, that there are a few years missing from the data, and I'm trying to make the plot reflect that with gaps in the x-scale. The other problem is the text and tick marks on the axis. I'd like the ticks to just be the Year of observation rather than Month.Year since the month is already coded in the fill. I've tried scale_x_discrete, but trying to supply discrete values for a continuous axis spits out a blank graph and an error. I've met my swearing at the computer quota for the day, and it would be really awesome to get a little help on this.
This creates huge gaps, as every year gets its own gap, but you can adapt this by passing only specific years as the levels argument to the factor() call.
df$Yr <- factor(df$Yr, levels=1999:2010)
ggplot(df,aes(x=Yr,y=Temp,fill=Month))+
geom_boxplot(position=position_dodge(1))+
ylab("Daily Maximum Temperature") +
scale_x_discrete("Year", drop=FALSE)

How to label percentage values inside stacked bar plot using R-base [duplicate]

This question already has an answer here:
How to label percentage values inside stacked bar plot using R-base
(1 answer)
Closed 10 years ago.
I am new to R. I would like others to explain to me how to add absolute values inside the individual stacked bars in a consistent way using the basic R plotting function (R base). I tried to plot a stacked bar graph using R base but the values appear in an inconsistent/illogical way in such a way that its supposed to be 100% for each village but they don't sum up to 100%.
Here is the data that am working on:
Village 100 200 300 400 500
Male 68.33333 53.33333 70 70 61.66667
Female 31.66667 46.66667 30 30 38.33333
In summary, there are five villages and the data showing the head of household interviewed by sex.
I have used the following command towards plotting the graph:
barplot(mydata,col=c("yellow","green")
x<-barplot(mydata,col=c("yellow","green")
text(x,mydata,labels=mydata,pos=3,offset=.5)
Please help to allocate the correct values in each bar
Thanks
This started as a comment but it seemed unfair to not turn into an answer. To answer your question (even on Stack Overflow) properly we need to know how "mydata" is structured. I assumed at first it was a data frame with 5 rows and 2 or 3 columns but in this case your code makes no sense. However, if this were how it is structured here is one way to do what I think you want:
mydata <- data.frame(
row.names =c(100, 200, 300, 400, 500),
Male =c(68.33333, 53.33333, 70, 70, 61.66667),
Female =c(31.66667, 46.66667, 30, 30, 38.33333))
x <- barplot(t(as.matrix(mydata)), col=c("yellow", "green"),
legend=TRUE, border=NA, xlim=c(0,8), args.legend=
list(bty="n", border=NA),
ylab="Cumulative percentage", xlab="Village number")
text(x, mydata$Male-10, labels=round(mydata$Male), col="black")
text(x, mydata$Male+10, labels=100-round(mydata$Male))
which produces the following:
An alternative would be to set the y value to 40 for all the male text labels, and 80 for all the females - this would have the advantage of less confusing jitter of the labels, and the disadvantage that the text vertical position is no longer notionally attached to data.
Personally, I don't much like this barplot at all, although there are many far worse crimes against data visualisation than a straightforward bar plot. Numbers on plots are cluttering and detract from the visual impact of the actual mapping of data to colours, shapes and sizes. I'd rather a simple dot plot like:
library(ggplot2)
ggplot(mydata, aes(x=row.names(mydata), y=Male)) +
geom_point(size=4) +
coord_flip() +
labs(x="Village number\n", y="Percentage male") +
ylim(0,100) +
geom_hline(yintercept=50, linetype=2)
which gives
There is less redundant clutter in the plot, a higher data to ink ratio, etc. However in the end you need to produce the plot that will mean something for your audience.

Resources