How can I use column labels as Y axis in ggplot? - r

Hello,
I have a dateset structured as shown in the link above. I am extremely new to R. And this is probably super easy to get done. But I cannot figure out how to plot this dataset using ggplot...
Could anyone guide and give me hints?
I basically want to color lines according to socioeconomic levels and visualize it by each years' value...

You need to reshape you data to run ggplot.
library(reshape)
library(dplyr)
library(ggplot2)
df_long <- melt(df) # reshape the dataframe to a long format
df_long %>%
ggplot( aes(x=variable, y=value, group=group, color=group)) +
geom_line()
Note: You will get better answers if you post your code with a reproducible dataset.

Related

ggplot (geom_bar) not sorting y-axis according to numeric values

I am trying to sort y-axis numerically according to population values. Have tried other stackoverflow answers that suggested reorder/ converting columns to numeric data type (as.numeric), but those solutions does not seem to work for me.
Without using reorder, the plot is sorted alphabetically:
Using reorder, the plot is sorted as such:
The code i am using:
library(ggplot2)
library(ggpubr)
library(readr)
library(tidyverse)
library(lemon)
library(dplyr)
pop_data <- read_csv("respopagesextod2011to2020.csv")
temp2 <- pop_data %>% filter(`Time` == '2019')
ggplot(data=temp2,aes(x=reorder(PA, Pop),y=Pop)) + geom_bar(stat='identity') + coord_flip()
How should I go about sorting my y-axis? Any help will be much appreciated. Thanks!
I am using data filtered from: https://www.singstat.gov.sg/-/media/files/find_data/population/statistical_tables/singapore-residents-by-planning-areasubzone-age-group-sex-and-type-of-dwelling-june-20112020.zip
The functions are all working as intended - the reason you don't see the result as expected is because the reorder() function is specifying the ordering of the pop_data$PA based on each observation in the set, whereas the bars you are plotting are a result of summary statistics on pop_data.
The easiest solution is to probably perform the summarizing first, then plot and reorder the summarized dataset. This way, the reordering reflects an ordering of the summarized data, which is what you want.
temp3 <- pop_data %>% filter(`Time` == '2019') %>%
group_by(PA) %>%
summarize(Pop = sum(Pop))
ggplot(data=temp3, aes(x=reorder(PA, Pop),y=Pop)) +
geom_bar(stat='identity') + coord_flip()

Boxplots in ggplot2 R

My goal is to visualize some data frames with ggplot2.
I have several data.frames looking like this
And my goal is a boxplot looking like this, just nicer.
I managed to get single boxplots using
plt <- ggplot(data, aes(RF, data$RF)) +
geom_boxplot()
plt
But that's not what I want.
library(ggplot2)
library(reshape)
airquality_m = melt(airquality)
ggplot(airquality_m, aes(variable, value )) + geom_boxplot()
I did not beautify the plot but I guess you get the idea here.
That boxplot you showed is created with base-r graphics. Single command
boxplot(data) will do it.
If you want to use ggplot, you have to first melt the dataframe and then plot.
library(reshape2)
datPlot <- melt(data)
ggplot(datPlot,aes(variable,value)) + geom_boxplot()
I guess this is what you want:
library(ggplot2)
library(reshape)
myddt_m = melt(mydata)
names(myddt_m)=c("Models","CI")
ggplot(myddt_m, aes(Models, CI,fill=Models )) + geom_boxplot()+guides(fill=FALSE)+labs( x="", y="C-Index")

ggplot bar chart for time series

I'm reading the book by Hadley Wickham about ggplot, but I have trouble to plot certain weights over time in a bar chart. Here is sample data:
dates <- c("20040101","20050101","20060101")
dates.f <- strptime(dates,format="%Y%m%d")
m <- rbind(c(0.2,0.5,0.15,0.1,0.05),c(0.5,0.1,0.1,0.2,0.1),c(0.2,0.2,0.2,0.2,0.2))
m <- cbind(dates.f,as.data.frame(m))
This data.frame has in the first column the dates and each row the corresponding weights. I would like to plot the weights for each year in a bar chart using the "fill" argument.
I'm able to plot the weights as bars using:
p <- ggplot(m,aes(dates.f))
p+geom_bar()
However, this is not exactly what I want. I would like to see in each bar the contribution of each weight. Moreover, I don't understand why I have the strange format on the x-axis, i.e. why there is "2004-07" and "2005-07" displayed.
Thanks for the help
Hope this is what you are looking for:
ggplot2 requires data in a long format.
require(reshape2)
m_molten <- melt(m, "dates.f")
Plotting itself is done by
ggplot(m_molten, aes(x=dates.f, y=value, fill=variable)) +
geom_bar(stat="identity")
You can add position="dodge" to geom_bar if you want then side by side.
EDIT
If you want yearly breaks only: convert m_molten$dates.f to date.
require(scales)
m_molten$dates.f <- as.Date(m_molten$dates.f)
ggplot(m_molten, aes(x=dates.f, y=value, fill=variable)) +
geom_bar(stat="identity") +
scale_x_date(labels = date_format("%y"), breaks = date_breaks("year"))
P.S.: See http://vita.had.co.nz/papers/tidy-data.pdf for Hadley's philosophy of tidy data.
To create the plot you need, you have to reshape your data from "wide" to "tall". There are many ways of doing this, including the reshape() function in base R (not recommended), reshape2 and tidyr.
In the tidyr package you have two functions to reshape data, gather() and spread().
The function gather() transforms from wide to tall. In this case, you have to gather your columns V1:V5.
Try this:
library("tidyr")
tidy_m <- gather(m, var, value, V1:V5)
ggplot(tidy_m,aes(x = dates.f, y=value, fill=var)) +
geom_bar(stat="identity")

Plot results from dist_tab() function from qdap library

I am interested in plotting the results from the following code which produces a frequency distribution table. I would like to graph the Freq column as a bar with the cum.Freq as a line both sharing the interval column as the x-axis.
library("qdap")
x <- c(1,2,3,2,4,2,5,4,6,7,8,9)
dist_tab(x)
I have been able to get the bar chart built using ggplot, but I want to take it further with the cum.Freq added as a secondary axis. I also want to add the percent and cum.percent values added as data labels. Any help is appreciated.
library("ggplot2")
ggplot(dist_tab(x), aes(x=interval)) + geom_bar(aes(y=Freq))
Not sure if I understand your question. Is this what you are looking for?
df <- dist_tab(x)
df.melt <- melt(df, id.vars="interval", measure.vars=c("Freq", "cum.Freq"))
#
ggplot(df.melt, aes(x=interval, y=value, fill=variable)) +
geom_bar(stat="identity", position="dodge")

How can I plot multiple variables side-by-side in a dotplot in R?

I'm still pretty new to R, and have come up against a plotting problem I can't find an answer to.
I've got a data frame that looks like this (though a lot bigger):
df <- data.frame(Treatment= rep(c("A", "B", "C"), each = 6),
LocA=sample(1:100, 18),
LocB=sample(1:100, 18),
LocC=sample(1:100, 18))
And I want dot plots that look like this one produced in Excel. It's exactly the formatting I want: a dotplot for each of the treatments side-by-side for each location, with data for multiple locations together on one graph. (Profuse apologies for not being able to post the image here; posting images requires a 10 reputation.)
It's no problem to make a plot for each location, with the dots color-coded, and so on:
ggplot(data = df, aes(x=Treatment, y=LocA, color = Treatment)) + geom_point()
but I can't figure out how to add locations B and C to the same graph.
Any advice would be much appreciated!
As a couple of people have mentioned, you need to "melt" the data, getting it into a "long" form.
library(reshape2)
df_melted <- melt(df, id.vars=c("Treatment"))
colnames(df_melted)[2] <- "Location"
In ggplot jargon, having different groups like treatment side-by-side is achieved through "dodging". Usually for things like barplots you can just say position="dodge" but geom_point seems to require a bit more manual specification:
ggplot(data=df_melted, aes(x=Location, y=value, color=Treatment)) +
geom_point(position=position_dodge(width=0.3))
You need to reshape the data. Here an example using reshape2
library(reshape2)
dat.m <- melt(dat, id.vars='Treatment')
library(ggplot2)
ggplot(data = dat.m,
aes(x=Treatment, y=value,shape = Treatment,color=Treatment)) +
geom_point()+facet_grid(~variable)
Since you want a dotplot, I propose also a lattice solution. I think it is more suitable in this case.
dotplot(value~Treatment|variable,
groups = Treatment, data=dat.m,
pch=c(25,19),
par.strip.text=list(cex=3),
cex=2)

Resources