My data is
PC_Name Electors_2009 Electors_2014 Electors_2019 Voters_2009 Voters_2014
1 Amritsar 1241099 1477262 1507875 814503 1007196
2 Anandpur Sahib 1338596 1564721 1698876 904606 1086563
3 Bhatinda 1336790 1525289 1621671 1048144 1176767
4 Faridkot 1288090 1455075 1541971 930521 1032107
5 Fatehgarh Sahib 1207556 1396957 1502861 838150 1030954
6 Ferozpur 1342488 1522111 1618419 956952 1105412
7 Gurdaspur 1318967 1500337 1595284 933323 1042699
8 Hoshiarpur 1299234 1485286 1597500 843123 961297
9 Jalandhar 1339842 1551497 1617018 899607 1040762
10 Khadoor Sahib 1340145 1563256 1638842 946690 1040518
11 Ludhiana 1309308 1561201 1683325 846277 1100457
12 Patiala 1344864 1580273 1739600 935959 1120933
13 Sangrur 1251401 1424743 1529432 931247 1099467
Voters_2019
1 859513
2 1081727
3 1200810
4 974947
5 985948
6 1172033
7 1103887
8 990791
9 1018998
10 1046032
11 1046955
12 1177903
13 1105888
I have written the code
data <- read.csv(file = "Punjab data 3.csv")
data
library(ggplot2)
library(reshape2)
long <- reshape2::melt(data, id.vars = "PC_Name")
ggplot(long, aes(PC_Name, value, fill = variable)) + geom_freqpoly(stat="identity",binwidth = 500)
I am trying to plot something like this
I tried line chart and geom line but I am not sure where problem resides. I am trying geom polygon now but its not plotting.I want to compare voters or electors not both of them according to year 2009 2014 2019.Sorry for bad english.
I want to plot PC_Name on x-axis and compare Electors_2009 with Voters_2009 and Electors_2014 with Voters_2014 and all these on same graph. So on y axis I will have 'values' after melting.
It sounds like you were interested in PC_Name on horizontal axis, and value (after melting) on vertical axis. Perhaps you might be interested in a barplot with and compare electors and voters side-by-side?
As suggested by #camille, you could split your data frame's variable column after melting into two columns (one with either Electors or Voters, and the other column with the year). This would provide flexibility in plot options.
Here are a couple of possibilities to start with:
You could order your variable factor how you would like (e.g., Electors_2009, Voters_2009, Electors_2014, etc. for comparison) and use geom_bar.
You could use facet_wrap to make comparisons between Electors and Voters by year.
library(ggplot2)
library(reshape2)
long <- reshape2::melt(data, id.vars = "PC_Name")
# Split electors/voters from year into 2 columns
long <- cbind(long, colsplit(long$variable, "_", c("type", "year")))
# Change order of variable factor for comparisons
long$variable <- factor(long$variable, levels =
c("Electors_2009", "Voters_2009",
"Electors_2014", "Voters_2014",
"Electors_2019", "Voters_2019"))
# Plot value vs. PC_Name using barplot (all years together)
ggplot(long, aes(PC_Name, value, fill = variable)) +
geom_bar(position = "dodge", stat = "identity")
# Show example plot faceted by year
ggplot(long, aes(PC_Name, value, fill = type)) +
geom_bar(position = "dodge", stat = "identity") +
facet_wrap(~year, ncol = 1)
Please let me know if this is what you had in mind. There would be alternative options available.
Related
I have two separate data frames - each representing a feature (activity, and sleep) and the amount of days that each of these features were recorded by each id number. The amount of days need to reflect on the y-axis and the feature itself needs to reflect on the x-axis.
I managed to draw the boxplots separately, showing the outliers clearly esp for the one set, however if I want to place the two boxplots next to each other, the outliers do not show up clearly. Also, how do I get the names of the two features (activity and sleep) on my x-axis?
The dataframe for the "sleep "feature:
head(idday)
A tibble: 6 x 2
id days
<dbl> <int>
1 1503960366 25
2 1644430081 4
3 1844505072 3
4 1927972279 5
5 2026352035 28
6 2320127002 1
The dataframe for the "activity "feature:
head(iddaya)
A tibble: 6 x 2
id days
<dbl> <int>
1 1503960366 31
2 1624580081 31
3 1644430081 30
4 1844505072 31
5 1927972279 31
6 2022484408 31
My attempt for sleep:
ggplot(idday, aes(y = days), boxwex = 0.05) +
stat_boxplot(geom = "errorbar",
width = 0.2) +
geom_boxplot(alpha=0.9, outlier.color="red")
and for activity:
ggplot(iddaya, aes(y = days), boxwex = 0.05) +
stat_boxplot(geom = "errorbar",
width = 0.2) +
geom_boxplot(alpha=0.9, outlier.color="red")
I then combined them:
boxplot(summary(idday$days), summary(iddaya$days))
In this final image the outliers do not show clearly, and I want to name my x-axis and y-axis.
There are several ways to achieve your task. One way could be:
If your dataframes are coalled df_sleep and df_activity then we could combine them in a named list and add a new column feature, then plot:
df_sleep
df_activity
library(tidyverse)
bind_rows(list(sleep = df_sleep, activity = df_activity), .id = 'feature') %>%
ggplot(aes(x = feature, y=days, fill=feature))+
geom_boxplot()
If you want to compare these two boxplots with each other I recommend to use the same range for your y-axis. To achieve this you first have to combine both data frames. You can do this with inner_join() from the dplyr package.
data_combined <- inner_join(idday, iddaya,
by = "id",
suffix = c("_sleep", "_activity"))
Then you need to transform your data frame into long-format with pivot_longer() from the tidyr package:
data_combined_long <- data_combined %>%
pivot_longer(days_sleep:days_activity,
names_to = "features",
names_prefix = "days_",
values_to = "days")
After that you can again use ggplot() to create your boxplot. But now you have to define that you want your x-axis to represent your features:
ggplot(data_combined_long, aes(y = days, x = features), boxwex = 0.05)+
stat_boxplot(geom = "errorbar",
width = 0.5) +
geom_boxplot(alpha=0.9, outlier.color="red")
Your plot should then look like this:
I have a dataframe called "employee_attrition". There are two variables of my interest, the first one is called "MonthlyIncome" (with continuous data of salary) and the second one is "PerformanceRating" which takes discrete values (1,2,3 or 4). My intention is to create a histogram for the MonthlyIncome, and show the PerformanceRating in the same plot. I have this:
ggplot(data = employee_attrition, aes(x=MonthlyIncome, fill=PerformanceRating))+
geom_histogram(aes(y=..count..))+
xlab("Salario mensual (MonthlyIncome)")+
ylab("Frecuencia")+
ggtitle("Histograma: MonthlyIncome y Attrition")+
theme_minimal()
The problem is that the plot does not show the "PerformanceRating" associated with each bar of the histogram.
My data frame is something like this:
MonthlyIncome PerformanceRating
1 5993 1
2 5130 1
3 2090 4
4 2909 3
5 3468 4
6 3068 3
And i want a histogram that shows the frequency of MonthlyIncome and each bar with 4 colours of the PerformanceRating.
Something like this, but with 4 colours (PerformanceRating Values)
To make the fill commands works, you should first making factor the grouping variables.
library(tibble)
library(tidyverse)
##---------------------------------------------------
##Creating a sample dataset simulating your dataset
##---------------------------------------------------
employee_attrition <- tibble(
MonthlyIncome = sample(3000:5993, 1000, replace = FALSE),
PerformanceRating = sample(1:4, 1000, replace = TRUE)
)
##------------------------------------
## Plot - also changing the format of
## PerformanceRating to "factor"
##-----------------------------------
employee_attrition %>%
mutate(PerformanceRating = as.factor(PerformanceRating)) %>%
ggplot(aes(x=MonthlyIncome, fill=PerformanceRating))+
geom_histogram(aes(y=..count..), bins = 20) +
xlab("Salario mensual (MonthlyIncome)")+
ylab("Frecuencia")+
ggtitle("Histograma: MonthlyIncome y Attrition")+
theme_minimal()
I have some experience with base R but am trying to learn tidyverse and ggplot. I have a dataframe with 4 columns of data. I want a simple x-y plot, where the first column of data is on the x-axis, and the data in the other 3 columns is plotted on the y-axis, resulting in 3 lines on one plot. The first 15 lines of my data look like this (sorry about the image - I don't know how to insert a sample of my data):
screen shot - first 15 rows of data
I tried to plot the second and third columns of data as follows: ,
ggplot(data=SWRC_SL, aes(x=SWRC_SL$pressure_head, y=SWRC_SL$UNSODA_theta)) +
geom_line(colour="red") + scale_x_log10() +
ggplot(data=SWRC_SL, aes(x=SWRC_SL$pressure_head, y=SWRC_SL$Vrugt_theta)) +
geom_line(colour="blue") + scale_x_log10()
I get this error:
Error: Don't know how to add ggplot(data = SWRC_SL, aes(x = SWRC_SL$pressure_head, y = SWRC_SL$Vrugt_theta)) to a plot
I believe I should be using something like "group=" to indicate which columns should be plotted, but I haven't been able to find an example that shows how you can use gglot to plot data across multiple columns. What am I missing ?
ggplot() is only ever called once when you create a chart. Try with the following:
ggplot() +
geom_line(data=SWRC_SL, aes(x=pressure_head, y=UNSODA_theta), colour="red") +
geom_line(data=SWRC_SL, aes(x=pressure_head, y=Vrugt_theta), colour="blue") +
scale_x_log10()
A better method would be to turn your data to long, where the UNSODA_theta and Vrugt_theta data are in the same column (say thetas), and have another column (say type_theta) indicating whether the data is for UNSODA_theta or Vrugt_theta. Then you could do the following:
ggplot(data=SWRC_SL, aes(x=pressure_head, y=thetas, colour=type_theta)) +
geom_line() +
scale_x_log10()
This is more desirable because ggplot2 will include a legend indicating what type of theta the colours are applied to.
As suggested by #Marius, the most efficient way to plot your data is to convert them into a long format.
Using tidyverse, you can have the use of pivot_longer function (from tidyr package) and write the following code:
library(tidyverse)
SWRC_SL %>% pivot_longer(.,-pressure_head, names_to = "variable", values_to = "value") %>%
ggplot(aes(x = pressure_head, y = value, color = variable))+
geom_line()+
scale_x_log10()
EDIT: Illustrating example
Using this dummy dataset:
pressure UNSODA_theta Vrugt_theta Cassel_theta
1 0 -1.4672500 1.4119747 -2.0553118
2 1 0.5210227 0.6189239 1.4817574
3 2 -0.1587546 1.4094018 2.2796175
4 3 1.4645873 2.6888733 -0.4631109
5 4 -0.7660820 2.5865884 -1.8799346
6 5 -0.4302118 0.6690922 0.9633620
First, you pivot your data into a long format:
df %>% pivot_longer(.,-pressure, names_to = "variable", values_to = "value")
# A tibble: 45 x 3
pressure variable value
<int> <chr> <dbl>
1 0 UNSODA_theta -1.47
2 0 Vrugt_theta 1.41
3 0 Cassel_theta -2.06
4 1 UNSODA_theta 0.521
5 1 Vrugt_theta 0.619
6 1 Cassel_theta 1.48
7 2 UNSODA_theta -0.159
8 2 Vrugt_theta 1.41
9 2 Cassel_theta 2.28
10 3 UNSODA_theta 1.46
# … with 35 more rows
Now, your data are suitable for the plotting with ggplot2, you can directly add ggplot command to the previous command by adding a "pipe" (%>%) between them:
library(tidyverse)
df %>% pivot_longer(.,-pressure, names_to = "variable", values_to = "value") %>%
ggplot(aes(x = pressure, y = value, color = variable))+
geom_line()+
scale_x_log10()
And you get the following plot with legend included:
Data example
structure(list(pressure = 0:14, UNSODA_theta = c(-1.46725002909224,
0.521022742648139, -0.158754604716016, 1.4645873119698, -0.766081999604665,
-0.430211753928547, -0.926109497377437, -0.17710396143654, 0.402011779486338,
-0.731748173119606, 0.830373167981674, -1.20808278630446, -1.04798441280774,
1.44115770684428, -1.01584746530465), Vrugt_theta = c(1.41197471231751,
0.61892394889108, 1.40940183965093, 2.68887328620405, 2.58658843344197,
0.669092199317234, -1.28523553529247, 3.49766158983416, 1.66706616676549,
1.5413273359637, 0.986600476854091, 1.51010842295293, 0.835624168230333,
1.42069464325451, 0.599753256022356), Cassel_theta = c(-2.05531181632119,
1.48175740118232, 2.27961753824932, -0.46311085383842, -1.87993463341154,
0.963361958516736, -0.0670637053409687, -2.59982761023726, 0.00319778952040447,
-0.945450500892219, -0.511452869790608, -1.73485854395378, 2.7047128618762,
-0.496698054586832, -2.40827011837962)), class = "data.frame", row.names = c(NA,
-15L))
This question already has answers here:
Add legend to ggplot2 line plot
(4 answers)
Closed 2 years ago.
I was attempting (unsuccessfully) to show a legend in my R ggplot2 graph which involves multiple plots. My data frame df and code is as follows:
Individuals Mod.2 Mod.1 Mod.3
1 2 -0.013473145 0.010859793 -0.08914021
2 3 -0.011109863 0.009503278 -0.09049672
3 4 -0.006465788 0.011304668 -0.08869533
4 5 0.010536718 0.009110458 -0.09088954
5 6 0.015501212 0.005929766 -0.09407023
6 7 0.014565584 0.005530390 -0.09446961
7 8 -0.009712516 0.012234843 -0.08776516
8 9 -0.011282278 0.006569570 -0.09343043
9 10 -0.011330579 0.003505439 -0.09649456
str(df)
'data.frame': 9 obs. of 4 variables:
$ Individuals : num 2 3 4 5 6 7 8 9 10
$ Mod.2 : num -0.01347 -0.01111 -0.00647 0.01054 0.0155 ...
$ Mod.1 : num 0.01086 0.0095 0.0113 0.00911 0.00593 ...
$ Mod.3 : num -0.0891 -0.0905 -0.0887 -0.0909 -0.0941 ...
ggplot(df, aes(df$Individuals)) +
geom_point(aes(y=df[,2]), colour="red") + geom_line(aes(y=df[,2]), colour="red") +
geom_point(aes(y=df[,3]), colour="lightgreen") + geom_line(aes(y=df[,3]), colour="lightgreen") +
geom_point(aes(y=df[,4]), colour="darkgreen") + geom_line(aes(y=df[,4]), colour="darkgreen") +
labs(title = "Modules", x = "Number of individuals", y = "Mode")
I looked up the following stackflow threads, as well as Google searches:
Merging ggplot2 legend
ggplot2 legend not showing
`ggplot2` legend not showing label for added series
ggplot2 legend for geom_area/geom_ribbon not showing
ggplot and R: Two variables over time
ggplot legend not showing up in lift chart
Why ggplot2 legend not show in the graph
ggplot legend not showing up in lift chart.
This one was created 4 days ago
This made me realize that making legends appear is a recurring issue, despite the fact that legends usually appear automatically.
My first question is what are the causes of a legend to not appear when using ggplot? The second is how to solve these causes. One of the causes appears to be related to multiple plots and the use of aes(), but I suspect there are other reasons.
colour= XYZ should be inside the aes(),not outside:
geom_point(aes(data, colour=XYZ)) #------>legend
geom_point(aes(data),colour=XYZ) #------>no legend
Hope it helps, it took me a hell long way to figure out.
You are going about the setting of colour in completely the wrong way. You have set colour to a constant character value in multiple layers, rather than mapping it to the value of a variable in a single layer.
This is largely because your data is not "tidy" (see the following)
head(df)
x a b c
1 1 -0.71149883 2.0886033 0.3468103
2 2 -0.71122304 -2.0777620 -1.0694651
3 3 -0.27155800 0.7772972 0.6080115
4 4 -0.82038851 -1.9212633 -0.8742432
5 5 -0.71397683 1.5796136 -0.1019847
6 6 -0.02283531 -1.2957267 -0.7817367
Instead, you should reshape your data first:
df <- data.frame(x=1:10, a=rnorm(10), b=rnorm(10), c=rnorm(10))
mdf <- reshape2::melt(df, id.var = "x")
This produces a more suitable format:
head(mdf)
x variable value
1 1 a -0.71149883
2 2 a -0.71122304
3 3 a -0.27155800
4 4 a -0.82038851
5 5 a -0.71397683
6 6 a -0.02283531
This will make it much easier to use with ggplot2 in the intended way, where colour is mapped to the value of a variable:
ggplot(mdf, aes(x = x, y = value, colour = variable)) +
geom_point() +
geom_line()
ind = 1:10
my.df <- data.frame(ind, sample(-5:5,10,replace = T) ,
sample(-5:5,10,replace = T) , sample(-5:5,10,replace = T))
df <- data.frame(rep(ind,3) ,c(my.df[,2],my.df[,3],my.df[,4]),
c(rep("mod.1",10),rep("mod.2",10),rep("mod.3",10)))
colnames(df) <- c("ind","value","mod")
Your data frame should look something likes this
ind value mod
1 5 mod.1
2 -5 mod.1
3 3 mod.1
4 2 mod.1
5 -2 mod.1
6 5 mod.1
Then all you have to do is :
ggplot(df, aes(x = ind, y = value, shape = mod, color = mod)) +
geom_line() + geom_point()
I had a similar problem with the tittle, nevertheless, I found a way to show the title: you can add a layer using
ggtitle ("Name of the title that you want to show")
example:
ggplot(data=mtcars,
mapping = aes(x=hp,
fill = factor(vs)))+
geom_histogram(bins = 9,
position = 'identity',
alpha = 0.8, show.legend = T)+
labs(title = 'Horse power',
fill = 'Vs Motor',
x = 'HP',
y = 'conteo',
subtitle = 'A',
caption = 'B')+
ggtitle("Horse power")
I have following data frame:
Quarter x y p q
1 2001 8.714392 8.714621 3.3648435 3.3140090
2 2002 8.671171 8.671064 0.9282508 0.9034387
3 2003 8.688478 8.697413 6.2295996 8.4379698
4 2004 8.685339 8.686349 3.7520135 3.5278024
My goal is to generate a facet plot where x and y column in one plot in the facet and p,q together in another plot instead of 4 facets.
If I do following:
x.df.melt <- melt(x.df[,c('Quarter','x','y','p','q')],id.vars=1)
ggplot(x.df.melt, aes(Quarter, value, col=variable, group=1)) + geom_line()+
facet_grid(variable~., scale='free_y') +
scale_color_discrete(breaks=c('x','y','p','q'))
I all the four series in 4 different facets but how do I combine x,y to be one while p,q to be in another together. Preferable no legends.
One idea would be to create a new grouping variable:
x.df.melt$var <- ifelse(x.df.melt$variable == "x" | x.df.melt$variable == "y", "A", "B")
You can use it for facetting while using variable for grouping:
ggplot(x.df.melt, aes(Quarter, value, col=variable, group=variable)) + geom_line()+
facet_grid(var~., scale='free_y') +
scale_color_discrete(breaks=c('x','y','p','q'), guide = F)
I think beetroot's answer above is more elegant but I was working on the same problem and arrived at the same place a different way. I think it is interesting because I used a "double melt" (yum!) to line up the x,y/p,q pairs. Also, it demonstrates tidyr::gather instead of melt.
library(tidyr)
x.df<- data.frame(Year=2001:2004,
x=runif(4,8,9),y=runif(4,8,9),
p=runif(4,3,9),q=runif(4,3,9))
x.df.melt<-gather(x.df,"item","item_val",-Year,-p,-q) %>%
group_by(item,Year) %>%
gather("comparison","comp_val",-Year,-item,-item_val) %>%
filter((item=="x" & comparison=="p")|(item=="y" & comparison=="q"))
> x.df.melt
# A tibble: 8 x 5
# Groups: item, Year [8]
Year item item_val comparison comp_val
<int> <chr> <dbl> <chr> <dbl>
1 2001 x 8.400538 p 5.540549
2 2002 x 8.169680 p 5.750010
3 2003 x 8.065042 p 8.821890
4 2004 x 8.311194 p 7.714197
5 2001 y 8.449290 q 5.471225
6 2002 y 8.266304 q 7.014389
7 2003 y 8.146879 q 7.298253
8 2004 y 8.960238 q 5.342702
See below for the plotting statement.
One weakness of this approach (and beetroot's use of ifelse) is the filter statement quickly becomes unwieldy if you have a lot of pairs to compare. In my use case I was comparing mutual fund performances to a number of benchmark indices. Each fund has a different benchmark. I solved this by with a table of meta data that pairs the fund tickers with their respective benchmarks, then use left/right_join. In this case:
#create meta data
pair_data<-data.frame(item=c("x","y"),comparison=c("p","q"))
#create comparison name for each item name
x.df.melt2<-x.df %>% gather("item","item_val",-Year) %>%
left_join(pair_data)
#join comparison data alongside item data
x.df.melt2<-x.df.melt2 %>%
select(Year,item,item_val) %>%
rename(comparison=item,comp_val=item_val) %>%
right_join(x.df.melt2,by=c("Year","comparison")) %>%
na.omit() %>%
group_by(item,Year)
ggplot(x.df.melt2,aes(Year,item_val,color="item"))+geom_line()+
geom_line(aes(y=comp_val,color="comp"))+
guides(col = guide_legend(title = NULL))+
ylab("Value")+
facet_grid(~item)
Since there is no need for an new grouping variable we preserve the names of the reference items as labels for the facet plot.