I am starting to learn interactive data viz and basic data analysis with R (mainly plotly).
I am having an issue while using the dplyr function filter() while plotting with plotly in R.
here is an example using the gapminder dataset:
library(gapminder)
# filter by year and continent
gapminder_2002_asia <- gapminder %>%
filter(year== 2002 & continent == "Asia")
# plot gpd/capita bar chart using plotly
gapminder_2002_asia %>%
plot_ly() %>%
add_bars(x= ~country, y = ~gdpPercap, color = ~country)
this is the results: all the world countries present in the initial data set are on the x axis:
plotly graph as image
On the other hand, if just make a static graph with ggplot, I only have the asian countries appearing on the x axis:
gapminder_2002_asia %>%
ggplot(aes(country, gdpPercap, fill = country)) +
geom_col()
ggplot graph
I really do not understand how this is happening as they both come from the same df..
Very odd.
As an alternative while you debug that code, why not try using ggplotly()?
E. G.
p <- gapminder_2002_asia %>%
ggplot(aes(country, gdpPercap, fill = country)) +
geom_col()
plotly::ggplotly(p)
I'd be curious which version of the plot came out the far end!
The reason is that plotly is taking all the levels inside the country variable while ggplot2 only takes the available values in your dataset. So, to get same results, yu can use this:
library(plotly)
library(ggplot2)
#Plotly
gapminder_2002_asia %>%
plot_ly() %>%
add_bars(x= ~country, y = ~gdpPercap, color = ~country)
Output:
And with ggplot2:
#ggplot2
gapminder_2002_asia %>%
ggplot(aes(country, gdpPercap, fill = country)) +
geom_col()+
scale_x_discrete(limits=levels(gapminder_2002_asia$country))+
theme(axis.text.x = element_text(angle=90))
Output:
Update: In order to get the same output in plotly you could use something like this, which will be similar to your ggplot2 initial code for plotting:
#Plotly 2
gapminder_2002_asia %>%
mutate(country=as.character(country)) %>%
plot_ly() %>%
add_bars(x= ~country, y = ~gdpPercap, color = ~country)
Output:
The key to tackle is the factors in your dataset.
Another option can be fct_drop() from forcats (many thanks and credit to #user2554330):
library(forcats)
#Plotly 2
gapminder_2002_asia %>%
mutate(country=fct_drop(country)) %>%
plot_ly() %>%
add_bars(x= ~country, y = ~gdpPercap, color = ~country)
Output:
Related
I have a data set where I want to plot many lines in a single plot. the lines represent events that are ordered and I would like to use the color scale to represent that order. If I do this, I get
library(purrr)
library(ggplot2)
set.seed(100)
c(1:10) %>% set_names(seq_along(.)) %>%
map(~rnorm(50, 0, 1)) %>% map(cumsum) %>%
imap(~tibble(y=.x, color=as.integer(.y))) %>%
map(mutate, x=row_number()) %>%
reduce(union_all) %>%
ggplot(aes(x=x, y=y, color=color))+ geom_line()
I can solve the issue of the incorrect line by making color a factor
set.seed(100)
c(1:10) %>% set_names(seq_along(.)) %>%
map(~rnorm(50, 0, 1)) %>% map(cumsum) %>%
imap(~tibble(y=.x, color=as.factor(.y))) %>% #this is the only changed line
map(mutate, x=row_number()) %>%
reduce(union_all) %>%
ggplot(aes(x=x, y=y, color=color))+ geom_line()
to get the correct line plot, but now my color scales are discrete and the legend is too. I would like the legend to look like the first example and the plot like the second example. furthermore, in the actual data the events are not uniformly spaced around, so the behavior of the continuous color scale is important because the color conveys distance. I tried group=color but that doesn't work. What aesthetic am I missing here that would help me achieve the desired outcome?
set.seed(100)
c(1:10) %>% set_names(seq_along(.)) %>%
map(~rnorm(50, 0, 1)) %>% map(cumsum) %>%
imap(~tibble(y=.x, color=as.integer(.y))) %>%
map(mutate, x=row_number()) %>%
reduce(union_all) %>%
ggplot(aes(x=x, y=y, group=color, color=color))+ geom_line()
I working with plotly in R. I am trying to plot charts with the same colors. Below you can see data and charts.
library(plotly)
library(reshape2)
library(dplyr)
df<-data.frame(city=c("NYC","Seattle","Boston","LA","Seattle"),
value=c(100,200,300,400,500))
df <-melt(df)
Now I am plotting pie chart with colors shown below:
fig<-df %>%
plot_ly(labels = ~city, values = ~value)
fig <- fig %>% add_pie(hole = 0.6)
fig
Finally, I want to plot a bar chart with the same colors as the pie plot, shown above. In order to do this, I tried this command lines :
df <-melt(df)
fig <- plot_ly(df, x = ~city, y = ~value, type = 'bar')
fig
So can anybody help me with how to plot a barplot with the same colors as pie chart ?
Here's a somewhat hacky but effective solution:
fig <- ggplot(df, aes(city, value, fill = city)) +
geom_col() +
scale_fill_manual(values = c("#2ca02c", "#ff7f0e",
"#d62728", "#1f77b4")) +
theme_minimal() +
theme(panel.grid.major.x = element_blank())
ggplotly(fig)
You may use ggplot2 package for both charts and it shall give matching colours to the city in each chart:
#install.packages('ggplot2')
library(ggplot2)
library(dplyr)
#your data frame
df<-data.frame(city=c("NYC","Seattle","Boston","LA","Seattle"),
value=c(100,200,300,400,500))
df <-melt(df)
PieChart <- df %>% ggplot(aes(x="", y=value, fill=city)) +
geom_bar(stat = "identity", width=1) +
coord_polar("y",start=0) +
theme_void()
PieChart
The resulted plot:
BarChart <- df %>% ggplot(aes(x=city, y=value, fill=city)) +
geom_bar(stat = "identity") +
theme_void() +
xlab("City") + ylab("Value")
BarChart
The resulted plot:
You may find this helpful.
In addition, here's what you can do if you want to completely use plotly.
#install.packages('RColorBrewer')
library(RColorBrewer)
library(dplyr)
library(plotly)
df<-data.frame(city=c("NYC","Seattle","Boston","LA","Seattle"),
value=c(100,200,300,400,500))
df <-melt(df)
df
#define a vector with colour palette of four colours
cols <- brewer.pal(12, "Set3")[1:4]
cols
Bar Chart:
BarChart <- df %>% plot_ly(x = ~city, y = ~value, showlegend = TRUE,
type = 'bar', color = ~city, colors = cols)
BarChart
The resulting plot:
Pie Chart:
PieChart <- df %>% plot_ly(labels = ~city, values = ~value,
marker = list(colors = cols), showlegend = TRUE) %>%
add_pie(hole = 0.6)
PieChart
The resulting plot:
Note that this would still give you different colours for the cities, so another suggestion that I have besides the linked answer above is to try creating a new data frame that has the unique four cities i.e no repetition, the summation of the values for the repeated cities, and bind another column that has a unique colour per city.
I have a boxplot generated using the following code, and after checking the dataset all the values are correct here.
myplot <- inDATA %>% filter(PARAMCD=="param1") %>%
ggplot(aes(x=ACTARMCD,y=AVAL,fill=ACTARMCD))+
geom_boxplot()+
stat_summary(fun.y=mean,na.rm=TRUE,shape=25,col='black',geom='point')
I want to generate a second boxplot where I split the x variable into different groups by applying a different variable as a fill. I use the following code, but the values present in the graph are incorrect.
myplot <- inDATA %>% filter(PARAMCD=="param1") %>%
group_by(ACTARMCD, RESPFL) %>%
ggplot(aes(x=ACTARMCD,y=AVAL))+
geom_boxplot(aes(fill=RESPFL))
However when I generate a bargraph using this code, the numbers are correct.
myplot <- inDATA %>%
filter(PARAMCD=="param1") %>%
group_by(ACTARMCD,RESPFL) %>%
dplyr::mutate(AVAL = mean(AVAL, na.rm=TRUE)) %>%
ggplot(aes(x=ACTARMCD,y=AVAL,fill=RESPFL))+
geom_bar(stat="identity",position="dodge")
Can anyone please help me understand what I am doing incorrectly with the second boxplot?
I ended up solving the issue by using plotly instead of ggplot. The code that worked is:
myplot <- inDATA %>% filter(PARAMCD=="param1") %>%
plot_ly(x = ~ACTARMCD, y = ~AVAL, color = ~RESPFL, type = "box",boxmean=TRUE) %>% layout(boxmode = "group")
I am new to R
I would like plot using ggplot2's geom_bar():
top_r_cuisine <- r_cuisine %>%
group_by(Rcuisine) %>%
summarise(count = n()) %>%
arrange(desc(count)) %>%
top_n(10)
But when I try to plot this result by:
ggplot(top_r_cuisine, aes(x = Rcuisine)) +
geom_bar()
I get this:
which doesn't represent the values in top_r_cuisine. Why?
EDIT:
I have tried:
c_count=c(23,45,67,43,54)
country=c("america","india","germany","france","italy")
# sample Data frame #
finaldata = data.frame(country,c_count)
ggplot(finaldata, aes(x=country)) +
geom_bar(aes(weight = c_count))
you need to assign the weights in the geom_bar()
the ggplot analysis below is intended show number of survey responses by date. I'd like to color the bars by the three survey administrations (the Admini variable).While there are no errors thrown, the bars do not color.
Can anyone point out how/why my bars are not color-coded? THANKS!
library(ggplot2)
library(dplyr)
library(RCurl)
OSTadminDates2<-getURL("https://raw.githubusercontent.com/bac3917/Cauldron/master/OSTadminDates.csv")
OSTadminDates<-read.csv(text=OSTadminDates2)
ndate1<-as.Date(OSTadminDates$Date,"%m/%d/%y");ndate1
SurvAdmin<-as.factor(OSTadminDates$Admini)
R<-ggplot(data=OSTadminDates,aes(x=ndate1),fill=Admini,group=1) +
geom_bar(stat = "count",width = .5 )
R
Here's a work-around you could use:
library(ggplot2)
library(dplyr)
library(RCurl)
OSTadminDates2<-getURL("https://raw.githubusercontent.com/bac3917/Cauldron/master/OSTadminDates.csv")
OSTadminDates<-read.csv(text=OSTadminDates2)
OSTadminDates$Date<-as.Date(OSTadminDates$Date,"%m/%d/%y")
OSTadminDates$Admini <- factor(OSTadminDates$Admini)
df <- OSTadminDates %>%
group_by(Date, Admini) %>%
summarise(n = n())
ggplot(data = df) +
geom_bar(aes(x = Date, y = n, fill = Admini), stat = "identity")