Changing the xlim of numeric value causing error ggplot R - r

I have a grouped barplot produced using ggplot in R with the following code
ggplot(mTogether, aes(x = USuniquNegR, y = value, fill = variable)) +
geom_bar(stat = "identity", position = "dodge") +
scale_fill_discrete(name = "Area",
labels = c("Everywhere", "New York")) +
xlab("Reasons") +
ylab("Proportion of total complaints") +
coord_flip() +
ggtitle("Comparison between NY and all areas")
mTogether is created using the following code
mTogether <- melt(together, id.vars = 'USuniquNegR')
The Data Frame together is made up of
USperReasons USperReasonsNY USuniquNegR
1 0.198343304187759 0.191304347826087 Late Flight
2 0.35987114588127 0.321739130434783 Customer Service Issue
3 0.0667280257708237 0.11304347826087 Lost Luggage
4 0.0547630004601933 0.00869565217391304 Flight Booking Problems
5 0.109065807639208 0.121739130434783 Can't Tell
6 0.00460193281178095 0 Damaged Luggage
7 0.0846755637367694 0.0782608695652174 Cancelled Flight
8 0.0455591348366314 0.0521739130434783 Bad Flight
9 0.0225494707777266 0.0347826086956522 longlines
10 0.0538426138978371 0.0782608695652174 Flight Attendant Complaints
Together can be generated by the following
together<-data.frame(cbind(USperReasons,USperReasonsNY,USuniquNegR))
where
USperReasons <- c(0.19834,0.35987,.06672,0.05476,0.10906,.00460,.08467,0.04555,0.02254,0.05384)
USperReasonsNY <- c(0.191304348,0.321739130,0.113043478,0.008695652,0.121739130,0.000000000,0.078260870,0.05217391,0.034782609,0.078260870)
USuniquNegR <- c("Late Flight","Customer Service Issue","Lost Luggage","Flight Booking Problems","Can't Tell","Damaged Luggage","Cancelled Flight","Bad Flight","longlines","Flight Attendant Complaints")
The problem is when I try change xlim of the ggplot using
+ xlim(0, 1)
I just seem to get an error:
Discrete value supplied to continuous scale
I can't understand why this happens but I need to resolve it because currently the x axis starts below 0 and is very highly packed:
image of ggplot output

The problem is that you are cbind()ing your column vectors together, which converts the numbers to characters. Fix that and the rest should fix itself.
together<-data.frame(USperReasons,USperReasonsNY,USuniquNegR)

You need to remove the cbind from
together<-data.frame(cbind(USperReasons,USperReasonsNY,USuniquNegR))
because str(together) tells that all three columns are factors.
With
together <- data.frame(USperReasons, USperReasonsNY, USuniquNegR)
the plot looks reasonable to me (without having to use ylim or xlim).
So, the error was not within ggplot2 but in data preparation.
Therefore, please, provide a full working example which can be copied, pasted and run when asking a question next time. Thank you.

Related

struggling with scaling a secondary axis on a plot that is not a percentage

I'm getting crazy here, please help me!
I'm new to R and this is why. I have a graph here in which I'm trying to plot steps given against time needed to fall asleep (in minutes) and I decided to plot user ID on the x axis and the other two variables in a vertical axis of its own.
The result is as follows:
I'm not happy with many things. The scaling of the line plot and the scale of the secondary axis, the width of the columns in geom_col, and the y axis labels, I mean, the user IDs have 10 digits each and it shows up as a potency.
Can you please help me out with all I mentioned, specially with the scaling of the secondary axis?
I've searched and searched and can't do it.
The code is this one:
ggplot(data= sleep_steps) +
+ geom_col(mapping = aes(x=Id, y=AVGSteps), fill = 'cyan') +
+ geom_line(mapping = aes(x=Id,y=AVGMinToFallAsleep)) +
+ labs(title = "Relationship between Steps and Time to Fall Asleep") +
+ scale_y_continuous(sec.axis = sec_axis(~ . - 8*60*60, name = "Minutes to Fall Asleep"))
And the table is like this:
head(sleep_steps)
Id AVGSteps AVGKcal AVGMinToFallAsleep AVGTotalMinAsleep
1 1503960366 12116.742 1816.419 22.92000 360.2800
2 1644430081 7282.967 2811.300 52.00000 294.0000
3 1844505072 2580.065 1573.484 309.00000 652.0000
4 1927972279 916.129 2172.806 20.80000 417.0000
5 2026352035 5566.871 1540.645 31.46429 506.1786
6 2347167796 9519.667 2043.444 44.53333 446.8000
I'm clueless. Since it is not a percentage nor is a datetime variable, I'm not sure what to do. I've tried to change the trans argument in sec_axis function but no success. The structure of the data frame is all num.
Thank you!
You need Id as a factor to start because they are individuals, not actual numbers.
Insert before plot
sleep_steps$Id <- as.factor(sleep_steps$Id)
Without the code for your data to check, I would also say that you need another fill colour for your second scale, but you are using geom_line which is not normally how you would plot individuals because they are not connected. You may need to reconsider that. Normally you would plot all your data with boxplots which would show the averages and the quartiles etc.
If you are looking for an actual RELATIONSHIP, then you need to look into an lm plot LINK HERE

How do i go about putting these lines spereatly on this graph?

So i created a ggplot as below;
enter image description here
using this code:
ggplot(dataset1, aes(x = y, y = x)) + geom_smooth(span=0.2) + ylim(0,5) + xlim(0,23) + ylab("Count")
labs(x="Hours") +
theme_classic()
i then wanted to add an additonal 3 lines to this graph and so tried this code:
ggplot(rbind(dataset1,dataset2,dataset3,dataset4), aes(x = y, y = x)) + geom_smooth(span=0.2) + ylim(0,5) + xlim(0,23) + ylab("count") +
labs(x="Hours") +
theme_classic()
however the graph i was then given was as seen below:
enter image description here
which is no where near what I'm trying to do.
I also got an error message after i did this code such as;
Warning message:
Removed 1 rows containing non-finite values (stat_smooth).
I know i'm going majorly wrong with the second code and probably missing out a part of it but this code isnt something I've used before so just trying my hand at trying to get around it.
Thanks for any help!
If you need one line per dataset you should add some sort of category/grouping variable. By combining your data you just create one big dataset so ggplot has no way of knowing it should plot them separately.
dataset1$category <- 1
dataset2$category <- 2
...
Now you can create your new data and then add for example color = category to your aesthetics.

compare boxplots with a single value

I want to compare the distribution of several variables (here X1 and X2) with a single value (here bm). The issue is that these variables are too many (about a dozen) to use a single boxplot.
Additionaly the levels are too different to use one plot. I need to use facets to make things more organised:
However with this plot my benchmark category (bm), which is a single value in X1 and X2, does not appear in X1 and seems to have several values in X2. I want it to be only this green line, which it is in the first plot. Any ideas why it changes? Is there any good workaround? I tried the options of facet_wrap/facet_grid, but nothing there delivered the right result.
I also tried combining a bar plot with bm and three empty categories with the boxplot. But firstly it looked terrible and secondly it got similarly screwed up in the facetting. Basically any work around would help.
Below the code to create the minimal example displayed here:
# Creating some sample data & loading libraries
library(ggplot2)
library(RColorBrewer)
set.seed(10111)
x=matrix(rnorm(40),20,2)
y=rep(c(-1,1),c(10,10))
x[y==1,]=x[y==1,]+1
x[,2]=x[,2]+20
df=data.frame(x,y)
# creating a benchmark point
benchmark=data.frame(y=rep("bm",2),key=c("X1","X2"),value=c(-0.216936,20.526312))
# melting the data frame, rbinding it with the benchmark
test_dat=rbind(tidyr::gather(df,key,value,-y),benchmark)
# Creating a plot
p_box <- ggplot(data = test_dat, aes(x=key, y=value,color=as.factor(test_dat$y))) +
geom_boxplot() + scale_color_manual(name="Cluster",values=brewer.pal(8,"Set1"))
# The first line delivers the first plot, the second line the second plot
p_box
p_box + facet_wrap(~key,scales = "free",drop = FALSE) + theme(legend.position = "bottom")
The problem only lies int the use of test_dat$y inside the color aes. Never use $ in aes, ggplot will mess up.
Anyway, I think you plot would improve if you use a geom_hline for the benchmark, instead of hacking in a single value boxplot:
library(ggplot2)
library(RColorBrewer)
ggplot(tidyr::gather(df,key,value,-y)) +
geom_boxplot(aes(x=key, y=value, color=as.factor(y))) +
geom_hline(data = benchmark, aes(yintercept = value), color = '#4DAF4A', size = 1) +
scale_color_manual(name="Cluster",values=brewer.pal(8,"Set1")) +
facet_wrap(~key,scales = "free",drop = FALSE) +
theme(legend.position = "bottom")

How to make an interactive graph in R-studio

The data has 4 columns and roughly 600 rows. The data is twitter data collected using the twitteR package, and then summarized into a data frame. The summary is based on how many words from these libraries each tweet has, the tweets are given a score and then the summary is the number of tweets which get specific scores. So the columns are the two types of scores, the dates, and then the number of tweets with those scores.
Score1 Score2 Date Number
0 0 01/10/2015 50
0 1 01/10/2015 34
1 0 01/10/2015 10
...and so on
With dates and data that extend over a month, and the scores either way can go +/- 10 or so.
I'm trying to plot that kind of data using a bubble plots, score1 on the x axis and score2 on the y axis with the size of the bubble dependant on the number (how many tweets of with those scores there were per day).
My problem is that I only know how to use ggplot.
g <- ggplot(
twitterdata,
aes(x=score1, y=score2, size=number, label=""), guide=FALSE) +
geom_point(colour="black", fill="red", shape=21) +
scale_size_area(max_size = 30) +
scale_x_continuous(name="score1", limits=c(0, 10)) +
scale_y_continuous(name="score2", limits=c(-10, 10)) +
geom_text(size=4) +
theme_bw()
and that just gives me the plot for all dates, and what I need is a good way to see how that data changes over time. I've looked into using sliders and selectors but I really have no idea what would be the best tool to use. I've tried subsetting the data based on date, which works nicely but ideally I could make some kind of interactive graph.
I really need some way select certain days out of that data to plot so it doesn't pile up all on itself, but do it interactively so it can be presented.
Any help would be greatly appreciated, thank you.
It sounds like this won't completely satisfy your use case, but an extremely low-overhead way to add some interactivity to your plot would be to install.packages('plotly') and add the following line to your code:
# your original code
g <- ggplot(
twitterdata,
aes(x=score1, y=score2, size=number, label=""),
guide=FALSE)+
geom_point(colour="black", fill="red", shape=21) +
scale_size_area(max_size = 30) +
scale_x_continuous(name="score1", limits=c(0,10)) +
scale_y_continuous(name="score2", limits=c(-10,10)) +
geom_text(size=4) +
theme_bw()
# add this line
gg <- ggplotly(g)
Details and demos: https://plot.ly/ggplot2/
As Eric suggested, if you want sliders and such you should check out shiny. Here's a demo combining shiny with plotly: https://plot.ly/r/shiny-tutorial/

What is happening with my geom_line() in ggplot2?

I am no expert in R, but I have used ggplot2 many times and never had any problems. Still, this time I am not able to plot lines in my graph and I have no idea why (it should be something really simple though).
For instance for:
def.percent period
1 5.0657339 1984-1985
2 3.9164528 1985-1986
3 -1.756613 1986-1987
4 2.8184863 1987-1988
5 -2.606311 1988-1989
I have to code:
ggplot(plot.tab, aes(x=period, y=def.percent)) + geom_line() + geom_point() + ggtitle("Deforestation rates within Properties")
BUt when I run it, it just plots the points without a line. It also gives me this message:
geom_path: Each group consist of only one observation. Do you need to adjust the group aesthetic?
Its not really an error but I cannot figure it out how to plot the lines... Any ideas?
Your x axis (period) is a factor rather than numeric, so it doesn't connect them. You can fix this by setting group = 1 in the aesthetics, which tells ggplot2 to group them all together into a single line:
ggplot(plot.tab, aes(x = period, y = def.percent, group = 1)) +
geom_line() +
geom_point() +
ggtitle("Deforestation rates within Properties")

Resources