ggplot not reading dataframe the way I expected - r

Recently I've been trying to plot some data using ggplot that's defined like so. (Essentially assigning a different x value to two different data sets and using the y axis to display the points)
xcol = c(rep(2, length(allTTRs(teamset))))
ycol = c(allTTRs(teamset))
xcol2 = c(rep(1, length(allTTRs(oldteamset))))
ycol2 = c(allTTRs(oldteamset))
masterY = append(ycol, ycol2)
masterX = append(xcol, xcol2)
mat = cbind(masterX, masterY)
df = as.data.frame(mat)
show(df)
The show() call outputs this
masterX masterY
1 2 10.998817
2 2 10.999933
3 2 37.001567
4 2 15.016150
5 1 2.000817
6 1 5.000150
7 1 13.995800
8 1 11.001933
9 1 24.987017
10 1 0.999850
11 1 2.998750
Next I plot this data like so
p <- ggplot(data = df, mapping = aes(x = masterX, y = masterY)) +
geom_dotplot(inherit.aes = TRUE, binwidth = 0.005, data = df, y = masterY, show.legend=TRUE) +
stat_summary(fun.data = mean_sdl, color = "red")
When I run this, something strange happens. It seems the stat_summary() plots perfectly, but for some reason the geom_dotplot() call transposes the x values, such that the graph looks like this
It occurred to me this may be because I specify a 'y' argument in geom_dotplot but no 'x' argument, so I tried including 'x=masterX' in its arguments, but when I do that I get this error.
Error: stat_bindot requires the following missing aesthetics: x
Strangely, when I delete the 'y' argument from the function, I get a similar error for 'y' for the opposite reason. I.e.
Error: geom_dotplot requires the following missing aesthetics: y
Ultimately, I've already fixed this problem by changing masterY/X definitions like so
masterY = append(ycol2, ycol)
masterX = append(xcol2, xcol)
But this is rather unsatisfying to me, since I know it's still not using the x values as tuples, and is instead simply plotting based on the order of the dataframe, and I'd like to learn how to deal with intermixed data for the future. Ultimately, I get the feeling I'm misusing a function or doing something very non-idiomatically, but I'm not sure what.
Could anyone explain why this is happening and/or how I could use ggplot to graph data that might look more like so?
masterX masterY
1 2 10.998817
2 2 10.999933
3 2 37.001567
4 1 2.000817
5 2 15.016150
6 1 5.000150
7 1 13.995800
8 1 11.001933
9 1 24.987017
10 1 0.999850
11 1 2.998750

I think this will get you what you want:
ggplot(df, aes(x = masterX, y = masterY)) +
geom_point() +
stat_summary(fun.data = mean_sdl, color = "red")

Related

ggplot2 alternatives to fill in barplots, occurence of factor in multiple rows

I'm pretty new to R and I have a problem with plotting a barplot out of my data which looks like this:
condition answer
2 H
1 H
8 H
5 W
4 M
7 H
9 H
10 H
6 H
3 W
The data consists of 100 rows with the conditions 1 to 10, each randomly generated 10 times (10 times condition 1, 10 times condition 8,...). Each of the conditions also has a answer which could be H for Hit, M for Miss or W for wrong.
I want to plot the number of Hits for each condition in a barplot (for example 8 Hits out of 10 for condition 1,...) for that I tried to do the following in ggplot2
ggplot(data=test, aes(x=test$condition, fill=answer=="H"))+
geom_bar()+labs(x="Conditions", y="Hitrate")+
coord_cartesian(xlim = c(1:10), ylim = c(0:10))+
scale_x_continuous(breaks=seq(1,10,1))
And it looked like this:
This actually exactly what I need except for the red color which covers everything. You can see that conditions 3 to 5 have no blue bar, because there are no hits for these conditions.
Is there any way to get rid of this red color and to maybe count the amount of hits for the different conditions? -> I tried the count function of dplyr but it only showed me the amount of H when there where some for this particular condition. 3-5 where just "ignored" by count, there wasn't even a 0 in the output.-> but I'd still need those numbers for the plot
I'm sorry for this particular long post but I'm really at the end of knowledge considering this. I'd be open for suggestions or alternatives! Thanks in advance!
This is a situation where a little preprocessing goes a long way. I made sample data that would recreate the issue, i.e. has cases where there won't be any "H"s.
Instead of relying on ggplot to aggregate data in the way you want it, use proper tools. Since you mention dplyr::count, I use dplyr functions.
The preprocessing task is to count observations with answer "H", including cases where the count is 0. To make sure all combinations are retained, convert condition to a factor and set .drop = F in count, which is in turn passed to group_by.
library(dplyr)
library(ggplot2)
set.seed(529)
test <- data.frame(condition = rep(1:10, times = 10),
answer = c(sample(c("H", "M", "W"), 50, replace = T),
sample(c("M", "W"), 50, replace = T)))
hit_counts <- test %>%
mutate(condition = as.factor(condition)) %>%
filter(answer == "H") %>%
count(condition, .drop = F)
hit_counts
#> # A tibble: 10 x 2
#> condition n
#> <fct> <int>
#> 1 1 0
#> 2 2 1
#> 3 3 4
#> 4 4 2
#> 5 5 3
#> 6 6 0
#> 7 7 3
#> 8 8 2
#> 9 9 1
#> 10 10 1
Then just plot that. geom_col is the version of geom_bar for where you have your y-values already, instead of having ggplot tally them up for you.
ggplot(hit_counts, aes(x = condition, y = n)) +
geom_col()
One option is to just filter out anything but where answer == "H" from your dataset, and then plot.
An alternative is to use a grouped bar plot, made by setting position = "dodge":
test <- data.frame(condition = rep(1:10, each = 10),
answer = sample(c('H', 'M', 'W'), 100, replace = T))
ggplot(data=test) +
geom_bar(aes(x = condition, fill = answer), position = "dodge") +
labs(x="Conditions", y="Hitrate") +
coord_cartesian(xlim = c(1:10), ylim = c(0:10)) +
scale_x_continuous(breaks=seq(1,10,1))
Also note that if the condition is actually a categorical variable, it may be better to make it a factor:
test$condition <- as.factor(test$condition)
This means that you don't need the scale_x_continuous call, and that the grid lines will be cleaner.
Another option is to pick your fill colors explicitly and make FALSE transparent by using scale_fill_manual. Since FALSE comes alphabetically first, the first value to specify is FALSE, the second TRUE.
ggplot(data=test, aes(x=condition, fill=answer=="H"))+
geom_bar()+labs(x="Conditions", y="Hitrate")+
coord_cartesian(xlim = c(1:10), ylim = c(0:10))+
scale_x_continuous(breaks=seq(1,10,1)) +
scale_fill_manual(values = c(alpha("red", 0), "cadetblue")) +
guides(fill = F)

Trying to make a bar chart with each categorical column as a different color

I found a cool Wes Anderson palette package but I am failing here in actually using it. The variable I am looking at (Q1) has options 1 and 2. There is an NA in the set which is getting plotted however I would like to remove it as well.
library(readxl)
library(tidyverse)
library(wesanderson)
RA_Survey <- read_excel("file extension")
ggplot(data = RA_Survey, mapping = aes(x = Q1)) +
geom_bar() + scale_fill_manual(values=wes_palette(n=2, name="GrandBudapest"))
The plot I'm getting is working but without the color. Any ideas?
There are several issues which need to be addressed.
Using the Wes Anderson palette
As already mentioned by Mako, the fill aesthetic was missing from the call to aes().
Furthermore, the OP reports an error message saying Palette not found. The wesanderson package contains a list of available palettes:
names(wesanderson::wes_palettes)
[1] "BottleRocket1" "BottleRocket2" "Rushmore1" "Rushmore" "Royal1" "Royal2" "Zissou1"
[8] "Darjeeling1" "Darjeeling2" "Chevalier1" "FantasticFox1" "Moonrise1" "Moonrise2" "Moonrise3"
[15] "Cavalcanti1" "GrandBudapest1" "GrandBudapest2" "IsleofDogs1" "IsleofDogs2"
There is no palette called "GrandBudapest" as requested in OP's code. Instead, we have to choose between "GrandBudapest1" and "GrandBudapest2".
Also, the help file help("wes_palette") lists the available palettes.
Here is a working example which uses the dummy data created in the Data section below:
library(ggplot2)
library(wesanderson)
ggplot(RA_Survey, aes(x = Q1, fill = Q1)) +
geom_bar() +
scale_fill_manual(values=wes_palette(n=2, name="GrandBudapest1"))
Removing NA
The OP has asked to remove the NAs from the set. There are two options:
Tell ggplot() to remove the NAs.
Remove the NAs from te data by filtering.
We can tell ggplot() to remove NAs when plotting the x axis:
library(ggplot2)
library(wesanderson)
ggplot(RA_Survey, aes(x = Q1, fill = Q1)) +
geom_bar() +
scale_fill_manual(values=wes_palette(n=2, name="GrandBudapest1")) +
scale_x_discrete(na.translate = FALSE)
Note, this produces a warning message Removed 3 rows containing non-finite values (stat_count). To get rid of the message, we can use geom_bar(na.rm = TRUE).
The other option removes the NAs from the data by filtering
library(dplyr)
library(ggplot2)
library(wesanderson)
ggplot(RA_Survey %>% filter(!is.na(Q1)), aes(x = Q1, fill = Q1)) +
geom_bar() +
scale_fill_manual(values=wes_palette(n=2, name="GrandBudapest1"))
which creates exactly the same chart.
Data
As the OP has not provided a sample dataset, we need to create our own:
library(dplyr)
set.seed(123L)
RA_Survey <- data_frame(Q1 = sample(c("1", "2", NA), 20, TRUE, c(3, 6, 1)))
RA_Survey
# A tibble: 20 x 1
Q1
<chr>
1 2
2 1
3 2
4 1
5 NA
6 2
7 2
8 1
9 2
10 2
11 NA
12 2
13 1
14 2
15 2
16 1
17 2
18 2
19 2
20 NA

building a area plot in R

I´m trying to build a graph that plots relative abundance against depth variation.
I have the following table
test X1m X2m X3m X4m X5m X6m X7m
1 Example1 1 10 10 1 1 5 1
2 Example2 2 5 5 5 2 2 5
and I have tried the following using ggplot2()
Example.class.melt<-melt(Example.df)
colnames(Example.class.melt)[1] = "Class"
colnames(Example.class.melt)[2] = "Depth"
colnames(Example.class.melt)[3] = "Relative_abundance"
Example.class.melt<-as.data.frame(Example.class.melt)
ggplot(Example.class.melt, aes(x=Depth, y=Relative_abundance, fill=as.factor(Class))) + geom_area()
For some reason, that I don´t understand, it isn´t working. Any suggestion to correct this or any alternative?
thanks
Is this what you are looking for?This was my interpretation based on the way you asked the question. The code is as follows:
install.packages("ggplot2")
install.packages("reshape")
library(ggplot2)
library(reshape)
Example1<-c(1,10,10,1,1,5,1)
Example2<-c(2,5,5,5,2,2,5)
data<-rbind(Example1,Example2)
Example.class.melt<-melt(data)
colnames(Example.class.melt)[1] = "Class"
colnames(Example.class.melt)[2] = "Depth"
colnames(Example.class.melt)[3] = "Relative_abundance"
Example.class.melt<-as.data.frame(Example.class.melt)
ggplot(data = Example.class.melt, aes(x = Depth, y = Relative_abundance, fill=Class)) + geom_area()
You don't require to say as.factor in fill to class.

How to Plot Multiple Lines for Each column of a Data Matrix against one Column? [duplicate]

This question already has answers here:
Plot multiple columns on the same graph in R [duplicate]
(4 answers)
Closed 4 years ago.
For the following matrix of order 11*8 stored in an object named Results:
Delta UE RE LS PT SP JS JS+
SRE0 0.000000 1 3.8730275 2.2721219 1.0062884 1.0047529 1.0317746 1.0318688
SRE1 0.100065 1 2.2478516 2.0595205 1.0502708 1.0453288 1.0436898 1.0764224
SRE2 0.200385 1 1.5838920 1.8793306 1.0359049 1.0437888 1.0529307 1.0753217
SRE3 0.300075 1 0.9129295 1.5360455 0.9946433 1.0320438 1.0063378 1.0654772
SRE4 0.400175 1 0.6434000 1.3150935 0.9530553 1.0172104 1.0107737 1.0564151
SRE5 0.500138 1 0.6063778 1.2876456 0.9455131 1.0165491 0.9994965 1.0553198
SRE6 0.600200 1 0.3710599 0.9537165 0.8730835 0.9945211 0.9346991 1.0369921
SRE7 0.699500 1 0.3312944 0.8793348 0.8535376 0.9914288 0.9046180 1.0314705
SRE8 0.800285 1 0.2338423 0.6966505 0.7831482 0.9657499 0.8445466 1.0169138
SRE9 0.900020 1 0.1665775 0.5328803 0.7024265 0.9296520 0.7989161 0.9850603
SRE10 1.000074 1 0.1550065 0.5047066 0.6849924 0.9231919 0.7765414 0.9821768
I want to plot (as a line) last 7 columns of this matrix against first column in a single graph such that each column has either a different color or different line segment. The first column named Delta should be placed on X-axis while rest of columns will be on Y-axis.
The basic idea I'd take is to change your Results object from wide to long format, to pass to ggplot. I like to use Hadley Wickham's reshape2 library. It has a function, melt, which will stack your data appropriately, then you can choose to group the lines by the different variables.
library(reshape2) # install.packages("reshape2")
R = data.frame(Delta = c(1,2), UE = c(1,1), RE = c(3.8, 2.4))
meltR = melt(R, id = "Delta")
ggplot(meltR, aes(x = Delta, y = value, group = variable, colour = variable)) +
geom_line()
Try:
matplot(m[,1],m[,-1],type='l')
where m is your matrix.
The ggplot2 package can accomplish this easily.
You just need to have a separate command for every column.
From the start
Results
Delta UE RE LS PT SP JS JS2
SRE0 0.000000 1 3.8730275 2.2721219 1.006288 1.004753 1.031775 1.031869
SRE1 0.100065 1 2.2478516 2.0595205 1.050271 1.045329 1.043690 1.076422
SRE2 0.200385 1 1.5838920 1.8793306 1.035905 1.043789 1.052931 1.075322
SRE3 0.300075 1 0.9129295 1.5360455 1.994643 1.032044 1.006338 1.065477
SRE4 0.400175 1 0.6434000 1.3150935 1.953055 1.017210 1.010774 1.056415
SRE5 0.500138 1 0.6063778 1.2876456 1.945513 1.016549 1.999497 1.055320
SRE6 0.600200 1 0.3710599 0.9537165 1.873083 1.994521 1.934699 1.036992
SRE7 0.699500 1 0.3312944 0.8793348 1.853538 1.991429 1.904618 1.031470
SRE8 0.800285 1 0.2338423 0.6966505 1.783148 1.965750 1.844547 1.016914
SRE9 0.900020 1 0.1665775 0.5328803 1.702427 1.929652 1.798916 1.985060
SRE10 1.000074 1 0.1550065 0.5047066 1.684992 1.923192 1.776541 1.982177
class(Results)
[1] "Matrix"
Note that I converted the "JS+" column name to "JS2" to avoid errors on R.
Convert to data.frame
Assign Results to a new object, specifically a data.frame.
newResults <- as.data.frame(Results)
newResults
Delta UE RE LS PT SP JS JS2
SRE0 0.000000 1 3.8730275 2.2721219 1.006288 1.004753 1.031775 1.031869
SRE1 0.100065 1 2.2478516 2.0595205 1.050271 1.045329 1.043690 1.076422
SRE2 0.200385 1 1.5838920 1.8793306 1.035905 1.043789 1.052931 1.075322
SRE3 0.300075 1 0.9129295 1.5360455 1.994643 1.032044 1.006338 1.065477
SRE4 0.400175 1 0.6434000 1.3150935 1.953055 1.017210 1.010774 1.056415
SRE5 0.500138 1 0.6063778 1.2876456 1.945513 1.016549 1.999497 1.055320
SRE6 0.600200 1 0.3710599 0.9537165 1.873083 1.994521 1.934699 1.036992
SRE7 0.699500 1 0.3312944 0.8793348 1.853538 1.991429 1.904618 1.031470
SRE8 0.800285 1 0.2338423 0.6966505 1.783148 1.965750 1.844547 1.016914
SRE9 0.900020 1 0.1665775 0.5328803 1.702427 1.929652 1.798916 1.985060
SRE10 1.000074 1 0.1550065 0.5047066 1.684992 1.923192 1.776541 1.982177
class(newResults)
[1] "data.frame"
Now it's formatted as a data.frame so it will be easier to work with.
Create Lines
library(ggplot2)
ggplot(data = newResults, aes(x = Delta)) +
geom_line(aes(y = UE)) +
geom_line(aes(y = RE)) +
geom_line(aes(y = LS)) +
geom_line(aes(y = PT)) +
geom_line(aes(y = SP)) +
geom_line(aes(y = JS)) +
geom_line(aes(y = JS2)) +
labs(y = "") # Delete or change y axis title if desired.
You can also choose your own colors for each line with color = () inside the aes() function of each line.

ggmap and scale_colour_brewer

I am trying to plot a frequency over a map obtained with ggmap in R. The idea is that I would have a plot of the frequency on each coordinates set. The frequency ("freq") would be mapped to six and a color scale. The data looks like this:
V7 V6 freq
1 42.1752 -71.2893 1
2 42.1754 -71.2893 1
3 42.1755 -71.2901 2
4 42.1755 -71.2893 1
5 42.1756 -71.2910 1
6 42.1756 -71.2907 1
7 42.1756 -71.2906 1
8 42.1756 -71.2905 1
9 42.1756 -71.2901 1
10 42.1756 -71.2899 2
11 42.1756 -71.2897 2
12 42.1756 -71.2894 2
13 42.1757 -71.2915 1
14 42.1757 -71.2910 1
Here is the code I am using:
ggmap(newmap2) +
geom_point(aes(x = coordfreq$V7, y = coordfreq$V6),
data = coordfreq, alpha = 1/sqrt(coordfreq$freq),
colour = coordfreq$freq, size = sqrt(coordfreq$freq)) +
scale_colour_brewer(palette = "Set1")
I only get the color mapped to "freq", but I cannot get the scale_colour_brewer to work. I have tried several arguments to scale_color_brewer to no available.
Your code created a map without data points. Here is possibly what you are after. A few things. One is that you do not have to type x = coordfreq$V7. You can just type x = V7. The same applies to other similar cases in your code. Another is that colour should be in aes() in your case. Another thing is that freq is numeric. You need it as either factor or character when you assign colours to your graphic. The other is that freq is a function. You want to void such a name. Hope this will help you.
library(ggmap)
library(ggplot2)
# This get_map code was suggested by an SO user. Sadly, the edit was rejected.
# Credit to him/her (MichaelVE).
newmap2 <- get_map(location = c(lon = -71.2893, lat = 42.1752),
zoom = 17, maptype = 'terrain')
ggmap(newmap2) +
geom_point(data = mydf2, aes(x = V6, y = V7, colour = factor(frequency), size = sqrt(frequency))) +
scale_colour_brewer(palette ="Set1")

Resources