Using ggplot2 to create a plot with more than 2 variables - r

I am working on a dataset with multiple variables as seen on sample below;(actual data set contains:84 obs. 24 variables). I want to create a single plot which takes in all the variables as opposed to creating a single plot for each variable.
Fruit Vitamin A(mg) Vitamin C(mg) Calcium(mg)
Pear 61 8 11
Apple 10 2 3
Cherry 35 10 11
Fig 5 2 67
I have tried the code below, an altered version of one suggested in one of the forums;
library(ggplot2)
g<- ggplot(FR, aes(Fruit)
g + geom_point() + facet_grid(. ~ FR[2:26,])
I get error;
Error: unexpected symbol in: "g<- ggplot(FR, aes(Fruit) g"
I am open to any better suggestions for alternatives to represent the dataset.

How about this:
To do this, you need to reshape your dataset using gather{tidyr}. Here is a reproducible example on how to do this:
# load libraries
library(ggplot2)
library(ggthemes)
library(tidyr)
library(googleVis)
# get data for a reproducible example
data("Fruits")
colnames(Fruits)[4] <- "Vitamin A(mg)"
colnames(Fruits)[5] <- "Vitamin C(mg)"
colnames(Fruits)[6] <- "Calcium(mg)"
Fruits <- Fruits[ c("Fruit","Vitamin A (mg)" , "Vitamin C (mg)", "Calcium (mg)")]
# reshape the dataset
df <- gather(data=Fruits, key=Fruit)
colnames(df)[2] <- "vitamin"
# Plot !
ggplot(data=df) +
geom_point(aes(x=vitamin, y=value , color=vitamin)) +
facet_grid(Fruit~., scale="free_x") +
theme_minimal()

I believe you're missing a closing parenthesis. Change:
g<- ggplot(FR, aes(Fruit)
to
g<- ggplot(FR, aes(Fruit))
In my experience, "unexpected symbol" errors usually mean you forgot to close parentheses or braces.

You aren't specifying the axes enough.
ggplot(FR, aes(x = Fruit, y = Vitamin A(mg),
shape = as.factor(Fruit),
color = as.factor(Fruit))) +
geom_point() +
geom_point(aes(x = Fruit, y = Vitamin C(mg))) +
geom_point(aes(x = Fruit, y = Calcium(mg)))
Is that what you wanted?

Related

Trying to make a bar chart with each categorical column as a different color

I found a cool Wes Anderson palette package but I am failing here in actually using it. The variable I am looking at (Q1) has options 1 and 2. There is an NA in the set which is getting plotted however I would like to remove it as well.
library(readxl)
library(tidyverse)
library(wesanderson)
RA_Survey <- read_excel("file extension")
ggplot(data = RA_Survey, mapping = aes(x = Q1)) +
geom_bar() + scale_fill_manual(values=wes_palette(n=2, name="GrandBudapest"))
The plot I'm getting is working but without the color. Any ideas?
There are several issues which need to be addressed.
Using the Wes Anderson palette
As already mentioned by Mako, the fill aesthetic was missing from the call to aes().
Furthermore, the OP reports an error message saying Palette not found. The wesanderson package contains a list of available palettes:
names(wesanderson::wes_palettes)
[1] "BottleRocket1" "BottleRocket2" "Rushmore1" "Rushmore" "Royal1" "Royal2" "Zissou1"
[8] "Darjeeling1" "Darjeeling2" "Chevalier1" "FantasticFox1" "Moonrise1" "Moonrise2" "Moonrise3"
[15] "Cavalcanti1" "GrandBudapest1" "GrandBudapest2" "IsleofDogs1" "IsleofDogs2"
There is no palette called "GrandBudapest" as requested in OP's code. Instead, we have to choose between "GrandBudapest1" and "GrandBudapest2".
Also, the help file help("wes_palette") lists the available palettes.
Here is a working example which uses the dummy data created in the Data section below:
library(ggplot2)
library(wesanderson)
ggplot(RA_Survey, aes(x = Q1, fill = Q1)) +
geom_bar() +
scale_fill_manual(values=wes_palette(n=2, name="GrandBudapest1"))
Removing NA
The OP has asked to remove the NAs from the set. There are two options:
Tell ggplot() to remove the NAs.
Remove the NAs from te data by filtering.
We can tell ggplot() to remove NAs when plotting the x axis:
library(ggplot2)
library(wesanderson)
ggplot(RA_Survey, aes(x = Q1, fill = Q1)) +
geom_bar() +
scale_fill_manual(values=wes_palette(n=2, name="GrandBudapest1")) +
scale_x_discrete(na.translate = FALSE)
Note, this produces a warning message Removed 3 rows containing non-finite values (stat_count). To get rid of the message, we can use geom_bar(na.rm = TRUE).
The other option removes the NAs from the data by filtering
library(dplyr)
library(ggplot2)
library(wesanderson)
ggplot(RA_Survey %>% filter(!is.na(Q1)), aes(x = Q1, fill = Q1)) +
geom_bar() +
scale_fill_manual(values=wes_palette(n=2, name="GrandBudapest1"))
which creates exactly the same chart.
Data
As the OP has not provided a sample dataset, we need to create our own:
library(dplyr)
set.seed(123L)
RA_Survey <- data_frame(Q1 = sample(c("1", "2", NA), 20, TRUE, c(3, 6, 1)))
RA_Survey
# A tibble: 20 x 1
Q1
<chr>
1 2
2 1
3 2
4 1
5 NA
6 2
7 2
8 1
9 2
10 2
11 NA
12 2
13 1
14 2
15 2
16 1
17 2
18 2
19 2
20 NA

Tail of my df is being plotted in the beginning of my plot

I have a dataframe that contains time(H:M:S), thetaX(degrees), thetaY(degrees), and thetaZ(degress). I want to plot the degrees vs time using ggplot as mentioned here.
This is the original state of my dataframe:
> head(df)
time thetaX thetaY thetaZ
1 08:27:27 0.01539380 -0.001609785 -0.03271715
2 08:27:27 0.03079389 -0.003863202 -0.06512209
3 08:27:27 0.04588598 -0.006668402 -0.09720450
4 08:27:28 0.06008822 -0.008774166 -0.12872514
5 08:27:28 0.07400642 -0.008951306 -0.15985775
6 08:27:28 0.08823425 -0.012280650 -0.19023676
I run these lines to plot each column of df over time:
df = data.frame(time, thetaX,thetaY,thetaZ)
> df.m = melt(df,id="time")
> ggplot(data = df.m, aes(x = x, y = value)) + geom_point() + facet_grid(variable ~ .)
But, this is what comes out:
Question: Why is my data plotting from the what looks like the tail end at #1pm-ish of my df then jumping across to the beginning #8am-ish and finishing through the rest?

Color points if ID in vector in ggplot2

I have imported data in this form:
Sample1 Sample2 Identity
1 2 chr11-50-T
3 4 chr11-200-A
v <- read.table("myfile", header = TRUE)
I have a vector that looks like this:
x <- c(50,100)
And without some other aesthetic stuff I am plotting column 1 vs column 2 labeled with column 3.
p <- ggplot(v, aes(x=sample1, y=sample2, alpha=0.5, label=identity)) +
geom_point() +
geom_text_repel(aes(label=ifelse(sample2>0.007 |sample1>0.007 ,as.character(identity),''))) +
I would like to somehow indicate those points that contain a number in their ID, found within the vector x. I was thinking this could be done with color, but it doesn't really matter to me as long as there is a difference between the two types of points.
So for instance if the points containing a number in x were to be colored red, the first point would be red because it has 50 in the ID and the second point would not be, because 200 is not a value in x.
You could add in a TRUE/FALSE value as a column and use that as a color. I had to remove your label = ... aes since that's not an aes in ggplot2. Also everything is transparent because you use aes(alpha = 0.5):
library(ggrepel)
library(ggplot2)
vafs$col <- grepl(paste0(x,collapse = "|"), vafs$Identity)
p <- ggplot(vafs, aes(x=Sample1, y=Sample2, alpha=0.5, color = col)) +
geom_point() +
geom_text_repel(aes(label=ifelse(Sample2>0.007 |Sample1>0.007 ,as.character(Identity),'')))
I came up with the following solution:
vafs<-read.table(text="Sample1 Sample2 Identity
1 2 chr11-50-T
3 4 chr11-200-A", header=T)
vec <- c(50,100)
vafs$vec<- sapply(vafs$Identity, FUN=function(x)
ifelse(length(grep(pattern=paste(vec,collapse="|"), x))>0,1,0))
vafs$vec <- as.factor(vafs$vec)
ggplot(vafs, aes(x=Sample1, y=Sample2, label=Identity, col=vec),alpha=0.5)+geom_point()

ggplot2 adding custom legend when plotting two lines from subset of columns

I've looked all over stack and other sites to fix my code but can't see what's wrong. I am trying to plot 2 lines on the same graph on ggplot that are portions of 2 different columns. For example, I have a column of length 8 of which the first four rows are M (male) and the last four rows are F (female). I have two columns of data and one column for condition (factor).
ModelMF <- data.frame(ProbGender, ProbCond, ProbMF, Act_pct)
where:
ProbGender ProbCond ProbMF Act_pct
M 0 .75 .71
M 10 .67 .69
M 20 .61 .54
M 30 .81 .77
F 0 .88 .82
F 10 .73 .71
F 20 .67 .71
F 30 .60 .63
I have tried the following but I keep getting errors (see below):
ggplot(data = ModelMF, aes(x = ProbCond)) + geom_line(data =
ModelMF[ModelMF$ProbGender=="M",], aes(y=ProbMF), color = 'col1') +
geom_point(data = ModelMF[ModelMF$ProbGender=="M",], aes(y = ProbMF)) +
geom_line(data = ModelMF[ModelMF$ProbGender=="M",], aes(y=Act_pct), color =
'col2') + geom_point(data = ModelMF[ModelMF$ProbGender=="M",], aes(y =
Act_pct)) + scale_color_manual(values = c('col1' = 'darkblue', 'col2' ='lightblue'))
Preferably I would like to be able to create a custom legend that lets me map the colors as I've attempted to do using scale_color_manual, but I get the following error:
Error in grDevices::col2rgb(colour, TRUE) : invalid color name 'col1'
I'm not sure if it is due to the fact that I'm subsetting data within the df or something else I'm just missing? Also if I add the female lines I assume I can simply follow the same procedure?
Thanks in advance.

Can't produce a barplot after "casting"

After succesfully performing a cast (using the reshape package) on a small data set I obtain the following frame(e_disp) which is what I am looking for.
Date Code 200g
1 2010/06/01 cg4j 0.519880141
2 2010/09/19 7gv2 0.158999682
3 2011/04/14 zl94 0.294174203
4 2011/05/27 a13t 0.140232549
My problem is that I wish to create a barplot which has the values under the column 200g plotted in bars with the x-axis being the date and each bar having the code associated with value. (This could also be on the x-axis above or below the date)
My problem is that I get the following error
"Error in barplot.default(e_disp) : 'height' must be a vector or a matrix"
So my questions are
1) Can what I am trying to do be done after using 'cast'
2) If so any suggestions as to how to accomplish this
Any help would be appreciated
This is quite easily done with ggplot2. Here is an example
# generate dummy data
mydf = data.frame(date = 1:5, code = letters[1:5], value = rpois(5, 40))
# plot it using ggplot2
library(ggplot2)
pl = ggplot(mydf, aes(x = date, y = value)) +
geom_bar(stat = 'identity') +
geom_text(aes(label = code), vjust = -1)
print(p1)
Is this what you are after:
dat <- read.table(textConnection("Date Code x200g
1 2010/06/01 cg4j 0.519880141
2 2010/09/19 7gv2 0.158999682
3 2011/04/14 zl94 0.294174203
4 2011/05/27 a13t 0.14023254"), header=TRUE, as.is=TRUE)
dat$Date <- as.Date(dat$Date)
Pasting the Date and Code columns separated by linefeed (\n") to make labels:
barplot(dat$x200g, names.arg=paste(dat$Date,"\n", dat$Code, sep=""), ylab=" ")

Resources