Simple df plotting in R - r

I am a new user of R and I have a dataframe with three columns car,var and val. I have about 90 rows and I want to plot the two columns var and val. My data frame looks like
car var val
a kl -14
b km -1
c kn -3
d ko -20
I tried this plot(data$var,data$val) but I want to have something like this with X axis the var and Y axis the val. How can I do this with ggplot?

You can make a similar plot to the one you post using geom_line. You need to use the aesthetic group = 1 because the x-axis data are discrete and each group has only a single observation.
df <- read.table(header = TRUE, text = "
car var val
a kl -14
b km -1
c kn -3
d ko -20
")
ggplot(df, aes(x = var, y = val, group = 1)) +
geom_line(colour = "green")
Given that the x-axis data are discrete, it probably makes more sense to use a geom_bar to get a bar plot.
ggplot(df, aes(x = var, y = val, group = 1)) +
geom_bar(stat = "identity")

Related

How to customise the colors in stacked bar charts

Maybe a question someone already asked.
I have a data frame (dat) that looks like this:
Sample perc cl
a 30 0
b 22 0
s 2 0
z 19 0
a 12 1
b 45 1
s 70 1
z 1 1
a 60 2
b 67 2
s 50 2
z 18 2
I would like to generate a stacked barplot. To do this I used the following:
g = ggplot(dat, aes(x = cl, y = Perc,fill = Sample)
g + geom_bar(stat="identity", position = "fill", show.legend = FALSE) +
scale_fill_manual(name = "Samples", values=c("a"="blue","b" = "blue","s" = "gray","z" = "red"))`
Fortunately the colors are assigned correctly. My point is that the order of samples in the bar is from a to z from the top to the bottom of the bar but I would like a situation in which the gray is on the top without loss of continuity in the bar from the blue to the red. Maybe there's another way to color the bars and set the desired order.
The groups are plotted in the bars in the order of the factor levels. You can change the plotting order by changing the order of the factor levels in your call to aes with factor(var, levels(var[order])) like this:
library(ggplot2)
ggplot(dat, aes(x = cl, y = perc,
fill = factor(Sample, levels(Sample)[c(3,1,2,4)]))) +
geom_bar(stat="identity", position = "fill", show.legend = FALSE) +
scale_fill_manual(name = "Samples",
values=c("a"="blue","b" = "blue","s" = "gray","z" = "red"))

ggplot facet_wrap with equally spaced axes

Say I have the following dummy data frame:
df <- data.frame(let = LETTERS[1:13], value = sample(13),
group = rep(c("foo", "bar"), times = c(5,8)))
df
let value group
1 A 2 foo
2 B 1 foo
3 C 12 foo
4 D 8 foo
5 E 4 foo
6 F 13 bar
7 G 11 bar
8 H 3 bar
9 I 7 bar
10 J 5 bar
11 K 10 bar
12 L 9 bar
13 M 6 bar
Using ggplot with facet_wrap allows me to make a panel for each of the groups...
library(ggplot2)
ggplot(df, aes(x= let, y = value)) +
geom_point() +
coord_flip() +
facet_wrap(~group, scales = "free")
..but the vertical axes are not equally spaced, i.e. the left plot contains more vertical ticks than the right one. I would like to fill up the right vertical axis with (unlabeled) ticks (with no plotted values). In this case that would add 3 empty ticks, but it should be scalable to any df size.
What is the best way to accomplish this? Should I change the data frame, or is there a way to do this using ggplot?
I’m not sure why you want to arrange the categorical variable on your chart as you do other than aesthetics (it does seem to look better). At any rate, a simple workaround which seems to handle general cases is to note that ggplot uses a numerical scale to plot categorical variables. The workaround for your chart is then for each x value to plot a transparent point at the y value equal to the number of categorical variables. Points are plotted for all x values as a simple solution to the case of non-overlapping ranges of x values for each group. I've added another group to your data frame to make the example a bit more general.
library(ggplot2)
set.seed(123)
df <- data.frame(let = LETTERS[1:19], value = c(sample(13),20+sample(6)),
group = rep(c("foo", "bar", "bar2"), times = c(5,8,6)))
num_rows <- xtabs(~ group, df)
max_rows <- max(num_rows)
sp <- ggplot(df, aes(y= let, x = value)) +
geom_point() +
geom_point(aes(y = max_rows +.5), alpha=0 ) +
facet_wrap(~group, scales = "free", nrow=1 )
plot(sp)
This gives the following chart:
A cludgy solution that requires magrittr (for the compound assignment pipe %<>%):
df %<>%
rbind(data.frame(let = c(" ", " ", " "),
value = NA,
group = "foo"))
I just add three more entries for foo that are blank strings (i.e., just spaces) of different lengths. There must be a more elegant solution, though.
Use free_x instead of free, like this:
ggplot(df, aes(x= let, y = value)) +
geom_point() +
coord_flip() +
facet_wrap(~group, scales = "free_x")+
theme(axis.text.y=element_blank(),
axis.ticks.y=element_blank())

Plotting variable means for each level of the independent variable. R

Given the next code and dataframe:
require(data.table)
require(ggplot2)
dat1 <- fread('J S1 S2 S3 S4 Z
1 4 5 3 2 0
1 6 5 6 5 1
2 3 5 8 9 0
2 12 11 34 44 1
3 11 23 23 22 0
3 12 15 22 21 1')
temp <- melt(dat1, id.vars = c("J", "Z"))
ggplot(temp, aes(x = J, y = value, color = variable, shape = as.factor(Z))) +
geom_point()
I'd like to plot in the same graph the mean of values (S1, S2, S3, S4) for each level of J. I mean, for S1, get 3 points in my graph: 5.5, 7.5, 11.5. For S2, another 3 points, and so on...
I'm trying this:
ggplot(temp, aes(x = J, y = mean(value), color = variable, shape = as.factor(Z))) +
geom_point()
Plot
I get only one point for each full set of data. But I'd like to get in the same graph the mean of S1 for each level of J (1,2,3), the mean of S2 for each level of J, the mean of S3 for each level of J, and the mean of S4 for each level of J.
You need to add rows for mean in your data.
Please let me know if this make sense or you wish to have something different.
You can do:
library(data.table)
temp1 <- setDT(temp)[,.(value = mean(value)),by=.(J,variable)]
ggplot(temp1, aes(x = J, y = value, color=factor(variable))) +
geom_point()
OR you can do :
ggplot(temp1, aes(x = variable, y = value, color=factor(J))) +
geom_point()
EDIT, after OP's request:
To get Z variable into account, you need to summarize the data basis Z as well like below and then plot:
temp1 <- setDT(temp)[,.(value = mean(value)),by=.(J,variable,Z)]
ggplot(temp1, aes(x = variable, y = value, color=factor(J),shape=factor(Z))) +
geom_point()
Now the plot contains three categorical variables, "variable","J" and "Z", you can play with them by switching them interchangeably to see what fits your need, don't forget to use factor() before them in case you want to use shape and color in the aesthetics. If you want to draw a graph for 0s and 1s separately then you have to use facet_wrap, like below:
ggplot(temp1, aes(x = variable, y = value, color=factor(J),shape=factor(Z))) +
geom_point() + facet_wrap(~Z)

Stacked bar chart

I would like to create a stacked chart using ggplot2 and geom_bar.
Here is my source data:
Rank F1 F2 F3
1 500 250 50
2 400 100 30
3 300 155 100
4 200 90 10
I want a stacked chart where x is the rank and y is the values in F1, F2, F3.
# Getting Source Data
sample.data <- read.csv('sample.data.csv')
# Plot Chart
c <- ggplot(sample.data, aes(x = sample.data$Rank, y = sample.data$F1))
c + geom_bar(stat = "identity")
This is as far as i can get. I'm not sure of how I can stack the rest of the field values.
Maybe my data.frame is not in a good format?
You said :
Maybe my data.frame is not in a good format?
Yes this is true. Your data is in the wide format You need to put it in the long format. Generally speaking, long format is better for variables comparison.
Using reshape2 for example , you do this using melt:
dat.m <- melt(dat,id.vars = "Rank") ## just melt(dat) should work
Then you get your barplot:
ggplot(dat.m, aes(x = Rank, y = value,fill=variable)) +
geom_bar(stat='identity')
But using lattice and barchart smart formula notation , you don't need to reshape your data , just do this:
barchart(F1+F2+F3~Rank,data=dat)
You need to transform your data to long format and shouldn't use $ inside aes:
DF <- read.table(text="Rank F1 F2 F3
1 500 250 50
2 400 100 30
3 300 155 100
4 200 90 10", header=TRUE)
library(reshape2)
DF1 <- melt(DF, id.var="Rank")
library(ggplot2)
ggplot(DF1, aes(x = Rank, y = value, fill = variable)) +
geom_bar(stat = "identity")
Building on Roland's answer, using tidyr to reshape the data from wide to long:
library(tidyr)
library(ggplot2)
df <- read.table(text="Rank F1 F2 F3
1 500 250 50
2 400 100 30
3 300 155 100
4 200 90 10", header=TRUE)
df %>%
gather(variable, value, F1:F3) %>%
ggplot(aes(x = Rank, y = value, fill = variable)) +
geom_bar(stat = "identity")
You will need to melt your dataframe to get it into the so-called long format:
require(reshape2)
sample.data.M <- melt(sample.data)
Now your field values are represented by their own rows and identified through the variable column. This can now be leveraged within the ggplot aesthetics:
require(ggplot2)
c <- ggplot(sample.data.M, aes(x = Rank, y = value, fill = variable))
c + geom_bar(stat = "identity")
Instead of stacking you may also be interested in showing multiple plots using facets:
c <- ggplot(sample.data.M, aes(x = Rank, y = value))
c + facet_wrap(~ variable) + geom_bar(stat = "identity")

How to plot stacked proportional graph?

I have a data frame:
x <- data.frame(id=letters[1:3],val0=1:3,val1=4:6,val2=7:9)
id val0 val1 val2
1 a 1 4 7
2 b 2 5 8
3 c 3 6 9
I want to plot a stacked bar plot that shows the percentage of each columns. So, each bar represents one row and and each bar is of length but of three different colors each color representing percentage of val0, val1 and val2.
I tried looking for it, I am getting only ways to plot stacked graph but not stacked proportional graph.
Thanks.
Using ggplot2
For ggplot2 and geom_bar
Work in long format
Pre-calculate the percentages
For example
library(reshape2)
library(plyr)
# long format with column of proportions within each id
xlong <- ddply(melt(x, id.vars = 'id'), .(id), mutate, prop = value / sum(value))
ggplot(xlong, aes(x = id, y = prop, fill = variable)) + geom_bar(stat = 'identity')
# note position = 'fill' would work with the value column
ggplot(xlong, aes(x = id, y = value, fill = variable)) +
geom_bar(stat = 'identity', position = 'fill', aes(fill = variable))
# will return the same plot as above
base R
A table object can be plotted as a mosaic plot. using plot. Your x is (almost) a table object
# get the numeric columns as a matrix
xt <- as.matrix(x[,2:4])
# set the rownames to be the first column of x
rownames(xt) <- x[[1]]
# set the class to be a table so plot will call plot.table
class(xt) <- 'table'
plot(xt)
you could also use mosaicplot directly
mosaicplot(x[,2:4], main = 'Proportions')

Resources