Hello stackoverflow community, I have a question regarding coding for ggplot. Here is my code, data format and output at the moment and below is my question.
Data format:
ID time var1 var2 var3
a 1 2 3 4
a 5 6 7 8
b 9 11 12 13
b 14 15 16 17
c . . . .
c . . . .
and so forth
Code:
gg1 <- ggplot() + geom_line(aes(x=TIME, y=Var1, col="red"), FILE) +
geom_line(aes(x=TIME, y=Var2, col="blue"), FILE) +
geom_point(aes(x=TIME, y=Var3), Model_20160806) + facet_wrap( ~ ID)+
xlab("Time (Hr)") + ylab("Concentration (ng/ml)") + ggtitle("x")
I have been struggling in making the plots in the right format and any help would be very much appreciated.
As you can see, the col="red/blue" is displayed as the legend rather than the color? Is there a way to fix it?
How do I add legends for Var1, Var2, Var3 on the bottom of the output?
I have tried adding , facet_wrap( ~ ID, ncol=3) into the code but it doesn't work and provided a null. Is there a way to fix this?
Since there are a lot of cell samples, is there a way to make the graphs onto multiple pages so the graphs are visible and interpretable
Lastly, for better visualization of the transfection data, I tried using gg1+theme_bw(), but this does not work.
Without a reproducible example it is difficult to help you with these questions.
aes(..., col="blue") Doesn't work. Inside aes() everything must refer to a column of your dataframe. If you have a grouping variable in the dataframe, use that to define color. If you want everything to be just blue, define color outside of aes().
Something like scale_colour_manual(values=c("red","green","blue")). Possible duplicate question from Add legend to ggplot2 line plot.
Could you explain what you want to do with facet_wrap( ~ ID, ncol=3)?
Yes that is possible. The easiest way is to make multiple graphs is by splitting your x into groups of 10.
Again a reason why you need a reproducible example. The short answer is, theme_bw() works for me and I have no clue why it wouldn't work for you.
For example:
library(car)
library(ggplot2)
data("diamonds")
ggplot(diamonds, aes(x = carat, y = cut, color = color)) +
geom_point() +
theme_bw()
Edit: to give an example of splitting the dataframe into groups of 10:
# Example data
df = data.frame(x = factor(rep(1:30, each = 10)), y1 = rnorm(300), y2 = rnorm(300))
# Assume that df$x is the grouping variable consisting of too many groups
# Every df$x < 10 becomes 0, 10 < df$ < 20 becomes 1, etc.
df$x2 = floor(as.numeric(df$x) / 10)
# Split the dataframe based on this new grouping variable df$x2
dfSplit = split(df, df$x2)
# do a loop over dfSplit
for (i in 1:length(dfSplit)) {
dfForPlotting = dfSplit[[i]]
# do plotting stuff
ggplot(data = dfForPlotting, aes(x = y1, y = y2, color = x)) + geom_line()
}
Regarding question 2, the easiest way to do this is using the grid package and grid.text().
library(grid)
par(mar=c(6.5, 2, 2, 2))
plot(1:10,1:10)
grid.text(x=0.2, y = 0.05, "Var1 = Birds, Var2 = Bees")
Related
I'm currently trying to develop a surface plot that examines the results of the below data frame. I want to plot the increasing values of noise on the x-axis and the increasing values of mu on the y-axis, with the point estimate values on the z-axis. After looking at ggplot2 and ggplotly, it's not clear how I would plot each of these columns in surface or 3D plot.
df <- "mu noise0 noise1 noise2 noise3 noise4 noise5
1 1 0.000000 0.9549526 0.8908646 0.919630 1.034607
2 2 1.952901 1.9622004 2.0317115 1.919011 1.645479
3 3 2.997467 0.5292921 2.8592976 3.034377 3.014647
4 4 3.998339 4.0042379 3.9938346 4.013196 3.977212
5 5 5.001337 4.9939060 4.9917115 4.997186 5.009082
6 6 6.001987 5.9929932 5.9882173 6.015318 6.007156
7 7 6.997924 6.9962483 7.0118066 6.182577 7.009172
8 8 8.000022 7.9981131 8.0010066 8.005220 8.024569
9 9 9.004437 9.0066182 8.9667536 8.978415 8.988935
10 10 10.006595 9.9987245 9.9949733 9.993018 10.000646"
Thanks in advance.
Here's one way using geom_tile(). First, you will want to get your data frame into more of a Tidy format, where the goal is to have columns:
mu: nothing changes here
noise: need to combine your "noise0", "noise1", ... columns together, and
z: serves as the value of the noise and we will apply the fill= aesthetic using this column.
To do that, I'm using dplyr and gather(), but there are other ways (melt(), or pivot_longer() gets you that too). I'm also adding some code to pull out just the number portion of the "noise" columns and then reformatting that as an integer to ensure that you have x and y axes as numeric/integers:
# assumes that df is your data as data.frame
df <- df %>% gather(key="noise", value="z", -mu)
df <- df %>% separate(col = "noise", into=c('x', "noise"), sep=5) %>% select(-x)
df$noise <- as.integer(df$noise)
Here's an example of how you could plot it, but aesthetics are up to you. I decided to also include geom_text() to show the actual values of df$z so that we can see better what's going on. Also, I'm using rainbow because "it's pretty" - you may want to choose a more appropriate quantitative comparison scale from the RColorBrewer package.
ggplot(df, aes(x=noise, y=mu, fill=z)) + theme_bw() +
geom_tile() +
geom_text(aes(label=round(z, 2))) +
scale_fill_gradientn(colors = rainbow(5))
EDIT: To answer OP's follow up, yes, you can also showcase this via plotly. Here's a direct transition:
p <- plot_ly(
df, x= ~noise, y= ~mu, z= ~z,
type='mesh3d', intensity = ~z,
colors= colorRamp(rainbow(5))
)
p
Static image here:
A much more informative way to show this particular set of information is to see the variation of df$z as it relates to df$mu by creating df$delta_z and then using that to plot. (you can also plot via ggplot() + geom_tile() as above):
df$delta_z <- df$z - df$mu
p1 <- plot_ly(
df, x= ~noise, y= ~mu, z= ~delta_z,
type='mesh3d', intensity = ~delta_z,
colors= colorRamp(rainbow(5))
)
Giving you this (static image here):
ggplot accepts data in the long format, which means that you need to melt your dataset using, for example, a function from the reshape2 package:
dfLong = melt(df,
id.vars = "mu",
variable.name = "noise",
value.name = "meas")
The resulting column noise contains entries such as noise0, noise1, etc. You can extract the numbers and convert to a numeric column:
dfLong$noise = with(dfLong, as.numeric(gsub("noise", "", noise)))
This converts your data to:
mu noise meas
1 1 0 1.0000000
2 2 0 2.0000000
3 3 0 3.0000000
...
As per ggplot documentation:
ggplot2 can not draw true 3D surfaces, but you can use geom_contour(), geom_contour_filled(), and geom_tile() to visualise 3D surfaces in 2D.
So, for example:
ggplot(dfLong,
aes(x = noise
y = mu,
fill = meas)) +
geom_tile() +
scale_fill_gradientn(colours = terrain.colors(10))
Produces:
I am in a bit of a struggle and I can't find a solution (it should be very simple)
my Code is this
df
Ch1 V1 V2 ID
A a1 a2 1
B b1 b2 2
C a1 b2 1
D d1 d2 3
...
in total we have values ranging from 1 to 9.
I simply want to plot how often 1(,2,3,...,9) occurs in this data frame. My code is this
ggplot(df,aes(ID))+ #because I read that leaving y value makes ggplot count the occurences which is T
geom_bar()+
This works but unfortunately I get this as a result
I want all values to be displayed though.
I tried to modify this with scale_x_continuous
but it didn't work (made the whole x-axis go away and display only 1)
I know I can also create a table = table(df)
But I want to find a universal solution. Because later I want to be able to apply this while making several bars per x-axis value with dependency on V1 or V2 ...
Thank you very much for your help!
According to the OP, the intention is to create
several bars per x-axis value with dependency on V1 or V2
This can be solved either by using fill = V1 and position = "dodge" as already suggested H 1 or by facetting. Both approaches have their merits depending on the aspect the OP wants to focus on.
Note that in all variants ID is turned into a discrete variable (using factor()) and by overriding the default axis title to solve the issue with labeling the x-axis.
Dogded position
library(ggplot2)
ggplot(df) +
aes(x = factor(ID), fill = V1) +
geom_bar(position = "dodge") +
xlab("ID")
This is good if the focus is on comparing the differences between V1 within each ID value.
Facets
library(ggplot2)
ggplot(df) +
aes(x = factor(ID), fill = V1) +
geom_bar() +
xlab("ID") +
facet_wrap(~ V1, nrow = 1L)
Here, the focus is on comparing the distribution of ID counts within each V1.
Colouring the bars in addition to faceting is redundant (but I find it aesthetically more pleasing as compared to all-black bars).
Data
As there were no reproducible data supplied in the question, I have tried to simulate the data by
nr <- 1000L
set.seed(123L) # required to reproduce the data
df <- data.frame(Ch1 = sample(LETTERS[1:4], nr, TRUE),
V1 = paste0(sample(letters[1:4], nr, TRUE), "1"),
V2 = paste0(sample(letters[1:4], nr, TRUE), "2"),
ID = pmin(1L + rgeom(nr, 0.3), 9L)
)
"Raw" plot for comparison with OP's chart
library(ggplot2)
ggplot(df) +
aes(x = ID) +
geom_bar()
This question already has an answer here:
Issue when passing variable with dollar sign notation ($) to aes() in combination with facet_grid() or facet_wrap()
(1 answer)
Closed 4 years ago.
I have currently encountered a phenomenon in ggplot2, and I would be grateful if someone could provide me with an explanation.
I needed to plot a continuous variable on a histogram, and I needed to represent two categorical variables on the plot. The following dataframe is a good example.
library(ggplot2)
species <- rep(c('cat', 'dog'), 30)
numb <- rep(c(1,2,3,7,8,10), 10)
groups <- rep(c('A', 'A', 'B', 'B'), 15)
data <- data.frame(species=species, numb=numb, groups=groups)
Let the following code represent the categorisation of a continuous variable.
data$factnumb <- as.factor(data$numb)
If I would like to plot this dataset the following two codes are completely interchangable:
Note the difference after the fill= statement.
p <- ggplot(data, aes(x=factnumb, fill=species)) +
facet_grid(groups ~ .) +
geom_bar(aes(y=(..count..)/sum(..count..))) +
scale_y_continuous(labels = scales::percent)
plot(p):
q <- ggplot(data, aes(x=factnumb, fill=data$species)) +
facet_grid(groups ~ .) +
geom_bar(aes(y=(..count..)/sum(..count..))) +
scale_y_continuous(labels = scales::percent)
plot(q):
However, when working with real-life continuous variables not all categories will contain observations, and I still need to represent the empty categories on the x-axis in order to get the approximation of the sample distribution. To demostrate this, I used the following code:
data_miss <- data[which(data$numb!= 3),]
This results in a disparity between the levels of the categorial variable and the observations in the dataset:
> unique(data_miss$factnumb)
[1] 1 2 7 8 10
Levels: 1 2 3 7 8 10
And plotted the data_miss dataset, still including all of the levels of the factnumb variable.
pm <- ggplot(data_miss, aes(x=factnumb, fill=species)) +
facet_grid(groups ~ .) +
geom_bar(aes(y=(..count..)/sum(..count..))) +
scale_fill_discrete(drop=FALSE) +
scale_x_discrete(drop=FALSE)+
scale_y_continuous(labels = scales::percent)
plot(pm):
qm <- ggplot(data_miss, aes(x=factnumb, fill=data_miss$species)) +
facet_grid(groups ~ .) +
geom_bar(aes(y=(..count..)/sum(..count..))) +
scale_x_discrete(drop=FALSE)+
scale_fill_discrete(drop=FALSE) +
scale_y_continuous(labels = scales::percent)
plot(qm):
In this case, when using fill=data_miss$species the filling of the plot changes (and for the worse).
I would be really happy if someone could clear this one up for me.
Is it just "luck", that in case of plot 1 and 2 the filling is identical, or I have stumbled upon some delicate mistake in the fine machinery of ggplot2?
Thanks in advance!
Kind regards,
Bernadette
Using aes(data$variable) inside is never good, never recommended, and should never be used. Sometimes it still works, but aes(variable) always works, so you should always use aes(variable).
More explanation:
ggplot uses nonstandard evaluation. A standard evaluating R function can only see objects in the global environment. If I have data named mydata with a column name col1, and I do mean(col1), I get an error:
mydata = data.frame(col1 = 1:3)
mean(col1)
# Error in mean(col1) : object 'col1' not found
This error happens because col1 isn't in the global environment. It's just a column name of the mydata data frame.
The aes function does extra work behind the scenes, and knows to look at the columns of the layer's data, in addition to checking the global environment.
ggplot(mydata, aes(x = col1)) + geom_bar()
# no error
You don't have to use just a column inside aes though. To give flexibility, you can do a function of a column, or even some other vector that you happen to define on the spot (if it has the right length):
# these work fine too
ggplot(mydata, aes(x = log(col1))) + geom_bar()
ggplot(mydata, aes(x = c(1, 8, 11)) + geom_bar()
So what's the difference between col1 and mydata$col1? Well, col1 is a name of a column, and mydata$col1 is the actual values. ggplot will look for columns in your data named col1, and use that. mydata$col1 is just a vector, it's the full column. The difference matters because ggplot often does data manipulation. Whenever there are facets or aggregate functions, ggplot is splitting your data up into pieces and doing stuff. To do this effectively, it needs to know identify the data and column names. When you give it mydata$col1, you're not giving it a column name, you're just giving it a vector of values - whatever happens to be in that column, and things don't work.
So, just use unquoted column names in aes() without data$ and everything will work as expected.
Before I get to my question, I should point out that I am new in R, and this question might be simplicity itself for an experienced user.
I want to use ggplot2 to take full advantage of all the functionalities therein. However, I have encountered a problem that I have not been able to solve.
If I have a data frame as follows:
df = as.data.frame(cbind(rnorm(100,35:65),rnorm(100,25:35),rnorm(100,15:20),rnorm(100,5:10),rnorm(100,0:5)))
header = c("A","B","C","D","E")
names(df) = make.names(header)
Plotting the data, where rows are Y and X is columns can readily be done in base R like e.g. this:
par(mfrow=c(2,0))
stripchart(df, vertical = TRUE, method = 'jitter')
boxplot(df)
The picture shows the stripchart & boxplot of the data
However, the same cannot readily be done in ggplot2, as x and y input are required. All examples I have found plots one column vs another column, or process the data into the column format. Yet, I want to set y as the rows in my df and the x as the columns. How can this be accomplished?
You'll need to reshape your data in order to get those graphs. I think this is what you're looking for:
> library(ggplot2)
> library(reshape2)
> df = as.data.frame(cbind(rnorm(100,35:65),rnorm(100,25:35),rnorm(100,15:20),rnorm(100,5:10),rnorm(100,0:5)))
> header = c("A","B","C","D","E")
> names(df) = make.names(header)
> df = melt(df)
No id variables; using all as measure variables
> head(df)
variable value
1 A 36.75505
2 A 35.68714
3 A 36.44952
4 A 38.77236
5 A 39.79136
6 A 39.39672
> ggplot(df, aes(x = variable, y = value))
> ggplot(df, aes(x = variable, y = value)) + geom_boxplot()
> ggplot(df, aes(x = variable, y = value)) + geom_point(shape = 0, size = 20)
Here is the box plot:
Here is the strip chart:
You can change the settings in aes() options. See here for more info.
So I have a data frame which I will call R. Looks something like this:
zep SEX AGE BMI
1 O F 3.416667 16.00000
2 O F 3.833333 14.87937
3 O G 3.416667 14.80223
4 O F 4.000000 15.09656
5 N G 3.666667 16.50000
6 O G 4.000000 16.49102
7 N G 3.916667 16.02413
With this data frame I want to plot multiple histograms comparing different aspects like how gender effects BMI. Like so:
par(mfrow=c(1,3)
boxplot(DF$BMI ~ DF$ZEP)
boxplot(DF$BMI ~ DF$GENDER)
boxplot(DF$BMI ~ ~ DF$AGE)
But for some reason the columns are made to be in characters instead of factors.
Now I pose this, is there a way to plot these if they are characters? If not,what can I do?
Also is there a way maybe to change zep and sex into a vector of logical factors? Maybe like in zep if O then true (1) if not then false (0), and the same thing for SEX. If G then true (1) if not then false (0).
I have to plot categorical variables for me advanced data analysis class. I can help you out. beedstands for border entry and employment data, don't steal my research plz.
The code I use to create factors is for example: (I have a column called portname that is dummy variables in a column, to create a column with factor variables (the names) This is how I would make the logical you describe. I've added that code with the larger code chunk below.
beed$portdisc <- as.numeric(beed$portname)
beed$portdisc[beed$portdisc==0] <- "Columbus Port of Entry"
beed$portdisc[beed$portdisc==1] <- "Santa Teresa Port of Entry"
beed$portdisc[beed$portdisc==2] <- "New Mexico All Ports Aggregate"
So what I've done here is taken by dataframe beed and used the specific column containing my portname variables. I add a new column to my dataframe called beed$portdisc then using the [ ] I define what I want to label as what.
In your case I think this should work (think, but I've tested by using the data you provided).
I have a hard time making the labels come out right with discrete variables. My apologies but this gets you very close.
library(ggplot2)
DF$SEX.factor <- as.character(DF$SEX)
DF$SEX.factor[DF$SEX.factor== "G"] <- "0"
DF$SEX.factor[DF$SEX.factor== "F"] <- "1"
DF$SEX.factor <- as.factor(DF$SEX.factor)
bar <- ggplot()
bar <- bar + geom_bar(data = DF$Sex.factor, aes(x=DF$SEX.factor),binwidth = .5)+ xlab("Sex")
bar <- bar + scale_x_discrete(limits = c(0,1,2), breaks= c(0,1,2), labels = c(" ","Male" ,"Female"))
bar
# DF.BMI5 = cut(DF$BMI,pretty(DF$BMI,5)) # Creates close to 5 integer ranges as factors, actomatically chooses pretty scales.
# This would be good to compair say age and BMI, best with one discreate and one continious variable
p <- ggplot(DF, aes(x = SEX.factor, y = BMI))
p <- p + geom_boxplot(width = 0.25, alpha = 0.4)
p <- p + geom_jitter(position = position_jitter(width = 0.1), alpha = .35, color = "blue")
# diamond at mean for each group
p <- p + stat_summary(fun.y = mean, geom = "point", shape = 18, size = 6,
colour = "red", alpha = 0.8)
p <- p + scale_x_discrete(limits = c(0,1,2), breaks= c(0,1,2), labels = c(" ","Male" ,"Female")) + xlab("Sex")
p
Here is what I got when I ran this code on my own data. I think this is what you're looking to create, I've included the code above. It'll work with anything where x is a discrete variable, just use the at.factor() and set y as type continuous. function/
If you need any more help just let me know, I like to help out people on here because it helps me hone my R skills. I'm more of an Visual Studio kind of guy, VBA is my friend.
Hope this helps!
If you ever need to change a character to a factor, you can always use as.factor('A'), for instance.