I have two functions, a and b, that each take a value of x from 1-3 and produce an estimate and an error.
x variable estimate error
1 a 8 4
1 b 10 2
2 a 9 3
2 b 10 1
3 a 8 5
3 b 11 3
I'd like to use geom_path() in ggplot to plot the estimates and errors for each function as x increases.
So if this is the data:
d = data.frame(x=c(1,1,2,2,3,3),variable=rep(c('a','b'),3),estimate=c(8,10,9,10,8,11),error=c(4,2,3,1,5,3))
Then the output that I'd like is something like the output of:
ggplot(d,aes(x,estimate,color=variable)) + geom_path()
but with the thickness of the line at each point equal to the size of the error. I might need to use something like geom_polygon(), but I haven't been able to find a good way to do this without calculating a series of coordinates manually.
If there's a better way to visualize this data (y value with confidence intervals at discrete x values), that would be great. I don't want to use a bar graph because I actually have more than two functions and it's hard to track the changing estimate/error of any specific function with a large group of bars at each x value.
The short answer is that you need to map size to error so that the size of the geometric object will vary depending on the value, error in this case. There are many ways to do what you want like you have suggested.
df = data.frame(x = c(1,1,2,2,3,3),
variable = rep(c('a','b'), 3),
estimate = c(8,10,9,10,8,11),
error = c(4,2,3,1,5,3))
library(ggplot2)
ggplot(df, aes(x, estimate, colour = variable, group = variable, size = error)) +
geom_point() + theme(legend.position = 'none') + geom_line(size = .5)
I found geom_ribbon(). The answer is something like this:
ggplot(d,aes(x,estimate,ymin=estimate-error,ymax=estimate+error,fill=variable)) + geom_ribbon()
Related
As an R-beginner, there's one hurdle that I just can't find the answer to. I have a table where I can see the amount of responses to a question according to gender.
Response
Gender
n
1
1
84
1
2
79
2
1
42
2
2
74
3
1
84
3
2
79
etc.
I want to plot these in a column chart: on the y I want the n (or its proportions), and on the x I want to have two seperate bars: one for gender 1, and one for gender 2. It should look like the following example that I was given:
The example that I want to emulate
However, when I try to filter the columns according to gender inside aes(), it returns an error! Could anyone tell me why my approach is not working? And is there another practical way to filter the columns of the table that I have?
ggplot(table) +
geom_col(aes(x = select(filter(table, gender == 1), Q),
y = select(filter(table, gender == 1), n),
fill = select(filter(table, gender == 2), n), position = "dodge")
Maybe something like this:
library(RColorBrewer)
library(ggplot2)
df %>%
ggplot(aes(x=factor(Response), y=n, fill=factor(Gender)))+
geom_col(position=position_dodge())+
scale_fill_brewer(palette = "Set1")
theme_light()
Your answer does not work, because you are assigning the x and y variables as if it was two different datasets (one for x and one for y). In line with the solution from TarJae, you need to think of it as the axis in a diagram - so you need for your x axis to assign the categorical variables you are comparing, and you want for the y axis to assign the numerical variables which determines the height of the bars. Finally, you want to compare them by colors, so each group will have a different color - that is where you include your grouping variable (here, I use fill).
library(dplyr) ## For piping
library(ggplot2) ## For plotting
df %>%
ggplot(aes(x = Response, y = n, fill = as.character(Gender))) +
geom_bar(stat = "Identity", position = "Dodge")
I am adding "Identity" because the default in geom_bar is to count the occurences in you data (i.e., if you data was not aggregated). I am adding "Dodge" to avoid the bars to be stacked. I will recommend you, to look at this resource for more information: https://r4ds.had.co.nz/index.html
I'm currently trying to develop a surface plot that examines the results of the below data frame. I want to plot the increasing values of noise on the x-axis and the increasing values of mu on the y-axis, with the point estimate values on the z-axis. After looking at ggplot2 and ggplotly, it's not clear how I would plot each of these columns in surface or 3D plot.
df <- "mu noise0 noise1 noise2 noise3 noise4 noise5
1 1 0.000000 0.9549526 0.8908646 0.919630 1.034607
2 2 1.952901 1.9622004 2.0317115 1.919011 1.645479
3 3 2.997467 0.5292921 2.8592976 3.034377 3.014647
4 4 3.998339 4.0042379 3.9938346 4.013196 3.977212
5 5 5.001337 4.9939060 4.9917115 4.997186 5.009082
6 6 6.001987 5.9929932 5.9882173 6.015318 6.007156
7 7 6.997924 6.9962483 7.0118066 6.182577 7.009172
8 8 8.000022 7.9981131 8.0010066 8.005220 8.024569
9 9 9.004437 9.0066182 8.9667536 8.978415 8.988935
10 10 10.006595 9.9987245 9.9949733 9.993018 10.000646"
Thanks in advance.
Here's one way using geom_tile(). First, you will want to get your data frame into more of a Tidy format, where the goal is to have columns:
mu: nothing changes here
noise: need to combine your "noise0", "noise1", ... columns together, and
z: serves as the value of the noise and we will apply the fill= aesthetic using this column.
To do that, I'm using dplyr and gather(), but there are other ways (melt(), or pivot_longer() gets you that too). I'm also adding some code to pull out just the number portion of the "noise" columns and then reformatting that as an integer to ensure that you have x and y axes as numeric/integers:
# assumes that df is your data as data.frame
df <- df %>% gather(key="noise", value="z", -mu)
df <- df %>% separate(col = "noise", into=c('x', "noise"), sep=5) %>% select(-x)
df$noise <- as.integer(df$noise)
Here's an example of how you could plot it, but aesthetics are up to you. I decided to also include geom_text() to show the actual values of df$z so that we can see better what's going on. Also, I'm using rainbow because "it's pretty" - you may want to choose a more appropriate quantitative comparison scale from the RColorBrewer package.
ggplot(df, aes(x=noise, y=mu, fill=z)) + theme_bw() +
geom_tile() +
geom_text(aes(label=round(z, 2))) +
scale_fill_gradientn(colors = rainbow(5))
EDIT: To answer OP's follow up, yes, you can also showcase this via plotly. Here's a direct transition:
p <- plot_ly(
df, x= ~noise, y= ~mu, z= ~z,
type='mesh3d', intensity = ~z,
colors= colorRamp(rainbow(5))
)
p
Static image here:
A much more informative way to show this particular set of information is to see the variation of df$z as it relates to df$mu by creating df$delta_z and then using that to plot. (you can also plot via ggplot() + geom_tile() as above):
df$delta_z <- df$z - df$mu
p1 <- plot_ly(
df, x= ~noise, y= ~mu, z= ~delta_z,
type='mesh3d', intensity = ~delta_z,
colors= colorRamp(rainbow(5))
)
Giving you this (static image here):
ggplot accepts data in the long format, which means that you need to melt your dataset using, for example, a function from the reshape2 package:
dfLong = melt(df,
id.vars = "mu",
variable.name = "noise",
value.name = "meas")
The resulting column noise contains entries such as noise0, noise1, etc. You can extract the numbers and convert to a numeric column:
dfLong$noise = with(dfLong, as.numeric(gsub("noise", "", noise)))
This converts your data to:
mu noise meas
1 1 0 1.0000000
2 2 0 2.0000000
3 3 0 3.0000000
...
As per ggplot documentation:
ggplot2 can not draw true 3D surfaces, but you can use geom_contour(), geom_contour_filled(), and geom_tile() to visualise 3D surfaces in 2D.
So, for example:
ggplot(dfLong,
aes(x = noise
y = mu,
fill = meas)) +
geom_tile() +
scale_fill_gradientn(colours = terrain.colors(10))
Produces:
I have 2-dimension data (from the lower part of a matrix):
m <- data.frame(x=c(1,1,1,2,2,3),y=c(1,2,3,1,2,1))
# x y
#1 1 1
#2 1 2
#3 1 3
#4 2 1
#5 2 2
#6 3 1
If I plot this, it gives something like this:
x
x x
x x x
So, I have the x and y axis. However, I'd like to plot this data in a ternary plot, like this:
x
x x
x x x
I need the z axis. It's the same data, but with another axis.
I don't think what you want is a ternary plot, though I am not at all sure why you are looking to move the data in this manner if you don't have a z-value. Take a look at this description of ternary plots and note that they, by definition, have three variables (and your data only have 2) and they must always sum to a constant (unless you mean yours are missing the last variable, which makes each row sum to some constant that you have not defined in your question?).
If you are just looking to shift the x-values, you can center them for each value of y, though only if the y values are discrete as is the case here. This uses dplyr to do the modifications, and scales each set of x values to center around 0 (though it does not modify for standard deviation).
m %>%
group_by(y) %>%
mutate(newX = as.numeric(scale(x, scale = FALSE)) ) %>%
ggplot(aes(x = newX, y = y)) +
geom_point()
Gives:
I'm not sure what information you are getting from doing this, as you lose all ability to compare back to the original x scale this way. Unless you add color = factor(x) to the mapping, like this:
If this is not what you are trying to do (and I rather hope that it is not), please update your question to clarify the output that you are expecting.
On the off chance that what you meant was that there was a z column missing which caused each row to sum to a particular constant, here is an example using ggtern to plot that, with the assumption that each row sums to 10 units:
m %>%
mutate(z = 10 - (x + y)) %>%
ggtern(aes(x, y, z)) +
geom_point()
So I have a data frame which I will call R. Looks something like this:
zep SEX AGE BMI
1 O F 3.416667 16.00000
2 O F 3.833333 14.87937
3 O G 3.416667 14.80223
4 O F 4.000000 15.09656
5 N G 3.666667 16.50000
6 O G 4.000000 16.49102
7 N G 3.916667 16.02413
With this data frame I want to plot multiple histograms comparing different aspects like how gender effects BMI. Like so:
par(mfrow=c(1,3)
boxplot(DF$BMI ~ DF$ZEP)
boxplot(DF$BMI ~ DF$GENDER)
boxplot(DF$BMI ~ ~ DF$AGE)
But for some reason the columns are made to be in characters instead of factors.
Now I pose this, is there a way to plot these if they are characters? If not,what can I do?
Also is there a way maybe to change zep and sex into a vector of logical factors? Maybe like in zep if O then true (1) if not then false (0), and the same thing for SEX. If G then true (1) if not then false (0).
I have to plot categorical variables for me advanced data analysis class. I can help you out. beedstands for border entry and employment data, don't steal my research plz.
The code I use to create factors is for example: (I have a column called portname that is dummy variables in a column, to create a column with factor variables (the names) This is how I would make the logical you describe. I've added that code with the larger code chunk below.
beed$portdisc <- as.numeric(beed$portname)
beed$portdisc[beed$portdisc==0] <- "Columbus Port of Entry"
beed$portdisc[beed$portdisc==1] <- "Santa Teresa Port of Entry"
beed$portdisc[beed$portdisc==2] <- "New Mexico All Ports Aggregate"
So what I've done here is taken by dataframe beed and used the specific column containing my portname variables. I add a new column to my dataframe called beed$portdisc then using the [ ] I define what I want to label as what.
In your case I think this should work (think, but I've tested by using the data you provided).
I have a hard time making the labels come out right with discrete variables. My apologies but this gets you very close.
library(ggplot2)
DF$SEX.factor <- as.character(DF$SEX)
DF$SEX.factor[DF$SEX.factor== "G"] <- "0"
DF$SEX.factor[DF$SEX.factor== "F"] <- "1"
DF$SEX.factor <- as.factor(DF$SEX.factor)
bar <- ggplot()
bar <- bar + geom_bar(data = DF$Sex.factor, aes(x=DF$SEX.factor),binwidth = .5)+ xlab("Sex")
bar <- bar + scale_x_discrete(limits = c(0,1,2), breaks= c(0,1,2), labels = c(" ","Male" ,"Female"))
bar
# DF.BMI5 = cut(DF$BMI,pretty(DF$BMI,5)) # Creates close to 5 integer ranges as factors, actomatically chooses pretty scales.
# This would be good to compair say age and BMI, best with one discreate and one continious variable
p <- ggplot(DF, aes(x = SEX.factor, y = BMI))
p <- p + geom_boxplot(width = 0.25, alpha = 0.4)
p <- p + geom_jitter(position = position_jitter(width = 0.1), alpha = .35, color = "blue")
# diamond at mean for each group
p <- p + stat_summary(fun.y = mean, geom = "point", shape = 18, size = 6,
colour = "red", alpha = 0.8)
p <- p + scale_x_discrete(limits = c(0,1,2), breaks= c(0,1,2), labels = c(" ","Male" ,"Female")) + xlab("Sex")
p
Here is what I got when I ran this code on my own data. I think this is what you're looking to create, I've included the code above. It'll work with anything where x is a discrete variable, just use the at.factor() and set y as type continuous. function/
If you need any more help just let me know, I like to help out people on here because it helps me hone my R skills. I'm more of an Visual Studio kind of guy, VBA is my friend.
Hope this helps!
If you ever need to change a character to a factor, you can always use as.factor('A'), for instance.
I have the following data
dati <- read.table(text="
class num
1 0.0 63530
2 2.5 27061
3 3.5 29938
4 4.5 33076
5 5.6 45759
6 6.5 72794
7 8.0 153177
8 10.8 362124
9 13.5 551051
10 15.5 198634
")
And I want to produce a histogram with variable size bins, so that the area of each bar reflects the total numerosity (num) of each bin. I tried
bins <- c(0,4,8,11,16)
p <- ggplot(dati) +
geom_histogram(aes(x=class,weight=num),breaks = bins)
however, this produces a histogram where the length of each bar is equal to total numerosity of each bin. Because bin widths are variable, areas are not proportional to numerosity.
I could not solve this apparently easy problem within ggplot2. Can anyone help me?
I think you are looking for a density plot - this closely related question has most of the answer. You call y = ..density.. in geom_histogram().
This works because stat_bin (recall geom_histogram() is geom_bar() + stat_bin(), and stat_bin() constructs a data frame with columns count and density. Thus calling y = ..density.. pulls the right column for density, whereas the default (counts) is as if you call y = ..count...
##OP's code
ggplot(dati) + geom_histogram(aes(x=class, weight=num),
breaks = bins)
##new code (density plot)
ggplot(dati) + geom_histogram( aes(x=class,y = ..density.., weight=num),
breaks = bins, position = "identity")
You can find some further examples in the online ggplot2 help page for geom_histogram().
It sounds to me like you asking for how to produce variable sized bar widths. If so, you just need to call the 'width' parameter in your ggplot aesthetics like this:
ggplot(data, aes(x = x, y = y, width = num))
this method is discussed more in the following question:
Variable width bars in ggplot2 barplot in R