Related
I'm struggling with ggplot2 and I've been looking for a solution online for several hours. Maybe one of you can give me a help? I have a data set that looks like this (several 100's of observations):
Y-AXIS
X-AXIS
SUBJECT
2.2796598
F1
1
0.9118639
F1
2
2.7111228
F3
3
2.7111228
F2
4
2.2796598
F4
5
2.3876401
F10
6
....
...
...
The X-AXIS is a continuous value larger than 0 (the upper limit can vary from data set to data set, but is typically < 100). Y-AXIS is a categorical variable with 10 levels. SUBJECT refers to an individual and, across the entire data set, each individual has exactly 10 observations, exactly 1 for each level of the categorical variable.
To generate a box plot, I used ggplot like this:
plot1 <- ggplot(longdata,
aes(x = X_axis, y = Y_axis)) +
geom_boxplot() +
ylim(0, 12.5) +
stat_summary(fun = "mean", geom = "point", shape = 2, size = 3, color = "purple")
That results in the boxplot I have in mind. You can check out the result here if you like: boxplot
So far so good. What I want to do next, and hopefully someone can help me, is this: for one specific SUBJECT, I want to plot a line for their 10 scores in the same figure. So on top of the boxplot. An example of what I have in mind can be found here: boxplot with data of one subject as a line. In this case, I simply assumed that the outliers belong to the same case. This is just an assumption. The data of an individual case can also look like this: boxplot with data of a second subject as a line
Additional tips on how to customize that line (colour, thikness, etc.) would also be appreciated. Many thanks!
library(ggplot2)
It is always a good idea to add a reproducible example of your data,
you can always simulate what you need
set.seed(123)
simulated_data <- data.frame(
subject = rep(1:10, each = 10),
xaxis = rep(paste0('F', 1:10), times = 10),
yaxis = runif(100, 0, 100)
)
In ggplot each geom can take a data argument, for your line just use
a subset of your original data, limited to the subject desired.
Colors and other visula elements for the line are simple, take a look here
ggplot() +
geom_boxplot(data = simulated_data, aes(xaxis, yaxis)) +
geom_line(
data = simulated_data[simulated_data$subject == 1,],
aes(xaxis, yaxis),
color = 'red',
linetype = 2,
size = 1,
group = 1
)
Created on 2022-10-14 with reprex v2.0.2
library(ggplot2)
library(dplyr)
# Simulate some data absent a reproducible example
testData <- data.frame(
y = runif(300,0,100),
x = as.factor(paste0("F",rep(1:10,times=30))),
SUBJECT = as.factor(rep(1:30, each = 10))
)
# Copy your plot with my own data + ylimits
plot1 <- ggplot(testData,
aes(x = x, y = y)) +
geom_boxplot() +
ylim(0, 100) +
stat_summary(fun = "mean", geom = "point", shape = 2, size = 3, color = "purple")
# add the geom_line for subject 1
plot1 +
geom_line(data = filter(testData, SUBJECT == 1),
mapping = aes(x=x, y=y, group = SUBJECT))
My answer is very similar to Johan Rosa's but his doesn't use additional packages and makes the aesthetic options for the geom_line much more apparent - I'd follow his example if I were you!
I am plotting various plots in the shiny app that I have developed, the raw dataset that I have, have all the data points in meters, for eg. one of my raw data set looks like this:
df <- data.frame(X = c(0.000000000000,4.99961330240E-005,9.99922660480E-005,0.000149988399072,0.00019998453209, 0.000249980665120,0.000299976798144,0.000349972931168,0.000399969064192,0.000449965197216,0.000499961330240,0.000549957463264,0.000599953596288,0.000649949729312,0.000699945862336,0.000749941995360,0.000799938128384,0.000849934261408,0.000899930394432,0.000949926527456,0.000999922660480,0.00104991879350,0.00109991492653,0.00114991105955),
Y = c(0.00120303964354,0.00119632557146,0.00119907223731,0.00120059816279,0.00119785149693,0.00119876705222,0.00119327372051,0.00118900112918,0.00118930631428,0.00119174779504,0.00119113742485,0.00119541001617,0.00119815668203,0.00119052705466,0.00119205298013,0.00118930631428,0.00119174779504,0.00119388409070,0.00118778038881,0.00122287667470,0.00122684408094,0.00122623371075,0.00122867519150,0.00122379222999))
My attempt to plot:
g <- ggplot(data = df) + theme_bw() +
geom_point(aes_string(x= df[,1], y= df[,2]), colour= "red", size = 0.1)
ggplotly(g)
And the plot looks like this:
What I want:
The data that I have in the datafile is in meters, but on the plot, I need Y-axis data to be shown in Micrometer and X-axis data to be shown in Millimeter. And the dataframe that I have illustrated above is just a small part of my actual dataframe. In the actual dataframe, data is very big.
Is there any way we can do this automatically without having the user to change the units manually?
In the end, I want 'Y' values to be multiplied by 10^6 and 'X' value to be multiplied by 10^3 in order to convert them into micrometers and millimeters respectively.
I got two possible answers to my question:
1st is:
g <- ggplot(data = df) + theme_bw() +
geom_point(aes_string(x= df[,1]*10^3, y= df[,2]*10^6), colour= "red", size = 0.1)
ggplotly(g)
2nd is:
M <- data.frame(x= df[,1]* 10^3, y= df[,2]* 10^6)
g <- ggplot(data = M) + theme_bw() +
geom_point(aes_string(x= M[,1], y= M[,2]), colour= "red", size = 0.1)
ggplotly(g)
Hi stack overflow community,
I hope the two interrelated questions I am asking are not too nooby. I tried several google searches but could not find a solution.
I use R to plot the findings of a linguistic "experiment", in which I checked in how far two grammatical constructions yield acceptable descriptions of an event, depending on how for it unfolds. My data look like similar to this:
event,PFV.alone,PFV.and.PART
0.01,0,1
0.01,0,1
0.05,0,1
0.05,0,1
0.05,0,1
0.1,0,1
0.1,0,1
0.25,0,1
0.25,0,1
0.25,0,1
0.3,0,1
0.3,0,1
0.33,0,1
0.33,0,1
0.33,0,1
0.33,0,1
....
0.67,1,0.5
0.75,1,0.5
0.75,1,0
0.75,1,0
0.75,1,0
0.8,1,1
0.8,1,0
0.8,1,0
0.8,1,0
0.85,1,1
0.85,1,0
0.9,1,0
0.9,1,0
0.9,1,0
0.95,1,0
As you can see, for each of the two constructions there are "plateaus" where acceptability is 0 or 1 and then there's a "transitional" area. In order to illustrate the "plateaus" I use geom_segment and to create a smooth "transition" for the scattered data in between, I use geom_smooth. Here's my code:
#after loading datafile into "Daten":
p <- ggplot(data = Daten,
aes(x=event, y=PFV.and.PART, xmin=0, ymin=0, xmax=1, ymax=1))
p + geom_blank() +
coord_fixed()+
xlab("Progress of the event") +
ylab("Acceptability") +
geom_segment(x=0, xend=1, y=0.5,yend=0.5, linetype="dotted") +
geom_smooth(data=(subset(Daten, event==0.33 | event ==0.9)),
aes(color="chocolate"),
method="loess", fullrange=FALSE, level=0.95, se=FALSE) +
geom_segment(x=0,xend=0.33,y=1,yend=1, color="chocolate", size=1) +
geom_segment(x=0.9,xend=1,y=0,yend=0, color="chocolate", size=1) +
geom_smooth(data=(subset(Daten, event==0.33 | event==0.67)),
aes(x = event, y = PFV.alone, color="cyan4"),
method="lm",fullrange=FALSE, level=0.95, se=FALSE) +
geom_segment(color="cyan4",x=0,xend=0.33,y=0,yend=0,size=1) +
geom_segment(color="cyan4", x=0.67,xend=1,y=1,yend=1, size=1) +
scale_x_continuous(labels = scales::percent) +
scale_y_continuous(breaks = c (0,0.5,1), labels = scales::percent)+
labs(color='Construction')+
scale_color_manual(labels = c("PFV + PART", "PFV alone"),
values = c("chocolate", "cyan4")) +
theme(legend.position=c(0.05, 0.8),
legend.justification = c("left", "top"),
legend.background = element_rect(fill = "darkgray"))
This code produces a nice graph, but there's one calculation and one plot-related issue that I need help with.
First, and most importantly, I'd like to find out, at what point exactly the geom_smooth (loess) curve for "PFV.and.PART" drops down to 0.5, i.e. hits 50% acceptability. I fear that this might involve some quiet complex code?
Related to the preceding point, I'd like to mark area/line, where both curves are above 0.5 (50% acceptability), or to speak in terms of what I am trying to show: the percentages of the event at which both constructions yield a description that is at least 50% acceptable. This, of course would be based on point 1, as it is neceessary to determine the right limit, whereas the left limit does not constitute a problem as it seems to lie at x=0.5,y=0.5.
I'd really appreciate any help and I hope that I have provided all the necessary information. Please excuse me if this question has been addressed elsewhere.
Here's one approach, which involves fitting a loess model outside of ggplot
# Generate some data
set.seed(2019)
my_dat <- c(sample(c(1,0.5, 0),33, prob = c(0.85,0.15,0), replace = TRUE),
sample(c(1,0.5, 0),33, prob = c(0.1,0.7,0.1), replace = TRUE),
sample(c(1,0.5,0),34, prob = c(0,0.15,0.85), replace = TRUE))
df <- tibble(x = 1:100, y = my_dat)
# fit a loess model
m1 <- loess(y~x, data = df)
df <- df %>%
add_column(pred = predict(m1)) # predict using the loess model
# plot
df %>%
ggplot(aes(x,y))+
geom_point() +
geom_line(aes(y = pred))
# search for a value of x that gives a prediction of 0.5
f <- function(x) { 0.5 - predict(m1)[x]}
uniroot(f, interval = c(1, 100))
# $root
# [1] 53.99997
Sample data:
mydata="theta,rho,value
0,0.8400000,0.0000000
40,0.8400000,0.4938922
80,0.8400000,0.7581434
120,0.8400000,0.6675656
160,0.8400000,0.2616592
200,0.8400000,-0.2616592
240,0.8400000,-0.6675656
280,0.8400000,-0.7581434
320,0.8400000,-0.4938922
360,0.8400000,0.0000000
0,0.8577778,0.0000000
40,0.8577778,0.5152213
80,0.8577778,0.7908852
120,0.8577778,0.6963957
160,0.8577778,0.2729566
200,0.8577778,-0.2729566
240,0.8577778,-0.6963957
280,0.8577778,-0.7908852
320,0.8577778,-0.5152213
360,0.8577778,0.0000000
0,0.8755556,0.0000000
40,0.8755556,0.5367990
80,0.8755556,0.8240077
120,0.8755556,0.7255612
160,0.8755556,0.2843886
200,0.8755556,-0.2843886
240,0.8755556,-0.7255612
280,0.8755556,-0.8240077
320,0.8755556,-0.5367990
360,0.8755556,0.0000000
0,0.8933333,0.0000000
40,0.8933333,0.5588192
80,0.8933333,0.8578097
120,0.8933333,0.7553246
160,0.8933333,0.2960542
200,0.8933333,-0.2960542
240,0.8933333,-0.7553246
280,0.8933333,-0.8578097
320,0.8933333,-0.5588192
360,0.8933333,0.0000000
0,0.9111111,0.0000000
40,0.9111111,0.5812822
80,0.9111111,0.8922910
120,0.9111111,0.7856862
160,0.9111111,0.3079544
200,0.9111111,-0.3079544
240,0.9111111,-0.7856862
280,0.9111111,-0.8922910
320,0.9111111,-0.5812822
360,0.9111111,0.0000000
0,0.9288889,0.0000000
40,0.9288889,0.6041876
80,0.9288889,0.9274519
120,0.9288889,0.8166465
160,0.9288889,0.3200901
200,0.9288889,-0.3200901
240,0.9288889,-0.8166465
280,0.9288889,-0.9274519
320,0.9288889,-0.6041876
360,0.9288889,0.0000000
0,0.9466667,0.0000000
40,0.9466667,0.6275358
80,0.9466667,0.9632921
120,0.9466667,0.8482046
160,0.9466667,0.3324593
200,0.9466667,-0.3324593
240,0.9466667,-0.8482046
280,0.9466667,-0.9632921
320,0.9466667,-0.6275358
360,0.9466667,0.0000000
0,0.9644444,0.0000000
40,0.9644444,0.6512897
80,0.9644444,0.9997554
120,0.9644444,0.8803115
160,0.9644444,0.3450427
200,0.9644444,-0.3450427
240,0.9644444,-0.8803115
280,0.9644444,-0.9997554
320,0.9644444,-0.6512897
360,0.9644444,0.0000000
0,0.9822222,0.0000000
40,0.9822222,0.6751215
80,0.9822222,1.0363380
120,0.9822222,0.9125230
160,0.9822222,0.3576658
200,0.9822222,-0.3576658
240,0.9822222,-0.9125230
280,0.9822222,-1.0363380
320,0.9822222,-0.6751215
360,0.9822222,0.0000000
0,1.0000000,0.0000000
40,1.0000000,0.6989533
80,1.0000000,1.0729200
120,1.0000000,0.9447346
160,1.0000000,0.3702890
200,1.0000000,-0.3702890
240,1.0000000,-0.9447346
280,1.0000000,-1.0729200
320,1.0000000,-0.6989533
360,1.0000000,0.0000000"
read in a data frame:
foobar <- read.csv(text = mydata)
You can check (if you really want to!) that the data are periodic in the theta direction, i.e., for each given rho, the point at theta=0 and theta=360 are precisely the same. I would like to plot a nice polar surface plot, in other words an annulus colored according to value. I tried the following:
library(viridis) # just because I very much like viridis: if you don't want to install it, just comment this line and uncomment the scale_fill_distiller line
library(ggplot2)
p <- ggplot(data = foobar, aes(x = theta, y = rho, fill = value)) +
geom_tile() +
coord_polar(theta = "x") +
scale_x_continuous(breaks = seq(0, 360, by = 45), limits=c(0,360)) +
scale_y_continuous(limits = c(0, 1)) +
# scale_fill_distiller(palette = "Oranges")
scale_fill_viridis(option = "plasma")
I'm getting:
Yuck! Why the nasty hole in the annulus? If I generate a foobar data frame with more rows (more theta and rho values) the hole gets smaller. This isn't a viable solutione, both because computing data at more rho/theta values is costly and time-consuming, and both because even with 100x100=10^4 rows I still get a hole. Also, with a bigger dataframe, ggplot takes forever to render the plot: the combination of geom_tile and coord_polar is incredibly inefficient. Isn't there a way to get a nice-looking polar plot without unnecessarily wasting memory & CPU time?
Edit: all value of data for theta=360 were removed (repeat from the values of theta=0)
ggplot(data = foobar, aes(x = theta, y = rho, fill = value)) +
geom_tile() +
coord_polar(theta = "x",start=-pi/9) +
scale_y_continuous(limits = c(0, 1))+
scale_x_continuous(breaks = seq(0, 360, by = 45))
I just removed limits from scale_x_continuous
That gives me:
In the following, by selecting free_y, the maximum values of each scale adjust as expected, however, how can I get the minimum values to also adjust? at the moment, they both start at 0, when I really want the upper facet to start at about 99 and go to 100, and the lower facet to start at around 900 and go to 1000.
library(ggplot2)
n = 100
df = rbind(data.frame(x = 1:n,y = runif(n,min=99,max=100),variable="First"),
data.frame(x = 1:n,y = runif(n,min=900,max=1000),variable="Second"))
ggplot(data=df,aes(x,y,fill=variable)) +
geom_bar(stat='identity') +
facet_grid(variable~.,scales='free')
You could use geom_linerange rather than geom_bar. A general way to do this is to first find the min of y for each value of variable and then merge the minimums with the original data. Code would look like:
library(ggplot2)
min_y <- aggregate(y ~ variable, data=df, min)
sp <- ggplot(data=merge(df, min_y, by="variable", suffixes = c("","min")),
aes(x, colour=variable)) +
geom_linerange(aes(ymin=ymin, ymax=y), size=1.3) +
facet_grid(variable ~ .,scales='free')
plot(sp)
Plot looks like: