My dataset is formed by 4 columns, as shown below:
The two columns on the left represent the coordinates XY of a geographical structure, and the two on the left represent the size of "each" geographical unit (diameters North-South and East-West)
I would like to graphically represent a scatterplot where to plot all the coordinates and draw over each point an ellipse including the diameters of each geographical unit.
Manually, and using only two points, the image should be like this one:
How can I do it using ggplot2?
You can download the data here
Use geom_ellipse() from ggforce:
library(ggplot2)
library(ggforce)
d <- data.frame(
x = c(10, 20),
y = c(10, 20),
ns = c(5, 8),
ew = c(4, 4)
)
ggplot(d, aes(x0 = x, y0 = y, a = ew/2, b = ns/2, angle = 0)) +
geom_ellipse() +
coord_fixed()
Created on 2019-06-01 by the reprex package (v0.2.1)
I'm not adding any new code to what Claus Wilke already posted above. All credit should go to Claus. I'm simply testing it with the actual data, and showing OP how to post data,
Loading packages needed
# install.packages(c("tidyverse"), dependencies = TRUE)
library(tidyverse)
Reading data,
tbl <- read.table(
text = "
X Y Diameter_N_S Diameter_E_W
-4275 1145 77 96
-4855 1330 30 25
-4850 1612 45 90
-4990 1410 15 15
-5055 1230 60 50
-5065 1503 43 45
-5135 1305 40 50
-5505 1190 55 70
-5705 1430 90 40
-5645 1535 52 60
", header = TRUE, stringsAsFactors = FALSE) %>% as_tibble()
showing data,
tbl
#> # A tibble: 10 x 4
#> X Y Diameter_N_S Diameter_E_W
#> <int> <int> <int> <int>
#> 1 -4275 1145 77 96
#> 2 -4855 1330 30 25
#> 3 -4850 1612 45 90
#> 4 -4990 1410 15 15
#> 5 -5055 1230 60 50
#> 6 -5065 1503 43 45
#> 7 -5135 1305 40 50
#> 8 -5505 1190 55 70
#> 9 -5705 1430 90 40
#> 10 -5645 1535 52 60
loading more packages needed
library(ggforce) # devtools::install_github("thomasp85/ggforce")
executing
ggplot(tbl, aes(x0 = X, y0 = Y, a = Diameter_E_W, b = Diameter_N_S, angle = 0)) +
geom_ellipsis() + geom_point(aes(X, Y), size = .5) + coord_fixed() + theme_bw()
Related
I am not expert in R and trying my best. I appreciate to have some assistance.
I have data as follows:
POPs: num[1:3000] 3,4,5,6,7,....
PM1: num[1:3000] 3,4,5,6,7,....
PM2: num[1:3000] 3,4,5,6,7,....
PM3: num[1:3000] 3,4,5,6,7,....
PM4: num[1:3000] 3,4,5,6,7,....
.. etc
I want to do regression analysis for each PMs (PM1, PM2, PM3, ..) and put them into one figure (as in the picture) . Also, adding into them the R2 , RMSE, MAE and the regression abline and 1:1 line.
The x is POPs and the y is PM1 and PM2 and PM3 ... etc.
I can do for each PMs (y-axis) individually in the code (aes(x=POPs, y=PM1)). However, it takes lot of figures and better to combine them in one figure. How I can add all the PMs into a single (y) in the code. I think some advance in looping which I am not into this level unfortunately.
ggplot(data =Plot,aes(x=POPs, y=PM1)) +
stat_smooth(method = "lm", se=FALSE, color="black", formula = y ~ x) +
geom_point(size=0.3) +
stat_cor(aes(label = paste(..rr.label..)), # adds R^2 value
r.accuracy = 0.01,
label.x = 0, label.y = 375, size = 4) +
stat_regline_equation(aes(label = ..eq.label..), # adds equation to linear regression
label.x = 0, label.y = 400, size = 4)
Based on Behnam Hedayat Answer below with some coding modification from my side and from Allan Cameron .. I can say now it worked 100% perfectly
# change format of df to longer
Plot %>% pivot_longer(cols=starts_with("PEM"), names_to = "PEMs", values_to = "PEMs_value") -> df2
df2 %>% ggplot(aes(POPs, PEMs_value)) +
geom_point(color = "#fe4300", size=0.3) +
geom_abline()+
geom_smooth(method='lm', se=FALSE, formula = y ~ x, color = "#1b14fd")+
labs(y = expression(bold(PLF~PM["2.5"]~("u"*g/m^"3"))), x = expression(bold(POPS~PM["2.5"]~("u"*g/m^"3")))) +
stat_cor(aes(label = paste(..rr.label..)), # adds R^2 value
r.accuracy = 0.01,
label.x = 0, label.y = 110, size = 3) +
stat_regline_equation(aes(label = ..eq.label..), # adds equation to linear regression
label.x = 0, label.y = 100, size = 3) +
facet_wrap(~PEMs, ncol=5)
You can use facet_wrap function of ggplot2, but first you have to reshape your dataset to longer format by pivot_longer() function of tidyverse.
To add regression metrics on plots, you can create a separate data frame containing metrics of each group of PMs variable, then use this data frame in geom_text function with x and y column created for x and y position respectively.
Here I also used caret package functions (R2, RMSE, MAE) to calculate regression metrics.
# caret for calculating R2, MAE and RMSE
# tidyverse to reshape data to longer format
libs <- c("ggplot2", "tidyverse","caret")
suppressMessages(invisible(sapply(libs, library, character.only=T)))
# sample dataset
df <- data.frame(POPs = sample(1:100, 100),
PM1 = sample(1:100, 100),
PM2 = sample(1:100, 100),
PM3 = sample(1:100, 100),
PM4 = sample(1:100,100),
PM5 = sample(1:100,100),
PM6 = sample(1:100,100),
PM7 = sample(1:100,100),
PM8 = sample(1:100,100))
# change format of df to longer
df %>% pivot_longer(cols=starts_with("PM"),
names_to = "PMs", values_to = "PMs_value") -> df2
head(df2, 10)
#> # A tibble: 10 × 3
#> POPs PMs PMs_value
#> <int> <chr> <int>
#> 1 5 PM1 88
#> 2 5 PM2 21
#> 3 5 PM3 51
#> 4 5 PM4 40
#> 5 5 PM5 40
#> 6 5 PM6 2
#> 7 5 PM7 30
#> 8 5 PM8 70
#> 9 52 PM1 13
#> 10 52 PM2 90
# create a dataframe of summary of regression metrics
summary_df <- df2 %>%
group_by(PMs) %>%
summarise(R2 = R2(PMs_value, POPs),
RMSE=RMSE(PMs_value, POPs),
MAE=MAE(PMs_value, POPs)) %>%
mutate_if(is.numeric, round,digits=2) %>%
pivot_longer(cols = -PMs, names_to = "Metric", values_to = "Metric_value") %>%
# add x column for x position of text and y column for y position
mutate(x = rep(30, times =nrow(.)),
y = rep(c(90,80,70), times=nrow(.)/3)) %>%
unite("Metric", Metric:Metric_value, sep = " = ")
summary_df
#> # A tibble: 24 × 4
#> PMs Metric x y
#> <chr> <chr> <dbl> <dbl>
#> 1 PM1 R2 = 0.03 30 90
#> 2 PM1 RMSE = 43.95 30 80
#> 3 PM1 MAE = 36.72 30 70
#> 4 PM2 R2 = 0.02 30 90
#> 5 PM2 RMSE = 37.83 30 80
#> 6 PM2 MAE = 29.76 30 70
#> 7 PM3 R2 = 0.02 30 90
#> 8 PM3 RMSE = 43.69 30 80
#> 9 PM3 MAE = 36.88 30 70
#> 10 PM4 R2 = 0.01 30 90
#> # … with 14 more rows
df2 %>% ggplot(aes(POPs, PMs_value)) +
geom_point(size=0.3) +geom_abline()+
geom_smooth(method='lm', se=FALSE)+
facet_wrap(~PMs, ncol=4)+
geom_text(data = summary_df,
mapping = aes(x = x, y = y, label = Metric))
#> `geom_smooth()` using formula = 'y ~ x'
Created on 2023-02-12 with reprex v2.0.2
You first need to get your data into the correct format - that is, to pivot it into long format, such that the PM column names are in a single column, and the values are in their own column too. Then you can use the names column as a faceting variable in ggplot:
library(tidyverse)
Plot %>%
pivot_longer(-POPs) %>%
ggplot(aes(POPs, value)) +
geom_abline() +
geom_point(color = "#fe4300", alpha = 0.5) +
geom_smooth(method = "lm", se = FALSE, formula = y ~ x, color = "#fd1b14") +
coord_cartesian(xlim = c(0, 100), ylim = c(0, 100)) +
facet_wrap(.~name, nrow = 5, scales = "free") +
theme_classic() +
theme(strip.background = element_blank(),
panel.border = element_rect(fill = NA))
Data used
Obviously we don't have your data (unless we were to transcribe the picture of your data or include the output of dput(Plot) in your question, so I have constructed a dummy data set with the same names and structure as your own:
set.seed(1)
Plot <- setNames(as.data.frame(cbind(1:115,
replicate(17, sample(100, 115, TRUE)))),
c("POPs", paste0("PM", 1:17)))
str(Plot)
#> 'data.frame': 115 obs. of 18 variables:
#> $ POPs: int 1 2 3 4 5 6 7 8 9 10 ...
#> $ PM1 : int 68 39 1 34 87 43 14 82 59 51 ...
#> $ PM2 : int 1 29 78 22 70 28 37 61 46 67 ...
#> $ PM3 : int 99 77 57 71 25 31 37 92 28 62 ...
#> $ PM4 : int 60 65 64 53 5 44 35 23 29 35 ...
#> $ PM5 : int 48 7 27 43 9 8 86 45 6 27 ...
#> $ PM6 : int 65 2 9 49 69 91 93 66 31 78 ...
#> $ PM7 : int 50 89 8 54 31 69 12 30 9 66 ...
#> $ PM8 : int 21 7 99 42 33 94 5 5 4 11 ...
#> $ PM9 : int 22 56 58 55 99 96 5 52 47 55 ...
#> $ PM10: int 84 84 55 98 73 47 13 5 63 3 ...
#> $ PM11: int 41 83 91 7 78 32 49 14 92 84 ...
#> $ PM12: int 16 39 37 15 24 97 56 62 69 100 ...
#> $ PM13: int 94 69 53 37 70 57 50 51 18 29 ...
#> $ PM14: int 79 40 11 67 25 54 21 34 59 46 ...
#> $ PM15: int 5 89 74 34 47 85 29 24 46 98 ...
#> $ PM16: int 44 22 57 63 7 95 46 66 4 92 ...
#> $ PM17: int 38 57 48 75 8 28 21 2 84 95 ...
Created on 2023-02-11 with reprex v2.0.2
I have four dataframes that look like below:
X score.i score.ii score.iii mm
1: 1 -0.3958555 -0.3750726 -0.3378881 10
2: 2 -0.3954955 -0.3799290 -0.3400876 15
3: 3 -0.3962514 -0.3776692 -0.3401180 20
4: 4 -0.4033265 -0.3764099 -0.3436115 25
5: 5 -0.4035860 -0.3753792 -0.3426287 30
---
186: 186 -0.4041035 -0.3767158 -0.3419871 80
187: 187 -0.4040643 -0.3767881 -0.3417620 85
188: 188 -0.4052228 -0.3766468 -0.3436883 90
189: 189 -0.4047009 -0.3767359 -0.3431591 95
190: 190 -0.4061497 -0.3766785 -0.3433624 100
How can I plot a circular line graph with aes(x=mm, y=score.i) for these four such that there is a gap between the lines for each dataframe?
library(ggplot2)
library(dplyr)
library(tidyr)
df1 %>%
pivot_longer(-c(X, mm), names_to = "Variable", values_to = "Score") %>%
ggplot(., aes(x = mm, y = Score, color = Variable)) +
geom_line() +
coord_polar()
Data:
read.table(text =
"X score.i score.ii score.iii mm
1 -0.3958555 -0.3750726 -0.3378881 10
2 -0.3954955 -0.3799290 -0.3400876 15
3 -0.3962514 -0.3776692 -0.3401180 20
4 -0.4033265 -0.3764099 -0.3436115 25
5 -0.4035860 -0.3753792 -0.3426287 30
186 -0.4041035 -0.3767158 -0.3419871 80
187 -0.4040643 -0.3767881 -0.3417620 85
188 -0.4052228 -0.3766468 -0.3436883 90
189 -0.4047009 -0.3767359 -0.3431591 95
190 -0.4061497 -0.3766785 -0.3433624 100",
header = T, stringsAsFactors = F) -> df1
I am having trouble extracting pixel intensity values from images of microtiter plates in R. I have used EBImage to threshold and segment an image, but when I do this I lose the actual intensity values from the original image.
Starting with a .png image like this:
I need to identify each individual well and calculate the average intensity within each (they are leaf discs in the well plate). Thus I would want to have 81 values from this image.
Next, I need to extract those values in a matrix that I can use to perform operations from separate images of the same plate. So the segmentation needs to be re-usable so I can just read in the other images of this same plate and extract the respective well values. The images are all the exact same size, and the location of wells does not change. There are hundreds of images of this plate taken over several hours.
So far I've segmented and thresholded, but this causes loss of the original image intensities.
Here are the attributes of the original image posted above:
print(fo)
Image
colorMode : Color
storage.mode : double
dim : 696 520 3
frames.total : 3
frames.render: 1
imageData(object)[1:5,1:6,1]
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0 0 0 0 0 0
[2,] 0 0 0 0 0 0
[3,] 0 0 0 0 0 0
[4,] 0 0 0 0 0 0
[5,] 0 0 0 0 0 0
###progress so far
library(tidyverse)
library("EBImage")
#read in image
fo <- readImage("image.png")
#crop excess
fo <- fo[99:589,79:437,1:3]
#adaptive thresholding
threshold <- thresh(fo,w=25,h=25,offset=0.01)
#use bwlabel to segment thresholded image
fo_lab <- bwlabel(threshold[,,2])
nmask = watershed(distmap(threshold), 10 )
display(colorLabels(nmask), all=TRUE)
Which would leave me with this image:
The values I would get in fo_lab are based on the thresholded intensity for each region, so they don't effectively capture the true difference in intensity between wells. More importantly, I need to carry those values to use in mathematical operations on the exact same extracted areas from proceeding images.
Any thoughts on how to do this?
Thank you.
This is tricky. Let's start by reproducing your data by reading the image straight from this Stack Overflow page:
library(tidyverse)
library("EBImage")
fo <- readImage("https://i.stack.imgur.com/MFkmD.png")
#crop excess
fo <- fo[99:589,79:437,1:3]
#adaptive thresholding
threshold <- thresh(fo, w = 25, h = 25, offset = 0.01)
#use bwlabel to segment thresholded image
fo_lab <- bwlabel(threshold[,,2])
Now, the key to this is realising that fo_lab contains an array of pixels which are labelled according to the group (i.e. the well) they are in. There are also a few stray pixels which have been assigned to their own groups, so we remove anything with fewer than a hundred pixels by writing 0s into fo_lab at these locations:
fo_table <- table(fo_lab)
fo_lab[fo_lab %in% as.numeric(names(fo_table)[fo_table < 100])] <- 0
Now we have only the pixels that are on a well labelled with anything other than a zero, and we can ensure we have the correct number of wells:
fo_wells <- as.numeric(names(table(fo_lab)))[-1]
length(fo_wells)
#> [1] 81
So now we can create a data frame that records the position (the centroid) of each well:
df <- as.data.frame(computeFeatures.moment(fo_lab))
And we can add the average intensity of the pixels within each well on the original image to that data frame:
df$intensity <- sapply(fo_wells, function(x) mean(fo[fo_lab == x]))
So we have a data frame with the required results:
head(df)
#> m.cx m.cy m.majoraxis m.eccentricity m.theta intensity
#> 1 462.2866 17.76579 29.69468 0.3301601 -0.2989824 0.1229826
#> 2 372.9313 20.51608 29.70871 0.1563481 -1.0673974 0.2202901
#> 3 417.3410 19.64526 29.43567 0.2725219 0.4858422 0.1767944
#> 4 328.2435 21.87536 29.73790 0.1112710 -0.9316834 0.3010003
#> 5 283.9245 22.69954 29.17318 0.2366731 -1.4561670 0.5471162
#> 6 239.0390 24.15465 29.39590 0.1881874 0.6315008 0.3799093
So we have each plate recorded according to its x, y position and we have its average intensity. To prove this, let's plot the original image in ggplot and overlay the intensity values on each plate:
img_df <- reshape2::melt(as.matrix(as.raster(as.array(fo))))
ggplot(img_df, aes(Var1, Var2, fill = value)) +
geom_raster() +
scale_fill_identity() +
scale_y_reverse() +
geom_text(inherit.aes = FALSE, data = df, color = "red",
aes(x = m.cx, y = m.cy, label = round(intensity, 3))) +
coord_equal()
We can see that the whitest plates have the highest intensities and the darker plates have lower intensities.
In terms of making sure successive plates are comparable, note that the output of computeFeatures.moment(fo_lab) will always produce the labelling in the same order:
ggplot(img_df, aes(Var1, Var2, fill = value)) +
geom_raster() +
scale_fill_identity() +
scale_y_reverse() +
geom_text(inherit.aes = FALSE, data = df, color = "red",
aes(x = m.cx, y = m.cy, label = seq_along(m.cx))) +
coord_equal()
So you can use this to identify wells in subsequent plates.
Putting this all together, you can have a function that takes the image and spits out the intensities of each well, like this:
well_intensities <- function(img) {
fo <- readImage(img)[99:589,79:437,1:3]
fo_lab <- bwlabel(thresh(fo, w = 25, h = 25, offset = 0.01)[,,2])
fo_table <- table(fo_lab)
fo_lab[fo_lab %in% as.numeric(names(fo_table)[fo_table < 100])] <- 0
fo_wells <- as.numeric(names(table(fo_lab)))[-1]
data.frame(well = seq_along(fo_wells),
intensity = sapply(fo_wells, function(x) mean(fo[fo_lab == x])))
}
Which allows you to do:
well_intensities("https://i.stack.imgur.com/MFkmD.png")
#> well intensity
#> 1 1 0.1229826
#> 2 2 0.2202901
#> 3 3 0.1767944
#> 4 4 0.3010003
#> 5 5 0.5471162
#> 6 6 0.3799093
#> 7 7 0.2266809
#> 8 8 0.2691313
#> 9 9 0.1973300
#> 10 10 0.1219945
#> 11 11 0.1041047
#> 12 12 0.1858798
#> 13 13 0.1853668
#> 14 14 0.3065456
#> 15 15 0.4998599
#> 16 16 0.4173711
#> 17 17 0.3521405
#> 18 18 0.4614704
#> 19 19 0.2955793
#> 20 20 0.2511733
#> 21 21 0.1841083
#> 22 22 0.2669468
#> 23 23 0.3062121
#> 24 24 0.5471972
#> 25 25 0.7279144
#> 26 26 0.4425966
#> 27 27 0.4174344
#> 28 28 0.5155241
#> 29 29 0.5298436
#> 30 30 0.2440677
#> 31 31 0.2971507
#> 32 32 0.1490848
#> 33 33 0.2785301
#> 34 34 0.4392502
#> 35 35 0.4466012
#> 36 36 0.4020305
#> 37 37 0.4516624
#> 38 38 0.3949014
#> 39 39 0.4749804
#> 40 40 0.3820500
#> 41 41 0.2409199
#> 42 42 0.1769995
#> 43 43 0.4764645
#> 44 44 0.3035113
#> 45 45 0.3331184
#> 46 46 0.4859249
#> 47 47 0.8278420
#> 48 48 0.5102533
#> 49 49 0.5754179
#> 50 50 0.4044553
#> 51 51 0.2949486
#> 52 52 0.2020463
#> 53 53 0.3663714
#> 54 54 0.5853405
#> 55 55 0.4011272
#> 56 56 0.8564808
#> 57 57 0.5154415
#> 58 58 0.5178042
#> 59 59 0.5585773
#> 60 60 0.5070020
#> 61 61 0.2637470
#> 62 62 0.2379200
#> 63 63 0.2463080
#> 64 64 0.3840690
#> 65 65 0.3139230
#> 66 66 0.5157990
#> 67 67 0.3606038
#> 68 68 0.3066231
#> 69 69 0.4538155
#> 70 70 0.2935641
#> 71 71 0.1639805
#> 72 72 0.1892272
#> 73 73 0.2618652
#> 74 74 0.3513564
#> 75 75 0.4484937
#> 76 76 0.5032775
#> 77 77 0.3014721
#> 78 78 0.3475152
#> 79 79 0.2001712
#> 80 80 0.2873561
#> 81 81 0.1462936
I have a dataframe df where I need to see the comparison of the trend between weeks
df
Col Mon Tue Wed
1 47 164 163
2 110 168 5
3 31 146 109
4 72 140 170
5 129 185 37
6 41 77 96
7 85 26 41
8 123 15 188
9 14 23 163
10 152 116 82
11 118 101 5
Right now I can only plot 2 variables like below. But I need to see for Tuesday and Wednesday as well
ggplot(data=df,aes(x=Col,y=Mon))+geom_line()
You can either add a
geom_line(aes(x = Col, y = Mon), col = 1)
for each day, or you would need to restructure your data frame using a function like gather so your new columns are col, day, value. Without reformatting the data, your result would be
ggplot(data=df)+geom_line(aes(x=Col,y=Mon), col = 1) + geom_line(aes(x=Col,y=Tue), col = 2) + geom_line(aes(x=Col,y=Wed), col = 3)
with a restructure it would be
ggplot(data=df)+geom_line(aes(x=Col,y=Val, col = Day))
The standard way would be to get the data in long format and then plot
library(tidyverse)
df %>%
gather(key, value, -Col) %>%
ggplot() + aes(factor(Col), value, col = key, group = key) + geom_line()
I have a data.frame similar to this example
SqMt <- "Sex Sq..Meters PDXTotalFreqStpy
1 M 129 22
2 M 129 0
3 M 129 1
4 F 129 35
5 F 129 42
6 F 129 5
7 M 557 20
8 M 557 0
9 M 557 15
10 F 557 39
11 F 557 0
12 F 557 0
13 M 1208 33
14 M 1208 26
15 M 1208 3
16 F 1208 7
17 F 1208 0
18 F 1208 8
19 M 604 68
20 M 604 0
21 M 604 0
22 F 604 0
23 F 604 0
24 F 604 0"
Data <- read.table(text=SqMt, header = TRUE)
I want to show the average PDXTotalFreqStpy for each Sq..Meters organized by Sex. This is what I use:
library(ggplot2)
ggplot(Data, aes(x=Sq..Meters, y=PDXTotalFreqStpy)) + stat_summary(fun.y="mean", geom="line", aes(group=Sex,color=Sex))
How do I get these lines smoothed out so that they are not jagged and instead, nice and curvy and go through all the data points? I have seen things on spline, but I have not gotten those to work?
See if this works for you:
library(dplyr)
# increase n if the result is not smooth enough
# (for this example, n = 50 looks sufficient to me)
n = 50
# manipulate data to calculate the mean for each sex at each x-value
# before passing the result to ggplot()
Data %>%
group_by(Sex, x = Sq..Meters) %>%
summarise(y = mean(PDXTotalFreqStpy)) %>%
ungroup() %>%
ggplot(aes(x, y, color = Sex)) +
# optional: show point locations for reference
geom_point() +
# optional: show original lines for reference
geom_line(linetype = "dashed", alpha = 0.5) +
# further data manipulation to calculate values for smoothed spline
geom_line(data = . %>%
group_by(Sex) %>%
summarise(x1 = list(spline(x, y, n)[["x"]]),
y1 = list(spline(x, y, n)[["y"]])) %>%
tidyr::unnest(),
aes(x = x1, y = y1))