Dotplot with two categorical variables and two quantitative variables - r

I have a problem making a dotplot. I have a data frame "distribution_tab" with 4 columns and 6 rows. The two first columns are quantitative variables and the two other are categorical values :
read.length percentage.GC strand organism
1 203 63.0 forward bacteria
2 250 33.0 forward plant
3 205 72.0 reverse bacteria
4 240 36.0 reverse plant
5 210 33.5 forward plant
6 230 63.5 reverse bacteria
I want to make only one dotplot out of this data frame, with read.length in the x axis and percentage.GC in the y axis. The strand "forward" has to be represented with a dot and the strand reverse with a triangle (or with whatever two other different symbols). The organism "bacteria" has to be represented in pink and the organism "plant" in green.
So for instance, if one data is "forward and bacteria", it has to be represented with a pink dot in the dotplot, and if it is "reverse and plant" it has to be a green triangle.
I really don't know how to do this (or if it possible at all). For the moment I have made a dotplot with the two quantitative variables :
plot(distribution_tab$read_length ~ distribution_tab$percentage.GC)
I have no idea how to distinguish them in the plot according to their organism and strand values.

distribution_tab <- read.table(header = TRUE, text = "read.length percentage.GC strand organism
1 203 63.0 forward bacteria
2 250 33.0 forward plant
3 205 72.0 reverse bacteria
4 240 36.0 reverse plant
5 210 33.5 forward plant
6 230 63.5 reverse bacteria ")
plot(percentage.GC ~ read.length, data = distribution_tab,
pch = c(17,19)[(strand %in% 'forward') + 1L],
col = c('pink', 'green')[(organism %in% 'plant') + 1L])
or using ifelse but the above method is more flexible
plot(percentage.GC ~ read.length, data = distribution_tab,
pch = ifelse(strand %in% 'forward', 19, 17),
col = ifelse(organism %in% 'plant', 'green', 'pink'))

Using ggplot:
library(ggplot2)
df$col <- ifelse(df$organism == "bacteria", "pink", "green")
ggplot(df, aes(read.length, percentage.GC, shape = strand, col = col)) +
geom_point(size = 4) +
scale_color_identity()
Data:
#dummy data
df <- read.table(text=" read.length percentage.GC strand organism
1 203 63.0 forward bacteria
2 250 33.0 forward plant
3 205 72.0 reverse bacteria
4 240 36.0 reverse plant
5 210 33.5 forward plant
6 230 63.5 reverse bacteria ", header = TRUE)

Related

Plotting each value of columns for a specific row

I am struggling to plot a specific row from a dataframe. Below is the Graph i am trying to plot. I have tried using ggplot and normal plot but i cannot figure it out.
Wt2 Wt3 Wt4 Wt5 Lngth2 Lngth3 Lngth4 Lngth5
1 48 59 95 82 141 157 168 183
2 59 68 102 102 140 168 174 170
3 61 77 93 107 145 162 172 177
4 54 43 104 104 146 159 176 171
5 100 145 185 247 150 158 168 175
6 68 82 95 118 142 140 178 189
7 68 95 109 111 139 171 176 175
Above is the Data frame I am trying to plot with. The rows are for each bears measurement. So row 1 is for bear 1. How would I plot only the Wt columns for bear 1 against an X-axis that goes from years 2 to 5
You can pivot your data frame into a longer format:
First add a column with the row number (bear number):
df = cbind("Bear"=as.factor(1:nrow(df)), df)
It needs to be factor so we can pass it as a group variable to ggplot. Now pivot:
df2 = tidyr::pivot_longer(df[,1:5], cols=2:5,
names_to="Year", values_to="Weight", names_prefix="Wt")
df2$Year = as.numeric(df2$Year)
We ignore the Length columns with df[,1:5]; say that we only want to pivot the weight columns with df[,2:5]; then say the name of the columns we want to create with names_to and values_to; and lastly the names_prefix="Wt" removes the "Wt" before the column names, leaving only the year number, but we get a character, so we need to make it numeric with as.numeric().
Then plot:
ggplot(df2, aes(x=Year, y=Weight, linetype=Bear)) + geom_line()
Output (Ps: i created my own data, so the actual numbers are off):
Just an addition, if you don't want to specify the columns of your dataset explicity, you can do:
df2 = df2[,grep("Wt|Bear", colnames(df)]
df2 = tidyr::pivot_longer(df2, cols=grep("Wt", colnames(df2)),
names_to="Year", values_to="Weight", names_prefix="Wt")
Edit: one plot for each group
You can use facet_wrap:
ggplot(df2, aes(x=Year, y=Weight, linetype=Bear)) +
facet_wrap(~Bear, nrow=2, ncol=4) +
geom_line()
Output:
You can change the nrow and ncol as you wish, and can remove the linetype from aes() as you already have a differenciation, but it's not mandatory.
You can also change the levels of the categorical data to make the labels on each graph better, do levels(df2$Bear) = paste("Bear", 1:7) for example (or do that the when creating it).
Try
ggplot(mapping = aes(x = seq.int(2, 5), y = c(48, 59, 95, 82))) +
geom_point(color = "blue") +
geom_line(color = "blue") +
xlab("Year") +
ylab("Weight")

Change Bar Colours in a Grouped Bar Plot

My data consist of numerical values between 100 - 2000 grouped into 3 different drug treatment groups, which are then subdivided into 3 groups (based on their anatomical location in an organism, termed "Inner", "Middle", "Outer"). The final plot should be 3 groups of 3 bars (each representing the mean values of cell survival in each of the 3 locations). So far I have managed to make individual barplots, but I want to combine them. Here is some code that I have, and below that is a small excerpt from the data set.
Treatment Inner Middle Outer
RAD 317 373 354
RAD 323 217 174
RAD 236 255 261
HUTS 1411 1844 1978
HUTS 1922 1756 1856
HUTS 1478 1711 1433
RGD 1433 1489 1633
RGD 1400 1500 1544
RGD 1222 1333 1444
With some help, I've been able to create a grouped bar plot using the code:
df %>%
gather(key = group, value = value, -Treatment) %>%
ggplot(aes(x = Treatment, y = value, fill = group)) +
stat_summary(fun.y = mean, geom = "col", position = position_dodge())
Now, however, I want to be able to choose the colours of the bars.
Any help would be really appreciated!

ggplot doesn't show the second geom_line() in my plot

My df:
p1 p2 p3 x y
0 3000 14 0.0 0.026500
20 3000 14 11.0 0.054000
30 3000 14 17.9 0.057000
60 3000 14 49.3 0.064000
80 3000 14 77.4 0.063000
60 3500 14 45.3 0.061000
60 4000 14 41.4 0.058300
60 4400 14 43.7 0.073600
60 3500 9 41.7 0.060556
60 3500 18 46.7 0.060700
60 3500 21 49.2 0.059900
This is the result of a "one parameter at a time" experimental design, i.e., one where the parameters p1, p2 and p3 were changed one at a time (definitely not the best kind of DOE, but that's what I got). For each observation, two variables are measured, x and y. I would like to plot a line connecting all points of the p1 study (the first 5 rows), a line connecting all points of the p2 study (rows 4 and 6:8) and a third line connecting the points of the p3 study (rows 6 and 9:11). I tried with
ggplot(df, aes(x = x, y = y, color = p2)) +
geom_point( aes(shape = p3)) +
geom_line() +
geom_line(data = filter(df, p1 == "60" & p3 == "14"), aes(x = x, y = y))
The red and the green line correspond to the p1 and p3 study, but ggplot doesn't plot the line corresponding to the p2. How can I manage to plot it? In practice, I need either a geom_path or a geom_line connecting the triangle symbols in the center of the screen (x coordinate between 40 and 50).

ggplot create map with arrows

I have a data frame like this
id lon lat
1 A -69.5 -58.5
2 A -69.5 -58.5
3 A -69.5 -57.5
4 A -68.5 -57.5
5 A -68.5 -57.5
6 A -68.5 -57.5
7 A -66.5 -57.5
8 A -68.5 -56.5
9 A -68.5 -56.5
10 A -67.5 -56.5
11 A -65.5 -56.5
12 A -65.5 -56.5
13 A -65.5 -55.5
14 A -62.5 -54.5
15 B -177 -52.5
16 B -178 -50.5
17 B -179 -48.5
18 B 179 -47.5
19 B 178 -46.5
20 B 177 -46.5
and I want to produce a map of the position of A and B, linked by oriented lines. However when ids cross the Pacific (lon=-180 -> lon=+180) I get an arrow crossing the whole figure, like shown below.
This is the code I am using
worldmap = map_data("world")
ggplot(test, aes(x = lon, y=lat, colour = factor(id))) +
geom_polygon(data=worldmap,center=180,aes(x=long, y=lat, group=group), fill="black",colour="black") +
xlab("") +ylab("")+theme(axis.text=element_blank(),axis.ticks=element_blank())+ theme(panel.background = element_rect(fill = 'white', colour = 'black') ,panel.grid.major = element_blank(),panel.grid.minor = element_blank())+
geom_path(size =2,arrow = arrow(angle=30,length = unit(0.6, "inches")))
How can I fix it?
Thanks
I guess that depends on what you think the "right" think to do is. I decided to break up the pathes that cross the glob into two segments by adding in points at the edge of the map, and then creating a "sequence" indicator so ggplot knows which lines to connect. Here's the transformation for your sample data
test2 <- do.call(rbind, lapply(split(test, test$id), function(x) {
cp <- cumsum(c(FALSE, diff(x$lon)>250))
xx<-split(x, cp)
xx<-Map(cbind, xx, seq=seq_along(xx))
Reduce(function(a,b) {
lasta<-a[nrow(a),]
firstb<-b[1,]
lasta$lon <- 180*sign(lasta$lon)
firstb$lon <- 180*sign(firstb$lon)
lasta$lat <- mean(lasta$lat, firstb$lat)
firstb$lat <- lasta$lat
rbind(a,lasta, firstb,b)
}, xx)
}))
tail(test2)
# id lon lat seq
# B.17 B -179 -48.5 1
# B.171 B -180 -48.5 1
# B.18 B 180 -48.5 2
# B.181 B 179 -47.5 2
# B.19 B 178 -46.5 2
# B.20 B 177 -46.5 2
here you can see that we've broken the B line up into two sequences. Then if we use a group aesthetic
geom_path(aes(group=interaction(id, seq)), ...)
then R will only connect those points that are in the same id/seq group. This will prevent the line from going across the ocean. However, because we are drawing two lines for that group rather than one, there's no way to turn of the arrow head for just one of the segments. you might want to find another way to indicate start/end.

Add points ggplot

Hi I have many data frame like this
id oldid yr mo dy lon lat
1 01206295 Aberfeldy 1885 3 22 -127.1 -31.78
2 05670001 05670005 1885 3 22 -4.38 49.15
3 06279 06279 1885 3 22 -123.5 37.5
4 106251 06323 1885 3 22 178.5 19.5
5 58FFF3618 58FFF3618 1885 3 22 -0.73 69.73
6 Achille.F Achille.F 1885 3 22 -35.62 -2.98
stored in different files myfiles and I am trying to plot the (lon,lat) points for each of them with the colour chosen according to the id value. So far I am doing like this
for (i in 1:length(myfiles)){
colnames(myfilesContent[[i]]) <-c("id","oldid","yr","mo","dy","lon","lat")
p <- ggplot() + geom_polygon(data=world_map,aes(x=long, y=lat,group=group))
myfilesContent[[i]]$lon <- as.numeric(myfilesContent[[i]]$lon)
myfilesContent[[i]]$lat <- as.numeric(myfilesContent[[i]]$lat)
p + geom_point(data=myfilesContent[[i]], aes(x=lon, y=lat, fill=as.factor(id)), size = 4, shape = 21, show_guide=FALSE)
print(p)
}
Anyway I am not sure that if an id is in different files it will be assigned with the same colour
Many thanks
You can make sure the levels for all your id columns are the same. First, get a master list of all the IDs from all the data.frames
allids <- unique(unlist(lapply(myfilesContent, function(x) levels(x[,1])))
Then make sure all the ID columns share these levels
lapply(seq_along(myfilesContent), function(i) {
myfilesContent[[i]][,1] < -factor(myfilesContent[[i]][,1], levels=allids)
})
If they have the same levels, they should get the same colors.

Resources