Creating a continuous density heatmap of 2D scatter data in R with each column of dataframe coloured differently - r

I am curious if there's a way to improve upon the answers mentioned in 1
For example,
1) Can the x and y columns of the data-frame be colored differently rather than red or using a color gradient?. And as specified in ggplot2 documentation, I don't want color the columns according to a factor
2) Furthermore, can the shape of points be altered respectively for each of the columns in the data-fame (e.g. triangles for x values and round for y values)
To achieve the same, afaik, I tried to plot each column separately by tweaking the code mentioned in 1
All i got was the same plot with red color for each point with a failure to change the shape when using the aes() function for each column separately.
Thanks and Regards,
Yogesh

Related

Plot a curve with different color for each point in R

I have a curve, for instance
y_curve=c(1,2,5,6,9,1).
and the colors for each curve point
colors=c("#0000FF","#606060","#606060","#FF0000","#FF0000","#FF0000").
In theory I want to plot a curve where the first half has one color (except for the first point which is blue) and the second half has another color. In my example the dataset has more than 3000 observations so it makes sense.
For some reason, if I plot the data just using the command
plot(y_curve,col=colors), the color of points is plotted corrently.
Nevertheless, if I add the option type="l", the plotted curve has only one color - the blue, which is the first color in the vector colors ("#0000FF").
Does anyone know what am I doing wrong?
So the code is
y_curve=c(1,2,5,6,9,1)
colors=c("#0000FF","#606060","#606060","#FF0000","#FF0000","#FF0000")
plot(y_curve,col=colors,type="l")
Thank you all in advance.
I avoid using ggplot since this part of code is inside an already complicated function and I prefer using the base R commands.
The line option for the plot function does not accept multiple colors.
There is the segments() function that we can use to manually draw in each separate segment individually with a unique color.
y_curve=c(1,2,5,6,9,1)
colors=c("#0000FF","#606060","#606060","#FF0000","#FF0000","#FF0000")
#create a mostly blank plot
plot(y_curve,col=NA)
# Use this to show the points:
#plot(y_curve,col=colors)
#index variable
x = seq_along(y_curve)
#draw the segments
segments(head(x,-1), head(y_curve,-1), x[-1], y_curve[-1], type="l", col=colors)
This answer is based on the solution to this question:
How do I plot a graph in R, with the first values in one colour and the next values in another colour?

How to increase the interval of labels in geom_text?

I am trying to put labels beside some points which are very close to each other on geographic coordinate. Of course, the problem is overlapping labels. I have used the following posts for reference:
geom_text() with overlapping labels
avoid overlapping labels in ggplot2 charts
Relative positioning of geom_text in ggplot2?
The problem is that I do not want to relocate labels but increase the interval of labeling (for example every other 10 points).
I tried to make column as alpha in my dataframe to make unwanted points transparent
[![combined_df_c$alpha=rep(c(1,rep(0,times=11)),
times=length(combined_df_c$time)/
length(rep(c(1,rep(0,times=11)))))][1]][1]
I do not know why it does not affect the plot and all labels are plotted again.
The expected output is fewer labels on my plot.
You can do this by sequencing your dataframe for the labs of geom_text.
I used the build-in dataset mtcars for this, since you did not provide any data. With df[seq(1,nrow(df),6),] i slice the data with 6-steps. This are the labels which get shown in your graph afterwards. You could use this with any steps you want. The sliced dataframe is given to geom_text, so it does not use the original dataset anymore, just the sliced one. This way the amount of points for the labels and the amount of labels are equal.
df <- mtcars
labdf<- df[seq(1,nrow(df),6),]
ggplot()+
geom_point(data=df, aes(x=drat, y=seq(1:length(drat))))+
geom_text(data=labdf,
aes(x=drat, y=seq(1:length(drat))), label=labdf$drat)
The output is as expected: from 32 rows, just 6 get labeled.
You can easily adjust the code for your case.
also: you can put the aes in ggplot() which may be more useful if you use more then just gemo_point. I made it like this, so i can clarify: there is a different dataset used on geom_text()

Coloring and Labeling points in geom_point

I am trying to create a polar chart with two levels as similar to:
However I am having a bit of difficulty with coloring the points, then coloring the labels without losing the color on the original points. I do not know if I should ask my questions in multiple questions or all together. I figured all together since they relate to the same graph, but if that is not allowed, please let me know and I can edit before it gets down voted. That is simply unfair. I have posted a reproducable example down below with comments to make it easier.
I have two dataframes whioh basically are the same. One of them has an extra column, df2$plotter that I use to create a subset of the data to then plot the second level. The color vector, cdf, is a vector of where I have HEX values as colors.
Coloring Points
If it was one level I would use the scale_color_manual and fill/color the points that way. However, since I have two dataframes I thought I could call a color vector so to say that would be used to color based on the values within the vector. Yet it does not use those colors I assigned. Instead it labels, points D to O as a murky green and not greyas indicated by the HEX code: #A9A9A9 and uses the color as part of the legend. I would prefer a mapping as below. I do not know how to create a color vector such that is the values for each cell is used as the actual color, this vector also needs to work to color the labels. Secondly when I try to pass the same vector for the second level, the aesthetics in geom_point as Error: Aesthetics must be either lenght 1 or the same as the data This is both with adding plotter to the color paletter, but most likely I am guessing is do to the size of the vector itself. I would also prefer not to create another color vector, but simply refer to the first one.
• Alice (both Alice and Alice2) is #b79f00
• Bob is #00ba38
• Charlie is #00bfc4
• Peter is#619cff`
• Quin is #F8766D
• Roger is#f564e3`
• Then D to O is #A9A9A9
Labeling and Coloring said labels
I can add labels with geom_text. Then I call the same data and aesthetics. My issue is partially the coloring as mentioned above, but now when I color them, I lose my color but keep the fill of my points. Observe below. I do not know why my color gets lost down the way or how to fix them. I tried to plot the text first then the points, but that didn't change anything nor would I have guessed it to. Each label should be the same color as its point in short.
Reproducible Data:
k<-18
ct<-12
x_vector<-seq(1,k,1)
radius<-rep(5,k,1)
name<-c('Alice','Bob','Charlie','D','E','F','G','H','I','J','K','L','M','N','O','Peter','Quin','Roger')
df<-data.frame(x_vector,radius,name)
name2<-c('Alice2','Bob2','Charlie2','D','E','F','G','H','I','J','K','L','M','N','O','Peter2','Quin2','Roger2')
plotter<-c(1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1)
radius2<-rep(7,k,1)
df2<-data.frame(x_vector,radius2,name2,plotter)
color1<-c('#F8766D'
,'#F564E3'
,'#B79F00'
)
other_color<-c(rep('#A9A9A9',ct))
color2<-c('#00BFC4'
,'#619CFF'
,'#00BA38'
)
cdf<-c(color1,other_color,color2) #color palette
df$label_radius<-df$radius+0.5 ##used to adjust the labels by a radius of 0.5
p<-ggplot()+
## Level1
geom_point(data=df,aes(x=x_vector,y=radius,color=cdf,fill=cdf),size=3,shape=21)+
geom_text(data=df,aes(x=x_vector,y=label_radius,label=name,color=name))+
## Level2
geom_point(data=df2[(df2$plotter>0),], aes(x=x_vector,y=radius2,color=name2,fill=name2),size=3,shape=21)+
geom_text(data=df2[(df2$plotter>0),], aes(x=x_vector,y=radius2,label=name2,color=name2))+
## transform into polar coordinates
coord_polar(theta='x',start=0,direction=-1,clip='on')+
## sets up the scale to display from 0 to 7
scale_y_continuous(limits=c(0,7))+
## Used to 'push' the points so all 'k' show up.
expand_limits(x=0)
p
In general, I'd personally prefer to have all data in one dataframe, and add another variable to it: So instead of Bob1 and Bob2 as df$name, having df$name=Bob and creating, say, df$nr.
It is then easier to assign the same color to all Bob occurrences.
But sticking to your example: You can set colors values manually with scale_color_manual and scale_fill_manual (see link).
library(ggplot2)
k<-18
ct<-12
x_vector<-seq(1,k,1)
radius<-rep(5,k,1)
name<-c('Alice','Bob','Charlie','D','E','F','G','H','I','J','K','L','M','N','O','Peter','Quin','Roger')
df<-data.frame(x_vector,radius,name)
name2<-c('Alice2','Bob2','Charlie','D','E','F','G','H','I','J','K','L','M','N','O','Peter2','Quin2','Roger2')
plotter<-c(1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1)
radius2<-rep(7,k,1)
df2<-data.frame(x_vector,radius2,name2,plotter)
color1<-c(rep('#F8766D',2), # Alice and Alice2
rep('#F564E3',2), # Bob and Bob2
rep('#B79F00',1) # Charlie
)
other_color<-c(rep('#A9A9A9',12))
color2<-c(rep('#00BFC4',2),
rep('#619CFF',2),
rep('#00BA38',2)
)
cdf<-c(color1,other_color,color2) #color palette
df$label_radius<-df$radius+0.5 ##used to adjust the labels by a radius of 0.5
p<-ggplot()+
## Level1
geom_point(data=df,aes(x=x_vector,y=radius,color=name,fill=name),size=3,shape=21)+
scale_color_manual(values=cdf)+
scale_fill_manual(values=cdf)+
geom_text(data=df,aes(x=x_vector,y=label_radius,label=name,color=name))+
## Level2
geom_point(data=df2[(df2$plotter>0),], aes(x=x_vector,y=radius2,color=name2,fill=name2),size=3,shape=21)+
geom_text(data=df2[(df2$plotter>0),], aes(x=x_vector,y=radius2,label=name2,color=name2))+
## transform into polar coordinates
coord_polar(theta='x',start=0,direction=-1)+ #,clip='on')+ # <-- the clip property does not work for me, probably due to my ggplot version
## sets up the scale to display from 0 to 7
scale_y_continuous(limits=c(0,7))+
## Used to 'push' the points so all 'k' show up.
expand_limits(x=0)
p

How to plot heatmap with multiple categories in a single cell with ggplot2?

How to plot heatmap with multiple categories in a single cell with ggplot2? Heatmap plot of categorical variables could be done with this code
#data
datf <- data.frame(indv=factor(paste("ID", 1:20),
levels =rev(paste("ID", 1:20))), matrix(sample(LETTERS[1:7], 400, T), ncol = 20))
library(ggplot2);
library(reshape2)
# converting data to long form for ggplot2 use
datf1 <- melt(datf, id.var = 'indv')
ggplot(datf1, aes(variable, indv)) + geom_tile(aes(fill = value),
colour = "white") + scale_fill_manual(values= rainbow (7))
The codes came from here:
http://rgraphgallery.blogspot.com/2013/04/rg54-heatmap-plot-of-categorical.html
But what about multiple categories in a single cell like this? Is it possible to use triangle or other shape as a cell?
http://postimg.org/image/4dudrv0nz/
copy from biostar as Alex Reynolds suggested.
For those interested, this apperas to be Figure 2 from Exome sequencing identifies mutation in CNOT3 and ribosomal genes RPL5 and RPL10 in T-cell acute lymphoblastic leukemia.
I wanted to create a similar plot with ggplot and geom_tile for a bigger collection of genes (few hundreds) but finally decided to use geom_points instead to provide additional information per cell (tile). Also it looks to me a lot like this plot was generated in Excel or some other spreadsheet software (maybe along those lines https://www.youtube.com/watch?v=0s5OiRMMzuY). The colors in the cells (tiles) do not match those in the legend (suggesting that they have been added separately and not automatically) and there appears to be an erroneous cell (diagonal separating colors -upper left to lower right - different from diagonal in black color - lower left to upper right -).
Hence, my concluding two cents: Doing this automatically is probably very time-consuming and in my opinion makes only sense if you want to do this repeatedly, e.g., on data that is subject to change or on multiple datasets, and/or if you have a larger collections of genes.
Otherwise, following the instructions in the youtube video for a rather small number of cells is likely to be more efficient. Or use geom_point (similar to Adding points to a geom_tile layer in ggplot2 or
Marking specific tiles in geom_tile() / geom_raster()
) to represent information about an additional category (variable).
In any case, should anyone have other suggestions on how to automatically create such a figure, I am more than happy to hear about that.

creating comparable heatmaps in R

I am trying to create 2 heatmaps with variable values in R. I would like the colors and values to be scaled so that the values of the two heatmaps will be comparable. Right now I am using the heatmap.2 from the gplot package.
MyHeatMap <- heatmap.2(MyData, trace="none", col=greenred)
My data is in the from of a numeric matrix. I have two of these matrices where the numeric ranges of the values are slightly different and I would like to create quality heatmaps for both (does not have to necessarily be using the sample package).
I've encountered this issue a number of times in my own analyses and here is how I would suggest handling it.
Firstly, set your greenred color variable to have 256 colors with greenred(256).
Then, create a break variable that contains the range of numbers that you would like to split these 256 colors on for both heatmaps (the length will be one more than the length of the color vector). So, for instance, if you wanted the spread to be from -1 to 1 from green to red, respectively, you would do
pairs.breaks = seq(from=-1,to=1,length.out=257)
Then, when calling your heatmaps, use
MyHeatMap1 <- heatmap.2(MyData1, trace="none", col=greenred(256), breaks=pairs.breaks)
MyHeatMap2 <- heatmap.2(MyData2, trace="none", col=greenred(256), breaks=pairs.breaks)
This should produce two heat maps with different data sets that use identical color scales.
Hope this helps!
Ron

Resources