Parallel Co-ordinates Plot in R - r

I'd like to plot a parallel co-ordinate plot for a dataset mtcars. I want to set a variable on color. I used the code :
library(GGgally)
ggparcoord(data=mtcars, columns=1:10 , groupColumn=11)
It generated the graph but all the lines are in shades of blue. However I have trouble comprehending the graph and making observations due to similar colors used. How can I introduce a different set of colors like blue, green and red etx for the same variable.

You can use the ggparcoord()'s coloring function for this by turning the grouping-column into a factor.
mtcars[,11] <- as.factor(mtcars[,11])
ggparcoord(data=mtcars, columns=1:10 , groupColumn=11)

Related

Plot a curve with different color for each point in R

I have a curve, for instance
y_curve=c(1,2,5,6,9,1).
and the colors for each curve point
colors=c("#0000FF","#606060","#606060","#FF0000","#FF0000","#FF0000").
In theory I want to plot a curve where the first half has one color (except for the first point which is blue) and the second half has another color. In my example the dataset has more than 3000 observations so it makes sense.
For some reason, if I plot the data just using the command
plot(y_curve,col=colors), the color of points is plotted corrently.
Nevertheless, if I add the option type="l", the plotted curve has only one color - the blue, which is the first color in the vector colors ("#0000FF").
Does anyone know what am I doing wrong?
So the code is
y_curve=c(1,2,5,6,9,1)
colors=c("#0000FF","#606060","#606060","#FF0000","#FF0000","#FF0000")
plot(y_curve,col=colors,type="l")
Thank you all in advance.
I avoid using ggplot since this part of code is inside an already complicated function and I prefer using the base R commands.
The line option for the plot function does not accept multiple colors.
There is the segments() function that we can use to manually draw in each separate segment individually with a unique color.
y_curve=c(1,2,5,6,9,1)
colors=c("#0000FF","#606060","#606060","#FF0000","#FF0000","#FF0000")
#create a mostly blank plot
plot(y_curve,col=NA)
# Use this to show the points:
#plot(y_curve,col=colors)
#index variable
x = seq_along(y_curve)
#draw the segments
segments(head(x,-1), head(y_curve,-1), x[-1], y_curve[-1], type="l", col=colors)
This answer is based on the solution to this question:
How do I plot a graph in R, with the first values in one colour and the next values in another colour?

Heatmap distortion in R

I managed to generate the heatmap in R using heatmap function
( heatmap(heatmap_16m, col=redgreen(75))
to get the following:
As you see, it has a normal distribution of red, black and green colors.
Since heatmap function cannot provide any legend, I switched to heatmap.2 function (heatmap.2(heatmap_16m, col= redgreen(75), trace="none")) and got the following:
Here the color distribution is skewed to mainly red.
So, my question is following: how to get the apperance (legend, row and column dendrogram order) as in second heatmap with the distribution of greens and reds as in first heatmap?
I found the answer accidentally while searching for something else :)
Here it goes:
heatmap.2(heatmap_16m, col= greenred(75), trace="none",
scale="row")
You can also scale by column, depending on the data.

Creating a continuous density heatmap of 2D scatter data in R with each column of dataframe coloured differently

I am curious if there's a way to improve upon the answers mentioned in 1
For example,
1) Can the x and y columns of the data-frame be colored differently rather than red or using a color gradient?. And as specified in ggplot2 documentation, I don't want color the columns according to a factor
2) Furthermore, can the shape of points be altered respectively for each of the columns in the data-fame (e.g. triangles for x values and round for y values)
To achieve the same, afaik, I tried to plot each column separately by tweaking the code mentioned in 1
All i got was the same plot with red color for each point with a failure to change the shape when using the aes() function for each column separately.
Thanks and Regards,
Yogesh

R - Subtracting two smoothScatter plots

I have two smoothScatter plots and hope to subtract them. See Below:
par(mfrow=c(1,2))
set.seed(3)
x1 = rnorm(1000)
y1 = rnorm(1000)
smoothScatter(x1,y1,nrpoints=length(x1),cex=3)
x2 = rnorm(200)
y2 = rnorm(200)
smoothScatter(x2,y2,nrpoints=length(x2),cex=3,colramp=colorRampPalette(c("white","red")))
My hope is that I can produce a 3rd plot which is a colorful subtraction of the 1st plot from the 2nd plot. That is, there will be areas which are blue, red, and then if possible I'd like to make the overlapped areas gray. But I'd like the colors to be consistent with the new densities. For instance, the center of the new plot would be almost fully gray, whereas the outsides may have some gray but also patches of blue and red. Note that the two plots have different numbers of points. How could I do such a thing?
The only way I can think of doing this is to go pixel by pixel and subtract the colors from one plot to another. The problem is, I don't know how to grab the color intensities at each pixel to do this. However, even if I were to achieve this, white minus white would probably give black, which I wouldn't want.
Thanks in advance!
You might consider using slightly transparent colors
#helper function to make transparent ramps
alpharamp<-function(c1,c2, alpha=128) {stopifnot(alpha>=0 & alpha<=256);function(n) paste(colorRampPalette(c(c1,c2))(n), format(as.hexmode(alpha), upper.case=T), sep="")}
And then we can overplot the two graphs with
smoothScatter(x1,y1,nrpoints=length(x1),cex=3, colramp=alpharamp("white",blues9))
par(new=T)
smoothScatter(x2,y2,nrpoints=length(x2),cex=3,colramp= alpharamp("white","red"), axes=F, ann=F)
Here's that this code produces.
If, you still want to get to the actual color values in the plot, that's actually a bit tricky. You'd have to call grDevices:::.smoothScatterCalcDensity directly with your data. Then you'd have to transform the returned fhat values by taking 4th root and rescaling to 0-1. Then you convert to color by taking those values and then those values (let's call them z are converted to indexes using the formula floor((256 - 1e-05) * z + 1e-07)+1. Then those indexes are used to find a value from the 256 colors generated from the ramp you supply. It's all a bit crazy but you can read the source to smoothScatter and image.default to see how it really happens.

Clustering and heatmap in R

I am a newbie to R and I am trying to do some clustering on a data table where rows represent individual objects and columns represent the features that have been measured for these objects. I've worked through some clustering tutorials and I do get some output, however, the heatmap that I get after clustering does not correspond at all to the heatmap produced from the same data table with another programme. While the heatmap of that programme does indicate clear differences in marker expression between the objects, my heatmap doesn't show much differences and I cannot recognize any clustering (i.e., colour) pattern on the heatmap, it just seems to be a randomly jumbled set of colours that are close to each other (no big contrast). Here is an example of the code I am using, maybe someone has an idea on what I might be doing wrong.
mydata <- read.table("mydata.csv")
datamat <- as.matrix(mydata)
datalog <- log(datamat)
I am using log values for the clustering because I know that the other programme does so, too
library(gplots)
hr <- hclust(as.dist(1-cor(t(datalog), method="pearson")), method="complete")
mycl <- cutree(hr, k=7)
mycol <- sample(rainbow(256)); mycol <- mycol[as.vector(mycl)]
heatmap(datamat, Rowv=as.dendrogram(hr), Colv=NA,
col=colorpanel(40, "black","yellow","green"),
scale="column", RowSideColors=mycol)
Again, I plot the original colours but use the log-clusters because I know that this is what the other programme does.
I tried to play around with the methods, but I don't get anything that would at least somehow look like a clustered heatmap. When I take out the scaling, the heatmap becomes extremely dark (and I am actually quite sure that I have somehow to scale or normalize the data by column). I also tried to cluster with k-means, but again, this didn't help. My idea was that the colour scale might not be used completely because of two outliers, but although removing them slightly increased the range of colours plotted on the heatmap, this still did not reveal proper clusters.
Is there anything else I could play around with?
And is it possible to change the colour scale with heatmap so that outliers are found in the last bin that has a range of "everything greater than a particular value"? I tried to do this with heatmap.2 (argument "breaks"), but I didn't quite succeed and also I didn't manage to put the row side colours that I use with the heatmap function.
If you are okay with using heatmap.2 from the gplots package that will allow you to add breaks to assign colors to ranges represented in your heatmap.
For example if you had 3 colors blue, white, and red with the values going from low to high you could do something like this:
my.breaks <- c(seq(-5, -.6, length.out=6),seq(-.5999999, .1, length.out=4),seq(.100009,5, length.out=7))
result <- heatmap.2(mtscaled, Rowv=T, scale='none', dendrogram="row", symm = T, col=bluered(16), breaks=my.breaks)
In this case you have 3 sets of values that correspond to the 3 colors, the values will differ of course depending on what values you have with your data.
One thing you are doing in your program is to call hclust on your data then to call heatmap on it, however if you look in the heatmap manual page it states:
Defaults to hclust.
So I don't think you need to do that. You might want to take a look at some similar questions that I had asked that might help to point you in the right direction:
Heatmap Question 1
Heatmap Question 2
If you post an image of the heatmap you get and an image of the heatmap that the other program is making it will be easier for us to help you out more.

Resources