How to draw box plots for each cluster in same diagram? - r

I have done the clustering using Rattle and at the end I have the following format of data:
I need to draw the boxplot for each cluster in the same diagram. (i.e there are 5 clusters. So, in X axis, I need 1 to 5 cluster numbers and in Y axis I need age.)
I did the following things. But I couldn't get as I expect.
Can anybody suggest correct settings to get 5 box plots in parallel for each cluster?

In R u can make it. first read the file in frame.
reviews <- read.csv ("abc.csv", stringsAsFactors=FALSE)
boxplot(reviews$age ~ reviews$Cluster)

Related

How to label 5 specific points on PCA plot

I have used package (tidyverse) and just wanted to add labels to 5 specific points on this lot. I have tried the below code but it is not giving me any points. the data set is about 2000 observations over 21 variables.
BOTTOM=which(interest2$ID%in%project.pca$ID);
text(which(interest2$ID%in%project.pca$ID)[BOTTOM,1], text(which(interest2$ID%in%project.pca$ID))[BOTTOM,2],text(which(interest2$ID%in%project.pca$ID)[BOTTOM,3],rownames(input)[BOTTOM],pos=1)
With ggplot, Make the PCA plot first, and the add edition layer with dataframe with only those 5 points. Check out in this post for example
https://datavizpyr.com/how-to-add-labels-to-select-points-with-ggrepel/

Plot group in lattice, using different data sources

Using the lattice package in R, I would like to plot one row of 7 diagrams, all using the same Y-axis. The diagrams should be (vertical) line diagrams. The problem is that my data are each in 7 separate dataframes (containing X and Y data), with different slightly different limits on the Y-axis data.
Besides all tutorials, I don't get it right. What must my Code look like? Is there even a clean solution for this in lattice?
You could combine all your data frames into one and then do something like
xyplot(Y~X|odf,data=combinedDF,layout=c(7,1))
where odf is an indicator column of the original data frame. This by default should use a common y scale.
Apart from combining the data, you could create 7 separate plots, then print them.
p1 <- xyplot(Y~X,data=DF1,ylim=c(Y1,Y2))
p2 <- xyplot(Y~X,data=DF2,ylim=c(Y1,Y2))
etc.
To print:
print(p1,split=c(1,1,7,1),more=TRUE)
print(p2,split=c(2,1,7,1),more=TRUE)
...
print(p7,split=c(7,1,7,1),more=FALSE)
see ?print.trellis.
Of course, arranging single plots like this doesn't really use the features of lattice. You could just as easily do this with base graphics using layout or par(mfrow=c(1,7)) for example, and a common ylim.

PCA biplot one variables shown R

I ran a pca on a set of 45000 genes on 5 different samples, and when I perform a biplot, all I see is a mass of text (responding to the observation names), and cannot see the location of my samples. Is there a way to plot the location of the samples only, and not the observation, in a biplot?
Using built in data from R
usa <- USArrests
pca1 <- prcomp(usa)
biplot(pca1)
This generates a biplot where all the states (observation names) overlap the variables (my different samples) rape, etc. Is it possible to plot only the variables (samples), and not the states (observation names)?
biplot.default uses text to write the categorical variable name of the observation. As it doesn't use points you need to modify the source if you only want the points (and not the labels) to be plotted.
However, you could "hack" it by doing something like:
biplot(pca1, xlabs = rep(".", nrow(usa)))
I hope this is what you're looking for!
Edit If this is not satisfactory, you can modify the source given when running stats:::biplot.default to use points.

combine dendrogram plots and heatmap plots in R [duplicate]

I am trying to take my dataset which is made up of protein dna interaction, cluster the data and generate a heatmap that displays the resulting data such that the data looks clustered with the clusters lining up on the diagonal. I am able to cluster the data and generate a dendrogram of that data however when I generate the heatmap of the data using the heatmap function in R, the clusters are not visible. If you look at the first 2 images one is of the dendrogram I am able to generate, the second is of the heatmap that I am able to generate, and the third is just an example of a clustered heatmap that shows how I expect the result to look roughly. As you can see from comparing the second and third images, it is clear that there are clusters in the third but not in the second image.
Here is a link to my dataset:
http://pastebin.com/wQ9tYmjy
I am able to cluster the data and generate a just fine in R:
args <- commandArgs(TRUE);
matrix_a <- read.table(args[1], sep='\t', header=T, row.names=1);
location <- args[2];
matrix_d <- dist(matrix_a);
hc <- hclust(matrix_d,"average");
mypng <- function(filename = "mydefault.png") {
png(filename)
}
options(device = "mypng")
plot(hc);
I am also able to generate a heatmap okay as well:
matrix_a <- read.table("Arda_list.txt.binary.matrix.txt", sep='\t', header=T, row.names=1);
mtscaled <- as.matrix(scale(matrix_a))
heatmap(mtscaled, Colv=F, scale='none')
I tried to follow the post:
http://digitheadslabnotebook.blogspot.com/2011/06/drawing-heatmaps-in-r.html
by by Christopher Bare but I am missing something. Any ideas would be appreciated. I have attached an image of the heatmap that I am getting, as well as the dendrogram. Image 3 was taken from Christopher Bare's post. Thanks
It turns out I should have generated a distance matrix using some kind of correlation on my data first. I calculated similarity values on the matrix using pearson, then called the heapmap function which made it easier to cluster the data. Once I was able to generate clusters I made it so that they would line up on the diagonal. Above is what the result looks like now. I had to alter how I called heatmap on my data set so that the clusters line up on the axis:
heatmap(mtscaled, Colv=T,Rowv=T, scale='none',symm = T)

Plotting different contour plots with similar scales in R or gnuplot

I am new to R for plotting, and I wish to do contour plots for several files. and here is what I have got so far. My file has 3 columns, X,Y,Z, and with some nan values. Since lattice does not allow Inf/NaN values, I had to remove them prior, and do some interpolation.
data <- read.table("file", sep=",", header=T)
mydata <- na.omit(data)
library(akima)
library(lattice)
s = interp(mydata$X, mydata$Y, mydata$Z)
filled.contour(s, xlim= c(5,25), ylim=c(40,180))
This does gives some results, but there are things I am not able to do:
To get contour lines on the graph.
Also there are like 3 files with different z ranges, say one from (0-18), (0-20), (0-25). I wish to adjust and rescale them to provide similar color scale on graph, for instance, the '15' value should be similar color on all three.
I am more familiar with gnuplot, but there also the problem is with the ranges, as the range always autoscale to color, and it seems difficult to control the range. Any help with that is also deeply appreciated.
I may be doing something wrong, so in case anybody could help me out, and provide to right direction, or right software, I will be grateful.
There are demos here for how to make contours in gnuplot. Are you having trouble in the sense that you have code to make a contour plot but it does not work?
To answer your second question, in gnuplot the command you probably want is
set cbrange [CB_MIN:CB_MAX]
This sets the range of values which will be colored according to the current palette. You would just have to issue the same set cbrange command for all three plots you are making. If you want to automatically set the cbrange to the min/max on all files, you can use the stats command (in version 4.6 or newer, otherwise it is more tricky):
stats 'datafile1' using 3 name 'd1'
stats 'datafile2' using 3 name 'd2'
stats 'datafile3' using 3 name 'd3'
datamin_z = (d1_min<d2_min&&d1_min<d3_min?d1_min:d2_min<d3_min?d2_min:d3_min)
datamax_z = (d1_max>d2_max&&d1_max>d3_max?d1_max:d2_max>d3_max?d2_max:d3_max)
set cbrange [datamin_z:datamax_z]

Resources