I want show cluster wise boxplot distribution from complexheatmap. I was able to do row-wise distribution but how do I implement the cluster-wise distribution attached as example.
In the dummy example it creates a subgroup which it shows in the distribution. Similar manner I have already in my datafile made cluster which is represented in the first column.
How do I implement this in my dataframe using this example code
I'm not sure how do I make subgroup in case of my dataframe.
Any suggestion or help would be really appreciated.
This is the output i would like to see:
This is the output I have:
The dataset is this one: small_data
And my code:
df <- read.csv("small_data.txt",header = TRUE)
heat <- t(scale(t(df[,3:ncol(df)])))
myBreaks <- seq(-1.5, 1.5, length.out=100)
hmap <- Heatmap(heat)
hmap
How do i implement the cluster specific distribution ? as it is shown in the first pic. The second figure is what I'm getting now
Related
I am working with California Housing Dataset. The dataset has 20640 observations and 10 attributes. I am using R to make biplot but the figure I obtained is not very readable. The output is as followenter image description here
I am using a simple code to make this output.
biplot(housingpr,scale = 0)
Is there anyway to make this biplot look readble.
There is not much you can do if you want to plot 20,640 observations except to make the points smaller. Here is an example with the iris data:
data(iris)
iris.pca <- prcomp(iris[, -5], scale.=TRUE)
biplot(iris.pca, xlabs=rep("*", nrow(iris)), cex=.75)
The xlabs= argument sets the text for each point with the default value being the row number. This replaces the default with an asterisk for each value. If there are still too many points you can replace the asterisk with a period. The cex= argument controls the size of the labels with the default value of 1 being full-size.
I am new to TramineR and using seqdplot() command I created a plot to visualize cluster patterns. Is there a way to plot the proportions of each cluster (as seen on the plot below)?
To display the proportion, you can use the group.p function of TraMineRextras. The function adds the proportion to each group label.
Here is the example given in the help page:
library('TraMineRextras')
data(actcal)
actcal <- actcal[1:100,]
actcal.seq <- seqdef(actcal[,13:24])
seqdplot(actcal.seq, group=group.p(actcal$sex))
So... newbie R user here. I have some observations that I'd like to record using R and be able to add to later.
The items are sorted by weights, and the number at each weight recorded. So far what I have looks like this:
weights <- c(rep(171.5, times=1), rep(171.6, times=2), rep(171.7, times=4), rep(171.8, times=18), rep(171.9, times=39), rep(172.0, times=36), rep(172.1, times=34), rep(172.2, times=25))
There will be a total of 500 items being observed.
I'm going to be taking additional observations over time to (hopefully) see how the distribution of weights changes with use/wear. I'd like to be able plots showing either stacked histograms or boxplots.
What would be the best way to format / store this data to facilitate this kind of use case? A matrix, dataframe, something else?
As other comments have suggest, the most versatile (and perhaps useful) container (structure) for your data would be a data frame - for use with the library(ggplot2) for your future plotting and graphing needs(such as BoxPlot with ggplot and various histograms
Toy example
All the code below does is use your weights vector above, to create a data frame with some dummy IDs and plot a box and whisker plot, and results in the below plot.
library(ggplot2)
IDs<-sample(LETTERS[1:5],length(weights),TRUE) #dummy ID values
df<-data.frame(ID=IDs,Weights=weights) #make data frame with your
#original `weights` vector
ggplot(data=df,aes(factor(ID),Weights))+geom_boxplot() #box-plot
I'm trying to plot some data with 2d density contours using ggplot2 in R.
I'm getting one slightly odd result.
First I set up my ggplot object:
p <- ggplot(data, aes(x=Distance,y=Rate, colour = Company))
I then plot this with geom_points and geom_density2d. I want geom_density2d to be weighted based on the organisation's size (OrgSize variable). However when I add OrgSize as a weighting variable nothing changes in the plot:
This:
p+geom_point()+geom_density2d()
Gives an identical plot to this:
p+geom_point()+geom_density2d(aes(weight = OrgSize))
However, if I do the same with a loess line using geom_smooth, the weighting does make a clear difference.
This:
p+geom_point()+geom_smooth()
Gives a different plot to this:
p+geom_point()+geom_smooth(aes(weight=OrgSize))
I was wondering if I'm using density2d inappropriately, should I instead be using contour and supplying OrgSize as the 'height'? If so then why does geom_density2d accept a weighting factor?
Code below:
require(ggplot2)
Company <- c("One","One","One","One","One","Two","Two","Two","Two","Two")
Store <- c(1,2,3,4,5,6,7,8,9,10)
Distance <- c(1.5,1.6,1.8,5.8,4.2,4.3,6.5,4.9,7.4,7.2)
Rate <- c(0.1,0.3,0.2,0.4,0.4,0.5,0.6,0.7,0.8,0.9)
OrgSize <- c(500,1000,200,300,1500,800,50,1000,75,800)
data <- data.frame(Company,Store,Distance,Rate,OrgSize)
p <- ggplot(data, aes(x=Distance,y=Rate))
# Difference is apparent between these two
p+geom_point()+geom_smooth()
p+geom_point()+geom_smooth(aes(weight = OrgSize))
# Difference is not apparent between these two
p+geom_point()+geom_density2d()
p+geom_point()+geom_density2d(aes(weight = OrgSize))
geom_density2d is "accepting" the weight parameter, but then not passing to MASS::kde2d, since that function has no weights. As a consequence, you will need to use a different 2d-density method.
(I realize my answer is not addressing why the help page says that geom_density2d "understands" the weight argument, but when I have tried to calculate weighted 2D-KDEs, I have needed to use other packages besides MASS. Maybe this is a TODO that #hadley put in the help page that then got overlooked?)
I am trying to take my dataset which is made up of protein dna interaction, cluster the data and generate a heatmap that displays the resulting data such that the data looks clustered with the clusters lining up on the diagonal. I am able to cluster the data and generate a dendrogram of that data however when I generate the heatmap of the data using the heatmap function in R, the clusters are not visible. If you look at the first 2 images one is of the dendrogram I am able to generate, the second is of the heatmap that I am able to generate, and the third is just an example of a clustered heatmap that shows how I expect the result to look roughly. As you can see from comparing the second and third images, it is clear that there are clusters in the third but not in the second image.
Here is a link to my dataset:
http://pastebin.com/wQ9tYmjy
I am able to cluster the data and generate a just fine in R:
args <- commandArgs(TRUE);
matrix_a <- read.table(args[1], sep='\t', header=T, row.names=1);
location <- args[2];
matrix_d <- dist(matrix_a);
hc <- hclust(matrix_d,"average");
mypng <- function(filename = "mydefault.png") {
png(filename)
}
options(device = "mypng")
plot(hc);
I am also able to generate a heatmap okay as well:
matrix_a <- read.table("Arda_list.txt.binary.matrix.txt", sep='\t', header=T, row.names=1);
mtscaled <- as.matrix(scale(matrix_a))
heatmap(mtscaled, Colv=F, scale='none')
I tried to follow the post:
http://digitheadslabnotebook.blogspot.com/2011/06/drawing-heatmaps-in-r.html
by by Christopher Bare but I am missing something. Any ideas would be appreciated. I have attached an image of the heatmap that I am getting, as well as the dendrogram. Image 3 was taken from Christopher Bare's post. Thanks
It turns out I should have generated a distance matrix using some kind of correlation on my data first. I calculated similarity values on the matrix using pearson, then called the heapmap function which made it easier to cluster the data. Once I was able to generate clusters I made it so that they would line up on the diagonal. Above is what the result looks like now. I had to alter how I called heatmap on my data set so that the clusters line up on the axis:
heatmap(mtscaled, Colv=T,Rowv=T, scale='none',symm = T)