My table having 40 raw and 4 columns, in that 4 columns first column belongs to one group and the remaining constitute the other group.
using following commands for calculating jaccard's index
x <- read.csv(file name,header=T, sep= )
jac <- vegdist(x,method="jaccard")
from this out file(jac) how can i find the p value for two groups?
and how can i plot notched box plot of these two groups?
when i use boxes(as.matrix(jac)~x$first column,notch=TRUE)
its showing 40 box plots. why it so?
Related
I have a expression matrix containing three groups. I need to draw or split the heat-map with specific range of column.
Total number of colums: 151 where 1st column is gene ids
Group1: 2:40
Group2: 41:80
Group3: 81:151
I searched for splitting the heatmap and I got some hits like this.
But they are based on specific clusters.
I need to give my range as (2:40, 41:80, 81:151) for splitting or making boundary for the heatmap
Something like this
library(pheatmap)
mat = cbind(genes=1:100,
matrix(rnorm(150*100,mean = rep(1:3,c(39*100,40*100,71*100))),ncol=150))
colnames(mat)[2:ncol(mat)] = paste0("col",1:150)
You need to know how many are in each group, from what you provided, i counted this:
Group1: 39 Group2: 40 Group3: 71
So you need to make a data.frame that has the same row names as your matrix, and tell it which is group1,2 etc.
DF = data.frame(Groups=rep(c("Group1","Group2","Group3"),c(39,40,71)))
rownames(DF) = colnames(mat)[2:ncol(mat)]
Then we plot, mat[,-1] means excluding the first column, you need to specify where to insert the gap, and for your example it is at 39,79 and 80 because we excluded the first column:
pheatmap(mat[,-1],cluster_cols=FALSE,
annotation_col=DF,gaps_col = cumsum(c(39,40,71)))
I am trying to plot a dataframe as follows:
A 1
C 5
B 4
Z 10
M 7
and would it to show the data in the order (i.e. first column in the bar chart is A, second is C, third is B.
I have:
ggplot(pc,aes(x=Let,y=Count))+geom_bar(stat="identity")
And it plots it with the order of the Let column.
df<-data.frame(c('A','C','B','Z','M'),c(1,5,4,10,7))
One way is to convert Let column to factor in the order you want to see them and then use ggplot command.
library(tidyverse)
df$Let <- factor(df$Let, levels = df$Let)
ggplot(df,aes(x=Let,y=Count))+geom_bar(stat="identity")
data
df<-data.frame(Let = c('A','C','B','Z','M'),Count = c(1,5,4,10,7))
I have a data frame that has the 14 columns. 2 of those 14 columns are "Region" and "Population Density." Lets say that I want to find all instances when region is 4 and print out what the value of the population density is for each instance of region = 4.
here I am going to add a new column in the data frame called "PopDens"
this new column will take the total population and divided it by
the land region
cdi.df$PopDens= cdi.df$TotalPop/cdi.df$LandArea
dim(cdi.df) #here I verify the columns are now 14 and not 13
head(cdi.df) #here I verify that the name and calculations are correct
head(cdi.df$PopDens, 3) #here I return only the first three values
Is there a way to return only values of pop dens when region is =4?
Just like this:
cdi.df[cdi.df$Region==4,] #to see the data.frame with only region 4
cdi.df$PopDens[cdi.df$Region==4] #to see only the population densities
Next time, please provide a reproducible eaxmple, as explained here.
Seems like quite an easy problem to solve, but I can't seem to get my head around it in R.
I have dataset with the following columns:
'Biomass' where each row is a value of biomass for a particular species
'Count' where each row is the number of individual animals of that species counted
I need to create a histogram of biomasses, but if I use hist(DF$Biomass) I will get a histogram of the biomasses of the animals where each value is one animal.
I need to include the count, so that I have (for example) the weight frequencies of elephant x 2, giraffe x 56 etc..
you're not making my life easy :)
Is this what you want ?
DF <- data.frame(Biomass=c(200,200,1500),Count = c(36,20,2))
DF2 <- aggregate(Count ~ Biomass,DF,sum) # sum different occurrences for each Biomass value
barplot(DF2$Count,names.arg =DF2$Biomass) # presents them with a barplot, which is more appropriate than an histogram in the R sense here.
If I understood you right that is what you need :)
biomass<-c(1,5,7,6,3)
count<-c(1,2,1,3,4)
new<-NULL
for (i in 1:length(biomass))
{
new<-c(new, rep(biomass[i], count[i]))
}
new
hist(new)
So finally just type:
new<-NULL
for (i in 1:length(DF$Biomass))
{
new<-c(new, rep(DF$Biomass[i], DF$Count[i]))
}
hist(new)
I have a dataframe "df" that has 120 rows and 2 columns containing numbers as shown...
V1 V2
10001 177417
227418 267719
317720 471368
I want to be able to lay these along the X-axis of a plot with a line connecting the values from V1 t0 V2 in each row.
one option would be to use seq(V1,V2) for each row then concatenate to create a full series, However with the the amount of data involved, the object size runs to >10GB and is therefore not a viable option. The Y-axis position here is not important.
Any ideas?
First create a plot object, then enter the rest of the rows using the segments function:
plot(x=c(1,1), y=df[1,], xlim = c(1,nrow(df)), ylim=range(df), type='l')
segments(x0=2:nrow(df), x1=2:nrow(df), y0=df[-1,1], y1=df[-1,2])
Here is how it looks on a random cumulative set:
df <- apply(as.data.frame(cbind(rnorm(1000),rnorm(1000))),2,cumsum)