I am trying to create a Venn diagram for common differentially expressed genes across 3 data sets. I created a list that contains the differentially expressed genes, then I used the venn.diagram() function with the following arguments: x (which is my list of gene names in the three data sets) , filename,category.names and output. However, the Venn diagram is turning out completely blank, no category names nor numbers inside intersections.
My code looks like this:
venn.diagram(up, filename = 'venn_up.png', category.names = c('up_PC3', 'up_LAPC4', 'up_22Rv1'), output = TRUE)
Has anyone faced a similar problem? Thanks all!
Without reproducible dataset it is hard, so I created one:
genes <- paste("gene",1:1000,sep="")
x <- list(
up_PC3 = sample(genes,300),
up_LAPC4 = sample(genes,525),
up_22Rv1 = sample(genes,440)
)
You can use the following code to run a Venn diagram:
library(VennDiagram)
venn.diagram(x, filename = "venn_up.png", category.names = c('up_PC3', 'up_LAPC4', 'up_22Rv1'))
Than check at the right folder of your working directory for the output:
Related
As in title, when I try to draw a venn diagram with VennDiagram package, the resulting plot looks like this: Resulting graph.
My input is two tables read from txt files with read.delim (although I also tried read.table) put into a list for venn.diagram purpose. The datasets are 1325 and 675 rows long with short peptide sequences as character values (eg. REVDPDGRRTL), so I don't understand the resulting graph.
Here's what in theory should work:
library("VennDiagram")
#reading files
hid <- read.table("data/file1.txt", sep = "\t")
lid <- read.table("data/file2.txt", sep = "\t")
#creating a list
vid <- list(High = hid, Low = lid)
#graph
venn.diagram(vid, fill = c("#EFC000FF", "#0073C2FF"), filename = "#venn.png")
I also tried transforming the sets to vectors/lists and plotting like that but problem stays the same.
It surely lays on datasets/list side, because the graph is correct when I put example values like this
venn.diagram(list(Low = c("REVDPDGRRTL", "IYEDEDVKEA", "GVYDGREHTV"), High = c("IYEDEDVKEA", "GVYDGREHTV")),
fill = c("#EFC000FF", "#0073C2FF"), filename = "#venn.png")
I'm sure it's some rookie mistake but I can't think of a solution.
Any help is highly appreciated,
Thank you
I'm trying to create a Venn diagram of two data frames, but am only able receive incorrect results. An example of the data sets of the same structure:
Chemical
ChemID
Oxidopamine
D016627
Melatonin
D016627
I've only received incorrect results from the following:
VennDiagram::venn.diagram(
x = list(Lewy, Park),
category.names = c("ChemID, ChemID"),
filename ="venndiagramm.png",
output=TRUE)
Ideally, I would like to export an image of number of overlapping chemicals between the two sets.
Welcome to SO! As far as I guess your data structure (two dataframes Lewy and Park, each with the column ChemID), try the following:
VennDiagram::venn.diagram(
x = list(Lewy$ChemID, Park$ChemID), # expects vectors, not dataframes
# category.names = c("ChemID, ChemID"), # see if these are rather to construct nice labels
filename ="venndiagramm.png",
output=TRUE)
You may increase the chance of a useful answer by providing minimal working data samples by dput(). Of course you can use simulated data. Try to explain what exactly did not work.
See also ? venn.diagram
I have many amino acids sequences like in a fasta format that I did multiple sequence alignment. I was trying to plot something like binary code as heatmap. If it had a change it would be red, if it did not change would be yellow, for example.
I came across to msaplot from ggtreepackage. I also checked ggmsa package for that. But so far, I did not get what I wanted.
So basically I wanted to:
change the multiple sequence alignment to a binary matrix (if the amino differs from the reference sequence, plot x, if not y)
plot as a heatmap
A multiple sequence alignment is something like this for those who don't know.
I know that I should provide some type of data example but I am not sure how to create an example of multiple sequence alignment
if you install ggmsa you can have an example of the data and plot in r using:
protein_sequences <- system.file("extdata", "sample.fasta", package = "ggmsa")
ggmsa(protein_sequences, start = 265, end = 300)
We read in the alignment:
library(Biostrings)
library(ggmsa)
protein_sequences <- system.file("extdata", "sample.fasta", package = "ggmsa")
aln = readAAMultipleAlignment(protein_sequences)
ggmsa(protein_sequences, start = 265, end = 300)
Set the reference as the 1st sequence, some Rattus, you can also use the consensus with consensusString() :
aln = unmasked(aln)
names(aln)[1]
[1] "PH4H_Rattus_norvegicus"
ref = aln[1]
Here we iterate through the sequence and make the binary for where the sequences are the same as the reference:
bm = sapply(1:length(aln),function(i){
as.numeric(as.matrix(aln[i])==as.matrix(ref))
})
bm = t(bm)
rownames(bm) = names(aln)
The plot you see above has sequences reversed, so to visualize the same thing we reverse it, and also subset on 265 - 300:
library(pheatmap)
pheatmap(bm[nrow(bm):1,265:300],cluster_rows=FALSE,cluster_cols=FALSE)
The last row, is Rattus, the reference, and anything similar to that is read, as you can see in the alignment above, last 4 sequences are identical.
The simplest description of what I am trying to do is that I have a column in a data.frame like 1,2,3,..., n, 1,2,3,...n,.... and I want group the first 1...n as 1 the second 1...n as 2 and so on.
The full context is; I am using the R spcosa package to do equal area stratification composite sampling on parcels of land. I start with a shape file from a GIS that contains a number of polygons (land parcels). The end result I want is a GIS file with each of the strata and sample locations in a GIS file format with each stratum and sample location labeled by land parcel, stratum and sample id. So far I can do all this except one bit which is identifying the stratum that the samples belongs too and including it in the sample label. The sample label needs to look like "parcel#-strata#-composite# (where # is the number). In practice I don't need this actual label but as separate attributes in GIS file.
The basic work flow is a follows
For each individual polygon using spcosa::stratify I divide it into a number of equal area strata like
strata.CSEA <- stratify(poly[i,], nStrata = n, nTry = 1, equalArea = TRUE, nGridCells = x)
Note spcosa::stratify generates a CompactStratificationEqualArea object. I cocerce this to a SpatialPixelData then use rasterToPolygon to be able to output it as a GIS file.
I then generate the sample locations as follows:
samples.SPRC <- spsample(strata.CSEA, n = n, type = "composite")
spcosa::spsample creates a SamplingPatternRandomComposite object. I coerce this to a SpatialPointsDataFrame
samples.SPDF <- as(samples.SPRC, "SpatialPointsDataFrame")
and add two columns to the #data slot
samples.SPDF#data$Strata <- "this is the bit I can't do yet"
samples.SPDF#data$CEA <- poly[i,]$name
I can then write samples.SPDF as a GIS file ( ie writeOGE) with all the wanted attributes.
As above the part I can't sort out is how the sample ids relate to the strata ids. The sample points are a vector like 1,2,3...n, 1,2,3...n,.... How do I extract which sample goes with which strata? As actual strata number are arbitrary, I can just group ( as per my simple question above) but ideally I would like to use the numbering of the actual strata so everything lines up.
To give any contributors access to a hands on example I copy below the code from the spcosa documentation slightly modified to generate the correct objects.
# Note: the example below requires the 'rgdal'-package You may consider the 'maptools'-package as an alternative
if (require(rgdal)) {
# read a vector representation of the `Farmsum' field
shpFarmsum <- readOGR(
dsn = system.file("maps", package = "spcosa"),
layer = "farmsum"
)
# stratify `Farmsum' into 50 strata
# NB: increase argument 'nTry' to get better results
set.seed(314)
myStratification <- stratify(shpFarmsum, nStrata = 50, nTry = 1, equalArea = TRUE)
# sample two sampling units per stratum
mySamplingPattern <- spsample(myStratification, n = 2 type = "composite")
# plot the resulting sampling pattern on
# top of the stratification
plot(myStratification, mySamplingPattern)
}
Maybe order() function can help you
n <- 10
dat <- data.frame(col1 = rep(1:n, 2), col2 = rnorm(2*n))
head(dat)
dat[order(dat$col1), ]
I did not get where the "ID" (1,2,3...n) is to be found; so let's assume you have your SpatialPolygonsDataFrame called shpFarmsum with a attribute data column "ID". You can access this column via shpFarmsum$ID. Therefore, if you want to create individual subsets for each ID this is one way to go:
for (i in unique(shpFarmsum$ID)) {
tempSubset shpFarmsum[shpFarmsum$ID == i,]
writeOGR(tempSubset, ".", paste0("subset_", i), driver = "ESRI Shapefile")
}
I added the line writeOGR(... so all subsets are written to your working direktory. However, you can change this line or add further analysis into the for-loop.
How it works
unique(shpFarmsum$ID) extracts all occuring IDs (compareable to your 1,2,3...n).
In each repetition of the for loop, another value of this IDs will be used to create a subset of the whole SpatialPolygonsDataFrame, which you can use for further analysis.
I am looking for the most convenient way of creating boxplots for different values and groups read from a CSV file in R.
First, I read my Sheet into memory:
Sheet <- read.csv("D:/mydata/Table.csv", sep = ";")
Which just works fine.
names(Sheet)
gives me correctly the Headlines of the different columns.
I can also access and filter different groups into separate lists, like
myData1 <- Sheet[Sheet$Group == 'Group1',]$MyValue
myData2 <- Sheet[Sheet$Group == 'Group2',]$MyValue
...
and draw a boxplot using
boxplot(myData1, myData2, ..., main = "Distribution")
where the ... stand for more lists I have filled using the selection method above.
However, I have seen that using some formular could do these steps of selection and boxplotting in one go. But when I use something like
boxplot(Sheet~Group, Sheet)
it won't work because I get the following error:
invalid type (list) for variable 'Sheet'
The data in the CSV looks like this:
No;Gender;Type;Volume;Survival
1;m;HCM;150;45
2;m;UCM;202;103
3;f;HCM;192;5
4;m;T4;204;101
...
So i have multiple possible groups and different values which I'd like to represent as a box plot for each group. For example, I could group by gender or group by type.
How can I easily draw multiple boxes from my CSV data without having to grab them all manually out of the data?
Thanks for your help.
Try it like this:
Sheet <- data.frame(Group = gl(2, 50, labels=c("Group1", "Group2")),
MyValue = runif(100))
boxplot(MyValue ~ Group, data=Sheet)
Using ggplot2:
ggplot(Sheet, aes(x = Group, y = MyValue)) +
geom_boxplot()
The advantage of using ggplot2 is that you have lots of possibilities for customizing the appearance of your boxplot.