I am making heat maps from correlations. I have two columns that represent ID's and a third column that gives the correlation between those two datapoints. I am struggling to get qplot to keep the order of my data in the file. Link to data:
https://www.dropbox.com/s/3l9p1od5vjt0p4d/SNPS.txt?n=7399684
Here is the code I am using to make the plot:
test <- qplot(x=x, y=y, data=PCIT, fill = col1, geom = "tile")
I have tried several order options but they don't seem to do the trick? Ideas?
Thanks and Happy Holidays
You need to set the levels of the factors x and y to be in the order you want them (as they come in from the file). Try
PCIT$x <- factor(PCIT$x, levels=unique(as.character(PCIT$x)))
and similarly with y.
Related
My data frame has 621 rows and each column describes something about it. I'm trying to do a exploratory data analysis where I plot out all the data into a bar plot.
I have a factor column called phenotype, which has 86 levels which describe the main condition in my cohort. I want to plot this out as 86 separate bar plots, each with the total number of people who have that condition on ggplot.
I've attached a screenshot of my data below, I basically want the x axis to have the condition name like the 'Bardet-Biedl Syndrome', 'Classic Ehlers Danlos Syndrome' etc and on the y axis the number of people who have that condition, such as 3,4,5 as displayed below etc. I got the below data by basically doing
table(data.frame$Phenotype)
I'm using the below code to generate my ggplot
ggplot (tiering, aes(x = Phenotype, y = count(tiering$Phenotype))) +
theme bw() +
geom bar(stat = "identity")
I'm sure the answer is out there, but I've looked on the R help websites and I can't seem to figure this out, so would be very grateful for the help.
EDIT: I got to a marplot with the help of the below code, just trying to reorder the bar/columns in decreasing order and tried this method but it hasn't worked. Would anyone have any suggestions?
I have a stacked barchart that looks like this.
If I have a second dataframe that has the same layout as the one that created the plot, and I want to group both datasets by position while still keeping the stacked percentages, how would I go about this. I'm not sure how to do it in ggplot2
Hard to say without seeing the data and without more information about what you actually want to achieve, but the general approach I would use is to say combine your dataframes - especially if the variables are the same. You just want to make sure to maintain "where" each dataset originated, and that will be your identifying column.
So, if your data is in myData1 and myData2:
# add identifying columns
myData1$id <- 'dataset1'
myData2$id <- 'dataset2'
# put them together
newData <- rbind(myData1, myData2)
You are not clear on what you're looking for in the combined plot, so you can go about that any number of ways (depending on what you want to do). Maybe the simplest example would be to use facet_grid() or facet_wrap() from ggplot2 to show them in side-by-side plots:
ggplot(newData, aes(x=name, y=value)) +
geom_col(aes(fill=gene)) +
facet_wrap(~id)
I have a dataframe that I want to reorder to make a ggplot so I can easily see which items have the highest and lowest values in them. In my case, I've grouped the data into two groups, and it'd be nice to have a visual representation of which group tends to score higher. Based on this question I came up with:
library(ggplot2)
cor.data<- read.csv("https://dl.dropbox.com/s/p4uy6uf1vhe8yzs/cor.data.csv?dl=0",stringsAsFactors = F)
cor.data.sorted = cor.data[with(cor.data,order(r.val,pic)),] #<-- line that doesn't seem to be working
ggplot(cor.data.sorted,aes(x=pic,y=r.val,size=df.val,color=exp)) + geom_point()
which produces this:
I've tried quite a few variants to reorder the data, and I feel like this should be pretty simple to achieve. To clarify, if I had succesfully reorganised the data then the y-values would go up as the plot moves along the x-value. So maybe i'm focussing on the wrong part of the code to achieve this in a ggplot figure?
You could do something like this?
library(tidyverse);
cor.data %>%
mutate(pic = factor(pic, levels = as.character(pic)[order(r.val)])) %>%
ggplot(aes(x = pic, y = r.val, size = df.val, color = exp)) + geom_point()
This obviously still needs some polishing to deal with the x axis label clutter etc.
Rather than try to order the data before creating the plot, I can reorder the data at the time of writing the plot:
cor.data<- read.csv("https://dl.dropbox.com/s/p4uy6uf1vhe8yzs/cor.data.csv?dl=0",stringsAsFactors = F)
cor.data.sorted = cor.data[with(cor.data,order(r.val,pic)),] #<-- This line controls order points drawn created to make (slightly) more readible plot
gplot(cor.data.sorted,aes(x=reorder(pic,r.val),y=r.val,size=df.val,color=exp)) + geom_point()
to create
I have this simple code, trying to plot the figure. My intention was to plot the x axis ordered as what I made, i.e. the same as order_num: from 1:10 and then 10+. However, ggplot changed my order. How could I keep the original order I put them in the data frame.
data_order=data.frame(order_num=as.factor(c(rep(1:10),"10+")),
ratio=c(0.18223,0.1561,0.14177,0.1163,0.09646,
0.07518,0.05699,0.04,0.0345,0.02668,0.006725))
ggplot(data_order,aes(x=order_num,y=ratio))+geom_bar(stat = 'identity')
Reading the data: (Notice the removal of as.factor, we will be doing it in the next step. This is not mandatory!)
data_order=data.frame(order_num=c(rep(1:10),"10+"),
ratio=c(0.18223,0.1561,0.14177,0.1163,0.09646,
0.07518,0.05699,0.04,0.0345,0.02668,0.006725))
You need to work with the dataframe instead of the ggplot.
data_order$order_num <- factor(data_order$order_num, levels = data_order$order_num)
Once you change the levels, it will be as expected.
ggplot(data_order,aes(x=order_num,y=ratio))+geom_bar(stat = 'identity')
This is my first post, so go easy. Up until now (the past ~5 years?) I've been able to either tweak my R code the right way or find an answer on this or various other sites. Trust me when I say that I've looked for an answer!
I have a working script to create the attached boxplot in basic R.
http://i.stack.imgur.com/NaATo.jpg
This is fine, but I really just want to "jazz" it up in ggplot, for vain reasons.
I've looked at the following questions and they are close, but not complete:
Why does a boxplot in ggplot requires axis x and y?
How do you draw a boxplot without specifying x axis?
My data is basically like "mtcars" if all the numerical variables were on the same scale.
All I want to do is plot each variable on the same boxplot, like the basic R boxplot I made above. My y axis is the same continuous scale (0 to 1) for each box and the x axis simply labels each month plus a yearly average (think all the mtcars values the same on the y axis and the x axis is each vehicle model). Each box of my data represents 75 observations (kind of like if mtcars had 75 different vehicle models), again all the boxes are on the same scale.
What am I missing?
Though I don't think mtcars makes a great example for this, here it is:
First, we make the data (hopefully) more similar to yours by using a column instead of rownames.
mt = mtcars
mt$car = row.names(mtcars)
Then we reshape to long format:
mt_long = reshape2::melt(mt, id.vars = "car")
Then the plot is easy:
library(ggplot2)
ggplot(mt_long, aes(x = variable, y = value)) +
geom_boxplot()
Using ggplot all but requires data in "long" format rather than "wide" format. If you want something to be mapped to a graphical dimension (x-axis, y-axis, color, shape, etc.), then it should be a column in your data. Luckily, it's usually quite easy to get data in the right format with reshape2::melt or tidyr::gather. I'd recommend reading the Tidy Data paper for more on this topic.