I have a comma separated file named foo.csv containing the following data:
scale, serial, spawn, for, worker
5, 0.000178, 0.000288, 0.000292, 0.000300
10, 0.156986, 0.297926, 0.064509, 0.066297
12, 2.658998, 6.059502, 0.912733, 0.923606
15, 188.023411, 719.463264, 164.111459, 161.687982
I essentially have two questions:
1) How do I plot the first column (x-axis) versus the second column (y-axis)? I'm trying this (from reading this site):
data <- read.table("foo.csv", header=T,sep=",")
attach(data)
scale <- data[1]
serial <- data[2]
plot(scale,serial)
But I get this error back:
Error in stripchart.default(x1, ...) : invalid plotting method
Any idea what I'm doing wrong? A quick Google search reveals someone else with the same problem but no relevant answer. UPDATE: It turns out it works fine if I skip the two assignment statements in the middle. Any idea why this is?
The second question follows pretty easily after the first:
2) How do I plot the first column (x-axis) versus all the other columns on the y-axis? I presume it's pretty easy once I get around the first problem I'm running into, but am just a bit new to R so I'm still wrapping my head around it.
You don't need the two lines:
scale <- data[1]
serial <- data[2]
as scale and serial are already set from the headers in the read.table.
Also scale <- data[1] creates an element from a data.frame
data[1]
1 5
2 10
3 12
4 15
whereas scale from the read.table is a vector
5 10 12 15
and the plot(scale, serial) function expects vector rather than a data.frame, so you just need to do
plot(scale, serial)
One approach to plotting the other columns of data on the y-axis:
plot(scale,serial, ylab="")
par(new=TRUE)
plot(scale,spawn,axes=F, ylab="", type="b")
par(new=TRUE)
plot(scale,for., axes=F, ylab="", type="b")
par(new=TRUE)
plot(scale,worker,axes=F, ylab="", type="b")
There are probably better ways of doing this, but that is beyond my current R knowledge....
In your example,
plot(scale, serial)
won't work because scale and serial are both data frames, e.g.
class(scale)
[1] "data.frame"
You could try the following and use points(), once the plot has been generated, to plot the remaining columns. Note, I used the ylim parameter in plot to accommodate the range in the third column.
data <- read.csv('foo.csv', header=T)
plot(data$scale, data$serial, ylim=c(0,750))
points(data$scale, data$spawn, col='red')
points(data$scale, data$for., col='green')
points(data$scale, data$worker, col='blue')
I'm new in R, but if you want to draw scale vs. all other columns in one plot, easy and with some elegance :) for printing or presentation, you may use Prof. Hadley Wickham's packages ggplot2 & reshape.
Installation:
install.packages(“ggplot2”,dep=T)
install.packages(“reshape”,dep=T)
Drawing your example:
library(ggplot2)
library(reshape)
#read data
data = read.table("foo.csv", header=T,sep=",")
#melt data “scale vs. all”
data2=melt(data,id=c("scale"))
data2
scale variable value
1 5 serial 0.000178
2 10 serial 0.156986
3 12 serial 2.658998
4 15 serial 188.023411
5 5 spawn 0.000288
6 10 spawn 0.297926
7 12 spawn 6.059502
8 15 spawn 719.463264
9 5 for. 0.000292
10 10 for. 0.064509
11 12 for. 0.912733
12 15 for. 164.111459
13 5 worker 0.000300
14 10 worker 0.066297
15 12 worker 0.923606
16 15 worker 161.687982
#draw all variables at once as line with different linetypes
qplot(scale,value,data=data2,geom="line",linetype=variable)
You could also use points (geom=”points”), choose different colours or shapes for different variables dots (colours=variable or shape=variable), adjust axis, set individual options for every line etc.
Link to online documentation.
I am far from being an R expert, but I think you need a data.frame:
plot(data.frame(data[1],data[2]))
It does at least plot something on my R setup!
Following advice in luapyad's answer, I came up with this. I renamed the header "scale":
scaling, serial, spawn, for, worker
5, 0.000178, 0.000288, 0.000292, 0.000300
10, 0.156986, 0.297926, 0.064509, 0.066297
12, 2.658998, 6.059502, 0.912733, 0.923606
15, 188.023411, 719.463264, 164.111459, 161.687982
then:
foo <- read.table("foo.csv", header=T,sep=",")
attach(foo)
plot( scaling, serial );
Try this:
data <- read.csv('foo.csv')
plot(serial ~ scale, data)
dev.new()
plot(spawn ~ scale, data)
dev.new()
plot(for. ~ scale, data)
dev.new()
plot(worker ~ scale, data)
There is a simple-r way of plotting it:
https://code.google.com/p/simple-r/
Using that script, you just have to type:
r -cdps, -k1:2 foo.csv
To get the plot you want. Put it in the verbose mode (-v) to see the corresponding R script.
data <- read.table(...)
plot(data$scale,data$serial)
Related
I was experimenting with the waffle package in r, and was trying to use a for loop to make multiple plots at once but was not able to get my code to work. I have a dataset with values for each year of renewables,and since it is over 40 years of data, was looking for a simple way to plot these with a for loop rather than manyally year by year. What am I doing wrong?
I have it from 1:16 as an experiment to see if it would work, although in reality I would do it for all the years in my dataset.
for(i in 1:16){
renperc<-islren$Value[i]
parts <- c(`Renewable`=(renperc), `Non-Renewable`=100-renperc)
waffle(parts, rows=10, size=1, colors=c("#00CC00", "#A9A9A9"),
title="Iceland Primary Energy Supply",
xlab=islren$TIME)
}
If I get your question correctly you want to plot all the 16 iterations in a same panel? You can parametrise your plot window to be divided into 16 smaller plots using par(mfrow = c(4,4)) (creating a 4 by 4 matrix and plotting into each cells recursively).
## Setting the graphical parameters
par(mfrow = c(4,4))
## Running the loop normally
for(i in 1:16){
renperc<-islren$Value[i]
parts <- c(`Renewable`=(renperc), `Non-Renewable`=100-renperc)
waffle(parts, rows=10, size=1, colors=c("#00CC00", "#A9A9A9"),
title="Iceland Primary Energy Supply",
xlab=islren$TIME)
}
If you need more plots (e.g. 40) you can increase the numbers in the graphical parameters (e.g. par(mfrow = c(6,7))) but that will create really tiny plots. One solution is to do it in multiple loops (for(i in 1:16); for(i in 17:32); etc.)
UPDATE: The code simply wasn't plotting anything when i tried putting in anything above one value (ex. 1:16) or a letter, both in terms of separate plots or many in one plot window (which I think perhaps waffle does not support in the same way as regular plots). In the end, I managed by making it into a function, although I'm still not sure why my original method wouldn't work if this did. See the code that worked below. I also tweaked it a bit, adding ggsave for example.
#function
waffling <- function(x){
renperc<-islren$Value[x]
parts <- c(`Renewable`=(renperc), `Non-Renewable`=100-renperc)
waffle(parts, rows=10, size=1, colors=c("#00CC00", "#A9A9A9"), title="",
xlab=islren$TIME[x])
ggsave(file=paste0("plot_", x,".png"))}
for(i in 1:57){
waffling(i)
}
I have this kind of dataset
Defect.found Treatment Program
1 Testing Counter
1 Testing Correlation
0 Inspection Counter
3 Testing Correlation
2 Inspection Counter
I would like to create two boxplotes, one boxplot of detected defects per program and one boxplot of detected defects per technique but in one graph.
Meaning having:
boxplot(exp$Defect.found ~ exp$Treatment)
boxplot(exp$Defect.found ~ exp$Program)
In a joined graph.
Searching on Stackoverflow I was able to create it but with lattice library typing:
bwplot(exp$Treatment + exp$Program ~ exp$Defects.detected)
but i would like to know if its possible to create the graph without additional libraries like ggplot and lattice
Prepare the plot window to receive two plots in one row and two columns (default is obviously one row and one column):
par(mfrow = c(1, 2))
My suggestion is to avoid using the word exp, because it is already used for the exponential function. Use for instance mydata.
Defects found against treatment (frame = F suppresses the external box):
with(mydata, plot(Defect.found ~ Treatment, frame = F))
Defects found against program (ylab = NA suppresses the y label because it is already shown in the previous plot):
with(mydata, plot(Defect.found ~ Program, frame = F, ylab = NA))
I want to create a simple scatterplot using a table with 2 variables.
The Table looks like this:
> freqs
Var1 Freq
1 1 200
2 2 50
3 3 20
I got it using freqs <- as.data.frame(table(data$V2)) to compute the frequency of numbers in another table.
What I do right now is:
plot(freqs, log="xy", main="Frequency of frequencies",
xlab="frequencies", ylab="frequency of frequencies")
The problem is, that I get a plot with lines, not dots, and I don't know why.
For another list plot() behaved differently and used dots.
It looks like this:
I know that plot depends on the datatype that it gets.
So is the problem in the way I generate freqs?
Edit:
Here is the data as requested: link
Steps were:
data <- read.csv(file="out-kant.txt",head=FALSE,sep="\t")
freqs <- as.data.frame(table(data$V2))
plot(freqs,log="xy",main="Frequency of frequencies", xlab="frequencies", ylab="frequency of frequencies")
It seems like the type of one of your variables is not set as an integer. You get a scatterplot when both x and y are integers. For example, when you run this code you will get a scatterplot, because it automatically sets both variables as integers:
freqs <- read.table(header=TRUE, text='Var1 freq
1 200
2 50
3 20')
plot(freqs, log="xy", main="Frequency of frequencies", xlab="frequencies", ylab="frequency of frequencies")
Check what type your variables are with:
typeof(freqs$freq)
typeof(freqs$Var1)
Then, if it is not an integer, fix it with:
freqs$freq <- as.integer(freqs$freq)
freqs$Var1 <- as.integer(freqs$Var1)
EDIT: So I managed to reproduce your issue when I ran:
freqs$Var1 <- as.factor(freqs$Var1)
plot(freqs, log="xy", main="Frequency of frequencies", xlab="frequencies", ylab="frequency of frequencies")
Perhaps your Var1 variable is specified as a factor. Try running:
freqs$Var1 <- as.numeric(freqs$Var1)
EDIT2: Used the above code to make freqs$Var1 numeric on the data provided on the main question's edit, which fixed the issue.
I really need your R skills here. Been working with this plot for several days now. I'm a R newbie, so that might explain it.
I have sequence coverage data for chromosomes (basically a value for each position along the length of every chromosome, making the length of the vectors many millions). I want to make a nice coverage plot of my reads. This is what I got so far:
Looks alright, but I'm missing y-labels so I can tell which chromosome it is, and also I've been having trouble modifying the x-axis, so it ends where the coverage ends. Additionally, my own data is much much bigger, making this plot in particular take extremely long time. Which is why I tried this HilbertVis plotLongVector. It works but I can't figure out how to modify it, the x-axis, the labels, how to make the y-axis logged, and the vectors all get the same length on the plot even though they are not equally long.
source("http://bioconductor.org/biocLite.R")
biocLite("HilbertVis")
library(HilbertVis)
chr1 <- abs(makeRandomTestData(len=1.3e+07))
chr2 <- abs(makeRandomTestData(len=1e+07))
par(mfcol=c(8, 1), mar=c(1, 1, 1, 1), ylog=T)
# 1st way of trying with some code I found on stackoverflow
# Chr1
plotCoverage <- function(chr1, start, end) { # Defines coverage plotting function.
plot.new()
plot.window(c(start, length(chr1)), c(0, 10))
axis(1, labels=F)
axis(4)
lines(start:end, log(chr1[start:end]), type="l")
}
plotCoverage(chr1, start=1, end=length(chr1)) # Plots coverage result.
# Chr2
plotCoverage <- function(chr2, start, end) { # Defines coverage plotting function.
plot.new()
plot.window(c(start, length(chr1)), c(0, 10))
axis(1, labels=F)
axis(4)
lines(start:end, log(chr2[start:end]), type="l")
}
plotCoverage(chr2, start=1, end=length(chr2)) # Plots coverage result.
# 2nd way of trying with plotLongVector
plotLongVector(chr1, bty="n", ylab="Chr1") # ylab doesn't work
plotLongVector(chr2, bty="n")
Then I have another vector called genes that are of special interest. They are about the same length as the chromosome-vectors but in my data they contain more zeroes than values.
genes_chr1 <- abs(makeRandomTestData(len=1.3e+07))
genes_chr2 <- abs(makeRandomTestData(len=1e+07))
These gene vectors I would like plotted as a red dot under the chromosomes! Basically, if the vector has a value there (>0), it is presented as a dot (or line) under the long vector plot. This I have not idea how to add! But it seems fairly straightforward.
Please help me! Thank you so much.
DISCLAIMER: Please do not simply copy and paste this code to run off the entire positions of your chromosome. Please sample positions (for example, as #Gx1sptDTDa shows) and plot those. Otherwise you'd probably get a huge black filled rectangle after many many hours, if your computer survives the drain.
Using ggplot2, this is really easily achieved using geom_area. Here, I've generated some random data for three chromosomes with 300 positions, just to show an example. You can build up on this, I hope.
# construct a test data with 3 chromosomes and 100 positions
# and random coverage between 0 and 500
set.seed(45)
chr <- rep(paste0("chr", 1:3), each=100)
pos <- rep(1:100, 3)
cov <- sample(0:500, 300)
df <- data.frame(chr, pos, cov)
require(ggplot2)
p <- ggplot(data = df, aes(x=pos, y=cov)) + geom_area(aes(fill=chr))
p + facet_wrap(~ chr, ncol=1)
You could use the ggplot2 package.
I'm not sure what exactly you want, but here's what I did:
This has 7000 random data points (about double the amount of genes on Chromosome 1 in reality). I used alpha to show dense areas (not many here, as it's random data).
library(ggplot2)
Chr1_cov <- sample(1.3e+07,7000)
Chr1 <- data.frame(Cov=Chr1_cov,fil=1)
pl <- qplot(Cov,fil,data=Chr1,geom="pointrange",ymin=0,ymax=1.1,xlab="Chromosome 1",ylab="-",alpha=I(1/50))
print(pl)
And that's it. This ran in less than a second. ggplot2 has a humongous amount of settings, so just try some out. Use facets to create multiple graphs.
The code beneath is for a sort of moving average, and then plotting the output of that. It is not a real moving average, as a real moving average would have (almost) the same amount of data points as the original - it will only make the data smoother. This code, however, takes an average for every n points. It will of course run quite a bit faster, but you will loose a lot of detailed information.
VeryLongVector <- sample(500,1e+07,replace=TRUE)
movAv <- function(vector,n){
chops <- as.integer(length(vector)/n)
count <- 0
pos <- 0
Cov <-0
pos[1:chops] <- 0
Cov[1:chops] <- 0
for(c in 1:chops){
tmpcount <- count + n
tmppos <- median(count:tmpcount)
tmpCov <- mean(vector[count:tmpcount])
pos[c] <- tmppos
Cov[c] <- tmpCov
count <- count + n
}
result <- data.frame(pos=pos,cov=Cov)
return(result)
}
Chr1 <- movAv(VeryLongVector,10000)
qplot(pos,cov,data=Chr1,geom="line")
I want to create intervals (discretize/bin) of continuous variables to plot a choropleth map using ggplot. After reading various threads, I decided to use cut and quantile to eliminate the problems of: a) manually creating bins, and b) taking care of dominant states (otherwise, I had to manually to create bins and see the map and readjust the bins).
However, I am facing another problem now. Intervals coming out of cut are hardly pretty. So, I am trying to follow this example and this example to come up with my pretty labels.
Here is my list:
x <- seq(1,50)
Rounded quantiles:
qs_x <- round(quantile(x, probs=c(seq(0,0.8,by=0.2),0.9)))
which results:
0% 20% 40% 60% 80% 90%
1 11 21 30 40 45
Using these cuts, I want to come up with these labels:
1-11, 12-21, 22-30, 31-40, 41-45, 45+
I am sure there is an easy solution to convert a list using some apply function, but I am not well-versed with those functions.
Help appreciated.
A 3-liner produces the output you want, without using apply.
labels <- paste(qs_x+1, qs_x[-1], sep="-")
labels[1] <- paste(qs_x[1], qs_x[2], sep="-")
labels[length(labels)] <- paste(tail(qs_x, 1), "+", sep = "")
The first line constructs labels of the form (x1 + 1) - x2, the second line fixes the first label, and the third line fixes the last label. Here is the output
> labels
[1] "1-11" "12-21" "22-30" "31-40" "41-45" "45+"