Plotting different contour plots with similar scales in R or gnuplot - r

I am new to R for plotting, and I wish to do contour plots for several files. and here is what I have got so far. My file has 3 columns, X,Y,Z, and with some nan values. Since lattice does not allow Inf/NaN values, I had to remove them prior, and do some interpolation.
data <- read.table("file", sep=",", header=T)
mydata <- na.omit(data)
library(akima)
library(lattice)
s = interp(mydata$X, mydata$Y, mydata$Z)
filled.contour(s, xlim= c(5,25), ylim=c(40,180))
This does gives some results, but there are things I am not able to do:
To get contour lines on the graph.
Also there are like 3 files with different z ranges, say one from (0-18), (0-20), (0-25). I wish to adjust and rescale them to provide similar color scale on graph, for instance, the '15' value should be similar color on all three.
I am more familiar with gnuplot, but there also the problem is with the ranges, as the range always autoscale to color, and it seems difficult to control the range. Any help with that is also deeply appreciated.
I may be doing something wrong, so in case anybody could help me out, and provide to right direction, or right software, I will be grateful.

There are demos here for how to make contours in gnuplot. Are you having trouble in the sense that you have code to make a contour plot but it does not work?
To answer your second question, in gnuplot the command you probably want is
set cbrange [CB_MIN:CB_MAX]
This sets the range of values which will be colored according to the current palette. You would just have to issue the same set cbrange command for all three plots you are making. If you want to automatically set the cbrange to the min/max on all files, you can use the stats command (in version 4.6 or newer, otherwise it is more tricky):
stats 'datafile1' using 3 name 'd1'
stats 'datafile2' using 3 name 'd2'
stats 'datafile3' using 3 name 'd3'
datamin_z = (d1_min<d2_min&&d1_min<d3_min?d1_min:d2_min<d3_min?d2_min:d3_min)
datamax_z = (d1_max>d2_max&&d1_max>d3_max?d1_max:d2_max>d3_max?d2_max:d3_max)
set cbrange [datamin_z:datamax_z]

Related

Why are all relevant tick-marks not plotted on the X-axis?

I often determine that when plotting in R not all relevant tick-marks are drawn. Relevant here means that there is data present.
See this example
> set.seed(NULL)
> d <- data.frame(a=sample(1:10, replace=TRUE), b=sample(11:30))
> plot(d)
The resulting plot where you can see values on the X-axis at 3, 5, 7 and 9. But the tick-marks for them are missing.
The focus of my question is to understand why R acts like that. What is the algorithm and logic behind it?
btw: I know how to solve it. I can draw the X-axis myself. But that is not part of the question.
You could find a brief description of the algorithm for plotting the tick marks using?axis.
plot() is a generic function to plot a wide sort of data. In your example, you are using discrete data. For continuous data, it does not make much sense to have a single tick mark for every single value, which would make unreadable the axes.
However, you can easily adjust the ticks in your plot using axis()

Creating new color palette for plotting raster

I have a raster image that I try to plot using rasterVis package in BuRdTheme. But I find the colors to be dull.
However, what I need is something like following. How can I create this desired pallete in raster or rasterVis or if any other package that would be appropriate?
I tried creating new palette with rasterTheme
rainbTheme5 <- rasterTheme(region = rev(rainbow(n = 5)))
levelplot(r,par.settings=rainbTheme5)
What I get is
The histogram of the image looks like
You've got some pixels down at about 305 (which I guess is blue) so most of your pixels are over 320 and so all in the brown/red end of your palette.
What you might want is a palette that changes over the peak in your data. You can do this by creating a palette that starts with a bunch of colours for the low values. For example:
> rep(rev(rainbow(n=5)),c(5,1,1,1,1))
[1] "#CC00FFFF" "#CC00FFFF" "#CC00FFFF" "#CC00FFFF" "#CC00FFFF" "#0066FFFF"
[7] "#00FF66FF" "#CCFF00FF" "#FF0000FF"
That has 9 colours, and the first five are identical. In this case, 5/9 of the range of the data would all be that colour, and the highest 4/9 of the dataset would be coloured by the full range of the data. Since your peak spans about that amount, you'll get the effect you are after. Adjust the palette to suit.
Note that you should always ask what question any graphic is trying to answer, and adjust colours/scales etc to help answer that question, rather than just trying to make it not look "dull", unless this is for the wall of a gallery.
Finally I could find what I was looking for:
myTheme <- rasterTheme(region = rep(rev(rev(colorRamps::matlab.like(n=12))),c(1,1,1,1,1,1,1,1,1,1,1,1)))
Using levelplot and using levelplot with customised theme

R: how to make multiple plots from one CSV, grouping by a column

I'd like to put multiple plots onto a single visual output in R, based on data that I have in a CSV that looks something like this:
user,size,time
fred,123,0.915022
fred,321,0.938769
fred,1285,1.185608
wilma,5146,2.196687
fred,7506,1.181990
barney,5146,1.860287
wilma,1172,1.158015
barney,5146,1.219313
wilma,13185,1.455904
wilma,8754,1.381372
wilma,878,1.216908
barney,2974,1.223852
I can read this just fine, using, e.g.:
data = read.csv('data.csv')
For the moment, a fairly simple plot is fine, so I'm just trying plot(), without much to it (setting type='o' to get lines and points), and' from solving a past problem, I know that I can do, e.g., the following, to get data for just fred:
plot(data$time[which(data$user == 'fred')], data$size[which(data$user == 'fred')], type='o')
What I'd like, though, is to have the data for each user all showing up on one set of axes, with color coding (and a legend to match users to colors) to identify different user data.
And if another user shows up, I'd like another line to show up, with another color (perhaps recycling if I have too many users at once).
However, just this doesn't do it:
plot(data$size, data$time, type='o',col=c("red", "blue", "green"))
Because it doesn't seem to group by the user.
And just this:
plot(data, type='o')
gives me an error:
Error in plot.default(...) :
formal argument "type" matched by multiple actual arguments
This:
plot(data)
does do something, but not what I want.
I've poked around, but I'm new enough to R that I'm not quite sure how best to search for this, nor where to look for examples that would hit a use-case like this.
I even got somewhat closer with this:
plot(data$size[which(data$user == 'wilma')], data$time[which(data$user == 'wilma')], type='o', col=c('red'))
lines(data$size[which(data$user == 'fred')], data$time[which(data$user == 'fred')], type='o', col=c('green'))
lines(data$size[which(data$user == 'barney')], data$time[which(data$user == 'barney')], type='o', col=c('blue'))
This gives me a plot (which I'd post inline, but as a new user, I'm not allowed to yet):
not-quite-right plot
which is kind of close to what I want, except that it:
doesn't have a legend
has ugly axis labels, instead of just time and size
is scaled to the first plot, and thus is missing data from some of the others
isn't sorted by x-axis, which I could do externally, though I'm guessing I could do it fairly easily in R.
So, the question, ultimately, is this:
What's an easy way to plot data like this which:
has multiple lines based on the labels in the first column of the CSV
uses the same set of axes for the data in columns 2 and 3, regardless of the label
has a legend and color-coding for which label is being used for a particular line (or set of points)
will adapt to adding new labels to the data file, hopefully without change to the R code.
Thanks in advance for any help or pointers on this.
P.S. I looked around for similar questions, and found one that's sort of close, but it's not quite the same, and I failed to figure out how to adapt it to what I'm trying to do.
Good question. This is doable in base plot, but it's even easier and more intuitive using ggplot2. Below is an example of how to do this with random data in ggplot2
First download and install the package
install.packages("ggplot2",repos='http://cran.us.r-project.org')
require(ggplot2)
Next generate the data
a <- c(rep('a',3),rep('b',3),rep('c',3))
b <- rnorm(9,50,30)
c <- rep(seq(1,3),3)
dat <- data.frame(a,b,c)
Finally, make the plot
ggplot(data=dat, aes(x=c, y=b , group=a, colour=a)) + geom_line() + geom_point()
Basically, you are telling ggplot that your x axis corresponds to the c column (dat$c), your y axis corresponds to the b column (y$b) and to group (draw separate lines) by the a column (dat$a). Colour specifies that you want to group colour by the a column as well.
The resulting graph looks like this:

Uniform plot points in R -- Research / HW

This is for research I am doing for my Masters Program in Public Health
I am graphing data against each other, a standard x,y type deal, over top of that I am plotting a predicted line. I get what I think to be the most funky looking point/boxplot looking thing ever with an x axis that is half filled out and I don't understand why as I do not call a boxplot function. When I call the plot function it is my understanding that only the points will plot.
The data I am plotting looks like this
TOTAL.LACE | DAYS.TO.FAILURE
9 | 15
16 | 7
... | ...
The range of the TOTAL.LACE is from 0 to 19 and DAYS.TO.FAILURE is 0 - 30
My code is as follows, maybe it is something before the plot but I don't think it is:
# To control the type of symbol we use we will use psymbol, it takes
# value 1 and 2
psymbol <- unique(FAILURE + 1)
# Build a test frame that will predict values of the lace score due to
# a patient being in a state of failure
test <- survreg(Surv(time = DAYS.TO.FAILURE, event = FAILURE) ~ TOTAL.LACE,
dist = "logistic")
pred <- predict(test, type="response") <-- produces numbers from about 14 to 23
summary(pred)
ord <- order(TOTAL.LACE)
tl_ord <- TOTAL.LACE[ord]
pred_ord <- pred[ord]
plot(TOTAL.LACE, DAYS.TO.FAILURE, pch=unique(psymbol)) <-- Produces goofy graph
lines(tl_ord, pred_ord) <-- this produces the line not boxplots
Here is the resulting picture
Not to sure how to proceed from here, this is an off shoot of another problem I had with the same data set at this link here I am not understanding why boxplots are being drawn, the reason being is I did not specifically call the boxplot() command so I don't know why they appeared along with point plots. When I issue the following command: plot(DAYS.TO.FAILURE, TOTAL.LACE) I only get points on the resulting plot like I expected, but when I change the order of what is plotted on x and y the boxplots show up, which to me is unexpected.
Here is a link to sample data that will hopefully help in reproducing the problem as pointed out by #Dwin et all Some Sample Data
Thank you,
Since you don't have a reproducible example, it is a little hard to provide an answer that deals with your situation. Here I generate some vaguely similar-looking data:
set.seed(4)
TOTAL.LACE <- rep(1:19, each=1000)
zero.prob <- rbinom(19000, size=1, prob=.01)
DAYS.TO.FAILURE <- rpois(19000, lambda=15)
DAYS.TO.FAILURE <- ifelse(zero.prob==1, DAYS.TO.FAILURE, 0)
And here is the plot:
First, the problem with some of the categories not being printed on the x-axis is because they don't fit. When you have so many categories, to make them all fit you have to display them in a smaller font. The code to do this is to use cex.axis and set the value <1 (you can read more about this here):
boxplot(DAYS.TO.FAILURE~TOTAL.LACE, cex.axis=.8)
As to the question of why your plot is "goofy" or "funky-looking", it is a bit hard to say, because those terms are rather nebulous. My guess is that you need to more clearly understand how boxplots work, and then understand what these plots are telling you about the distribution of your data. In a boxplot, the midline of the box is the 50th percentile of your data, while the bottom and top of the box are the 25th and 75th percentiles. Typically, the 'whiskers' will extend out to the furthest datapoint that is at most 1.5 times the inter-quartile range beyond the ends of the box. In your case, for the first 9 TOTAL.LACEs, more than 75% of your data are 0's, so there is no box and thus no whiskers are possible. Everything beyond the whisker limits is plotted as an individual point. I don't think your plots are "funky" (although I'll admit I have no idea what you mean by that), I think your data may be "funky" and your boxplots are representing the distributions of your data accurately according to the rules by which boxplots are constructed.
In the future (and I mean this politely), it will help you get more useful and faster answers if you can write questions that are more clearly specified, and contain a reproducible example.
Update: Thanks for providing more information. I gather by "funky" you mean that it is a boxplot, rather than a typical scatterplot. The thing to realize is that plot() is a generic function that will call different methods depending on what you pass to it. If you pass simple continuous data, it will produce a scatterplot, but if you pass continuous data and a factor, then it will produce a boxplot, even if you don't call boxplot explicitly. Consider:
plot(TOTAL.LACE, DAYS.TO.FAILURE)
plot(as.factor(TOTAL.LACE), DAYS.TO.FAILURE)
Evidently, you have converted DAYS.TO.FAILURE to a factor without meaning to. Presumably this was done in the pch=unique(psymbol) argument via the code psymbol <- unique(FAILURE + 1) above. Although I haven't had time to try this, I suspect eliminating that line of code and using pch=(FAILURE + 1) will accomplish your goals.

Clustering and heatmap in R

I am a newbie to R and I am trying to do some clustering on a data table where rows represent individual objects and columns represent the features that have been measured for these objects. I've worked through some clustering tutorials and I do get some output, however, the heatmap that I get after clustering does not correspond at all to the heatmap produced from the same data table with another programme. While the heatmap of that programme does indicate clear differences in marker expression between the objects, my heatmap doesn't show much differences and I cannot recognize any clustering (i.e., colour) pattern on the heatmap, it just seems to be a randomly jumbled set of colours that are close to each other (no big contrast). Here is an example of the code I am using, maybe someone has an idea on what I might be doing wrong.
mydata <- read.table("mydata.csv")
datamat <- as.matrix(mydata)
datalog <- log(datamat)
I am using log values for the clustering because I know that the other programme does so, too
library(gplots)
hr <- hclust(as.dist(1-cor(t(datalog), method="pearson")), method="complete")
mycl <- cutree(hr, k=7)
mycol <- sample(rainbow(256)); mycol <- mycol[as.vector(mycl)]
heatmap(datamat, Rowv=as.dendrogram(hr), Colv=NA,
col=colorpanel(40, "black","yellow","green"),
scale="column", RowSideColors=mycol)
Again, I plot the original colours but use the log-clusters because I know that this is what the other programme does.
I tried to play around with the methods, but I don't get anything that would at least somehow look like a clustered heatmap. When I take out the scaling, the heatmap becomes extremely dark (and I am actually quite sure that I have somehow to scale or normalize the data by column). I also tried to cluster with k-means, but again, this didn't help. My idea was that the colour scale might not be used completely because of two outliers, but although removing them slightly increased the range of colours plotted on the heatmap, this still did not reveal proper clusters.
Is there anything else I could play around with?
And is it possible to change the colour scale with heatmap so that outliers are found in the last bin that has a range of "everything greater than a particular value"? I tried to do this with heatmap.2 (argument "breaks"), but I didn't quite succeed and also I didn't manage to put the row side colours that I use with the heatmap function.
If you are okay with using heatmap.2 from the gplots package that will allow you to add breaks to assign colors to ranges represented in your heatmap.
For example if you had 3 colors blue, white, and red with the values going from low to high you could do something like this:
my.breaks <- c(seq(-5, -.6, length.out=6),seq(-.5999999, .1, length.out=4),seq(.100009,5, length.out=7))
result <- heatmap.2(mtscaled, Rowv=T, scale='none', dendrogram="row", symm = T, col=bluered(16), breaks=my.breaks)
In this case you have 3 sets of values that correspond to the 3 colors, the values will differ of course depending on what values you have with your data.
One thing you are doing in your program is to call hclust on your data then to call heatmap on it, however if you look in the heatmap manual page it states:
Defaults to hclust.
So I don't think you need to do that. You might want to take a look at some similar questions that I had asked that might help to point you in the right direction:
Heatmap Question 1
Heatmap Question 2
If you post an image of the heatmap you get and an image of the heatmap that the other program is making it will be easier for us to help you out more.

Resources