I am not sure I can accomplish what I am trying to do with gnuplot.
I am not sure whether it would be better to plot my data as a colored surface plot or in the following way:
Firstly, here is an explanation of my data format.
The first column contains time values. The time typically does something like 0, 1, 2, 3, 4, 5, .... For each time, the row next to the time value contains the data I would like to plot. (This is the y-data on a 2D plot, the x-data is the x index (row index) of the data minus 1 for the first column which contains the time index.
I have many (~ 1000) time values.
So it looks something like this:
Sorry for awful color choice.
Essentially I would like to plot a line graph for each time value, and put these 2d plots side by side in a 3d plot.
Can I even do this with gnuplot?
I hope the question is clear, if not I can try and add more information.
plot 'data.txt' matrix using ($1-1):3:2 every 1::1 with linespoints palette
your Excel data are copied to data.txt
matrix mode plots lines series (it's that you want)
the first and the second parameter in using when plotting matrix are mode specifiers: 1:3 is "plot rows of matrix".
Citing #Cristoph's comment,
"With the matrix option, the column number is available as first
column, the row number as second column and the actual matrix value as
third column."
($1) would be equal to 1 here, but ($1 - 1) means "subtract unity from a column number" - this way x would start from 0 for each row.
every 1::1 - do not include the first column with time values into plot.
at last using ..:..:2 allows you to set different colors (equal to row number) for the palette option.
The answer is based on similar question and answer
Related
I have a data frame (pLog) containing the number of reads per nucleotide for a chip-seq experiment done for a E. coli genome (4.6MB). I want to be able to plot on the X axis the chromosomal position and on the Y axis the number of reads. To make it easier, I binned the data in windows of 100bp. That makes the data frame of 46,259 rows and 2 columns. One column is named "position" and has a number representing a chromosomal position (1,101,201,....) and the other column is named "values" and contains the number of reads found on that bin e.g.(210,511,315,....). I have been using ggplot for all my analysis and I would like to use it for this plot, if possible.
I am trying for the graph to look something like this:
but I haven't been able to plot it.
This is how my data looks like
I tried
ggplot(pLog,aes(position))+
geom_histogram(binwidth=50)
ggsave(file.jpg)
And this is how it looks like :(
Many thanks!
You cannot use geom_histogram(), try geom_line:
pLog=data.frame(position=seq(1,100000,by=100),
value=rnbinom(10000,mu=100,size=20))
ggplot(pLog,aes(x=position,y=value))+geom_line(alpha=0.7,col="steelblue")
Most likely you need to play around to get the visualization you need
I have some items that have different eligibility criteria - specifically in this example two variables each with a min and max the values are allowed to take. I would like to see the coverage of the products by plotting rectangles for each product on a chart that shows the area between the mins and maxs.
How would you go about
converting the records most elegantly to that required by geom_polygon() and
ensuring the shapes produced appear as rectangles
Example
library(data.table)
library(ggplot2)
df<-data.table(Product=letters[1:10], minX=1:10, maxX=5:14, minY= 10:1, maxY=14:5)
df.t<-data.table(rbind( df[,list(Product,X=minX,Y=minY)],
df[,list(Product,X=minX,Y=maxY)],
df[,list(Product,X=maxX,Y=minY)],
df[,list(Product,X=maxX,Y=maxY)]))[
order(Product,X,Y)]
ggplot(df.t,aes(x=X,y=Y,group=Product,fill=Product))+geom_polygon()
NB In this reduced example there are only two criteria, however I have a range of criteria columns and would not want to repeat the exercise above for different combinations.
Use your original data frame df and then geom_rect() as you already have minimal and maximal values for the x and y.
ggplot(df,aes(xmin=minX,xmax=maxX,ymin=minY,ymax=maxY,fill=Product))+geom_rect()
I have about 500,000 points in R of occurrence data of a migratory bird species throughout the US.
I am attempting to overlay a grid on these points, and then count the number of occurrences in each grid. Once the counts have been tallied, I then want to reference them to a grid cell ID.
In R, I've used the over() function to just get the points within the range map, which is a shapefile.
#Read in occurrence data
data=read.csv("data.csv", header=TRUE)
coordinates(data)=c("LONGITUDE","LATITUDE")
#Get shapefile of the species' range map
range=readOGR(".",layer="data")
proj4string(data)=proj4string(range)
#Get points within the range map
inside.range=!is.na(over(data,as(range,"SpatialPolygons")))
The above worked exactly as I hoped, but does not address my current problem: how to deal with points that are the type SpatialPointsDataFrame, and a grid that is a raster. Would you recommend polygonizing the raster grid, and using the same method I indicated above? Or would another process be more efficient?
First of all, your R code doesn't work as written. I would suggest copy-pasting it into a clean session, and if it errors out for you as well, correcting syntax errors or including add-on libraries until it runs.
That said, I assume that you are supposed to end up with a data.frame of two-dimensional numeric coordinates. So, for the purposes of binning and counting them, any such data will do, so I took the liberty of simulating such a dataset. Please correct me if this doesn't capture a relevant aspect of your data.
## Skip this line if you are the OP, and substitute the real data instead.
data<-data.frame(LATITUDE=runif(100,1,100),LONGITUDE=runif(100,1,100));
## Add the latitudes and longitudes between which each observation is located
## You can substitute any number of breaks you want. Or, a vector of fixed cutpoints
## LATgrid and LONgrid are going to be factors. With ugly level names.
data$LATgrid<-cut(data$LATITUDE,breaks=10,include.lowest=T);
data$LONgrid<-cut(data$LONGITUDE,breaks=10,include.lowest=T);
## Create a single factor that gives the lat,long of each observation.
data$IDgrid<-with(data,interaction(LATgrid,LONgrid));
## Now, create another factor based on the above one, with shorter IDs and no empty levels
data$IDNgrid<-factor(data$IDgrid);
levels(data$IDNgrid)<-seq_along(levels(data$IDNgrid));
## If you want total grid-cell count repeated for each observation falling into that grid cell, do this:
data$count<- ave(data$LATITUDE,data$IDNgrid,FUN=length);
## You could have also used data$LONGITUDE, doesn't matter in this case
## If you want just a table of counts at each grid-cell, do this:
aggregate(data$LATITUDE,data[,c('LATgrid','LONgrid','IDNgrid')],FUN=length);
## I included the LATgrid and LONgrid vectors so there would be some
## sort of descriptive reference accompanying the anonymous numbers in IDNgrid,
## but only IDNgrid is actually necessary
## If you want a really minimalist table, you could do this:
table(data$IDNgrid);
This is a snipet of the data, of which there is a ton, with explanation of what I want to do:
File
Basically I have a number of subsets (marked out by 1, 2 ... in a seperate column) of data which have intervals. I need to know if the intervals in the same two subsets overlap and if yes then I need the value (column C) which is associated with the set in columns E-G to be pasted next to the interval in column J-K that overlaps with the interval in F-G. The problem is that the interval in column F-G overlaps with multiple intervals in columns J-K.
I've been trying to solve this with
=if(or(and(x>=a,x<=b),and(a>=x,a<=y)),"Overlap","Do not overlap")
But the problem is I can't find a way to do this for multiple overlaps. If you think this can't be done in excel and know how else to do it (e.g. R) please let me know.
Thank you
In Excel try this formula in L4 copied down
=IFERROR(INDEX(C$4:C$100,MATCH(1,INDEX((J4<=G$4:G$100)*(K4>=F$4:F$100)*(I4=E$4:E$100),0),0)),"No overlap")
This will find the first row within each subset (if any) where the F/G interval overlaps with the current row J/K interval, if no such row exists you get "no overlap"
I have data in a zoo object which has multiple columns.
Now I want to plot (four of those columns) two columns in same and two in graph below the previous graph.
To be more precise, I have been able to plot the four of them one below the other.
But I want first two in the same plot and last two in the next plot
It should work by adding nc=2 in your plot command (ie number of columns = 2).