Related
I have a large dataset which I need to plot in loglog scale in Gnuplot, like this:
set log xy
plot 'A_1D_l0.25_L1024_r0.dat' u 1:($2-512)
LogLogPlot of my datapoints
Text file with the datapoints
Datapoints on the x axis are equally spaced, but because of the logscale they get very dense on the right part of the graph, and as a result the output file (I finally export it in .tex) gets very large.
In linear scale, I would simply use the option every to reduce the number of points which get plotted. Is there a similar option for loglogscale, such that the plotted points appear equally spaced?
I am aware of a similar question which was raised a few years ago, but in my opinion the solution is unsatisfactory: plotted points are not equally spaced along the x-axis. I think this is a really unsophisticated problem which deserves a clearer solution.
As I understand it, you don't want to plot the actual data points; you just want to plot a line through them. But you want to keep the appearance of points rather than a line. Is that right?
set log xy
plot 'A_1D_l0.25_L1024_r0.dat' u 1:($2-512) with lines dashtype '.' lw 2
Amended answer
If it is important to present outliers/errors in the data set then you must not use every or any other technique that simply discards or skips most of the data points. In that case I would prefer the plot with points that you show in the original question, perhaps modified to represent each point as a dot rather than a cross. I will simulate this by modifying a single point in your 500000 point data set (first figure below). But I would also suggest that the presence of outliers is even more apparent if you plot with lines (second figure below).
Showing error bounds is another alternative for noisy data, but the options depend on what you have to work with in your data set. If you want to pursue that, please ask a separate question.
If you really want to reduce the number of data to be plotted, you might consider the following script.
s = 0.1 ### sampling interval in log scale
### (try 0.05 for more detail)
c = log10(0.01) ### a parameter used in sampler(x)
### which should be initialized by
### smaller value than any x in log scale
sampler(x) = (x>0 && log10(x)>=c) ? (c=ceil(log10(x)/s+0.5)*s, x) : NaN
set log xy
set grid xtics
plot 'A_1D_l0.25_L1024_r0.dat' using (sampler($1)):($2-512) with points pt 7 lt 1 notitle , \
'A_1D_l0.25_L1024_r0.dat' using 1:($2-512) with lines lt 1 notitle
This script samples the data in increments of roughly 0.1 on x-axis in log scale. It makes use of the property that points whose x value is evaluated as NaN in using are not drawn.
I am trying to convey the concentration of lines in 2D space by showing the number of crossings through each pixel in a grid. I am picturing something similar to a density plot, but with more intuitive units. I was drawn to the spatstat package and its line segment class (psp) as it allows you to define line segments by their end points and incorporate the entire line in calculations. However, I'm struggling to find the right combination of functions to tally these counts and would appreciate any suggestions.
As shown in the example below with 50 lines, the density function produces values in (0,140), the pixellate function tallies the total length through each pixel and takes values in (0, 0.04), and as.mask produces a binary indictor of whether a line went through each pixel. I'm hoping to see something where the scale takes integer values, say 0..10.
require(spatstat)
set.seed(1234)
numLines = 50
# define line segments
L = psp(runif(numLines),runif(numLines),runif(numLines),runif(numLines), window=owin())
# image with 2-dimensional kernel density estimate
D = density.psp(L, sigma=0.03)
# image with total length of lines through each pixel
P = pixellate.psp(L)
# binary mask giving whether a line went through a pixel
B = as.mask.psp(L)
par(mfrow=c(2,2), mar=c(2,2,2,2))
plot(L, main="L")
plot(D, main="density.psp(L)")
plot(P, main="pixellate.psp(L)")
plot(B, main="as.mask.psp(L)")
The pixellate.psp function allows you to optionally specify weights to use in the calculation. I considered trying to manipulate this to normalize the pixels to take a count of one for each crossing, but the weight is applied uniquely to each line (and not specific to the line/pixel pair). I also considered calculating a binary mask for each line and adding the results, but it seems like there should be an easier way. I know that you can sample points along a line, and then do a count of the points by pixel. However, I am concerned about getting the sampling right so that there is one and only one point per line crossing of a pixel.
Is there is a straight-forward way to do this in R? Otherwise would this be an appropriate suggestion for a future package enhancement? Is this more easily accomplished in another language such as python or matlab?
The example above and my testing has been with spatstat 1.40-0, R 3.1.2, on x86_64-w64-mingw32.
You are absolutely right that this is something to put in as a future enhancement. It will be done in one of the next versions of spatstat. It will probably be an option in pixellate.psp to count the number of crossing lines rather than measure the total length.
For now you have to do something a bit convoluted as e.g:
require(spatstat)
set.seed(1234)
numLines = 50
# define line segments
L <- psp(runif(numLines),runif(numLines),runif(numLines),runif(numLines), window=owin())
# split into individual lines and use as.mask.psp on each
masklist <- lapply(1:nsegments(L), function(i) as.mask.psp(L[i]))
# convert to 0-1 image for easy addition
imlist <- lapply(masklist, as.im.owin, na.replace = 0)
rslt <- Reduce("+", imlist)
# plot
plot(rslt, main = "")
I am stuck in simple problem. I have a scatter plot.
I am plotted confidence lines around it using my a custom formula. Now, i just want only the names outside the cutoff lines to be displayed nothing inside. But, I can't figure out how to subset my data on the based of the line co-ordinates.
The line is plotted using the lines function which is a vector of 128 x and y values. Now, how do I subset my data (x,y points) based on these 2 values. I can apply a static limit of a single number of sub-setting data like 1,2 or 3 but how to use a vector to subset data, got me stuck.
For an reproducible example, consider :
df=data.frame(x=seq(2,16,by=2),y=seq(2,16,by=2),lab=paste("label",seq(2,16,by=2),sep=''))
plot(df[,1],df[,2])
# adding lines
lines(seq(1,15),seq(15,1),lwd=1, lty=2)
# adding labels
text(df[,1],df[,2],labels=df[,3],pos=3,col="red",cex=0.75)
Now, I need just the labels, which are outside or intersecting the line.
What I was trying to subset my dataframe with the values used for the lines, but I cant make it right.
Now, static sub-setting can be done for single values like
df[which(df[,1]>8 & df[,2]>8),] but how to do it for whole list.
I also tried sapply, to cycle over all the values of x and y used for lines on the df iteratively, but most values become +ve for a limit but false for other values.
Thanks
I will speak about your initial volcano-type-graph problem and not the made up one because they are totally different.
So I really thought this a lot and I believe I reached a solid conclusion. There are two options:
1. You know the equations of the lines, which would be really easy to work with.
2. You do not know the equation of the lines which means we need to work with an approximation.
Some geometry:
The function shows the equation of a line. For a given pair of coordinates (x, y), if y > the right hand side of the equation when you pass x in, then the point is above the line else below the line. The same concept stands if you have a curve (as in your case).
If you have the equations then it is easy to do the above in my code below and you are set. If not you need to make an approximation to the curve. To do that you will need the following code:
df=data.frame(x=seq(2,16,by=2),y=seq(2,16,by=2),lab=paste("label",seq(2,16,by=2),sep=''))
make_vector <- function(df) {
lab <- vector()
for (i in 1:nrow(df)) {
this_row <- df[i,] #this will contain the three elements per row
if ( (this_row[1] < max(line1x) & this_row[2] > max(line1y) & this_row[2] < a + b*this_row[1])
|
(this_row[1] > min(line2x) & this_row[2] > max(line2y) & this_row[2] > a + b*this_row[1]) ) {
lab[i] <- this_row[3]
} else {
lab[i] <- <NA>
}
}
return(lab)
}
#this_row[1] = your x
#this_row[2] = your y
#this_row[3] = your label
df$labels <- make_vector(df)
plot(df[,1],df[,2])
# adding lines
lines(seq(1,15),seq(15,1),lwd=1, lty=2)
# adding labels
text(df[,1],df[,2],labels=df[,4],pos=3,col="red",cex=0.75)
The important bit is the function. Imagine that you have df as you created it with x,y and labs. You also will have a vector with the x,y coordinates for line1 and x,y coordinates for line2.
Let's see the condition of line1 only (the same exists for line 2 which is implemented on the code above):
this_row[1] < max(line1x) & this_row[2] > max(line1y) & this_row[2] < a + b*this_row[1]
#translates to:
#this_row[1] < max(line1x) = your x needs to be less than the max x (vertical line in graph below
#this_row[2] > max(line1y) = your y needs to be greater than the max y (horizontal line in graph below
#this_row[2] < a + b*this_row[1] = your y needs to be less than the right hand side of the equation (to have a point above i.e. left of the line)
#check below what the line is
This will make something like the below graph (this is a bit horrible and also magnified but it is just a reference. Visualize it approximating your lines):
The above code would pick all the points in the area above the triangle and within the y=1 and x=1 lines.
Finally the equation:
Having 2 points' coordinates you can figure out a line's equation solving a system of two equations and 2 parameters a and b. (y = a +bx by replacing y,x for each point)
The 2 points to pick are the two points closest to the tangent of the first line (line1). Chose those arbitrarily according to your data. The closest to the tangent the better. Just plot the spots and eyeball.
Having done all the above you have your points with your labels (approximately at least).
And that is the only thing you can do!
Long talk but hope it helps.
P.S. I haven't tested the code because I have no data.
I have a set of 3D coordinates (below - just for a single point, in 3D space):
x <- c(-521.531433, -521.511658, -521.515259, -521.518127, -521.563416, -521.558044, -521.571228, -521.607178, -521.631165, -521.659973)
y <- c(154.499557, 154.479568, 154.438705, 154.398682, 154.580688, 154.365189, 154.3564, 154.559189, 154.341309, 154.344223)
z <- c(864.379272, 864.354675, 864.365479, 864.363831, 864.495667, 864.35498, 864.358582, 864.50415, 864.35553, 864.359863)
xyz <- data.frame(x,y,z)
I need to make a time-series plot of this point with a 3D rendering (so I can rotate the plot, etc.). The plot will visualize a trajectory of the point above in time (for example in the form of solid line). I used 'rgl' package with plot3d method, but I can't make it to plot time-series (below, just plot a single point from first frame in time-series):
require(rgl)
plot3d(xyz[1,1],xyz[1,2],xyz[1,3],axes=F,xlab="",ylab="",zlab="")
I found this post, but it doesn't really deal with a real-time rendered 3D plots. I would appreciate any suggestions. Thank you.
If you read help(plot3d) you can see how to draw lines:
require(rgl)
plot3d(xyz$x,xyz$y,xyz$z,type="l")
Is that what you want?
How about this? It uses rgl.pop() to remove a point and a line and draw them as a trail - change the sleep argument to control the speed:
ts <- function(xyz,sleep=0.3){
plot3d(xyz,type="n")
n = nrow(xyz)
p = points3d(xyz[1,])
l = lines3d(xyz[1,])
for(i in 2:n){
Sys.sleep(sleep)
rgl.pop("shapes",p)
rgl.pop("shapes",l)
p=points3d(xyz[i,])
l=lines3d(xyz[1:i,])
}
}
The solution was simpler than I thought and the problem was that I didn't use as.matrix on my data. I was getting error (list) object cannot be coerced to type 'double' when I was simply trying to plot my entire dataset using plot3d (found a solution for this here). So, if you need to plot time-series of set of coordinates (in my case motion capture data of two actors) here is my complete solution (only works with the data set below!):
download example data set
read the above data into a table:
data <- read.table("Bob12.txt",sep="\t")
extract XYZ coordinates into a separate matrixes:
x <- as.matrix(subset(data,select=seq(1,88,3)))
y <- as.matrix(subset(data,select=seq(2,89,3)))
z <- as.matrix(subset(data,select=seq(3,90,3)))
plot the coordinates on a nice, 3D rendered plot using 'rgl' package:
require(rgl)
plot3d(x[1:nrow(x),],y[1:nrow(y),],z[1:nrow(z),],axes=F,xlab="",ylab="",zlab="")
You should get something like on the image below (but you can rotate it etc.) - hope you can recognise there are joint centers for people there. I still need to tweak it to make it visually better - to have first frame as a points (to clearly see actor's joints), then a visible break, and then the rest of frames as a lines.
I have
probability values: 0.06,0.06,0.1,0.08,0.12,0.16,0.14,0.14,0.08,0.02,0.04 ,summing up to 1
the corresponding intervals where a stochastic variable may take its value with the corresponding probability from the above list:
126,162,233,304,375,446,517,588,659,730,801,839
How can I plot the probability distribution?
On the x axis, the interval values, between the intervals histogram with the probability value?
Thanks.
How about
x <- c(126,162,233,304,375,446,517,588,659,730,801,839)
p <- c(0.06,0.06,0.1,0.08,0.12,0.16,0.14,0.14,0.08,0.02,0.04)
plot(x,c(p,0),type="s")
lines(x,c(0,p),type="S")
rect(x[-1],0,x[-length(x)],p,col="lightblue")
for a quick answer? (With the rect included you might not need the lines call and might be able to change it to plot(x,p,type="n"). As usual I would recommend par(bty="l",lty=1) for my preferred graphical defaults ...)
(Explanation: "s" and "S" are two different stair-step types (see Details in ?plot): I used them both to get both the left and right boundaries of the distribution.)
edit: In your comments you say "(it) doesn't look like a histogram". It's not quite clear what you want. I added rectangles in the example above -- maybe that does it? Or you could do
b <- barplot(p,width=diff(x),space=0)
but getting the x-axis labels right is a pain.