Related
I have a set of 3D coordinates here. The data has 52170 rows and 4 columns. Each row represent one point. The first column is point index number, increasing from 1 to 52170. The second to fourth columns are coordinates for x, y, and z axis, respectively. The first 10 lines are as follow:
seq x y z
1 7.126616 -102.927567 19.692112
2 -10.546907 -143.824966 50.77417
3 7.189214 -107.792068 18.758278
4 7.148852 -101.784027 19.905006
5 -14.65788 -146.294952 49.899158
6 -37.315742 -116.941185 12.316169
7 8.023512 -103.477882 19.081482
8 -14.641933 -145.100098 50.182739
9 -14.571636 -141.386322 50.547684
10 -15.691803 -145.66481 49.946281
I want to create a 3D scatter plot in which each point is added sequentially to this plot using R or MATLAB. The point represented by the first line is added first, then the point represented by the second line, ..., all the way to the last point.
In addition, I wish to control the speed at which points are added.
For 2D scatter plot, I could use the following code:
library(gganimate)
x <- rnorm(50, 5, 1)
y <- 7*x +rnorm(50, 4, 4)
ind <- 1:50
data <- data.frame(x, y, ind)
ggplot(data, aes(x, y)) + geom_point(aes(group = seq_along(x))) + transition_reveal(ind)
But I cannnot find information on how to do this for 3D scatter plot. Can anyone show me how this could be done? Thank you.
This is an answer for MATLAB
In a general fashion, animating a plot (or 3d plot, or scatter plot, or surface, or other graphic objects) can be done following the same approach:
Do the first plot/plot3/scatter/surf, and retrieve its handle. The first plot can incorporate the first "initial" sets of points or even be empty (use NaN value to create a plot with invisible data point).
Set axis limits and all other visualisation options which are going to be fixed (view point, camera angle, lightning...). No need to set the options which are going to evolove during the animation.
In a loop, update the minimum set of plot object properties: XData, YData ( ZData if 3D plot, CData if the plot object has some and you want to animate the color).
The code below is an implementation of the approach above adapted to your case:
%% Read data and place coordinates in named variables
csvfile = '3D scatter plot.csv' ;
data = csvread(csvfile,2) ;
% [optional], just to simplify notations further down
x = data(:,2) ;
y = data(:,3) ;
z = data(:,4) ;
%% Generate empty [plot3] objects
figure
% create an "axes" object, and retrieve the handle "hax"
hax = axes ;
% create 2 empty 3D point plots:
% [hp_new] will contains only one point (the new point added to the graph)
% [hp_trail] will contains all the points displayed so far
hp_trail = plot3(NaN,NaN,NaN,'.b','Parent',hax,'MarkerSize',2) ;
hold on
hp_new = plot3(NaN,NaN,NaN,'or','Parent',hax,'MarkerSize',6,'MarkerEdgeColor','r','MarkerFaceColor','g','LineWidth',2) ;
hold off
%% Set axes limits (to limit "wobbling" during animation)
xl = [min(x) max(x)] ;
yl = [min(y) max(y)] ;
zl = [min(z) max(z)] ;
set(hax, 'XLim',xl,'YLim',yl,'ZLim',zl)
view(145,72) % set a view perspective (optional)
%% Animate
np = size(data,1) ;
for ip=1:np
% update the "new point" graphic object
set( hp_new , 'XData',x(ip), 'YData',y(ip), 'ZData',z(ip) )
% update the "point history" graphic object
% we will display points from index 1 up to the current index ip
% (minus one) because the current index point is already displayed in
% the other plot object
indices2display = 1:ip-1 ;
set(hp_trail ,...
'XData',x(indices2display), ...
'YData',y(indices2display), ...
'ZData',z(indices2display) )
% force graphic refresh
drawnow
% Set the "speed"
% actually the max speed is given by your harware, so we'll just set a
% short pause in case you want to slow it down
pause(0.01) % <= comment this line if you want max speed
end
This will produce:
I know the use of for-loop in R is often unnecessary, because it supports vectorization. I want to program as efficient as possible, there for my question concerning the following example code.
I have a hexagonal grid, and I am calculating the number of the cell, this counts from 1 to 225 in my example starting in the left lower corner, going to the right. So cell 16 is placed a bit offset right above cell 1.
see snapshot:
Therefor, if I have the Y coordinate, the X coordinate has to be either rounded, or ceiling. In my application the user points out cells, I save this and in a for loop go through the cells to determine the cells he chose as follows, with toy input values for Xcells and Ycells the user would have chosen:
gridsize <- 15
Xcells <-c(0.8066765, 1.8209879, 3.0526517, 0.5893240)
Ycells <-c(0.4577802, 0.4577802, 0.5302311, 1.5445425)
clicks <- length(Xcells)
cells <-vector('list', clicks)
This corresponds to cell 1 2 3 and 16. 4 clicks. Now to determine the cell numbers:
Y <- ceiling(Ycells)
for(i in 1:clicks){
if(Y[i]%%2==1){
X[i] <- round(Xcells[i])
}
else{
X[i]<- ceiling(Xcells[i])
}
#determine the cell numbers and store in predefined list
cells[[i]] <- (Y[i]-1)*gridsize + X[i]
}
So if the Y is 'even' the X has to be rounded, and if the Y is 'un-even' it has to be the ceiling value.
Is there a way to do this without the for loop, by using the vectorization?
You can vectorize this as follows
(Y - 1) * gridsize + ifelse(Y %% 2 == 1, round(Xcells), ceiling(Xcells))
# [1] 1 2 3 16
(I'm not sure pre-calculating round(Xcells) and ceiling(Xcells) will improve this a bit more - you could try)
Another option (if you want to avoid ifelse) could be
(Y - 1) * gridsize + cbind(ceiling(Xcells), round(Xcells))[cbind(1:length(Xcells), Y %% 2 + 1)]
# [1] 1 2 3 16
I am stuck in simple problem. I have a scatter plot.
I am plotted confidence lines around it using my a custom formula. Now, i just want only the names outside the cutoff lines to be displayed nothing inside. But, I can't figure out how to subset my data on the based of the line co-ordinates.
The line is plotted using the lines function which is a vector of 128 x and y values. Now, how do I subset my data (x,y points) based on these 2 values. I can apply a static limit of a single number of sub-setting data like 1,2 or 3 but how to use a vector to subset data, got me stuck.
For an reproducible example, consider :
df=data.frame(x=seq(2,16,by=2),y=seq(2,16,by=2),lab=paste("label",seq(2,16,by=2),sep=''))
plot(df[,1],df[,2])
# adding lines
lines(seq(1,15),seq(15,1),lwd=1, lty=2)
# adding labels
text(df[,1],df[,2],labels=df[,3],pos=3,col="red",cex=0.75)
Now, I need just the labels, which are outside or intersecting the line.
What I was trying to subset my dataframe with the values used for the lines, but I cant make it right.
Now, static sub-setting can be done for single values like
df[which(df[,1]>8 & df[,2]>8),] but how to do it for whole list.
I also tried sapply, to cycle over all the values of x and y used for lines on the df iteratively, but most values become +ve for a limit but false for other values.
Thanks
I will speak about your initial volcano-type-graph problem and not the made up one because they are totally different.
So I really thought this a lot and I believe I reached a solid conclusion. There are two options:
1. You know the equations of the lines, which would be really easy to work with.
2. You do not know the equation of the lines which means we need to work with an approximation.
Some geometry:
The function shows the equation of a line. For a given pair of coordinates (x, y), if y > the right hand side of the equation when you pass x in, then the point is above the line else below the line. The same concept stands if you have a curve (as in your case).
If you have the equations then it is easy to do the above in my code below and you are set. If not you need to make an approximation to the curve. To do that you will need the following code:
df=data.frame(x=seq(2,16,by=2),y=seq(2,16,by=2),lab=paste("label",seq(2,16,by=2),sep=''))
make_vector <- function(df) {
lab <- vector()
for (i in 1:nrow(df)) {
this_row <- df[i,] #this will contain the three elements per row
if ( (this_row[1] < max(line1x) & this_row[2] > max(line1y) & this_row[2] < a + b*this_row[1])
|
(this_row[1] > min(line2x) & this_row[2] > max(line2y) & this_row[2] > a + b*this_row[1]) ) {
lab[i] <- this_row[3]
} else {
lab[i] <- <NA>
}
}
return(lab)
}
#this_row[1] = your x
#this_row[2] = your y
#this_row[3] = your label
df$labels <- make_vector(df)
plot(df[,1],df[,2])
# adding lines
lines(seq(1,15),seq(15,1),lwd=1, lty=2)
# adding labels
text(df[,1],df[,2],labels=df[,4],pos=3,col="red",cex=0.75)
The important bit is the function. Imagine that you have df as you created it with x,y and labs. You also will have a vector with the x,y coordinates for line1 and x,y coordinates for line2.
Let's see the condition of line1 only (the same exists for line 2 which is implemented on the code above):
this_row[1] < max(line1x) & this_row[2] > max(line1y) & this_row[2] < a + b*this_row[1]
#translates to:
#this_row[1] < max(line1x) = your x needs to be less than the max x (vertical line in graph below
#this_row[2] > max(line1y) = your y needs to be greater than the max y (horizontal line in graph below
#this_row[2] < a + b*this_row[1] = your y needs to be less than the right hand side of the equation (to have a point above i.e. left of the line)
#check below what the line is
This will make something like the below graph (this is a bit horrible and also magnified but it is just a reference. Visualize it approximating your lines):
The above code would pick all the points in the area above the triangle and within the y=1 and x=1 lines.
Finally the equation:
Having 2 points' coordinates you can figure out a line's equation solving a system of two equations and 2 parameters a and b. (y = a +bx by replacing y,x for each point)
The 2 points to pick are the two points closest to the tangent of the first line (line1). Chose those arbitrarily according to your data. The closest to the tangent the better. Just plot the spots and eyeball.
Having done all the above you have your points with your labels (approximately at least).
And that is the only thing you can do!
Long talk but hope it helps.
P.S. I haven't tested the code because I have no data.
What is the best (fastest) way to compute two vectors that are perpendicular to the third vector(X) and also perpendicular to each other?
This is how am I computing this vectors right now:
// HELPER - unit vector that is NOT parallel to X
x_axis = normalize(X);
y_axis = crossProduct(x_axis, HELPER);
z_axis = crossProduct(x_axis, y_axis);
I know there is infinite number of solutions to this, and I don't care which one will be my solution.
What is behind this question: I need to construct transformation matrix, where I know which direction should X axis (first column in matrix) be pointing. I need to calculate Y and Z axis (second and third column). As we know, all axes must be perpendicular to each other.
What I have done, provided that X<>0 or Y<>0 is
A = [-Y, X, 0]
B = [-X*Z, -Y*Z, X*X+Y*Y]
and then normalize the vectors.
[ X,Y,Z]·[-Y,X,0] = -X*Y+Y*X = 0
[ X,Y,Z]·[-X*Z,-Y*Z,X*X+Y*Y] = -X*X*Z-Y*Y*Z+Z*(X*X+Y*Y) = 0
[-Y,X,0]·[-X*Z,-Y*Z,X*X+Y*Y] = Y*X*Z+X*Y*Z = 0
This is called the nullspace of your vector.
If X=0 and Y=0 then A=[1,0,0], B=[0,1,0].
This is the way to do it.
It's also probably the only way to do it. Any other way would be mathematically equivalent.
It may be possible to save a few cycles by opening the crossProduct computation and making sure you're not doing the same multiplications more than once but that's really far into micro-optimization land.
One thing you should be careful is of course the HELPER vector. Not only does it has to be not parallel to X but it's also a good idea that it would be VERY not parallel to X. If X and HELPER are going to be even somewhat parallel, your floating point calculation is going to be unstable and inaccurate. You can test and see what happens if the dot product of X and HELPER is something like 0.9999.
There is a method to find a good HELPER (really - it is ready to be your y_axis).
Let's X = (ax, ay, az). Choose 2 elements with bigger magnitude, exchange them, and negate one of them. Set to zero third element (with the least magnitude). This vector is perpendicular to X.
Example:
if (ax <= ay) and (ax <= az) then HELPER = (0, -az, ay) (or (0, az, -ay))
X*HELPER = 0*0 - ay*az + az*ay = 0
if (ay <= ax) and (ay <= az) then HELPER = (az, 0, -ay)
For a good HELPER vector: find the coordinate of X with the smallest absolute value, and use that coordinate axis:
absX = abs(X.x); absY = abs(X.y); absZ = abs(X.z);
if(absX < absY) {
if(absZ < absX)
HELPER = vector(0,0,1);
else // absX <= absZ
HELPER = vector(1,0,0);
} else { // absY <= absX
if(absZ < absY)
HELPER = vector(0,0,1);
else // absY <= absZ
HELPER = vector(0,1,0);
}
Note: this is effectively very similar to #MBo's answer: taking the cross-product with the smallest coordinate axis is equivalent to setting the smallest coordinate to zero, exchanging the larger two, and negating one.
I think the minimum maximum magnatude out of all element in a unit vector is always greater than 0.577, so you may be able to get away with this:
-> Reduce the problem of finding a perpendicular vector to a 3D vector to a 2D vector by finding any element whose magnatude is greater than say 0.5, then ignore a different element (use 0 in its place) and apply the perpendicular to a 2D vector formula in the remaining elements (for 2D x-axis=(ax,ay) -> y-axis=(-ay,ax))
let x-axis be represented by (ax,ay,az)
if (abs(ay) > 0.5) {
y-axis = normalize((-ay,ax,0))
} else if (abs(az) > 0.5) {
y-axis = normalize((0,-az,ay))
} else if (abs(ax) > 0.5) {
y-axis = normalize((az,0,-ax))
} else {
error("Impossible unit vector")
}
I'm trying to calculate all the possible values of a grid size (x by y) that lead to the same number of cells, so for example a 2x2 grid has a cell size of 4. I want the y to be half of the x, and the total to be, for example 4000. So I guess I want R to calculate all the possible positive integer values of x and y where
function (total) {
x*y=total
x/y=2
x!=total
y!= total.
}
I suppose one way to get positive integers and to consider different solutions would be to allow the total to be up to 10% larger than its original value (but not smaller, I need the grid to be at least as big as the total value I give), in which case the function could have two fields, tot (e.g. 4000) and tolerance (e.g. 10%). Total (as used in the sketch function above) than has to be between tot and (tot+tolerance*tot)
I have several cell sizes so 4000 is only one example. I'm trying to build a quick function which returns positive integers only and returns a matrix of Xs and Ys.
Any ideas?
Many thanks
What about this:
possible.sizes <- function(total, tolerance) {
min.total <- total
max.total <- total * (1 + tolerance)
min.y <- ceiling(sqrt(min.total/2))
max.y <- floor(sqrt(max.total/2))
if (max.y < min.y)
return(data.frame(x=numeric(0), y=numeric(0)))
y <- seq(min.y, max.y)
x <- 2*y
return(data.frame(x=x, y=y))
}
possible.sizes(4000, 0.1)
# x y
# 1 90 45
# 2 92 46