I have a set of 3D coordinates here. The data has 52170 rows and 4 columns. Each row represent one point. The first column is point index number, increasing from 1 to 52170. The second to fourth columns are coordinates for x, y, and z axis, respectively. The first 10 lines are as follow:
seq x y z
1 7.126616 -102.927567 19.692112
2 -10.546907 -143.824966 50.77417
3 7.189214 -107.792068 18.758278
4 7.148852 -101.784027 19.905006
5 -14.65788 -146.294952 49.899158
6 -37.315742 -116.941185 12.316169
7 8.023512 -103.477882 19.081482
8 -14.641933 -145.100098 50.182739
9 -14.571636 -141.386322 50.547684
10 -15.691803 -145.66481 49.946281
I want to create a 3D scatter plot in which each point is added sequentially to this plot using R or MATLAB. The point represented by the first line is added first, then the point represented by the second line, ..., all the way to the last point.
In addition, I wish to control the speed at which points are added.
For 2D scatter plot, I could use the following code:
library(gganimate)
x <- rnorm(50, 5, 1)
y <- 7*x +rnorm(50, 4, 4)
ind <- 1:50
data <- data.frame(x, y, ind)
ggplot(data, aes(x, y)) + geom_point(aes(group = seq_along(x))) + transition_reveal(ind)
But I cannnot find information on how to do this for 3D scatter plot. Can anyone show me how this could be done? Thank you.
This is an answer for MATLAB
In a general fashion, animating a plot (or 3d plot, or scatter plot, or surface, or other graphic objects) can be done following the same approach:
Do the first plot/plot3/scatter/surf, and retrieve its handle. The first plot can incorporate the first "initial" sets of points or even be empty (use NaN value to create a plot with invisible data point).
Set axis limits and all other visualisation options which are going to be fixed (view point, camera angle, lightning...). No need to set the options which are going to evolove during the animation.
In a loop, update the minimum set of plot object properties: XData, YData ( ZData if 3D plot, CData if the plot object has some and you want to animate the color).
The code below is an implementation of the approach above adapted to your case:
%% Read data and place coordinates in named variables
csvfile = '3D scatter plot.csv' ;
data = csvread(csvfile,2) ;
% [optional], just to simplify notations further down
x = data(:,2) ;
y = data(:,3) ;
z = data(:,4) ;
%% Generate empty [plot3] objects
figure
% create an "axes" object, and retrieve the handle "hax"
hax = axes ;
% create 2 empty 3D point plots:
% [hp_new] will contains only one point (the new point added to the graph)
% [hp_trail] will contains all the points displayed so far
hp_trail = plot3(NaN,NaN,NaN,'.b','Parent',hax,'MarkerSize',2) ;
hold on
hp_new = plot3(NaN,NaN,NaN,'or','Parent',hax,'MarkerSize',6,'MarkerEdgeColor','r','MarkerFaceColor','g','LineWidth',2) ;
hold off
%% Set axes limits (to limit "wobbling" during animation)
xl = [min(x) max(x)] ;
yl = [min(y) max(y)] ;
zl = [min(z) max(z)] ;
set(hax, 'XLim',xl,'YLim',yl,'ZLim',zl)
view(145,72) % set a view perspective (optional)
%% Animate
np = size(data,1) ;
for ip=1:np
% update the "new point" graphic object
set( hp_new , 'XData',x(ip), 'YData',y(ip), 'ZData',z(ip) )
% update the "point history" graphic object
% we will display points from index 1 up to the current index ip
% (minus one) because the current index point is already displayed in
% the other plot object
indices2display = 1:ip-1 ;
set(hp_trail ,...
'XData',x(indices2display), ...
'YData',y(indices2display), ...
'ZData',z(indices2display) )
% force graphic refresh
drawnow
% Set the "speed"
% actually the max speed is given by your harware, so we'll just set a
% short pause in case you want to slow it down
pause(0.01) % <= comment this line if you want max speed
end
This will produce:
I would like to draw a group bar graph with error bars and split y axis to show both smaller and larger values in same plot? (as shown in my data sample number 1 has small values compare to other samples, therefore, I want to make a gap on y axis in-between 10-200)
Here is my data,
sample mean part sd
1 4.3161 G 1.2209
1 2.3157 F 1.7011
1 1.7446 R 1.1618
2 1949.13 G 873.42
2 195.07 F 47.82
2 450.88 R 140.31
3 2002.98 G 367.92
3 293.45 F 59.01
3 681.99 R 168.03
4 2717.85 G 1106.07
4 432.83 F 118.02
4 790.97 R 232.62
You can do anything you want with primitive graphic elements. For this reason, I always prefer to design my own plots with just the base R plotting functions, particularly points(), segments(), lines(), abline(), rect(), polygon(), text(), and mtext(). You can easily create curves (e.g. for circles) and more complex shapes using segments() and lines() across granular coordinate vectors that you define yourself. For example, see Plot angle between vectors. This provides much more control over the plot elements you create, however, it often takes more work and careful coding than more prepackaged solutions, so it's a tradeoff.
Data
First, here's your data in runnable form:
df <- data.frame(
sample=c(1,1,1,2,2,2,3,3,3,4,4,4),
mean=c(4.3161,2.3157,1.7446,1949.13,195.07,450.88,2002.98,293.45,681.99,2717.85,432.83,790.97),
part=c('G','F','R','G','F','R','G','F','R','G','F','R'),
sd=c(1.2209,1.7011,1.1618,873.42,47.82,140.31,367.92,59.01,168.03,1106.07,118.02,232.62),
stringsAsFactors=F
);
df;
## sample mean part sd
## 1 1 4.3161 G 1.2209
## 2 1 2.3157 F 1.7011
## 3 1 1.7446 R 1.1618
## 4 2 1949.1300 G 873.4200
## 5 2 195.0700 F 47.8200
## 6 2 450.8800 R 140.3100
## 7 3 2002.9800 G 367.9200
## 8 3 293.4500 F 59.0100
## 9 3 681.9900 R 168.0300
## 10 4 2717.8500 G 1106.0700
## 11 4 432.8300 F 118.0200
## 12 4 790.9700 R 232.6200
OP ggplot
Now, for reference, here's a screenshot of the plot that results from the ggplot code you pasted into your comment:
library(ggplot2);
ggplot(df,aes(x=as.factor(sample),y=mean,fill=part)) +
geom_bar(position=position_dodge(),stat='identity',colour='black') +
geom_errorbar(aes(ymin=mean-sd,ymax=mean+sd),width=.2,position=position_dodge(.9));
Linear Single
Also for reference, here's how you can produce a similar grouped bar plot using base R barplot() and legend(). I've added the error bars with custom calls to segments() and points():
## reshape to wide matrices
dfw <- reshape(df,dir='w',idvar='part',timevar='sample');
dfw.mean <- as.matrix(dfw[grep(perl=T,'^mean\\.',names(dfw))]);
dfw.sd <- as.matrix(dfw[grep(perl=T,'^sd\\.',names(dfw))]);
rownames(dfw.mean) <- rownames(dfw.sd) <- dfw$part;
colnames(dfw.mean) <- colnames(dfw.sd) <- unique(df$sample);
## plot precomputations
ylim <- c(0,4000);
yticks <- seq(ylim[1L],ylim[2L],100);
xcenters <- (col(dfw.sd)-1L)*(nrow(dfw.sd)+1L)+row(dfw.sd)+0.5;
partColors <- c(G='green3',F='indianred1',R='dodgerblue');
errColors <- c(G='darkgreen',F='darkred',R='darkblue');
## plot
par(xaxs='i',yaxs='i');
barplot(dfw.mean,beside=T,col=partColors,ylim=ylim,xlab='sample',ylab='mean',axes=F);
segments(xcenters,dfw.mean-dfw.sd,y1=dfw.mean+dfw.sd,lwd=2,col=errColors);
points(rep(xcenters,2L),c(dfw.mean-dfw.sd,dfw.mean+dfw.sd),pch=19,col=errColors);
axis(1L,par('usr')[1:2],F,pos=0,tck=0);
axis(2L,yticks,las=1L,cex.axis=0.7);
legend(2,3800,dfw$part,partColors,title=expression(bold('part')),cex=0.7,title.adj=0.5[2:1]);
The issue is plain to see. There's nuance to some of the data (the sample 1 means and variability) that is not well represented in the plot.
Logarithmic
There are two standard options for dealing with this problem. One is to use a logarithmic scale. You can do this with the log='y' argument to the barplot() function. It's also good to override the default y-axis tick selection, since the default base R ticks tend to be a little light on density and short on range. (That's actually true in general, for most base R plot types; I make custom calls to axis() for all the plots I produce in this answer.)
## plot precomputations
ylim <- c(0.1,4100); ## lower limit must be > 0 for log plot
yticks <- rep(10^seq(floor(log10(ylim[1L])),ceiling(log10(ylim[2L])),1),each=9L)*1:9;
xcenters <- (col(dfw.sd)-1L)*(nrow(dfw.sd)+1L)+row(dfw.sd)+0.5;
partColors <- c(G='green3',F='indianred1',R='dodgerblue');
errColors <- c(G='darkgreen',F='darkred',R='darkblue');
## plot
par(xaxs='i',yaxs='i');
barplot(log='y',dfw.mean,beside=T,col=partColors,ylim=ylim,xlab='sample',ylab='mean',axes=F);
segments(xcenters,dfw.mean-dfw.sd,y1=dfw.mean+dfw.sd,lwd=2,col=errColors);
points(rep(xcenters,2L),c(dfw.mean-dfw.sd,dfw.mean+dfw.sd),pch=19,col=errColors);
axis(1L,par('usr')[1:2],F,pos=0,tck=0);
axis(2L,yticks,yticks,las=1L,cex.axis=0.6);
legend(2,3000,dfw$part,partColors,title=expression(bold('part')),cex=0.7,title.adj=0.5[2:1]);
Right away we see the issue with sample 1 is fixed. But we've introduced a new issue: we've lost precision in the rest of the data. In other words, the nuance that exists in the rest of the data is less visually pronounced. This is an unavoidable result of the "zoom-out" effect of changing from linear to logarithmic axes. You would incur the same loss of precision if you used a linear plot but with too large a y-axis, which is why it's always expected that axes are fitted as closely as possible to the data. This also serves as an indication that a logarithmic y-axis may not be the correct solution for your data. Logarithmic axes are generally advised when the underlying data reflects logarithmic phenomena; that it ranges over several orders of magnitude. In your data, only sample 1 sits in a different order of magnitude from the remaining data; the rest are concentrated in the same order of magnitude, and are thus not best represented with a logarithmic y-axis.
Linear Multiple
The second option is to create separate plots with completely different y-axis scaling. It should be noted that ggplot faceting is essentially the creation of separate plots. Also, you could create multifigure plots with base R, but I've usually found that that's more trouble than it's worth. It's usually easier to just generate each plot individually, and then lay them out next to each other with publishing or word processing software.
There are different ways of customizing this approach, such as whether you combine the axis labels, where you place the legend, how you size and arrange the different plots relative to each other, etc. Here's one way of doing it:
##--------------------------------------
## plot 1 -- high values
##--------------------------------------
dfw.mean1 <- dfw.mean[,-1L];
dfw.sd1 <- dfw.sd[,-1L];
## plot precomputations
ylim <- c(0,4000);
yticks <- seq(ylim[1L],ylim[2L],100);
xcenters <- (col(dfw.sd1)-1L)*(nrow(dfw.sd1)+1L)+row(dfw.sd1)+0.5;
partColors <- c(G='green3',F='indianred1',R='dodgerblue');
errColors <- c(G='darkgreen',F='darkred',R='darkblue');
par(xaxs='i',yaxs='i');
barplot(dfw.mean1,beside=T,col=partColors,ylim=ylim,xlab='sample',ylab='mean',axes=F);
segments(xcenters,dfw.mean1-dfw.sd1,y1=dfw.mean1+dfw.sd1,lwd=2,col=errColors);
points(rep(xcenters,2L),c(dfw.mean1-dfw.sd1,dfw.mean1+dfw.sd1),pch=19,col=errColors);
axis(1L,par('usr')[1:2],F,pos=0,tck=0);
axis(2L,yticks,las=1L,cex.axis=0.7);
legend(2,3800,dfw$part,partColors,title=expression(bold('part')),cex=0.7,title.adj=0.5[2:1]);
##--------------------------------------
## plot 2 -- low values
##--------------------------------------
dfw.mean2 <- dfw.mean[,1L,drop=F];
dfw.sd2 <- dfw.sd[,1L,drop=F];
## plot precomputations
ylim <- c(0,6);
yticks <- seq(ylim[1L],ylim[2L],0.5);
xcenters <- (col(dfw.sd2)-1L)*(nrow(dfw.sd2)+1L)+row(dfw.sd2)+0.5;
partColors <- c(G='green3',F='indianred1',R='dodgerblue');
errColors <- c(G='darkgreen',F='darkred',R='darkblue');
par(xaxs='i',yaxs='i');
barplot(dfw.mean2,beside=T,col=partColors,ylim=ylim,xlab='sample',ylab='mean',axes=F);
segments(xcenters,dfw.mean2-dfw.sd2,y1=dfw.mean2+dfw.sd2,lwd=2,col=errColors);
points(rep(xcenters,2L),c(dfw.mean2-dfw.sd2,dfw.mean2+dfw.sd2),pch=19,col=errColors);
axis(1L,par('usr')[1:2],F,pos=0,tck=0);
axis(2L,yticks,las=1L,cex.axis=0.7);
This solves both problems (small-value visibility and large-value precision). But it also distorts the relative magnitude of samples 2-4 vs. sample 1. In other words, the sample 1 data has been "scaled up" relative to samples 2-4, and the reader must make a conscious effort to read the axes and digest the differing scales in order to properly understand the plots.
The lesson here is that there's no perfect solution. Every approach has its own pros and cons, its own tradeoffs.
Gapped
In your question, you indicate you want to add a gap across the y range 10:200. On the surface, this sounds like a reasonable solution for raising the visibility of the sample 1 data. However, the magnitude of that 190 unit range is dwarfed by the range of the remainder of the plot, so it ends up having a negligible effect on sample 1 visibility.
In order to demonstrate this I'm going to use some code I've written which can be used to transform input coordinates to a new data domain which allows for inconsistent scaling of different segments of the axis. Theoretically you could use it for both x and y axes, but I've only ever used it for the y-axis.
A few warnings: This introduces some significant complexity, and decouples the graphics engine's idea of the y-axis scale from the real data. More specifically, it maps all coordinates to the range [0,1] based on their cumulative position within the sequence of segments.
At this point, I'm also going to abandon barplot() in favor of drawing the bars manually, using calls to rect(). Technically, it would be possible to use barplot() with my segmentation code, but as I said earlier, I prefer to design my own plots from scratch with primitive graphic elements. This also allows for more precise control over all aspects of the plot.
Here's the code and plot, I'll attempt to give a better explanation of it afterward:
dataCoordToPlot <- function(data,seg) {
## data -- double vector of data-world coordinates.
## seg -- list of two components: (1) mark, giving the boundaries between all segments, and (2) scale, giving the relative scale of each segment. Thus, scale must be one element shorter than mark.
data <- as.double(data);
seg <- as.list(seg);
seg$mark <- as.double(seg$mark);
seg$scale <- as.double(seg$scale);
if (length(seg$scale) != length(seg$mark)-1L) stop('seg$scale must be one element shorter than seg$mark.');
scaleNorm <- seg$scale/sum(seg$scale);
cumScale <- c(0,cumsum(scaleNorm));
int <- findInterval(data,seg$mark,rightmost.closed=T);
int[int%in%c(0L,length(seg$mark))] <- NA; ## handle values outside outer segments; will propagate NA to returned vector
(data - seg$mark[int])/(seg$mark[int+1L] - seg$mark[int])*scaleNorm[int] + cumScale[int];
}; ## end dataCoordToPlot()
## y dimension segmentation
ymax <- 4000;
yseg <- list();
yseg$mark <- c(0,10,140,ymax);
yseg$scale <- diff(yseg$mark);
yseg$scale[2L] <- 30;
yseg$jump <- c(F,T,F);
## plot precomputations
xcenters <- seq(0.5,len=length(unique(df$sample)));
xlim <- range(xcenters)+c(-0.5,0.5);
ylim <- range(yseg$mark);
yinc <- 100;
yticks.inc <- seq(ylim[1L],ylim[2L],yinc);
yticks.inc <- yticks.inc[!yseg$jump[findInterval(yticks.inc,yseg$mark,rightmost.closed=T)]];
yticks.jump <- setdiff(yseg$mark,yticks.inc);
yticks.all <- sort(c(yticks.inc,yticks.jump));
## plot
## define as reusable function for subsequent examples
custom.barplot <- function() {
par(xaxs='i',yaxs='i');
plot(NA,xlim=xlim,ylim=dataCoordToPlot(ylim,yseg),axes=F,ann=F);
abline(h=dataCoordToPlot(yticks.all,yseg),col='lightgrey');
axis(1L,seq(xlim[1L],xlim[2L]),NA,tck=0);
axis(1L,xcenters,unique(df$sample));
axis(2L,dataCoordToPlot(yticks.inc,yseg),yticks.inc,las=1,cex.axis=0.7);
axis(2L,dataCoordToPlot(yticks.jump,yseg),yticks.jump,las=1,tck=-0.008,hadj=0.1,cex.axis=0.5);
mtext('sample',1L,2L);
mtext('mean',2L,3L);
xgroupRatio <- 0.8;
xbarRatio <- 0.9;
partColors <- c(G='green3',F='indianred1',R='dodgerblue');
partsCanon <- unique(df$part);
errColors <- c(G='darkgreen',F='darkred',R='darkblue');
for (sampleIndex in seq_along(unique(df$sample))) {
xc <- xcenters[sampleIndex];
sample <- unique(df$sample)[sampleIndex];
dfs <- df[df$sample==sample,];
parts <- unique(dfs$part);
parts <- parts[order(match(parts,partsCanon))];
barWidth <- xgroupRatio*xbarRatio/length(parts);
gapWidth <- xgroupRatio*(1-xbarRatio)/(length(parts)-1L);
xstarts <- xc - xgroupRatio/2 + (match(dfs$part,parts)-1L)*(barWidth+gapWidth);
rect(xstarts,0,xstarts+barWidth,dataCoordToPlot(dfs$mean,yseg),col=partColors[dfs$part]);
barCenters <- xstarts+barWidth/2;
segments(barCenters,dataCoordToPlot(dfs$mean + dfs$sd,yseg),y1=dataCoordToPlot(dfs$mean - dfs$sd,yseg),lwd=2,col=errColors);
points(rep(barCenters,2L),dataCoordToPlot(c(dfs$mean-dfs$sd,dfs$mean+dfs$sd),yseg),pch=19,col=errColors);
}; ## end for
## draw zig-zag cutaway graphic in jump segments
zigCount <- 30L;
jumpIndexes <- which(yseg$jump);
for (jumpIndex in jumpIndexes) {
if (yseg$scale[jumpIndex] == 0) next;
jumpStart <- yseg$mark[jumpIndex];
jumpEnd <- yseg$mark[jumpIndex+1L];
lines(seq(xlim[1L],xlim[2L],len=zigCount*2L+1L),dataCoordToPlot(c(rep(c(jumpStart,jumpEnd),zigCount),jumpStart),yseg));
}; ## end for
legend(0.2,dataCoordToPlot(3800,yseg),partsCanon,partColors,title=expression(bold('part')),cex=0.7,title.adj=c(NA,0.5));
}; ## end custom.barplot()
custom.barplot();
The key function is dataCoordToPlot(). That stands for "data coordinates to plot coordinates", where "plot coordinates" refers to the [0,1] normalized domain.
The seg argument defines the segmentation of the axis and the scaling of each segment. Its mark component specifies the boundaries of each segment, and its scale component gives the scale factor for each segment. n segments must have n+1 boundaries to fully define where each segment begins and ends, thus mark must be one element longer than scale.
Before being used, the scale vector is normalized within the function to sum to 1, so the absolute magnitudes of the scale values don't matter; it's their relative values that matter.
The algorithm is to find each coordinate's containing segment, find the accumulative distance within the segment reached by the coordinate accounting for the segment's relative scale, and then add to that the cumulative distance reached by all prior segments.
Using this design, it is possible to take any range of coordinates along the axis dimension and scale them up or down relative to the other segments. An instantaneous gap across a range could be achieved with a scale of zero. Alternatively, you can simply scale down the range so that it has some thickness, but contributes little to the progression of the dimension. In the above plot, I use the latter for the gap, mainly so that I can use the small thickness to add a zigzag aesthetic which visually indicates the presence of the gap.
Also, I should note that I used 10:140 instead of 10:200 for the gap. This is because the sample 2 F part error bar extends down to 147.25 (195.07 - 47.82). The difference is negligible.
As you can see, the result looks basically identical to the Linear Single plot. The gap is not significant enough to raise the visibility of the sample 1 data.
Distorted with Gap
Just to throw some more possibilities into mix, now venturing into very non-standard and probably questionable waters, we can use the segmentation transformation to scale up the sample 1 order of magnitude, thereby making it much more visible while still remaining within the single plot, directly alongside samples 2-4.
For this example, I preserve the gap from 10:140 so you can see how it looks when not lying prostrate near the baseline.
## y dimension segmentation
ymax <- 4000;
yseg <- list();
yseg$mark <- c(0,10,140,ymax);
yseg$scale <- c(24,1,75);
yseg$jump <- c(F,T,F);
## plot precomputations
xcenters <- seq(0.5,len=length(unique(df$sample)));
xlim <- range(xcenters)+c(-0.5,0.5);
ylim <- range(yseg$mark);
yinc1 <- 1;
yinc2 <- 100;
yticks.inc1 <- seq(ceiling(yseg$mark[1L]/yinc1)*yinc1,yseg$mark[2L],yinc1);
yticks.inc2 <- seq(ceiling(yseg$mark[3L]/yinc2)*yinc2,yseg$mark[4L],yinc2);
yticks.inc <- c(yticks.inc1,yticks.inc2);
yticks.jump <- setdiff(yseg$mark,yticks.inc);
yticks.all <- sort(c(yticks.inc,yticks.jump));
## plot
custom.barplot();
Distorted without Gap
Finally, just to clarify that gaps are not necessary for inconsistent scaling between segments, here's the same plot but without the gap:
## y dimension segmentation
ymax <- 4000;
yseg <- list();
yseg$mark <- c(0,10,ymax);
yseg$scale <- c(25,75);
yseg$jump <- c(F,F);
## plot precomputations
xcenters <- seq(0.5,len=length(unique(df$sample)));
xlim <- range(xcenters)+c(-0.5,0.5);
ylim <- range(yseg$mark);
yinc1 <- 1;
yinc2 <- 100;
yticks.inc1 <- seq(ceiling(yseg$mark[1L]/yinc1)*yinc1,yseg$mark[2L],yinc1);
yticks.inc2 <- seq(ceiling(yseg$mark[2L]/yinc2)*yinc2,yseg$mark[3L],yinc2);
yticks.inc <- c(yticks.inc1,yticks.inc2);
yticks.jump <- setdiff(yseg$mark,yticks.inc);
yticks.all <- sort(c(yticks.inc,yticks.jump));
## plot
custom.barplot();
In principle, there's really no difference between the Linear Multiple solution and the Distorted solutions. Both involve visual distortion of competing orders of magnitude. Linear Multiple simply separates the different orders of magnitude into separate plots, while the Distorted solutions combine them into the same plot.
Probably the best argument in favor of using Linear Multiple is that if you use Distorted you'll probably be crucified by a large mob of data scientists, since that is a very non-standard way of plotting data. On the other hand, one could argue that the Distorted approach is more concise and helps to represent the relative positions of each data point along the number line. The choice is yours.
What you want to plot is a discontinuous y axis.
This issue was covered before in this post and seems not to be possible in ggplot2.
The answers to the mentioned post suggest faceting, log scaled y axis and separate plots to solve your problem.
Please find the reasons detailed by Hadley Wickham here, who thinks that a broken y axis could be "visually distorting".
I've been looking for a solution to convert cartesian coordinates (lat, long) that I have to polar coordinates in order to facilitate a simulation that I want to run, but I haven't found any questions or answers here for doing this in R. There are a number of options, including the built in function cart2pol in Matlab, but all of my data are in R and I'd like to continue getting comfortable working in this framework.
Question:
I have lat/long coordinates from tagging data, and I want to convert these to polar coordinates (meaning jump size and angle: http://en.wikipedia.org/wiki/Polar_coordinate_system) so that I can then shuffle or bootstrap them (haven't decided which) about 1,000 times, and calculate the straight-line distance of each simulated track from the starting point. I have a true track, and I'm interested in determining if this animal is exhibiting site affinity by simulating 1,000 random tracks with the same jump sizes and turning angles, but in completely different orders and combinations. So I need 1,000 straight-line distances from the origin to create a distribution of distances and then compare this to my true data set's straight-line distance.
I'm comfortable doing the bootstrapping, but I'm stuck at the very first step, which is converting my cartesian lat/long coordinates to polar coordinates (jump size and turning angle). I know there are built in functions to do this in other programs such as Matlab, but I can't find any way to do it in R. I could do it manually by hand in a for-loop, but if there's a package out there or any easier way to do it, I'd much prefer that.
Ideally I'd like to convert the data to polar coordinates, run the simulation, and then for each random track output an end point as cartesian coordinates, lat/long, so I can then calculate the straight-line distance traveled.
I didn't post any sample data, as it would just be a two-column data frame of lat and long coordinates.
Thanks for any help you can provide! If there's an easy explanation somewhere on this site or others that I missed, please point me in that direction! I couldn't find anything.
Cheers
For x-y coordinates that are in the same units (e.g. meters rather than degrees of latitude and degrees of longitude), you can use this function to get a data.frame of jump sizes and turning angles (in degrees).
getSteps <- function(x,y) {
d <- diff(complex(real = x, imaginary = y))
data.frame(size = Mod(d),
angle = c(NA, diff(Arg(d)) %% (2*pi)) * 360/(2*pi))
}
## Try it out
set.seed(1)
x <- rnorm(10)
y <- rnorm(10)
getSteps(x, y)
# size angle
# 1 1.3838360 NA
# 2 1.4356900 278.93771
# 3 2.9066189 101.98625
# 4 3.5714584 144.00231
# 5 1.6404354 114.73369
# 6 1.3082132 135.76778
# 7 0.9922699 74.09479
# 8 0.2036045 141.67541
# 9 0.9100189 337.43632
## A plot helps check that this works
plot(x, y, type = "n", asp = 1)
text(x, y, labels = 1:10)
You can do a transformation bewteen cartesian and polar this way:
polar2cart <- function(r, theta) {
data.frame(x = r * cos(theta), y = r * sin(theta))
}
cart2polar <- function(x, y) {
data.frame(r = sqrt(x^2 + y^2), theta = atan2(y, x))
}
Since it is fairly straight forward, you can write your own function. Matlab-like cart2pol function in R:
cart2pol <- function(x, y)
{
r <- sqrt(x^2 + y^2)
t <- atan(y/x)
c(r,t)
}
I used Josh O'Brien's code and got what appear to be reasonable jumps and angles—they match up pretty well to eyeballing the rough distance and heading between points. I then used a formula from his suggestions to create a function to turn the polar coordinates back to cartesian coordinates, and a for loop to apply the function to the data frame of all of the polar coordinates. The loops appear to work, and the outputs are in the correct units, but I don't believe the values that it's outputting are corresponding to my data. So either I did a miscalculation with my formula, or there's something else going on. More details below:
Here's the head of my lat long data:
> head(Tag1SSM[,3:4])
lon lat
1 130.7940 -2.647957
2 130.7873 -2.602994
3 130.7697 -2.565903
4 130.7579 -2.520757
5 130.6911 -2.704841
6 130.7301 -2.752182
When I plot the full dataset just as values, I get this plot:
which looks exactly the same as if I were to plot this using any spatial or mapping package in R.
I then used Josh's function to convert my data to polar coordinates:
x<-Tag1SSM$lon
y<-Tag1SSM$lat
getSteps <- function(x,y) {
d <- diff(complex(real = x, imaginary = y))
data.frame(size = Mod(d),
angle = c(NA, diff(Arg(d)) %% (2*pi)) * 360/(2*pi))
}
which produced the following polar coordinates appropriately:
> polcoords<-getSteps(x,y)
> head(polcoords)
size angle
1 0.04545627 NA
2 0.04103718 16.88852
3 0.04667590 349.38153
4 0.19581350 145.35439
5 0.06130271 59.37629
6 0.01619242 31.86359
Again, these look right to me, and correspond well to the actual angles and relative distances between points. So far so good.
Now I want to convert these back to cartesian coordinates and calculate a euclidian distance from the origin. These don't have to be in true lat/long, as I'm just comparing them amongst themselves. So I'm happy for the origin to be set as (0,0) and for distances to be calculated in reference x,y values instead of kilometers or something like that.
So, I used this function with Josh's help and a bit of web searching:
polar2cart<-function(x,y,size,angle){
#convert degrees to radians (dividing by 360/2*pi, or multiplying by pi/180)
angle=angle*pi/180
if(is.na(x)) {x=0} #this is for the purpose of the for loop below
if(is.na(y)) {y=0}
newx<-x+size*sin(angle) ##X #this is how you convert back to cartesian coordinates
newy<-y+size*cos(angle) ##Y
return(c("x"=newx,"y"=newy)) #output the new x and y coordinates
}
And then plugged it into this for loop:
u<-polcoords$size
v<-polcoords$angle
n<-162 #I want 162 new coordinates, starting from 0
N<-cbind(rep(NA,163),rep(NA,163)) #need to make 163 rows, though, for i+1 command below— first row will be NA
for(i in 1:n){
jump<-polar2cart(N[i,1],N[i,2],u[i+1],v[i+1]) #use polar2cart function above, jump from previous coordinate in N vector
N[i+1,1]<-jump[1] #N[1,] will be NA's which sets the starting point to 0,0—new coords are then calculated from each previous N entry
N[i+1,2]<-jump[2]
Dist<-sqrt((N[163,1]^2)+(N[163,2]^2))
}
And then I can take a look at N, with my new coordinates based on those jumps:
> N
[,1] [,2]
[1,] NA NA
[2,] 0.011921732 0.03926732
[3,] 0.003320851 0.08514394
[4,] 0.114640605 -0.07594871
[5,] 0.167393509 -0.04472125
[6,] 0.175941466 -0.03096891
This is where the problem is... the x,y coordinates from N get progressively larger—there's a bit of variation in there, but if you scroll down the list, y goes from 0.39 to 11.133, with very few backward steps to lower values. This isn't what my lat/long data do, and if I calculated the cart->pol and pol->cart properly, these new values from N should match my lat/long data, just in a different coordinate system. This is what the N values look like plotted:
Not the same at all... The last point in N is the farthest point from the origin, while in my lat/long data, the last point is actually quite close to the first point, and definitely not the farthest point away. I think the issue must be in my conversion from polar coordinates back to cartesian coordinates, but I'm not sure how to fix it...
Any help in solving this would be much appreciated!
Cheers
I think this code I wrote converts to polar coordinates:
# example data
x<-runif(30)
y<-runif(30)
# center example around 0
x<-x-mean(x)
y<-y-mean(y)
# function to convert to polar coordinates
topolar<-function(x,y){
# calculate angles
alphas<-atan(y/x)
# correct angles per quadrant
quad2<-which(x<0&y>0)
quad3<-which(x<0&y<0)
quad4<-which(x>0&y<0)
alphas[quad2]<-alphas[quad2]+pi
alphas[quad3]<-alphas[quad3]+pi
alphas[quad4]<-alphas[quad4]+2*pi
# calculate distances to 0,0
r<-sqrt(x^2+y^2)
# create output
polar<-data.frame(alphas=alphas,r=r)
}
# call function
polar_out<-topolar(x,y)
# get out angles
the_angles<-polar_out$alphas
Another option only in degree
pol2car = function(angle, dist){
co = dist*sin(angle)
ca = dist*cos(angle)
return(list(x=ca, y=co))
}
pol2car(angle = 45, dist = sqrt(2))
cart2sph {pracma} Transforms between cartesian, spherical, polar, and cylindrical coordinate systems in two and three dimensions.