I have a large dataset which I need to plot in loglog scale in Gnuplot, like this:
set log xy
plot 'A_1D_l0.25_L1024_r0.dat' u 1:($2-512)
LogLogPlot of my datapoints
Text file with the datapoints
Datapoints on the x axis are equally spaced, but because of the logscale they get very dense on the right part of the graph, and as a result the output file (I finally export it in .tex) gets very large.
In linear scale, I would simply use the option every to reduce the number of points which get plotted. Is there a similar option for loglogscale, such that the plotted points appear equally spaced?
I am aware of a similar question which was raised a few years ago, but in my opinion the solution is unsatisfactory: plotted points are not equally spaced along the x-axis. I think this is a really unsophisticated problem which deserves a clearer solution.
As I understand it, you don't want to plot the actual data points; you just want to plot a line through them. But you want to keep the appearance of points rather than a line. Is that right?
set log xy
plot 'A_1D_l0.25_L1024_r0.dat' u 1:($2-512) with lines dashtype '.' lw 2
Amended answer
If it is important to present outliers/errors in the data set then you must not use every or any other technique that simply discards or skips most of the data points. In that case I would prefer the plot with points that you show in the original question, perhaps modified to represent each point as a dot rather than a cross. I will simulate this by modifying a single point in your 500000 point data set (first figure below). But I would also suggest that the presence of outliers is even more apparent if you plot with lines (second figure below).
Showing error bounds is another alternative for noisy data, but the options depend on what you have to work with in your data set. If you want to pursue that, please ask a separate question.
If you really want to reduce the number of data to be plotted, you might consider the following script.
s = 0.1 ### sampling interval in log scale
### (try 0.05 for more detail)
c = log10(0.01) ### a parameter used in sampler(x)
### which should be initialized by
### smaller value than any x in log scale
sampler(x) = (x>0 && log10(x)>=c) ? (c=ceil(log10(x)/s+0.5)*s, x) : NaN
set log xy
set grid xtics
plot 'A_1D_l0.25_L1024_r0.dat' using (sampler($1)):($2-512) with points pt 7 lt 1 notitle , \
'A_1D_l0.25_L1024_r0.dat' using 1:($2-512) with lines lt 1 notitle
This script samples the data in increments of roughly 0.1 on x-axis in log scale. It makes use of the property that points whose x value is evaluated as NaN in using are not drawn.
I'm using splot to visualize the fitness histogram for an optimization problem. In this scenario positive Z values (say in the +2 - +15 range) representing good solutions are of particular interest whereas negative values don't provide much insight, i.e. it doesn't matter if a bad solution as a Z value of -50, -500 or -5000.
Using the autorange option all the interesting bits around +/- 0 are 'scaled away' (i.e. mostly flat to include neg. peaks in the surface) so I'm now using an explicit zrange of [-bestValue:bestValue] to focus the plot on the interesting Z values.
Now the development of best solutions close to 0 can be traced much better, however the surface is rendered with 'holes' for neg. Z values exceeding the range:
This is very confusing to look at/interpret.
(FWIW the hidden3d option is enabled)
Can we (gnuplot) 'fill' the holes in some way, e.g. by clamping neg. values in the surface plot instead of just dropping the points from the surface?
I have a data-set containing many small (~100), but some bigger (~106) values as well. But the few big points (Which are not that important, but shouldn't be left out) stretch the whole axis.
For positive values i'd use log="y"...
But now, i'd love to get an axis like "...-102, -101, 0, 101, 102...".
Is there a(n easy) possibility to do this?
Do i have to loop through my data set, mapping all negative numbers x to -log(|x|)? And what to do with the y-axis afterwards?
I am wondering if there is a way to account for outlier in a histogram plot. I want to plot the frequencies of a random variable, which is very small and distributed around zero. However, in most of the cases I am considering I also have an outlier that complicates things. Is there a way to adjust the scale of the x axis in R/Matlab so that I can capture the distribution of the random variable I am considering and also show the outlier? Because normal ways to obtain the plot result in such a scale that all values are considered to be zero, and I want to show how they are distributed around zero. So ideally I would like to have the scales around zero accounting for very small numbers and than after a gap (which does not necessarily have to be proportional to the actual distance from zero) a bin to indicate the value of the outlier. And I do not want to remove the outlier from the sample.
Is such a thing possible in R/Matlab? Any other suggestions would be welcome.
Edit: The problem is not in identifying the outliers and using a different color for them. The problem is in adjusting the scales on the x-axis so I can observe the distribution of the variable as well as have the outlier included in the plot.
The next code will do the job, but you need to change the Xticklabels of the axes in order to make them show the real value of the outliers.
A=rand(1000,1)*0.1;
A(1:10)=10;
% modify the data for plotting pourposes. Get the outliers closer
expected_maximum_value=1; % You can compute this useg 3*sigma maybe?
distance_to_outliers=0.5;
outlier_mean=mean(A(A>expected_maximum_value));
A(A>expected_maximum_value)=A(A>expected_maximum_value)-outlier_mean+distance_to_outliers;
% plot
h=histogram(A,'BinWidth',0.01)
%% trick the X axis
ax=gca;
ax.XTickLabel{end-1}=[ax.XTickLabel{end-1} '//'];
ax.XTickLabel{end}=['//' num2str(outlier_mean)];
I am looking to present a variable as a bar plot with the caveat that the groups I am trying to plot (the size of an object) vary over several orders of magnitude. The other complication of the data is that the variable y also varies over several orders of magnitude when positive as well as having negative values. I usually think in pictures so I have sketched something along the lines that I am looking for below (the colour would simply be a function of the distance from zero, i.e. white zero, dark blue very negative, dark red very positive etc):
Here is a real case of the data if required:
x <- c(1.100e-08, 1.200e-08, 1.300e-08, 1.400e-08, 1.600e-08, 1.700e-08, 1.900e-08, 2.100e-08, 2.300e-08, 2.600e-08, 3.100e-08, 3.500e-08, 4.200e-08, 4.700e-08, 5.200e-08, 5.800e-08, 6.400e-08, 7.100e-08, 7.900e-08, 8.800e-08, 9.800e-08, 1.100e-07, 1.230e-07, 1.380e-07, 1.550e-07, 1.760e-07, 3.250e-07, 3.750e-07, 4.250e-07, 4.750e-07, 5.400e-07, 6.150e-07, 6.750e-07, 7.500e-07, 9.000e-07, 1.150e-06, 1.450e-06, 1.800e-06, 2.250e-06, 2.750e-06, 3.250e-06, 3.750e-06, 4.500e-06, 5.750e-06, 7.000e-06, 8.000e-06, 9.250e-06, 1.125e-05, 1.375e-05, 1.625e-05, 1.875e-05, 2.250e-05, 2.750e-05, 3.100e-05)
y <-c(1.592140e+01, -1.493541e+01, -6.255603e+00, -2.191637e+00, -1.274086e+00, -1.343391e+00, -8.869018e-01, -7.717447e-01, -6.140710e-01, -5.637220e-01, -5.404424e-01, -3.473077e-01, -2.279666e-01, -1.945254e-01, -2.485636e-01, -2.363181e-01, -2.197054e-01, -2.119314e-01, -1.897220e-01, -1.656779e-01, -1.478176e-01, -1.364191e-01, -1.297830e-01, -1.408082e-01, -1.514742e-01, -1.311300e-01, -1.358422e-01, -2.718636e+00, -2.231532e+00, -3.479395e+00, -3.572720e+00, -2.297957e+00, -3.265428e+00, -5.449620e+00, -7.741435e+00, -1.172256e+01, 9.368365e+00, 1.078983e+02, 9.542029e+01, 1.484089e+02, 2.293383e+02, 3.678836e+02, 7.965286e+02, 1.349151e+03, 1.577808e+04, 4.554271e+05, 1.821730e+06, 8.092310e+04, 1.015619e+06, 2.113788e+06, 5.208331e+06, 4.534863e+06, 8.086026e+06, 1.577413e+07)
I could also plot this as a scatterplot with broken axis but I am currently playing with the a nice approach to display such data- important for me is highlighting at the approximate value of x that y changes sign as well as the variability and magnitude of both the positive and negative values. Any tips and advice you have plotting such data would be great.
Edit based upon comments
I realise that on my graph x and y are the wrong way around, apologies for that. Parameter x should indeed be on the x-axis and parameter y on the y-axis.
Taking on board your suggestions I would be better to plot this data as a scatterplot. Accepting that I still need to break my axis at a relevant value of y (not x as shown in the figure) and have a log scale above this value and linear scale below. Somewhere below the smallest "positive" value of y seems sensible for this break. Can this be done using base r?
I guess something like this but with the split on the y-axis rather than the x-axis and in r of course.