Plot two large Raster Data Sets in a Scatter Plot - r

i have a problem with plotting two Raster Data Sets in R.
I use two different IRS LISS III Scenes (with the same Extent) and what i want is to plot the pixel values of both scenes in one Scatterplot (x= Layer1 and y=Layer2).
My problem is now the handling of the big amount of data. Each Scene has about 80.000.000 pixels due reclassification and other processing i was able to scale down the values to a amount of 12.000.000 in each raster. But when i try to import these values e.g. in a data.frame or load them from an ascii file i always got problems with my memory.
Is it possible two plot such an amount of data, and when yes it would be great if someone could help me, i was trying it for two days now and right now im desperated.
Many thanks,
Stefan

Use the raster package, there's a good chance it will work out of the box since it has good "out-of-memory" handling. If it doesn't work with the ASCII grids, convert them to something more efficient (like an LZW-compressed and tiled GeoTIFF) with GDAL. And if they are still too big resize them, that's all the graphics rendering process will do anyway. (You don't say how you resized originally, or give any details on how you are trying to read them).

Related

Cannot use plot() function in RStudio for large objects

I am trying to use the default plot() function in R to try and plot a shapefile that is about 100MB, using RStudio. When I try and plot the shapefile, the command doesn't finish executing for around 5 minutes, and when it finally does, the plotting window remains blank. When I execute the same process exactly in VS Code, the plot appears almost instantly, as expected.
I have tried uninstalling and reinstalling RStudio with no success.
I can't speak for what VStudio does, but I can guarantee that plotting 100MB worth of data points is useless (unless the final plot is going to be maybe 6 by 10 meters in size).
First thing: can you load the source file into R at all? One would hope so since that's not a grossly huge data blob. Then use your choice of reduction algorithms to get a reasonable number of points to plot, e.g. 800 by 1600, which is all a monitor can display anyway.
Next try plotting a small subset to verify the data are in a valid form, etc.
Then consider reducing the data by collapsing maybe each 10x10 region to a single average value, or by using ggplot2:geom_hex .

Working with Dask, Xarray, Holoviews, Bokeh Datasets

Currently have very large xarray dataset with 7 data variables, each with shape of about 60-80k (x,y) values as well as 4 years (give or take) of time data. Im working on creating a UI that will display 3 maps, 1 main map and 2 maps shown below of a selected AOI, as well as a scatter plot, a time-series curve and a scatter matrix. I wan't this all to be fully interactive meaning updating all plots simultaneously with dropdowns selecting data variables and time-sliders etc.The issue i'm currently having is with memory and render time.
I'm wondering if anyone has done anything similar to this and what the best way to go about doing something like this while being efficient with memory etc.
xrviz does interactive 2d visualisation and extraction for xarray data (based on panel/holoviz). Please try it out and help us to improve it - it is still a work in progress.

How to make plots from distributed data from R

I'm working with spark using R API, and have a grasp on how data is processed from spark, either when only spark native functions are used in which cases it is transparent for the user or in cases where spark_apply() is used, where it is required to have a better understanding on how the partitions are handled.
My doubt is regarding to plots where no aggregation is done, for example, is my understanding that if a group by is used before a plot not all the data will be used. But if I need to make say a scatter plot with 100 million dots, where is that data stored at this point? is it still distributed between all nodes? or is it at one node only, if the later... with the cluster get frozen because of this?
I know you write that no aggregation is (should be?) done, but I'd wager that is precisely what you need and want to do. The point of distributed computing is largely that partial results are computed, well, distributed at each node. For very big data sets, each node (often) sees only a subset of the data.
In regards to the plotting: a scatter plot more that even a few thousand (not to mention a 100 million) points will contain a significant amount of overplotting. Either you 'fix' that by making the points transparent, you do a density estimate, or you do some binning of the data (e.g. a hexbin plot or a heatmap). The latter can be done distributed by the nodes and the plot. The returned binned results from each node can then be aggregated to a final results by the master node and be plotted.
Even if you somehow had a node making a scatter plot of 100 million points, what is your output format? Vector graphics (e.g. pdf/svg) would create a huge file. Raster graphics (e.g. jpg, png) will effectively aggregate on your behalf when the plot is rasterized -- so you might as well control that yourself with bins the size of pixels.

Need a fast dataset 2D-viewer/plotter for large datasets

I'm searching a data viewer/plotter for some data I've generated.
Facts
First some facts about the data I've generated:
There are several datasets with about 3 million data points each.
Each dataset currently is stored in ascii format.
Every line represents a point and consists of multiple columns.
The first two columns determine the position of the point (i.e. x and y value) whereas the first column is a timestamp and the second is a normalized float between 0 and 1.
The other columns contain additional data which may be used to colorize the plot or filter the data.
An example data point:
2012-08-08T01:02:03.040 0.0165719281 foobar SUCCESS XX:1
Current Approach
Currently I am generating multiple png files (with gnuplot) with different selection criteria like the following ones for each data set:
Display all points in grey.
Display all points in grey, but SUCCESS in red.
Display all points in grey, but SUCCESS in red, XX:-1 in green; if both SUCCESS and XX:-1 match use blue as coloring.
Drawbacks
With the current approach there are some drawbacks I'd like to have addressed:
I can't easily switch on/off some filters or colorings because I have to generate a new png file every time.
I need to use a limited resolution in my image file because the higher the resolution the slower is the viewer. So I can only zoom in to a limited level of detail.
I don't have the raw data available in the png viewer for each point. Ideally I'd like to have the data visible on selection of a point.
Already tested
I've already tested some other approaches:
Gnuplot itself has a viewer but it can't handle that amount of points efficiently - it is too slow and consumes too much memory.
I've had a quick look at KST, but I couldn't find a way to display 2D data and I don't think it will meet my wishes.
Wishes
I'd like to have a viewer which can operate on the raw data, can displays the points quickly if zoomed out, can also zoom in quickly and as well should resolve the aforementioned drawbacks.
Question
So finally, does anybody know of such a viewer or has another suggestion?
If there isn't a viewer some recommendations for programming it myself are welcome, too.
Thanks in advance
Stefan

Raster map vs alternative

I recently found this web page Crime in Downtown Houston that I'm interested in reproducing. This is my first learning experience with mapping in R and thus lack the vocabulary and understanding necessary to make appropriate decisions.
At the end of the page David Kahle states:
One last point might be helpful. In making these kinds of plots, one
might tempted to use the map raster file itself as a background. This
method can be used to make map plots much more quickly than the
methods described above. However, the method has one very significant
disadvantage which, if not handled properly, can destroy the entire
purpose of using the map.
In very plain English what is the difference between the raster file
approach and his approach?
Does the RgoogleMaps package have the ability to produce these types
of high quality maps as seen on the page I referenced above that
calls a google map into R?
I ask not because I lack information but the opposite. There's too much and I want to make a good decision(s) about the approach to pursue so I'm not wasting my time on outdated or inefficient techniques.
Feel free to pass along any readings you think would benefit me.
Thank you in advance for your direction.
Basically, you had two options at the time this plot was made:
draw the map as a layer using geom_tile, where each pixel of the image is mapped onto the x,y axes (slow but accurate)
add a background image to the plot, as a purely "cosmetic" annotation. This method is faster, because you can use grid.raster which draws images more efficiently, but the image is not constrained by the axes of the plotting region. In other words, you have to manually adjust the x and y axes limits to make sure that the image corresponds to the actual positions on the plot.
Now, I would suggest you look at the new annotation_raster in ggplot2 v. 0.9.0. It should have the advantage of speed and leaner output files, and still conform to the data space of the plot. I believe that this function, as well as geom_raster and annotation_map did not exist when David made those plots.

Resources