Octave crashes when printing a plot - plot

Solution: As suggested by user Andy in the comments, an update to the newest version of Octave (at the moment: octave-4.0.1-rc4) fixed the problem and the plot could be saved as PNG.
I have a large-ish amount of data that I plot in Octave. But when I try to save the image, the program crashes without any explanation or real error message. My Octave is version 4.0 and it's running on Win 8.1, the graphics_toolkit is qt.
Saving smaller amounts of data has worked so far, but somehow I seem to have reached a size where the plot can be drawn but not saved.
First, I load the data from several files listed in the vector inputs:
data = [];
for i = 1:length(inputs)
data = [data; load(inputs{i})];
endfor
The result is a 955.524 x 7 matrix containing numbers. Loading alone takes a while on my system (several minutes), but eventually succeeds. I then proceed to plot the data:
hold on;
for j = 1:length(data(1,:))
curenntColumn = normalize(data(:,j)); % make sure all data is in the same range
plot(1:length(curenntColumn), curenntColumn, colours{j}); % plot column with distinct colour
endfor
hold off;
This results in a plot being drawn as Figure 1 that shows all 955.524 entries of each of the seven columns correctly in a distinct colour. If the program ends here, it exits properly. However, if I add
print("data.png");
Octave will keep running after opening the plot window and eventually crash with a simple "program does not work anymore" error message. The same happens if I try to save manually from the File->Save menu (which offers saving as PDF). Even just touching and moving the plot window takes a few seconds.
I tried using gnuplot and fltk as graphics_toolkit, but the latter does not even open a plot window, and the former seems to be broken (crashes on the attempt of plotting even simple data like plot(1:10,1:10);).
Now, I could screenshot the plot and try to work with that, but I'd really rather have it be saved automatically. Also, I find it weird that displaying the curves is possible, but not saving said display. As it works for smaller amounts of data, maybe I just need to somehow allocate more resources to Octave?

It (4.2.2 version) crashes with my Linux Mint. just a simple graph, and it crashed two times in a row. I am going back to R. I had my hopes up as I wanted to review the Numerical Analysis Using Matlab text.
Wait, come to think of it, the Studio Version of R crashes when I try to use it but not when I run the same program from the command line, so I will go back (one more time) and try to run a plot totally from the command line. The Linux Mint requires a 2 CPU 64 bit, and I just have the 64 bit single CPU.

Related

Some failures to use Rstudio + sparklyr in Watson Studio for data manipulation on large data set

I got Error in curl::curl_fetch_memory(url, handle = handle) : Empty reply from server for some operations in Rstudio (Watson studio) when I tried to do data manipulation on Spark data frames.
Background:
The data is stored on IBM Cloud Object Storage (COS). It will be several 10GB files but currently I'm testing only on the first subset (10GB).
The workflow I suppose is, in Rstudio (Watson Studio), connect to spark (free plan) using sparklyr, read the file as Spark data frame through sparklyr::spark_read_csv(), then apply feature transformation on it (e.g., split one column into two, compute the difference between two columns, remove unwanted columns, filter out unwanted rows etc.). After the preprocessing, save out the cleaned data back to COS through sparklyr::spark_write_csv().
To work with Spark I added 2 spark services into the project (seems like any spark service under the account can be used by Rstudio.. Rstudio is not limited by project?); I may need to use R notebooks for data exploration (to show the plots in a nice way) so I created the project for that purpose. In previous testings I found that for R notebooks / Rstudio, the two env cannot use the same Spark service at the same time; so I created 2 spark services, the first for R notebooks (let's call it spark-1) and the second for Rstudio (call it spark-2).
As I personally prefer sparklyr (pre-installed in Rstudio only) over SparkR (pre-installed in R notebooks only), for almost the whole week I was developing & testing code in Rstudio using spark-2.
I'm not very familiar with Spark and currently it behaves in a way that I don't really understand. It would be very helpful if anyone can give suggestions on any issue:
1) failure to load data (occasionally)
It worked quite stable until yesterday, since when I started to encounter issues loading data using exactly the same code. The error does not tell anything but R fails to fetch data (Error in curl::curl_fetch_memory(url, handle = handle) : Empty reply from server). What I observed for several times is, after I got this error, if I again run the code to import data (just one line of code), the data would be loaded successfully.
Q1 screenshot
2) failure to apply (possibly) large amount of transformations (always, regardless of data size)
To check whether the data is transformed correctly, I printed out the first several rows of interested variables after each step (most of them are not ordinal, i.e., the order of steps doesn't matter) of transformation. I read a little bit of how sparklyr translates operations. Basically sparklyr doesn't really apply the transformation to the data until you call to preview or print some of the data after transformation. After a set of transformations, if I run some more, when I printed out the first several rows I got error (same useless error as in Q1). I'm sure the code is right as once I run these additional steps of code right after I load the data, I'm able to print and preview the first several rows.
3) failure to collect data (always for the first subset)
By collecting data I want to pull the data frame down to the local machine, here to Rstudio in Watson Studio. After applying the same set of transformations, I'm able to collect the cleaned version of a sample data (originally 1000 rows x 158 cols, about 1000 rows x 90 cols after preprocessing), but failed on the first 10 GB subset file (originally 25,000,000 rows x 158 cols, at most 50,000 rows x 90 cols after preprocessing). The space it takes up should not exceed 200MB in my opinion, which means it should be able to be read into either Spark RAM (1210MB) or Rstudio RAM. But it just failed (again with that useless error).
4) failure to save out data (always, regardless of data size)
The same error happened every time when I tried to write the data back to COS. I suppose this has something to do with the transformations, probably something happens when Spark received too many transformation request?
5) failure to initialize Spark (some kind of pattern found)
Starting from this afternoon, I cannot initialize spark-2, which has been used for about a week. I got the same useless error message. However I'm able to connect to spark-1.
I checked the spark instance information on IBM Cloud:
spark-2
spark-1
It's weird that spark-2 has 67 active tasks since my previous operations got error messages. Also, I'm not sure why "input" in both spark instances are so large.
Does anyone know what happened and why did it happen?
Thank you!

R console output too long, how can I view outputs in `less`?

I am inspecting research data from NIR spectroscopy. Unfortunately, the output is too big (2048 rows with 15 columns).
Very often, when I try to check a variable like mymodel$loadings my results get truncated.
I understand that I can increase the max output of my terminal, but it's really a hassle to scroll my mouse up from my terminal window. Is there a way I can tell R to pipe the output from my last statement to less or more so I can just scroll using the keyboard?
Are you using a version of RStudio? I would generally look at tables like this in the Data Viewer pane, it allows you to see all data in tables like yours a lot easier.
Access by clicking on the data frame name in top right, or using below in console:
View(dataframe_name)

simple R Time Series function plotting

thank you kindly for your time.
I'm merely trying to plot a simple time series data set, but am running into a number of basic issues (one of which I'll ask here). For example, I have a notepad file that starts with:
"x"
"1",2.731
"2",2.562
"3",2.632
"4",2.495
"5",1.978
...and so on...
So R reads it just fine, e.g. myfile=read.table("F:/Documents/myfile.txt",sep=""). However, the values seem to change under a conversion using R's ts function, i.e.
myfile = ts(myfile,start=1,end=120,frequency=1)
plot(myfile, type="o",pch=22,lty=1,pty=2,xlab="Month",ylab="Values",main="My File")
So when plotted, the first value starts at 20+ for some reason, as opposed to 2+. Furthermore, R assumes that the y-axis goes from 1 to 120 (mirroring the x-axis), which is not the right scale (i.e. 0 through 10). In another data set that I did (using integers), it was shifted upward by 1. In any event, I believe the issue is probably about how to properly identifying the y-axis.
Any ideas on how to tackle this? Thanks!

R using waaay more memory than expected

I have an Rscript being called from a java program. The purpose of the script is to automatically generate a bunch of graphs in ggplot and them splat them on a pdf. It has grown somewhat large with maybe 30 graphs each of which are called from their own scripts.
The input is a tab delimited file from 5-20mb but the R session goes up to 12gb of ram usage sometimes (on a mac 10.68 btw but this will be run on all platforms).
I have read about how to look at the memory size of objects and nothing is ever over 25mb and even if it deep copies everything for every function and every filter step it shouldn't get close to this level.
I have also tried gc() to no avail. If I do gcinfo(TRUE) then gc() it tells me that it is using something like 38mb of ram. But the activity monitor goes up to 12gb and things slow down presumably due to paging on the hd.
I tried calling it via a bash script in which I did ulimit -v 800000 but no good.
What else can I do?
In the process of making assignments R will always make temporary copies, sometimes more than one or even two. Each temporary assignment will require contiguous memory for the full size of the allocated object. So the usual advice is to plan to have _at_least_ three time the amount of contiguous _memory available. This means you also need to be concerned about how many other non-R programs are competing for system resources as well as being aware of how you memory is being use by R. You should try to restart your computer, run only R, and see if you get success.
An input file of 20mb might expand quite a bit (8 bytes per double, and perhaps more per character element in your vectors) depending on what the structure of the file is. The pdf file object will also take quite a bit of space if you are plotting each point within a large file.
My experience is not the same as others who have commented. I do issue gc() before doing memory intensive operations. You should offer code and describe what you mean by "no good". Are you getting errors or observing the use of virtual memory ... or what?
I apologize for not posting a more comprehensive description with code. It was fairly long as was the input. But the responses I got here were still quite helpful. Here is how I mostly fixed my problem.
I had a variable number of columns which, with some outliers got very numerous. But I didn't need the extreme outliers, so I just excluded them and cut off those extra columns. This alone decreased the memory usage greatly. I hadn't looked at the virtual memory usage before but sometimes it was as high as 200gb lol. This brought it down to up to 2gb.
Each graph was created in its own function. So I rearranged the code such that every graph was first generated, then printed to pdf, then rm(graphname).
Futher, I had many loops in which I was creating new columns in data frames. Instead of doing this, I just created vectors not attached to data frames in these calculations. This actually had the benefit of greatly simplifying some of the code.
Then after not adding columns to the existing dataframes and instead making column vectors it reduced it to 400mb. While this is still more than I would expect it to use, it is well within my restrictions. My users are all in my company so I have some control over what computers it gets run on.

Plotting hundreds of hours of data with gnuplot

I am trying to plot data from a simulation that tracks simulation time in (hours):(minutes):(seconds) format, but does not turn (hours) into days - so (hours) can be in the hundreds. When gnuplot plots data by time, however ("set xdata time"), it only plots up to 99 hours in one continuous plot; after that, it loops back around and starts overplotting hour 100+ near the beginning (and even then, does weird stuff). Does anyone know why this happens and/or how to get around it?
I also looked into reading the components of the time column (which is the 3rd field of data on each line, but not necessarily a fixed number of characters into the line) in as 3 simple numbers (integers), then converting to a real number, which happens to be a decimal version of the time (e.g., 107:45:00 -> 107.75), which would be fine for the plot, but I haven't been able to figure out how to get gnuplot to do that, either.
Any other ideas are welcome. (I would rather not alter the original file, due to the additional complexity of multiple versions of each file, having to teach others how to convert the file and how to figure out the plot didn't work because they didn't convert the file, etc.)
Version 2 of MathGL (GPL plotting library) have time ticks which can be set as you want (using standard strftime() format). However it is in beta version now -- stable version should appear at October 2011.

Resources