I am inspecting research data from NIR spectroscopy. Unfortunately, the output is too big (2048 rows with 15 columns).
Very often, when I try to check a variable like mymodel$loadings my results get truncated.
I understand that I can increase the max output of my terminal, but it's really a hassle to scroll my mouse up from my terminal window. Is there a way I can tell R to pipe the output from my last statement to less or more so I can just scroll using the keyboard?
Are you using a version of RStudio? I would generally look at tables like this in the Data Viewer pane, it allows you to see all data in tables like yours a lot easier.
Access by clicking on the data frame name in top right, or using below in console:
View(dataframe_name)
Related
Background
This may be my lack of skill showing, but as I'm working on data manipulation in R, using RStudio, I'm fond of clicking into dataframes in the "Environments" section of the GUI (for me it's in the top-right of the screen) to see how my joins, mutates, etc. are changing the table(s) as I move through my workflow. It acts as a visual sanity check for me; when it comes to tables and dataframes I'm a very visual thinker, and I like to see my results as I code. As an example, I click on this:
And see something like this:
The Problem
Lately, because of a very large dataset (~200m rows), I've needed to do some of my dplyr work inside sparklyr, using a local instance of Apache Spark to work through some data manipulation. It's working mostly fine, but I lose my ability to have little previews of the data because spark dataframe objects look like lists in the Environment pane:
Besides clicking, is there a way I can "preview" my Spark dataframes inside RStudio as I work on them?
What I've tried
So your first thought might be "just use head()" -- and you'd be right! Except that running head(d1, 5) on a local Spark df with 200 million rows takes ... a long time.
Anything I may be missing?
Generally, I believe you need to call collect() on the Spark dataframe. So I would first sample the Spark dataframe, say .001% of the rows (if there's 200 million) with the sparklyr::sdf_sample function, and then collect that sample into a regular dataframe to look at.
samp <- analysis_test %>% sdf_sample(fraction = .00001) %>% collect()
I deal with large datasets frequently. Unfortunately, RStudio IDE's environment shows row count of data.frame like 156212811 and it'll be really useful if it shows in readable notation (like 156,212,811).
Is there any way how I can force it to use commas in row count?
Attached screenshot for reference
I use R and MATLAB from the command line and edit files with external editors. In MATLAB, the command workspace opens a graphical window with a list of all variables and the current values. If I double click on a complex object, like a matrix, MATLAB automatically opens another window with a table containing the values.
Is there any similar way to do so in R?
In this link, in the end of the first "box", it is listed the command browse.workspace, which seems what I am looking for. Unfortunately I cannot invoke it.
I tried with commands which prints output in the terminal (like str(as.list(.GlobalEnv))) but I do not like the result. When I have got a lot of variables it is a big mess.
Solution: As suggested by user Andy in the comments, an update to the newest version of Octave (at the moment: octave-4.0.1-rc4) fixed the problem and the plot could be saved as PNG.
I have a large-ish amount of data that I plot in Octave. But when I try to save the image, the program crashes without any explanation or real error message. My Octave is version 4.0 and it's running on Win 8.1, the graphics_toolkit is qt.
Saving smaller amounts of data has worked so far, but somehow I seem to have reached a size where the plot can be drawn but not saved.
First, I load the data from several files listed in the vector inputs:
data = [];
for i = 1:length(inputs)
data = [data; load(inputs{i})];
endfor
The result is a 955.524 x 7 matrix containing numbers. Loading alone takes a while on my system (several minutes), but eventually succeeds. I then proceed to plot the data:
hold on;
for j = 1:length(data(1,:))
curenntColumn = normalize(data(:,j)); % make sure all data is in the same range
plot(1:length(curenntColumn), curenntColumn, colours{j}); % plot column with distinct colour
endfor
hold off;
This results in a plot being drawn as Figure 1 that shows all 955.524 entries of each of the seven columns correctly in a distinct colour. If the program ends here, it exits properly. However, if I add
print("data.png");
Octave will keep running after opening the plot window and eventually crash with a simple "program does not work anymore" error message. The same happens if I try to save manually from the File->Save menu (which offers saving as PDF). Even just touching and moving the plot window takes a few seconds.
I tried using gnuplot and fltk as graphics_toolkit, but the latter does not even open a plot window, and the former seems to be broken (crashes on the attempt of plotting even simple data like plot(1:10,1:10);).
Now, I could screenshot the plot and try to work with that, but I'd really rather have it be saved automatically. Also, I find it weird that displaying the curves is possible, but not saving said display. As it works for smaller amounts of data, maybe I just need to somehow allocate more resources to Octave?
It (4.2.2 version) crashes with my Linux Mint. just a simple graph, and it crashed two times in a row. I am going back to R. I had my hopes up as I wanted to review the Numerical Analysis Using Matlab text.
Wait, come to think of it, the Studio Version of R crashes when I try to use it but not when I run the same program from the command line, so I will go back (one more time) and try to run a plot totally from the command line. The Linux Mint requires a 2 CPU 64 bit, and I just have the 64 bit single CPU.
I'd like to hear how one could bypass the default View() options. In my computer it only shows up to 100 columns. I'd like it to about 400 columns. It's possible?
Meanwhile, you can use the utils::View() to view more columns of the data. This isn't quite as useful or pretty as the RStudio Viewer but it does a decent job on tables with more than 100 columns.
The other option that I occasionally use is View(df[,101:200]) etc. to view different columns of the data--sometimes this can be combined with some columns at the beginning so that I can see the necessary key data.