How can you visualize data frames in a good way? - r

Given that you have a data frame with a lot of columns and rows, how can you visualize this in a good way?
I have imported my data from excel, where I in a clear way can browse my data. However, when I import it as a data frame into R, things get more complicated. I quickly get lost in the output in my terminal window. Could I output my data frame into some form of more easily accessible form for viewing, something that looks more the way it would look in excel?

RStudio does a pretty good job with its built-in (read-only) data viewer. Other solutions ( have been suggested on Cross Validated: Is there a good browser/viewer to see an R dataset (.rda file).

You can use edit(my.data.frame).
It will open your data.frame into the default editor specified by getOption("editor"). You can use option(editor = ".......") to change that default, or just use the editor= option when calling edit. This is only relevant for Unix users.
Finally, I'll bring your attention to this important portion of the ?edit documentation:
It is important to realize that edit does not change the object called name. Instead, a copy of name is made and it is that copy which is changed. Should you want the changes to apply to the object name you must assign the result of edit to name. (Try fix if you want to make permanent changes to an object.)

Related

R Shiny - Putting a Reactive Function into a Data Frame

In my app, I am bringing in data and getting it to kick out to datatables for display. However, I then need that data to be available for use with tidygraph which doesn't seem to play well with reactive data. As such, I'd like to make the "reactive" static. Is there a way?
For the record, I 100% am not looking to render this as a data table for display. I already have all that functionality. Whenever I search for a response to this issue, everyone's advice seems to end with renderDT but not put the OP's data back into a workable dataframe.
In short, how do I see reactive data in a frame?
I've tried about every reactive command option there is. Nothing seems to be working.
I figured it out! I simply had to place the problematic action into the reactive, which seems logical but I didn't think it would allow me to do what I wanted. Nevertheless, I have achieved the full functionality that I wanted for this stage.
Thanks for the guidance to those who responded.

How can I extract all the data from the Cancer Types Summary Page on CBioPortal at once?

I would really like to take the data that would normally be seen by hovering over each column at once. The graph is an interactive one so its hard to extract all the data at once. I would really like it.
I suggest that you pick a programming language that you know fairly well.
Then load the web pages, use a selector to select the desired elements, and output the data in the format you like.
Please begin writing the code, and update your question when you have something working at least partially, so you can ask precisely where you need help

Export data to Excel Part By Part

I have a huge dataset value in JRDatasource object and am not able to export it to Excel as it will give me memory out of space error. So am planning to split the JRDatasource object and export the data part by part. Any idea or suggestion on how to implement this? Or any other way suggested also fine for me. Thanks in advance.
I dont know much about JRDataSource, but I'll offer another solution.
Take a look at Apache POI library which enables you to create excel files on-the-fly.
So you can read from the data source element by element and persist them on a excel file.

R workflow: How to handle hand-cleaning data

Let me first say that I assiduously avoid hand-cleaning data in favor of regular expressions and the like. However, occasionally it is inevitable.
I use something like the Load-Clean-Func-Do workflow normally, so this obviously fits into the cleaning phase. However, any hand-editing breaks the ability to run the stuff before the hand-cleaning if it needs updating.
I can think of at least three ways to handle this:
Put the by-hand changes as early in the workflow as possible, so that everything after that remains runnable.
Write out regexes or assignment operations for every single change.
Use a tool that generates (2) for you after you close the spreadsheet where you've made the changes.
The problem with 2 is that it can be extremely unweildy. The problem with 3 is that I'm unaware of any such tool existing for R. Stata has an extremely good implementation of this.
So the questions are:
Which results in the most replicable code with the least-frustrating code writing?
Does a tool as in (3) exist?
I agree that hand-cleaning is generally a rather bad idea. However, sometimes it is unavoidable. I'd suggest one of the two, or both:
Keep a separate data file with "data fixing" containing three variables "case_id", "variable_name", "value". Use it to store information about which values in the original data need to be replaced. You may add some additional variables to extra information about cleaning (e.g. why value on variable "variable_name" need to be replaced with "value" for case "case_id", etc.). Then have a short piece of R code, which loads your original data and then cleans it with the additional information in the "fixing" file.
Perhaps you should start using some version control system like git or subversion (there are other progs also). Every hand-made change to the data could be recorded in the system as a separate commit. By the end of the day, you will be able to easily check the log for what change you made to the data and when. Moreover, you will be able to generate patch files that transform original data files to the cleaned ones. It is also beneficial to have your R code files version-controlled.

Export Grouped AdvancedDataGrid as CSV text

I'm trying to export an AdvancedDataGrid to CSV. This is easy enough for non-hierarchical data, but when using a HierarchicalCollectionView to show treed data it gets trickier.
Any suggestions on how to access each of the cells just as they appear on screen when all of the nodes are expanded?
If you've expanded all the nodes like you mentioned (you can use the AdvancedDataGrid's expandAll() function for this), you can then run the AdvancedDataGrid through the following CSV export utility class to access each of the cells as they appear on the screen:
https://onyxmueller.net/2011/08/20/advanceddatagrid-csv-export-utility-class/
However, I've found when dealing with a HierarchicalCollectionView as the data provider, that it is better to write some custom logic to "flatten" the data for CSV export.
Hierarchical data doesn't map well to CSV which is essentially flat. You are essentially trying to write nested objects into spreadsheet.
Accessing the data isn't that hard, you can just recursively work through getChildren() in the collection.
The hard bit is writing it into the CSV file in a way that can be retrieved later. The only really good ways of doing this is by ignoring the fact that you are writing to CSV. As soon as you get to the children field of the root object you are going to end up writing some horrible array parsing mechanism.
My solution? Write it out to JSON, and stick it in a single cell of the CSV. You'll save yourself a ridiculous amount of pain in the long run.

Resources