Good day.
I am 3 month old in R and R-Studio but am getting the hang of things. I am implementing a SOM solution with 38k records/observations using Kohonen SuperSOM following Self-Organising Maps for Customer Segmentation using R.
My data have no missing values but almost 60 columns many of them are dummyVars (I received this data in this format)
I have removed the ONE char Column (URL)
My Y column (as I understand it) is "shares" (How many times it was shared)
My data only consist of numerical data (dummyVars are of course 1 or 0)
I have Centered and Scaled my data (entire dataFrame)
As per the example I followed I dod convert the entire DF to a matrix
My problem is that my SOM takes ages to train even with multi core processing and my progress graph does not reach a nice flat"ish" plateau, it does come nicely down but still is very erratic, all my other graphs are extremely high in population and there are no nice clustering. I have even tried a 500 iteration with a 100x100 grid ;-(
I think /guess it is because of the huge amount of columns including mostly dummyVars e.g. dayOfWeek.Monday, dayOfWeek.Tuesday, category.LifeStile, category.Computers, etc.
What am I to do?
Should I convert the dummyVars back into another format, How and Why?
Please do not just give me a section of code as I would like to understand why I need to do What.
Thanx
In Enterprise Guide, I draw scatter plots with creation and closing date of issues to detect when backloggs occur and when they are resolved:
(The straight lines in the graph are batch interventions, like closing a set of issues that were handled outside ot the system.)
proc sgplot data=alert;
scatter x=create_Date y=CloseDate / group=CloseReason;
run;
When I try to do the same in SAS Visual Analytics, I can only put measures on the x-ax and y-ax and I cant make te date or datetime variable a measure.
Do I do something wrong? Should I use another graph type?
My take is that the inability of SAS VA Explorer to allow dates to be measures is a real weakness. Old school trickery would be perhaps to create a duplicate data item that computes the SAS data value (giving you a number result and thus a measure) and then formatting that with a custom format to render it back as a human readable date.
However, according to http://support.sas.com/kb/47/100.html#explorer
How SAS Visual Analytics Designer supports formats
In SAS Visual Analytics Designer, the Format property of the data item displays the name of the format for both numeric and character data items. However, there are some differences between numeric and character data items.
Numeric data items
You can change the format. If you change the format, you can restore the user-defined format by selecting Reset to Default in the Format type box.
You can specify to sort by formatted or unformatted values (release 6.2 and later).
(My bolds) Numeric data items with a user-defined format are classified as categories. You cannot change these data items to measures while the user-defined format is applied.
According to support.sas.com/documentation/cdl/en/vaug/68648/PDF/default/vaug.pdf , page 166, you could work on defining data roles for a scatter plot.
I am not sure that this could solve your situation but it says that:
"In addition to measures, you can assign a Group variable. The Group variable groups the data based on the values of the category data item that you assign. A separate set of scatter points is created for each value of the group variable.
You can add data items to the Data tips role. The values for the data items in the Data tips role are displayed in the data tips for the scatter plot".
Hope it helps.
edit: to clarify, the function to make my choropleth graph require "region" (state) and "value" columns, and simply color codes the region based on where value falls on the scale.
I want to make the "value" column dynamic and have the graph reference the dynamic column
I am not sure if I can read from an output object
I have a slider range from 2000-2014, and columns x2000-x2014.
I want the slider to change the data being graphed, so if I choose 2002-2010 it shows that data, etc.
It's a choropleth graph showing % change between the two years, so if I choose 2004 and 2007 on the slider I want it to pull (x2007-x2004)/x2004. I can get it to change to X2004 (low<- paste0("X", input$range[1])) but I cant really do df$low.
If I read the question correctly then you are able to create a character string with the name of the column in the data frame that you want to access (low is the name of the column in data frame df), but your attempts to access that column using df$low is not working. Is that correct?
If so, then there is actually a fortune about this:
> library(fortunes)
> fortune('toad')
The problem here is that the $ notation is a magical shortcut and
like any other magic if used incorrectly is likely to do the
programmatic equivalent of turning yourself into a toad.
-- Greg Snow (in response to a user that wanted to access a
column whose name is stored in y via x$y rather than x[[y]])
R-help (February 2012)
The answer to your question is in the bottom part of that quote and is detailed on the help page help('$') and in section 6.1 of An Introduction to R.
I'm trying to develop a ChartCustomizer that takes the data from a chart and converts it into a histogram (because JR does not directly support histograms). It's a fairly simple implementation with hard-coded intervals, etc. mostly as a proof-of-concept at this point.
The data I'm analyzing is HTTP response-time data of the form [date, response-time] and I have a CSV file with 18512 records in it. In my summary band, I have 3 items:
A text field dumping $V{REPORT_COUNT} (it reports 18512 in iReport's report preview)
A time series showing all the data points [date, response-time]
A category plot containing all the data points in a single series [category=$F{DATE}, value=$F{RESPONSE_TIME}]
I decided that the most straightforward way to build a histogram would be to use the Category plot because it had the right structure for the final histogram chart.
When the ChartCustomizer runs, it dumps out all kinds of good information about the data set, including the size. Strangely, the size is 10252: it's missing something like 8000 data points. I can't understand why the category plot would have fewer data points than the whole data set.
Any ideas?
Answering my own question in case others run across this foolish user error.
The problem was that CategoryDataset only allows one data point per "category", and in my case, "category" was a java.util.Date captured from the web server log. Apparently, nearly half of my dates were duplicates and so part of the data set overwrote the other half, leaving a subset of the data.
That should have been totally obvious to me at the outset, because that is exactly how a category dataset works.
Anyhow, simply changing the category plot series's "category expression" from $F{DATE} to $V{REPORT_COUNT} gave each datum a unique category which makes everything work.
I have csv file with following data set:
gv,ca,level1,2
gv,bg,level1,1
zea,li,level1,1
zea,li,level3,1
zea,de,level1,26
zea,de,level3,5
zea,el,level1,1
zea,eo,level1,3
zea,en,level1,5
zea,en,level2,34
zea,en,level3,38
zea,en,level4,12
zea,es,level1,7
zea,la,level1,7
zea,zea,level1,5
zea,zea,level3,4
zea,stq,level1,1
zea,sk,level2,1
zea,nl,level4,4
zea,fr,level2,9
zea,fy,level2,1
cdo,cdo,level3,1
cdo,de,level1,23
cdo,de,level2,4
cdo,de,level3,4
cdo,eo,level1,1
cdo,eo,level2,1
cdo,eo,level3,3
cdo,en,level1,6
cdo,en,level2,31
cdo,en,level3,38
cdo,en,level4,17
cdo,es,level1,8
cdo,es,level2,6
cdo,es,level3,3
cdo,fr,level1,14
I want to build a histogram but some how the second column need to be incorporated in the histogram, the way you read the data is example: In gv we have two users with with ca experience level1, similarly in gv we have 1 user with bg experience level 1.
I know how to build histograms in R but I am trying rap around this thought in my head and trying to figure how to get this in to a graphical representation.
Like #Ben said, it is a little difficult to see what you're getting at here. You may need to reformat your data so that you have only have only one type of data (class) per table.