It is hard to explain this without just showing what I have, where I am, and what I need in terms of data structure:
What structure I had:
Where I have got to with my transformation efforts:
What I need to end up with:
Notes:
I've not given actual names for anything as the data is classed as sensitive, but:
Metrics are things that can be measured- for example, the number of permanent or full-time jobs. The number of metrics is larger than presented in the test data (and the example structure above).
Each metric has many years of data (whilst trying to do the code I have restricted myself to just 3 years. The illustration of the structure is based on this test). The number of years captured will change overtime- generally it will increase.
The number of policies will fluctuate, I've just labelled them policy 1, 2 etc for sensitivity reasons and limited the number whilst testing the code. Again, I have limited the number to make it easier to check the outputs.
The source data comes from a workbook of surveys with a tab for each policy. The initial import creates a list of tibbles consisting of a row for each metric, and 4 columns (the metric names, the values for 2024, the values for 2030, and the values for 2035). I converted this to a dataframe, created a vector to be a column header and used cbind() to put this on top to get the "What structure I had" data.
To get to the "Where I have got to with my transformation efforts" version of the table, I removed all the metric columns, created another vector of metrics and used rbind() to put this as the first column.
The idea in my head was to group the data by policy to get a vector for each metric, then transpose this so that the metric became the column, and the grouped data would become the row. Then expand the data to get the metrics repeated for each year. A friend of mine who does coding (but has never used R) has suggested using loops might be a better way forward. Again, I am not sure of the best approach so welcome advice. On Reddit someone suggested using pivot_wider/pivot_longer but this appears to be a summarise tool and I am not trying to summarise the data rather transform its structure.
Any suggestions on approaches or possible tools/functions to use would be gratefully received. I am learning R whilst trying to pull this data together to create a database that can be used for analysis, so, if my approach sounds weird, feel free to suggest alternatives. Thanks
Good day.
I am 3 month old in R and R-Studio but am getting the hang of things. I am implementing a SOM solution with 38k records/observations using Kohonen SuperSOM following Self-Organising Maps for Customer Segmentation using R.
My data have no missing values but almost 60 columns many of them are dummyVars (I received this data in this format)
I have removed the ONE char Column (URL)
My Y column (as I understand it) is "shares" (How many times it was shared)
My data only consist of numerical data (dummyVars are of course 1 or 0)
I have Centered and Scaled my data (entire dataFrame)
As per the example I followed I dod convert the entire DF to a matrix
My problem is that my SOM takes ages to train even with multi core processing and my progress graph does not reach a nice flat"ish" plateau, it does come nicely down but still is very erratic, all my other graphs are extremely high in population and there are no nice clustering. I have even tried a 500 iteration with a 100x100 grid ;-(
I think /guess it is because of the huge amount of columns including mostly dummyVars e.g. dayOfWeek.Monday, dayOfWeek.Tuesday, category.LifeStile, category.Computers, etc.
What am I to do?
Should I convert the dummyVars back into another format, How and Why?
Please do not just give me a section of code as I would like to understand why I need to do What.
Thanx
In Enterprise Guide, I draw scatter plots with creation and closing date of issues to detect when backloggs occur and when they are resolved:
(The straight lines in the graph are batch interventions, like closing a set of issues that were handled outside ot the system.)
proc sgplot data=alert;
scatter x=create_Date y=CloseDate / group=CloseReason;
run;
When I try to do the same in SAS Visual Analytics, I can only put measures on the x-ax and y-ax and I cant make te date or datetime variable a measure.
Do I do something wrong? Should I use another graph type?
My take is that the inability of SAS VA Explorer to allow dates to be measures is a real weakness. Old school trickery would be perhaps to create a duplicate data item that computes the SAS data value (giving you a number result and thus a measure) and then formatting that with a custom format to render it back as a human readable date.
However, according to http://support.sas.com/kb/47/100.html#explorer
How SAS Visual Analytics Designer supports formats
In SAS Visual Analytics Designer, the Format property of the data item displays the name of the format for both numeric and character data items. However, there are some differences between numeric and character data items.
Numeric data items
You can change the format. If you change the format, you can restore the user-defined format by selecting Reset to Default in the Format type box.
You can specify to sort by formatted or unformatted values (release 6.2 and later).
(My bolds) Numeric data items with a user-defined format are classified as categories. You cannot change these data items to measures while the user-defined format is applied.
According to support.sas.com/documentation/cdl/en/vaug/68648/PDF/default/vaug.pdf , page 166, you could work on defining data roles for a scatter plot.
I am not sure that this could solve your situation but it says that:
"In addition to measures, you can assign a Group variable. The Group variable groups the data based on the values of the category data item that you assign. A separate set of scatter points is created for each value of the group variable.
You can add data items to the Data tips role. The values for the data items in the Data tips role are displayed in the data tips for the scatter plot".
Hope it helps.
I have csv file with following data set:
gv,ca,level1,2
gv,bg,level1,1
zea,li,level1,1
zea,li,level3,1
zea,de,level1,26
zea,de,level3,5
zea,el,level1,1
zea,eo,level1,3
zea,en,level1,5
zea,en,level2,34
zea,en,level3,38
zea,en,level4,12
zea,es,level1,7
zea,la,level1,7
zea,zea,level1,5
zea,zea,level3,4
zea,stq,level1,1
zea,sk,level2,1
zea,nl,level4,4
zea,fr,level2,9
zea,fy,level2,1
cdo,cdo,level3,1
cdo,de,level1,23
cdo,de,level2,4
cdo,de,level3,4
cdo,eo,level1,1
cdo,eo,level2,1
cdo,eo,level3,3
cdo,en,level1,6
cdo,en,level2,31
cdo,en,level3,38
cdo,en,level4,17
cdo,es,level1,8
cdo,es,level2,6
cdo,es,level3,3
cdo,fr,level1,14
I want to build a histogram but some how the second column need to be incorporated in the histogram, the way you read the data is example: In gv we have two users with with ca experience level1, similarly in gv we have 1 user with bg experience level 1.
I know how to build histograms in R but I am trying rap around this thought in my head and trying to figure how to get this in to a graphical representation.
Like #Ben said, it is a little difficult to see what you're getting at here. You may need to reformat your data so that you have only have only one type of data (class) per table.
I'm currently developing a website that shows multiple charts that I build using data from SQL tables. I've used and followed Scott Mitchell's tutorial (https://web.archive.org/web/20210927195532/http://www.4guysfromrolla.com/articles/093009-1.aspx) and K. Scott Allen's ChartBuilder class (http://code.msdn.microsoft.com/mag200903XASP/Release/ProjectReleases.aspx?ReleaseId=2245) and all works well.
However when have two series that I want to show on the same Chart, if one set of data does not have all of the X values the other series does, the chart blindly puts all the data on, ignoring trying to match the X values of the other series, therefore mismatching the X values when the chart is shown.
I know that I can fiddle the data so that both sets of data have the same X values, however I'm trying to make the class handle anomalies in the data so that I don't have to worry too much about the data.
Any help is greatly appreciated.
chart.DataBindCrossTab match it for you.
See more here: http://blogs.msdn.com/b/alexgor/archive/2009/02/21/data-binding-ms-chart-control.aspx