JasperReports CategoryDataset has less data than expected? - graph

I'm trying to develop a ChartCustomizer that takes the data from a chart and converts it into a histogram (because JR does not directly support histograms). It's a fairly simple implementation with hard-coded intervals, etc. mostly as a proof-of-concept at this point.
The data I'm analyzing is HTTP response-time data of the form [date, response-time] and I have a CSV file with 18512 records in it. In my summary band, I have 3 items:
A text field dumping $V{REPORT_COUNT} (it reports 18512 in iReport's report preview)
A time series showing all the data points [date, response-time]
A category plot containing all the data points in a single series [category=$F{DATE}, value=$F{RESPONSE_TIME}]
I decided that the most straightforward way to build a histogram would be to use the Category plot because it had the right structure for the final histogram chart.
When the ChartCustomizer runs, it dumps out all kinds of good information about the data set, including the size. Strangely, the size is 10252: it's missing something like 8000 data points. I can't understand why the category plot would have fewer data points than the whole data set.
Any ideas?

Answering my own question in case others run across this foolish user error.
The problem was that CategoryDataset only allows one data point per "category", and in my case, "category" was a java.util.Date captured from the web server log. Apparently, nearly half of my dates were duplicates and so part of the data set overwrote the other half, leaving a subset of the data.
That should have been totally obvious to me at the outset, because that is exactly how a category dataset works.
Anyhow, simply changing the category plot series's "category expression" from $F{DATE} to $V{REPORT_COUNT} gave each datum a unique category which makes everything work.

Related

Grouping and transposing data in R

It is hard to explain this without just showing what I have, where I am, and what I need in terms of data structure:
What structure I had:
Where I have got to with my transformation efforts:
What I need to end up with:
Notes:
I've not given actual names for anything as the data is classed as sensitive, but:
Metrics are things that can be measured- for example, the number of permanent or full-time jobs. The number of metrics is larger than presented in the test data (and the example structure above).
Each metric has many years of data (whilst trying to do the code I have restricted myself to just 3 years. The illustration of the structure is based on this test). The number of years captured will change overtime- generally it will increase.
The number of policies will fluctuate, I've just labelled them policy 1, 2 etc for sensitivity reasons and limited the number whilst testing the code. Again, I have limited the number to make it easier to check the outputs.
The source data comes from a workbook of surveys with a tab for each policy. The initial import creates a list of tibbles consisting of a row for each metric, and 4 columns (the metric names, the values for 2024, the values for 2030, and the values for 2035). I converted this to a dataframe, created a vector to be a column header and used cbind() to put this on top to get the "What structure I had" data.
To get to the "Where I have got to with my transformation efforts" version of the table, I removed all the metric columns, created another vector of metrics and used rbind() to put this as the first column.
The idea in my head was to group the data by policy to get a vector for each metric, then transpose this so that the metric became the column, and the grouped data would become the row. Then expand the data to get the metrics repeated for each year. A friend of mine who does coding (but has never used R) has suggested using loops might be a better way forward. Again, I am not sure of the best approach so welcome advice. On Reddit someone suggested using pivot_wider/pivot_longer but this appears to be a summarise tool and I am not trying to summarise the data rather transform its structure.
Any suggestions on approaches or possible tools/functions to use would be gratefully received. I am learning R whilst trying to pull this data together to create a database that can be used for analysis, so, if my approach sounds weird, feel free to suggest alternatives. Thanks

SOM Data preperation

Good day.
I am 3 month old in R and R-Studio but am getting the hang of things. I am implementing a SOM solution with 38k records/observations using Kohonen SuperSOM following Self-Organising Maps for Customer Segmentation using R.
My data have no missing values but almost 60 columns many of them are dummyVars (I received this data in this format)
I have removed the ONE char Column (URL)
My Y column (as I understand it) is "shares" (How many times it was shared)
My data only consist of numerical data (dummyVars are of course 1 or 0)
I have Centered and Scaled my data (entire dataFrame)
As per the example I followed I dod convert the entire DF to a matrix
My problem is that my SOM takes ages to train even with multi core processing and my progress graph does not reach a nice flat"ish" plateau, it does come nicely down but still is very erratic, all my other graphs are extremely high in population and there are no nice clustering. I have even tried a 500 iteration with a 100x100 grid ;-(
I think /guess it is because of the huge amount of columns including mostly dummyVars e.g. dayOfWeek.Monday, dayOfWeek.Tuesday, category.LifeStile, category.Computers, etc.
What am I to do?
Should I convert the dummyVars back into another format, How and Why?
Please do not just give me a section of code as I would like to understand why I need to do What.
Thanx

Cant I use dates as axes in a scatter plot in SAS VA?

In Enterprise Guide, I draw scatter plots with creation and closing date of issues to detect when backloggs occur and when they are resolved:
(The straight lines in the graph are batch interventions, like closing a set of issues that were handled outside ot the system.)
proc sgplot data=alert;
scatter x=create_Date y=CloseDate / group=CloseReason;
run;
When I try to do the same in SAS Visual Analytics, I can only put measures on the x-ax and y-ax and I cant make te date or datetime variable a measure.
Do I do something wrong? Should I use another graph type?
My take is that the inability of SAS VA Explorer to allow dates to be measures is a real weakness. Old school trickery would be perhaps to create a duplicate data item that computes the SAS data value (giving you a number result and thus a measure) and then formatting that with a custom format to render it back as a human readable date.
However, according to http://support.sas.com/kb/47/100.html#explorer
How SAS Visual Analytics Designer supports formats
In SAS Visual Analytics Designer, the Format property of the data item displays the name of the format for both numeric and character data items. However, there are some differences between numeric and character data items.
Numeric data items
You can change the format. If you change the format, you can restore the user-defined format by selecting Reset to Default in the Format type box.
You can specify to sort by formatted or unformatted values (release 6.2 and later).
(My bolds) Numeric data items with a user-defined format are classified as categories. You cannot change these data items to measures while the user-defined format is applied.
According to support.sas.com/documentation/cdl/en/vaug/68648/PDF/default/vaug.pdf , page 166, you could work on defining data roles for a scatter plot.
I am not sure that this could solve your situation but it says that:
"In addition to measures, you can assign a Group variable. The Group variable groups the data based on the values of the category data item that you assign. A separate set of scatter points is created for each value of the group variable.
You can add data items to the Data tips role. The values for the data items in the Data tips role are displayed in the data tips for the scatter plot".
Hope it helps.

Best way to plot histogram or any other graphical interpretation

I have csv file with following data set:
gv,ca,level1,2
gv,bg,level1,1
zea,li,level1,1
zea,li,level3,1
zea,de,level1,26
zea,de,level3,5
zea,el,level1,1
zea,eo,level1,3
zea,en,level1,5
zea,en,level2,34
zea,en,level3,38
zea,en,level4,12
zea,es,level1,7
zea,la,level1,7
zea,zea,level1,5
zea,zea,level3,4
zea,stq,level1,1
zea,sk,level2,1
zea,nl,level4,4
zea,fr,level2,9
zea,fy,level2,1
cdo,cdo,level3,1
cdo,de,level1,23
cdo,de,level2,4
cdo,de,level3,4
cdo,eo,level1,1
cdo,eo,level2,1
cdo,eo,level3,3
cdo,en,level1,6
cdo,en,level2,31
cdo,en,level3,38
cdo,en,level4,17
cdo,es,level1,8
cdo,es,level2,6
cdo,es,level3,3
cdo,fr,level1,14
I want to build a histogram but some how the second column need to be incorporated in the histogram, the way you read the data is example: In gv we have two users with with ca experience level1, similarly in gv we have 1 user with bg experience level 1.
I know how to build histograms in R but I am trying rap around this thought in my head and trying to figure how to get this in to a graphical representation.
Like #Ben said, it is a little difficult to see what you're getting at here. You may need to reformat your data so that you have only have only one type of data (class) per table.

ms Chart Multiple Series X Value Mismatch (ASP.NET)

I'm currently developing a website that shows multiple charts that I build using data from SQL tables. I've used and followed Scott Mitchell's tutorial (https://web.archive.org/web/20210927195532/http://www.4guysfromrolla.com/articles/093009-1.aspx) and K. Scott Allen's ChartBuilder class (http://code.msdn.microsoft.com/mag200903XASP/Release/ProjectReleases.aspx?ReleaseId=2245) and all works well.
However when have two series that I want to show on the same Chart, if one set of data does not have all of the X values the other series does, the chart blindly puts all the data on, ignoring trying to match the X values of the other series, therefore mismatching the X values when the chart is shown.
I know that I can fiddle the data so that both sets of data have the same X values, however I'm trying to make the class handle anomalies in the data so that I don't have to worry too much about the data.
Any help is greatly appreciated.
chart.DataBindCrossTab match it for you.
See more here: http://blogs.msdn.com/b/alexgor/archive/2009/02/21/data-binding-ms-chart-control.aspx

Resources