Best way to plot histogram or any other graphical interpretation

Best way to plot histogram or any other graphical interpretation - r

I have csv file with following data set:
gv,ca,level1,2
gv,bg,level1,1
zea,li,level1,1
zea,li,level3,1
zea,de,level1,26
zea,de,level3,5
zea,el,level1,1
zea,eo,level1,3
zea,en,level1,5
zea,en,level2,34
zea,en,level3,38
zea,en,level4,12
zea,es,level1,7
zea,la,level1,7
zea,zea,level1,5
zea,zea,level3,4
zea,stq,level1,1
zea,sk,level2,1
zea,nl,level4,4
zea,fr,level2,9
zea,fy,level2,1
cdo,cdo,level3,1
cdo,de,level1,23
cdo,de,level2,4
cdo,de,level3,4
cdo,eo,level1,1
cdo,eo,level2,1
cdo,eo,level3,3
cdo,en,level1,6
cdo,en,level2,31
cdo,en,level3,38
cdo,en,level4,17
cdo,es,level1,8
cdo,es,level2,6
cdo,es,level3,3
cdo,fr,level1,14
I want to build a histogram but some how the second column need to be incorporated in the histogram, the way you read the data is example: In gv we have two users with with ca experience level1, similarly in gv we have 1 user with bg experience level 1.
I know how to build histograms in R but I am trying rap around this thought in my head and trying to figure how to get this in to a graphical representation.

Like #Ben said, it is a little difficult to see what you're getting at here. You may need to reformat your data so that you have only have only one type of data (class) per table.

Related

Cant I use dates as axes in a scatter plot in SAS VA?

In Enterprise Guide, I draw scatter plots with creation and closing date of issues to detect when backloggs occur and when they are resolved:
(The straight lines in the graph are batch interventions, like closing a set of issues that were handled outside ot the system.)
proc sgplot data=alert;
scatter x=create_Date y=CloseDate / group=CloseReason;
run;
When I try to do the same in SAS Visual Analytics, I can only put measures on the x-ax and y-ax and I cant make te date or datetime variable a measure.
Do I do something wrong? Should I use another graph type?

My take is that the inability of SAS VA Explorer to allow dates to be measures is a real weakness. Old school trickery would be perhaps to create a duplicate data item that computes the SAS data value (giving you a number result and thus a measure) and then formatting that with a custom format to render it back as a human readable date.
However, according to http://support.sas.com/kb/47/100.html#explorer
How SAS Visual Analytics Designer supports formats
In SAS Visual Analytics Designer, the Format property of the data item displays the name of the format for both numeric and character data items. However, there are some differences between numeric and character data items.
Numeric data items
You can change the format. If you change the format, you can restore the user-defined format by selecting Reset to Default in the Format type box.
You can specify to sort by formatted or unformatted values (release 6.2 and later).
(My bolds) Numeric data items with a user-defined format are classified as categories. You cannot change these data items to measures while the user-defined format is applied.

According to support.sas.com/documentation/cdl/en/vaug/68648/PDF/default/vaug.pdf , page 166, you could work on defining data roles for a scatter plot.
I am not sure that this could solve your situation but it says that:
"In addition to measures, you can assign a Group variable. The Group variable groups the data based on the values of the category data item that you assign. A separate set of scatter points is created for each value of the group variable.
You can add data items to the Data tips role. The values for the data items in the Data tips role are displayed in the data tips for the scatter plot".
Hope it helps.

Is there a way to read a raw netcdf file and tell what layer a value belongs to?

I'm in the process of evaluating how successful a script I wrote is and kind of a quick and dirty method I've employed is looking at the first few values and last few values of a single variable and doing a few calculations with them based on the same values in another netcdf file.
I know that there are better ways to approach this but again, this is a really quick and dirty method that has worked for me so far. My question though is that by looking at the raw data through ncdump, is there a way to tell which vertical layer that data belongs to? In my example, the file has 14 layers. I"m assuming that the first few values are a part of the surface layer and the last few values are a part of the top layer, but I suspect that this assumption is wrong, at least in part.
As a follow-up question, what would then be the easiest 'proper' way to tell what layer data belongs to? Thank you in advance!

ncview and NCO are both very powerful and quick command line operators to view data inside a netcdf file.
ncview: http://meteora.ucsd.edu/~pierce/ncview_home_page.html
NCO: http://nco.sourceforge.net/
You can easily show variables over all layers for example with
ncks -d layer,0,13 some_infile.nc

ncdump dumps the data with the last dimension varying fastest (http://www.unidata.ucar.edu/software/netcdf/docs/netcdf/CDL-Syntax.html) so if 'layer' is the slowest/first dimension, the earlier values are all in the first layer, while the last few values are in the last layer.
As to whether the first layer is the top or bottom layer, you'd have to look to the 'layer' dimension and its data.

CSV file to Histogram in R

I'm a total newbie with R, and I'm trying to create a histogram (with value and frequency as the axises) from a csv file (just one row of values). Any idea how I can do this?

I'm also an R newbie, and I ran into the same thing. I made two separate mistakes, actually, so I'll describe them both here.
Mistake 1: Passing a frequency table to hist(). Originally I was trying to pass a frequency table to hist() instead of passing in the raw data. One way to fix this is to use the rep() ("replicate") function to explode your frequency table back into a raw dataset, as described here:
Creating a histogram using aggregated data
Simple R (histogram) from counted csv file
Instead of that, though, I just decided to read in my original dataset instead of the frequency table.
Mistake 2: Wrong data type. My raw data CSV file contains two columns: hostname and bookings (idea is to count the number of bookings each host generated during some given time period). I read it into a table.
> tbl <- read.csv('bookingsdata.csv')
Then when I tried to generate a histogram off the second column, I did this:
> hist(tbl[2])
This gave me the "'x' must be numeric" error you mention in a comment. (It was trying to read the "bookings" column header in as a data value.)
This fixed it:
> hist(tbl$bookings)

You should really start to read some basic R manual...
CRAN offers a lot of them (look into the Manuals and Contributed sections)
In any case:
setwd("path/to/csv/file")
myvalues <- read.csv("filename.csv")
hist(myvalues, 100) # Example: 100 breaks, but you can specify them at will
See the manual pages for those functions for more help (accessible through ?read.table, ?read.csv and ?hist).

To plot the histogram, the values must be of numeric class i.e the data must be of numeric value. Here the value of x seems to be of some other class.
Run the following command and see:
sapply(myvalues[1,],class)

JasperReports CategoryDataset has less data than expected?

I'm trying to develop a ChartCustomizer that takes the data from a chart and converts it into a histogram (because JR does not directly support histograms). It's a fairly simple implementation with hard-coded intervals, etc. mostly as a proof-of-concept at this point.
The data I'm analyzing is HTTP response-time data of the form [date, response-time] and I have a CSV file with 18512 records in it. In my summary band, I have 3 items:
A text field dumping $V{REPORT_COUNT} (it reports 18512 in iReport's report preview)
A time series showing all the data points [date, response-time]
A category plot containing all the data points in a single series [category=$F{DATE}, value=$F{RESPONSE_TIME}]
I decided that the most straightforward way to build a histogram would be to use the Category plot because it had the right structure for the final histogram chart.
When the ChartCustomizer runs, it dumps out all kinds of good information about the data set, including the size. Strangely, the size is 10252: it's missing something like 8000 data points. I can't understand why the category plot would have fewer data points than the whole data set.
Any ideas?

Answering my own question in case others run across this foolish user error.
The problem was that CategoryDataset only allows one data point per "category", and in my case, "category" was a java.util.Date captured from the web server log. Apparently, nearly half of my dates were duplicates and so part of the data set overwrote the other half, leaving a subset of the data.
That should have been totally obvious to me at the outset, because that is exactly how a category dataset works.
Anyhow, simply changing the category plot series's "category expression" from $F{DATE} to $V{REPORT_COUNT} gave each datum a unique category which makes everything work.

ms Chart Multiple Series X Value Mismatch (ASP.NET)

I'm currently developing a website that shows multiple charts that I build using data from SQL tables. I've used and followed Scott Mitchell's tutorial (https://web.archive.org/web/20210927195532/http://www.4guysfromrolla.com/articles/093009-1.aspx) and K. Scott Allen's ChartBuilder class (http://code.msdn.microsoft.com/mag200903XASP/Release/ProjectReleases.aspx?ReleaseId=2245) and all works well.
However when have two series that I want to show on the same Chart, if one set of data does not have all of the X values the other series does, the chart blindly puts all the data on, ignoring trying to match the X values of the other series, therefore mismatching the X values when the chart is shown.
I know that I can fiddle the data so that both sets of data have the same X values, however I'm trying to make the class handle anomalies in the data so that I don't have to worry too much about the data.
Any help is greatly appreciated.

chart.DataBindCrossTab match it for you.
See more here: http://blogs.msdn.com/b/alexgor/archive/2009/02/21/data-binding-ms-chart-control.aspx

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Best way to plot histogram or any other graphical interpretation - r

Like #Ben said, it is a little difficult to see what you're getting at here. You may need to reformat your data so that you have only have only one type of data (class) per table.

Related

Cant I use dates as axes in a scatter plot in SAS VA?

Is there a way to read a raw netcdf file and tell what layer a value belongs to?

CSV file to Histogram in R

JasperReports CategoryDataset has less data than expected?

ms Chart Multiple Series X Value Mismatch (ASP.NET)

Categories

Resources