I have a realtime and big(> milions point) graph.
Which class should i use: QCPGraph or QCPFinancial?
What are the advantages and disadvantages of each?
They are two different types of graphs. I don't think there is an advantage of one over the other. It depends on what you want to represent in your graph.
QCPFinancial:
A plottable representing a financial stock chart.
This plottable represents time series data binned to certain
intervals, mainly used for stock charts. The two common
representations OHLC (Open-High-Low-Close) bars and Candlesticks can
be set via setChartStyle.
QCPGraph:
A plottable representing a graph in a plot.
Graphs are used to
display single-valued data. Single-valued means that there should only
be one data point per unique key coordinate. In other words, the graph
can't have loops. If you do want to plot non-single-valued curves,
rather use the QCPCurve plottable.
See also this example for a simple QCPGraph.
Or this example for QCPFinancial.
Related
I have data on excel and I want to represent one persons data graphically in bar charts beside each other. All the values are numerical and I want it to look something like this:
where the w,x,y represent different variables like games played and turnovers and the three colours represent three different people. I have data on 20 people.
I don't know how to single out the individual data or represent multiple data points on the barchart.
Any questions, I'll try to describe as best as I can. Thanks in advance.
I'm working with spark using R API, and have a grasp on how data is processed from spark, either when only spark native functions are used in which cases it is transparent for the user or in cases where spark_apply() is used, where it is required to have a better understanding on how the partitions are handled.
My doubt is regarding to plots where no aggregation is done, for example, is my understanding that if a group by is used before a plot not all the data will be used. But if I need to make say a scatter plot with 100 million dots, where is that data stored at this point? is it still distributed between all nodes? or is it at one node only, if the later... with the cluster get frozen because of this?
I know you write that no aggregation is (should be?) done, but I'd wager that is precisely what you need and want to do. The point of distributed computing is largely that partial results are computed, well, distributed at each node. For very big data sets, each node (often) sees only a subset of the data.
In regards to the plotting: a scatter plot more that even a few thousand (not to mention a 100 million) points will contain a significant amount of overplotting. Either you 'fix' that by making the points transparent, you do a density estimate, or you do some binning of the data (e.g. a hexbin plot or a heatmap). The latter can be done distributed by the nodes and the plot. The returned binned results from each node can then be aggregated to a final results by the master node and be plotted.
Even if you somehow had a node making a scatter plot of 100 million points, what is your output format? Vector graphics (e.g. pdf/svg) would create a huge file. Raster graphics (e.g. jpg, png) will effectively aggregate on your behalf when the plot is rasterized -- so you might as well control that yourself with bins the size of pixels.
I have a very large data set (~250,000 records) that I have used to create a linear model. I have plotted predicted vs. actual
.
I tried to use identify() to select the two cluster of values near the center of the graph and coord() to identify them. There are a few problems here: 1)There are many, many more points in those clusters than I can click on and identify individually, and 2)I need to know ALL of them, select all of them somehow with out selecting any others, and subset my data to just those points.
This model was created using a satellite image paired with ancillary spatial data. Each entry in the table corresponds to a particular point on the map. I need to identify where these two clusters are located on the map. My data frame includes the FID (which I can use to link back to the map), the original predictor, the response, and my predicted values.
I appreciate any help!
I have different time-series corresponding to different individuals and their location within a building (a categorical variable -- more like a room name).
I would like to study the similarity in movement of different individuals by something like cross-recurrence plots, where the two time-series correspond to the two axes and the actual points correspond to the presence/absence of individuals in the same room.
Has anyone tried doing such plots in R or while using ggplot? Any help would be great!
I haven't used this routine. I used only d2 dimension and Lyapunov exponent for EEG but this package Tisean (RTisean for your case) has a routine ['recurr'] that returns the specific plot.
This link has a nice wrap up of tutorials and links
Edited:
In this link you can find a nice example of application of recurrence plot.
The return variables of function recur(and similar functions of other packages) you can access after putting $ after the dataset (like database)
and you can access them inside in ggplot function and applying the appropriate aes.
I'm trying to generate a map that looks like this in R:
The boxes represent individual observations, while the colors represent data pertaining to those individual observations. Anyone have any idea how this might be accomplished?