Select multiple points on scatterplot, save selection to new table - r

I have a very large data set (~250,000 records) that I have used to create a linear model. I have plotted predicted vs. actual
.
I tried to use identify() to select the two cluster of values near the center of the graph and coord() to identify them. There are a few problems here: 1)There are many, many more points in those clusters than I can click on and identify individually, and 2)I need to know ALL of them, select all of them somehow with out selecting any others, and subset my data to just those points.
This model was created using a satellite image paired with ancillary spatial data. Each entry in the table corresponds to a particular point on the map. I need to identify where these two clusters are located on the map. My data frame includes the FID (which I can use to link back to the map), the original predictor, the response, and my predicted values.
I appreciate any help!

Related

Is there an R function for creating a category for crid cells in spatial analysis?

I have a dataset with geo data (coordinates) which are assigned to events that I seek to link in a categorial regression later. I want to create grid cells with 50x50 kilometers in Africa to locate the respective events to test a causal relationship and plot it later on. Do you have any ideas how to proceed most efficiently and which code to use for the creation of grid cells and to categorize the grid cells, so that I can make the regression based on this new category instead of coordinates. I hope, I could describe the problem understandably.
I started to create grid cells in QGis but now I seek to conduct the whole analysis in R studio. I thought the packages for geo analysis could be helpful in my case but I am not sure how to proceed.

Determine the proportion of the data information in r

Suppose i have a plot like the following:
I want to get the portion of the data where the majority (say 90%) of the data lay, for example, i want to isolate the plot into something like:
in which the points lay in the black frame contributes to (90%) of the data.
How can i do this in R?
Edited for comment:
What if i have the following plot:? the majority part probably start from 0.

what is the difference between QCPFinancial and QCPGraph?

I have a realtime and big(> milions point) graph.
Which class should i use: QCPGraph or QCPFinancial?
What are the advantages and disadvantages of each?
They are two different types of graphs. I don't think there is an advantage of one over the other. It depends on what you want to represent in your graph.
QCPFinancial:
A plottable representing a financial stock chart.
This plottable represents time series data binned to certain
intervals, mainly used for stock charts. The two common
representations OHLC (Open-High-Low-Close) bars and Candlesticks can
be set via setChartStyle.
QCPGraph:
A plottable representing a graph in a plot.
Graphs are used to
display single-valued data. Single-valued means that there should only
be one data point per unique key coordinate. In other words, the graph
can't have loops. If you do want to plot non-single-valued curves,
rather use the QCPCurve plottable.
See also this example for a simple QCPGraph.
Or this example for QCPFinancial.

Cross-Recurrence plots in R (with or without ggplot)

I have different time-series corresponding to different individuals and their location within a building (a categorical variable -- more like a room name).
I would like to study the similarity in movement of different individuals by something like cross-recurrence plots, where the two time-series correspond to the two axes and the actual points correspond to the presence/absence of individuals in the same room.
Has anyone tried doing such plots in R or while using ggplot? Any help would be great!
I haven't used this routine. I used only d2 dimension and Lyapunov exponent for EEG but this package Tisean (RTisean for your case) has a routine ['recurr'] that returns the specific plot.
This link has a nice wrap up of tutorials and links
Edited:
In this link you can find a nice example of application of recurrence plot.
The return variables of function recur(and similar functions of other packages) you can access after putting $ after the dataset (like database)
and you can access them inside in ggplot function and applying the appropriate aes.

D3.js - Multiple Series (columns) of Data on ScatterPlot at Y Axis

The subject of this question might not give the true scenario, please read all below, thanks.
I am developing a Scatter Plot based on following data (JSON - in a file simple.json):
{
"docs":
[
{"timestamp":"01","id":"100","quantity":"5","pay":"50","bp":"25","city":"Multan"},
{"timestamp":"02","id":"200","quantity":"10","pay":"100","bp":"50","city":"Lahore"},
{"timestamp":"03","id":"300","quantity":"3","pay":"30","bp":"15","city":"Multan"},
{"timestamp":"04","id":"400","quantity":"5","pay":"50","bp":"25","city":"Multan"},
{"timestamp":"05","id":"500","quantity":"6","pay":"60","bp":"30","city":"Lahore"},
{"timestamp":"06","id":"600","quantity":"15","pay":"150","bp":"75","city":"Islamabad"},
{"timestamp":"07","id":"700","quantity":"14","pay":"140","bp":"70","city":"Islamabad"},
{"timestamp":"08","id":"800","quantity":"18","pay":"180","bp":"90","city":"Islamabad"},
{"timestamp":"09","id":"900","quantity":"7","pay":"70","bp":"35","city":"Lahore"},
{"timestamp":"10","id":"1000","quantity":"20","pay":"200","bp":"100","city":"Islamabad"}
]
}
I am trying to develop a Re-Usable graph, where I can present user with available data columns (from above data). So user can select a certain column (say "id") for X axis and another column (say "quantity") for Y axis (till here everything is perfect and as per expectations). And later user can select another column and can click a button to plot that column on the graph (along with previously added columns).
Here comes the problem:
When I proceed with another column (say "pay") for Y axis, while keeping previously on the graph, new ones get plotted correctly (I am rescaling the axis based on new data as well). But the old ones DO NOT RE-ARRANGED. This is the actual problem. I am thinking to keep track of each column added (by storing column references in a separate array), so every time there's a new column, I will have to redraw the old ones again (should I?). But this doesn't look feasible in terms of D3's power or performance.
For this I also applied an anonymous class "update" to every circle drawn, so that I can pick all "update" circles, but here comes another issue, that how would I know the new place for these circles? Do I need to traverse the data again for that particular series? and have to do that drawing again? For every new series, keeping track of old-ones and redrawing them, will increase the processing over-head turn by turn. Is there any handy solution or built-in (d3's) mechanism to re-adjust previous drawing according to new scale?
Please suggest something. I am sure I am lacking some key points.

Resources