Equation for non linear data - math

I have a set of non linear data. The data is the X & Y coordinates of different objects/points in a video( that is the x&y pixel co-ordinates of same objects in all the frames in a video.) upon plotting the values in one frame, I am getting a nonlinear graph as shown in the picture.
I want to form an equation for this graph so that, if I have a known X coorrdinate in this frame, then the corresponding Y coordinate can be obtained using this equation.(kind of predicting the new position, I am not sure this idea is correct or not)
OR
If this idea is illogical, can you suggest something that will work so that I can predict the location of new object using these data.
Any help or new ideas is highly appreciated.
A sample of my data is given below:
X Y
----------
214 182
830 185
1451 173
219 554
1453 548
214 941
830 934
1455 942
213 190
829 193
1450 181
218 561
1452 555
214 945
830 938
1455 946
213 190
828 193
1451 182
219 560
1452 554
214 945
830 938
1455 946
213 190
829 193
1450 181
219 556
1453 550
215 936
830 929
1455 937
I have selected 9 objects in each frame, so the first 9 data set belongs to one frame, and so on..

Your XY data looks like this:
There are clusters located on corners and mid-edges.
and when the lines that connect successive points are added
The points should come in groups of 8, in the sequence shown above. You can predict the location of a point using the index
// predict location `(x,y)` of point based on index `i`
point = MOD(i-1,8)+1; // get number 1-8 of the point (as shown above)
select case point
case [1,4,6] : x = 215;
case [2,7] : x = 829;
case [3,5,8] : x = 1463;
end select
select case point
case [1,2,3] : y = 186;
case [4,5] : y = 555;
case [6,7,8] : y = 940;
end select

You have to cut this curve in lot of linear lines, so following the value of X, you will be on linear line and its easy to calculate the equation of line knowing 2 points of this line

Related

Remove row with specific value

I have the following data:
library(data.table)
sales <- data.table(Customer = c(192,964,929,345,898,477,705,804,188,231,780,611,420,816,171,212,504,526,471,979,524,410,557,152,417,359,435,820,305,268,763,194,757,475,351,933,805,687,813,880,798,327,602,710,785,840,446,891,165,662),
Producttype = c(1,2,3,2,3,3,2,1,3,3,1,1,2,2,1,3,1,3,3,1,1,1,1,3,3,3,3,2,1,1,3,3,3,3,1,1,3,3,3,2,3,2,3,3,3,2,1,2,3,1),
Price = c(469,721,856,956,554,188,429,502,507,669,427,582,574,992,418,835,652,983,149,917,370,617,876,337,663,252,599,949,915,556,313,842,892,724,415,307,900,114,439,456,541,261,881,757,199,308,958,374,409,738),
Quarter = c(2,3,3,4,4,1,4,4,3,3,1,1,1,1,1,1,4,1,2,1,3,1,2,3,3,4,4,1,1,4,1,1,3,2,1,3,3,2,2,2,1,4,3,3,1,1,1,3,1,1))
How can I remove (let's say) the row in which Customer = 891?
And then I have another question:
If I want to manipulate the data I use data [row, column]. But when I want to use only the rows in which Quarter equals (for example) 4. I use data [Quarter = 4,] Why is it not data [, Quarter = 4] since Quarter is a column and not a row?
I did not find an appropriate answer in the internet which really explains the why.
Thank you.
You have used 'data.table' function to import your data, so you could write :
sales[Customer != 891,]
The data[Quarter = 4, ], ensures that all columns should be returned for the rows where Quarter is equal to 4. The comma(,) is necessary to only select the rows, and not the column Quarter = 4.
When you use indexing, ie, data[row, column] you are telling R to look for either a specific row or column index.
Row: sales[sales$Customer %in% c(192,964),] translates to "search the specific column Customer in the data frame (or table) for any rows that have values that contain 192 or 964 and isolate them. Note that data.table will allow for sales[Customer %in% c(192, 964),] but data frames cant (use sales[sales$Customer %in% c(192,964),])
Customer Producttype Price Quarter
1: 192 1 469 2
2: 964 2 721 3
Columns sales[, "Customer"] translates to "search the data frame (or table) for columns named "Customer" and isolate all its rows
Customer
1: 192
2: 964
3: 929
4: 345
5: 898
...
Note this returns a data table with one column. If you use sales[,Customer] (data table) or sales$Customer (data frame), it will return a vector:
# [1] 192 964 929 345 898 477 705 804 188 231 780 611 420 816 171 212 504 526 471 979 524
# [22] 410 557 152 417 359 435 820 305 268 763 194 757 475 351 933 805 687 813 880 798 327
# [43] 602 710 785 840 446 891 165 662
You can of course combine - if you did, sales[sales$Quarter %in% 1:2, c("Customer", "Producttype")] you would isolate all values of Customer and Producttype which were in quarters 1 and 2:
Customer Producttype
1: 192 1
2: 477 3
3: 780 1
4: 611 1
5: 420 2
...

Report the mean number of characters in Corpus document

So I have a corpus setup reading bunch of text file with paragraphs in them.
library('tm')
my.text.location <- "C:/Users//.../*/"
apapers <- VCorpus(DirSource(my.text.location))
Now I need to find the mean of the characters in each text. Running a
mean(nchar(apapers), na.rm =T) results in a very weird output, more than the number of characters.
Any other way to get the mean?
You didn't supply a reproducible example, but rowMeans(sapply(apapers, nchar)) will return the mean number of characters over all documents. "Content" is the column you need.
A longer version is running a sapply over the corpus counting the number of per document. Transpose this data and turn it into a data.frame. The data.frame will contain two columns, content and meta. Content is the one you need. Taking the mean of the content column will give you the average number of characters in a document. The advantage of this is that you have the table in case you need to report the numbers.
# your code
my_count <- data.frame(t(sapply(apapers, nchar)))
mean(my_count$content)
Reproducible example using the crude dataset:
library(tm)
data("crude")
crude <- as.VCorpus(crude)
# in one statement
rowMeans(sapply(crude, nchar))
content meta
1220.30 453.15
# longer version keeping intermediate results.
my_count <- data.frame(t(sapply(crude, nchar)))
mean(my_count$content)
[1] 1220.3
my_count
content meta
127 527 440
144 2634 458
191 330 444
194 394 441
211 552 441
236 2774 455
237 2747 477
242 930 453
246 2115 440
248 2066 466
273 2241 458
349 593 492
352 621 468
353 591 445
368 629 440
489 876 445
502 1166 446
543 463 447
704 1797 456
708 360 451

R is not taking the parameter hgap in layout_with_sugiyama

I'm working on R on a graph and I'd like to have a hierarchical plot, based on the values in the vector S (a value for each node).
lay2 <- layout_with_sugiyama(grafo, attributes="all", layers = S, hgap=10, vgap=10)
plot(lay2$extd_graph, vertex.label.cex=0.5)
However, the paramaters hgap e vgap are not taken and the graph is really confused (even because I've got 162 nodes).
I'm doing something wrong or there is another way in which I can do a hierarchical graph?
I believe that layout_with_sugiyama is working just fine,
but you may be misinterpreting the output. Since you do
not provide any data, I will illustrate with some randomly
generated data.
library(igraph)
set.seed(1234)
grafo = erdos.renyi.game(162, 0.03)
lay2 <- layout_with_sugiyama(grafo, attributes="all",
hgap=10, vgap=10)
plot(lay2$extd_graph, vertex.label.cex=0.5, vertex.size=9)
I think the source of your question is the fact that the nodes
are a bit crowded together in the horizontal direction. But
that should be expected. Let's analyze the layout, starting
with the easy part, the vertical direction.
table(lay2$layout[,2])
1 11 21 31 41
24 82 42 13 1
You can see that vgap worked. The spacing is 10 units apart.
The second line up (y=11) has 82 nodes. Unless the nodes are
tiny, 82 nodes on a single, horizontal line will overlap.
But aren't they supposed to have spacing of at least 10?
They do! Let's look at that second line.
sort(lay2$layout[lay2$layout[,2]==11,1])
[1] -25 -15 -5 5 15 25 35 45 55 65 75 85 95 105 115 125 135 230
[19] 240 260 270 280 290 300 310 320 330 340 350 360 370 380 390 400 410 420
[37] 430 440 450 460 470 480 490 500 510 520 530 540 550 560 570 580 590 600
[55] 610 620 630 640 655 665 675 685 695 720 730 740 750 760 770 780 790 800
[73] 810 820 830 840 850 860 870 880 890 910
Looking at the whole graph, there is a slightly broader range.
range(lay2$layout[,1])
[1] -65 910
None of the numbers are less that 10 apart - as requested. hgap worked too!
However, what happens when you try to plot that? If you read the part of the
?igraph.plotting help page that refers to the parameter rescale,
you will see:
rescale:
Logical constant, whether to rescale the coordinates to the [-1,1]x-1,1 interval. Defaults to TRUE, the layout will be rescaled.
So the layout will be rescaled to a range of -1,1 and then plotted.
Scaled or not, you need to fit 82 nodes in a single, horizontal row,
so it is very difficult to avoid overlapping nodes.

Natural Neighbor Interpolation in R

I need to conduct Natural Neighbor Interpolation (NNI) via R in order to smooth my numeric data. For example, say I have very spurious data, my goal is to use NNI to model the data neatly.
I have several hundred rows of data (one observation for each postcode), alongside latitudes and longitudes. I've made up some data below:
Postcode lat lon Value
200 -35.277272 149.117136 7
221 -35.201372 149.095065 38
800 -12.801028 130.955789 27
801 -12.801028 130.955789 3
804 -12.432181 130.84331 29
810 -12.378451 130.877014 20
811 -12.376597 130.850489 3
812 -12.400091 130.913672 42
814 -12.382572 130.853877 32
820 -12.410444 130.856124 39
821 -12.426641 130.882367 39
822 -12.799278 131.131697 49
828 -12.474896 130.907378 38
829 -14.460879 132.280002 34
830 -12.487233 130.972637 8
831 -12.480066 130.984006 49
832 -12.492269 130.990891 29
835 -12.48138 131.029173 33
836 -12.525546 131.103025 40
837 -12.460094 130.842663 39
838 -12.709507 130.995407 28
840 -12.717562 130.351316 22
841 -12.801028 130.955789 8
845 -13.038663 131.072091 19
846 -13.226806 131.098416 50
847 -13.824123 131.835799 11
850 -14.464497 132.262021 2
851 -14.464497 132.262021 23
852 -14.92267 133.064654 36
854 -16.81839 137.14707 17
860 -19.648306 134.186642 3
861 -18.94406 134.318373 8
862 -20.231104 137.762232 28
870 -12.436101 130.84059 24
871 -12.436101 130.84059 16
Is there any kind of package that will do this? I should mention, that the only predictors I am using in this model are latitude and longitude. If there isn't a package than can do this, how can I implement it manually. I've searched extensively and I can't figure out how to implement this in R. I have seen one or two other SO posts, but they haven't assisted me in figuring this out.
Please let me know if there's anything I must add to the question. Thanks.
I suggest the following:
Reproject the data to the corresponding UTM Zone.
Use R WhiteboxTools package to process the data using natural neighbour interpolation.

How do I make sure numbers are numeric from a .txt?

I'm setting up a script to extract the thickness and voltages from a single column text file and perform a Weibull distribution on it. When I try to use fitdistr() I get an error stating "'x' must be a non-empty numeric vector". R is supposed to interpret numbers in text files as numeric but that doesn't seem to be happening. Any thoughts?
filename <- "SampleBreakdownSet.txt"
d <- read.table(filename, header = FALSE, sep = "")
#Extract thickness from the dataset; set to variable t
t = d[1,1]
#Extract the breakdown voltages and toss into dataset, BDV
BDV = tail(d,(nrow(d)-1))
#Calculates the breakdown field from the thickness and BDV
BDF = (BDV*10000) / t
#Calculates the Weibull parameters from the input breakdown voltages.
fitdistr(BDF, densfun ="weibull", lower = 0)
fitdistr(BDF, densfun ="weibull", lower = 0)
Error in fitdistr(BDF, densfun = "weibull", lower = 0) :
'x' must be a non-empty numeric vector
Sample data I'm using:
2
200
250
450
320
100
400
200
403
502
203
420
120
342
304
253
423
534
534
243
253
423
123
433
534
234
633
432
342
543
532
123
453
231
532
342
213
243
You are passing a data.frame to fitdistr, but you should be passing the vector itself.
Try this:
d <- read.table(text='200
250
450
320
100
400
200
403
502
203
420
120
342
304
253
423
534
534
243
253
423
123
433
534
234
633
432
342
543
532
123
453
231
532
342
213
243', header=FALSE)
t <- d[1,1]
#Extract the breakdown voltages and toss into dataset, BDV
BDV <- d[-1, 1]
BDF <- (BDV*10000) / t
library(MASS)
fitdistr(BDF, densfun ="weibull", lower = 0)
You could also refer to the relevant column when calling fitdistr, e.g.:
fitdistr(BDF$V1, densfun ="weibull", lower = 0)
# shape scale
# 2.745485e+00 1.997509e+04
# (3.716797e-01) (1.283667e+03)

Resources