Suppose we have two objects created using the density() function. Is there a way to add these two objects to get another density (or similar) object?
For example:
A = rnorm(100)
B = rnorm(1000)
dA = density(A)
dB = density(B)
dC = density(c(A, B))
Is there a way to get the dC object from the dA and dB objects? Some king of sum operation?
A return from density is a list with these parts:
> str(dA)
List of 7
$ x : num [1:512] -3.67 -3.66 -3.65 -3.64 -3.63 ...
$ y : num [1:512] 0.00209 0.00222 0.00237 0.00252 0.00268 ...
$ bw : num 0.536
$ n : int 4
$ call : language density.default(x = A)
$ data.name: chr "A"
$ has.na : logi FALSE
- attr(*, "class")= chr "density"
note the original data isn't in there, so we can't get that and simply do something like dAB = density(c(dA$data, dB$data)).
The x and y components form the curve of the density, which you can plot with plot(dA$x, dA$y). You might think all you need to do is add the y values from two density objects but there's no guarantee they'll be at the same x points.
So maybe you think you can interpolate one to the same x points and then add the y values. But that won't integrate to 1 like a proper density ought to, so what you should do is scale dA$y and dB$y according to the fraction of points in each component density - which you can get from the dA$n component.
If you don't understand that last point, consider the following two densities, one from 1000 points and one from 10:
dA = density(runif(1000))
dB = density(runif(500)+10)
the first is a uniform between 0 and 1, the second a uniform between 10 and 11. The height of both uniforms is 1, and their ranges don't overlap, so if you added them you'd get two steps of equal height. But the density of their unions:
dAB = density(c(runif(1000), runif(500)+10))
is a density with twice as much mass between 0 and 1 than between 10 and 11. When adding densities taken from samples you need to weight by the sample size.
So if you can interpolate them to the same x values, and then sum the y values scaled according to the n values as weights, you can get something that would approximate density(c(A,B)).
I am trying to compute rP+r'Q on Sage where r,r' are positive integers and
P=(38*a + 31 : 69*a + 77 : 1),
Q=(106*a + 3 : a + 103 : 1)
two points on the elliptic curve E:y^2=x^3-x over GF(107^2).
Now I tried to define P and Q on sage simply as I did here but this gives a syntax error. So how do I define points on Sage?
So the answer was rediculously simple. Although sage gives you the points (a: b: c), you have to define your point like (a, b, c). How did I just spend over 1h to find out.
Gauss function has an infinite number of jump discontinuities at x = 1/n, for positive integers.
I want to draw diagram of Gauss function.
Using Maxima cas I can draw it with simple command :
f(x):= 1/x - floor(1/x); plot2d(f(x),[x,0,1]);
but the result is not good ( near x=0 it should be like here)
Also Maxima claims:
plot2d: expression evaluates to non-numeric value somewhere in plotting
range.
I can define picewise function ( jump discontinuities at x = 1/n, for positive integers )
so I tried :
define( g(x), for i:2 thru 20 step 1 do if (x=i) then x else (1/x) - floor(1/x));
but it don't works.
I can also use chebyshew polynomials to aproximate function ( like in : A Graduate Introduction to Numerical Methods From the Viewpoint of Backward Error Analysis by Corless, Robert, Fillion, Nicolas)
How to do it properly ?
For plot2d you can set the adapt_depth and nticks parameters. The default values are 5 and 29, respectively. set_plot_option() (i.e. with no argument) returns the current list of option values. If you increase adapt_depth and/or nticks, then plot2d will use more points for plotting. Perhaps that makes the figure look good enough.
Another way is to use the draw2d function (in the draw package) and explicitly tell it to plot each segment. We know that there are discontinuities at 1/k, for k = 1, 2, 3, .... We have to decide how many segments to plot. Let's say 20.
(%i6) load (draw) $
(%i7) f(x):= 1/x - floor(1/x) $
(%i8) makelist (explicit (f, x, 1/(k + 1), 1/k), k, 1, 20);
(%o8) [explicit(f,x,1/2,1),explicit(f,x,1/3,1/2),
explicit(f,x,1/4,1/3),explicit(f,x,1/5,1/4),
explicit(f,x,1/6,1/5),explicit(f,x,1/7,1/6),
explicit(f,x,1/8,1/7),explicit(f,x,1/9,1/8),
explicit(f,x,1/10,1/9),explicit(f,x,1/11,1/10),
explicit(f,x,1/12,1/11),explicit(f,x,1/13,1/12),
explicit(f,x,1/14,1/13),explicit(f,x,1/15,1/14),
explicit(f,x,1/16,1/15),explicit(f,x,1/17,1/16),
explicit(f,x,1/18,1/17),explicit(f,x,1/19,1/18),
explicit(f,x,1/20,1/19),explicit(f,x,1/21,1/20)]
(%i9) apply (draw2d, %);
I have made a list of segments with ending points. The result is :
and full code is here
Edit: smaller size with shorter lists in case of almost straight lines,
if (n>20) then iMax:10 else iMax : 250,
in the GivePart function
I'm using the method dbscan::dbscan in order to cluster my data by location and density.
My data looks like this:
str(data)
'data.frame': 4872 obs. of 3 variables:
$ price : num ...
$ lat : num ...
$ lng : num ...
Now I'm using following code:
EPS = 7
cluster.dbscan <- dbscan(data, eps = EPS, minPts = 30, borderPoints = T,
search = "kdtree")
plot(lat ~ lng, data = data, col = cluster.dbscan$cluster + 1L, pch = 20)
but the result isn't satisfying at all, the point's aren't really clustered.
I would like to have the clusters nicely defined, something like this:
I also tried to use use a decision tree classifier tree:tree which works better, but I can't tell if it is really a good classification.
File:
http://www.file-upload.net/download-11246655/file.csv.html
Question:
is it possible to achieve what I want?
am I using the right method?
should I play more with the parameters? if yes, with which?
This is the output of a careful density-based clustering using the quite new HDBSCAN* algorithm.
Using Haversine distance, instead of Euclidean!
It identified some 50-something regions that are substantially more dense than their surroundings. In this figure, some clusters look as if they had only 3 elements, but they do have many more.
The outermost area, these are the noise points that do not belong to any cluster at all!
(Parameters used: -verbose -dbc.in file.csv -parser.labelIndices 0,1 -algorithm clustering.hierarchical.extraction.HDBSCANHierarchyExtraction -algorithm SLINKHDBSCANLinearMemory -algorithm.distancefunction geo.LatLngDistanceFunction -hdbscan.minPts 20 -hdbscan.minclsize 20)
OPTICS is another density-based algorithm, here is a result:
Again, we have a "noise" area with red dots are not dense at all.
Parameters used: -verbose -dbc.in file.csv -parser.labelIndices 0,1 -algorithm clustering.optics.OPTICSXi -opticsxi.xi 0.1 -algorithm.distancefunction geo.LatLngDistanceFunction -optics.minpts 25
The OPTICS plot for this data set looks like this:
You can see there are many small valleys that correspond to clusters. But there is no "large" structure here.
You probably were looking for a result like this:
But in fact, this is a meaningless and rather random way of breaking the data into large chunks. Sure, it minimizes variance; but it does not at all care about the structure of the data. Points within one cluster will frequently have less in common than points in different clusters. Just look at the points at the border between the red, orange, and violet clusters.
Last but not least, the oldtimers: hierarchical clustering with complete linkage:
and the dendrogram:
(Parameters used: -verbose -dbc.in file.csv -parser.labelIndices 0,1 -algorithm clustering.hierarchical.extraction.SimplifiedHierarchyExtraction -algorithm AnderbergHierarchicalClustering -algorithm.distancefunction geo.LatLngDistanceFunction -hierarchical.linkage CompleteLinkageMethod -hdbscan.minclsize 50)
Not too bad. Complete linkage works on such data rather well, too. But you could merge or split any of these clusters.
You can use something called as Hullplot
In your cases
hullplot(select(data, lng, lat), cluster.dbscan$cluster)
I have a generated elliptic curve of a modulus. I want to list just a few points on it (doesn't matter what they are, I just need one or two) and I was hoping to do:
E.points()
However due to the size of the curve this generates the error:
OverflowError: range() result has too many items
I attempted to list the first four by calling it as such:
E.points()[:4]
However that generated the same error
Is there any way I can make it list just a few points? Maybe some Sage function?
Since you did not include code to reproduce your situation, I take an example curve from the Sage documentation:
sage: E = EllipticCurve(GF(101),[23,34])
Generating random points
You can repeatedly use random_element or random_point to choose points at random:
sage: E.random_point()
(99 : 92 : 1)
sage: E.random_point()
(27 : 80 : 1)
This is probably the simplest way to obtain a few arbitrary points on the curve. random_element works in many places in Sage.
Intersecting with lines
It has the defining polynomial
sage: p = E.defining_polynomial(); p
-x^3 + y^2*z - 23*x*z^2 - 34*z^3
which is homogeneous in x,y,z. One way to find some points on that curve is by intersecting it with straight lines. For example, you could intersect it with the line y=0 and use z=1 to choose representatives (thus omitting representatives at z==0) using
sage: p(y=0,z=1).univariate_polynomial().roots(multiplicities=False)
[77]
So at that point you know that (77 : 0 : 1) is a point on your curve. You can automate things, intersecting with different lines until you have reached the desired number of points:
sage: res = []
sage: y = 0
sage: while len(res) < 4:
....: for x in p(y=y,z=1).univariate_polynomial().roots(multiplicities=False):
....: res.append(E((x, y, 1)))
....: y += 1
....:
sage: res[:4]
[(77 : 0 : 1), (68 : 1 : 1), (23 : 2 : 1), (91 : 4 : 1)]
Adapting points()
You can have a look at how the points() method is implemented. Type E.points?? and you will see that it uses an internal method called _points_via_group_structure. Looking at the source of that (using E._points_via_group_structure?? or the link to the repo), you can see how that is implemented, and probably adapt it to only yield a smaller result. In particular you can see what role that range plays here, and use a smaller range instead.