I would like to use an equivalent to scipy griddata because of an error appears :
QH7074 qhull warning: more than 16777215 ridges. ID field overflows and two ridges
may have the same identifier. Otherwise output ok.
which sometimes kill my calculation or just very slow...
Currently i have :
test_interp= griddata((xx_points.flatten(),yy_points.flatten()),values.flatten(), (xi+X,yi+V), method='linear',fill_value=nan)
I had exactly the same error message and it looks like a limitation of griddata for big data, according to this thread. The limit is 2^24=16777216 'ridges' (I don't know exactly what it refers to). I did exactly the same interpolation with less data points and this problem didn't occur. My guess is that you should find a way to split your dataset to be below the 2^24 threshold.
Related
Working with code that describes a poisson cluster process in spatstat. Breaking down each line of code one at a time to understand. Easy to begin.
library(spatstat)
lambda<-100
win<-owin(c(0,1),c(0,1))
n.seeds<-lambda*win$xrange[2]*win$yrange[2]
Once the window is defined I then generate my points using a random generation function
x=runif(min=win$xrange[1],max=win$xrange[2],n=pmax(1,n.seeds))
y=runif(min=win$yrange[1],max=win$yrange[2],n=pmax(1,n.seeds))
This can be plotted straight away I know using the ppp function
seeds<-ppp(x=x,
y=y,
window=win)
plot(seeds)
The next line I add marks to the ppp object, it is apparently describing the angle of rotation of the points, I don't understand how this works right now but that is okay, I will figure out later.
marks<-data.frame(angles=runif(n=pmax(1,n.seeds),min=0,max=2*pi))
seeds1<-ppp(x=x,
y=y,
window=win,
marks=marks)
The first problem I encounter is that an objects called pops, describing the populations of the window, is added to the ppp object. I understand how the values are derived, it is a poisson distribution given the input value mu, which can be any value and the total number of observations equal to points in the window.
seeds2<-ppp(x=x,
y=y,
window=win,
marks=marks,
pops=rpois(lambda=5,n=pmax(1,n.seeds)))
My first question is, how is it possible to add a variable that has no classification in the ppp object? I checked the ppp documentation and there is no mention of pops.
The second question I have is about using double variables, the next line requires an sapply function to define dimensions.
dim1<-pmax(1,sapply(seeds1$marks$pops, FUN=function(x)rpois(n=1,sqrt(x))))
I have never seen the $ function being used twice, and seeds2$marks$pop returns $ operator is invalid for atomic vectors. Could you explain what is going on here?
Many thanks.
That's several questions - please ask one question at a time.
From your post it is not clear whether you are trying to understand someone else's code, or developing code yourself. This makes a difference to the answer.
Just to clarify, this code does not come from inside the spatstat package; it is someone's code using the spatstat package to generate data. There is code in the spatstat package to generate simulated realisations of a Poisson cluster process (which is I think what you want to do), and you could look at the spatstat code for rPoissonCluster to see how it can be done correctly and efficiently.
The code you have shown here has numerous errors. But I will start by answering the two questions in your title.
The rules for creating ppp objects are set out in the help file for ppp. The help says that if the argument window is given, then unmatched arguments ... are ignored. This means that in the line seeds2<-ppp(x=x,y=y,window=win,marks=marks,pops=rpois(lambda=5,n=pmax(1,n.seeds)))
the argument pops will be ignored.
The idiom sapply(seeds1$marks$pops, FUN=f) is perfectly valid syntax in R. If the object seeds1 is a structure or list which has a component named marks, which in turn is a structure or list which has a component named pops, then the idiom seeds1$marks$pops would extract it. This has nothing particularly to do with sapply.
Now turning to errors in the code,
The line n.seeds<-lambda*win$xrange[2]*win$yrange[2] is presumably meant to calculate the expected number of cluster parents (cluster seeds) in the window. This would only work if the window is a rectangle with bottom left corner at the origin (0,0). It would be safer to write n.seeds <- lambda * area(win).
However, the variable n.seeds is used later as it it were the number of cluster parents (cluster seeds). The author has forgotten that the number of seeds is random with a Poisson distribution. So, the more correct calculation would be n.seeds <- rpois(1, lambda * area(win))
However this is still not correct because cluster parents (seed points) outside the window can also generate offspring points inside the window. So, seed points must actually be generated in a larger window obtained by expanding win. The appropriate command used inside spatstat to generate the cluster parents is bigwin <- grow.rectangle(Frame(win), cluster_diameter) ; Parents <- rpoispp(lambda, bigwin)
The author apparently wants to assign two mark values to each parent point: a random angle and a random number pops. The correct way to do this is to make the marks a data frame with two columns, for example marks(seeds1) <- data.frame(angles=runif(n.seeds, max=2*pi), pops=rpois(n.seeds, 5))
I'm a total R beginner and try to cluster user data using the function skmeans.
I always get the error message:
"Error in if (!all(row_norms(x) > 0)) stop("Zero rows are not allowed.") :
missing value where TRUE/FALSE needed".
There already is a topic about this error message explaining that zeros are not allowed in rows.
However, my blueprint for what I'm trying to do is an example based on a data set which is also full of zeros. Working with this example, the error message does not appear and the function works fine. The error message only occurs when I apply the same procedure to my data set which doesn't seem different from the blueprint's data set.
Here's the function used for the kmeans:
weindaten.clusters <- skmeans(wendaten.tr, 5, method="genetic")
And here's the data set:
For my own data set, I used this function
kunden.cluster<- skmeans(test4, 5, method="genetic")
for this data set:
Could somebody please help me understand what the difference between the two data sets is (vector vs. something else maybe) and how I can change my data to be able to use skeams?
You cannot use spherical k-means on this data.
Spherical k-means uses angles for similarity. But the all-zero row cannot be used in angular computations.
Choose a different algorithm, unless you can treat the all-zero roe specially (for example on text, this would be an empty document).
I am using BsplinesComp for a sample problem.
The objective is to maximize the area under the line.
My problem arises when I want to set a constraint for one of the values in the output array that bspline gives. So a value such that the spline goes through that no matter what configuration it is in.
I tried this in two ways and I have uploaded the codes. They are both very badly coded so i think there is a neater way to do so. Links to codes:
https://gist.github.com/stackoverflow38/5eae1e86c5802a4df91becdf580d28c5
1- Using an extra explicit component in which the middle array value is imposed to be a selected value
2- Tried to use an execcomp but I get an error. Target shapes do not match.
I vaguely remember reading such a question but could not find it.
Overall I am trying to set a constraint for either the first, middle or last value of the bspline and some range that it should be in.
Similar to the plots here
So, I think you want to know the best way to do this, and the best way is to not use any extra components at all. You can directly constrain a single point in the output of the BsplinesComp by using the "indices" argument in the add_constraint call. Here, I constrain the first point in the spline to lie on the interval [-1, 1].
model.add_constraint('interp.h', lower=-1, upper=1, indices=[0])
Running the model gives me a shape that looks more like one of the ones you included.
Just for reference, for the errors you got with 1 and 2:
Not sure what is wrong here, but maybe the version you uploaded isn't the latest. You never used the AeraComp in a constraint, so it didn't do anything.
The exception was due to a size mismatch in connecting the vector output of the Bsplines comp to a scaler expression. You can do this by specifying the "src_indices", giving it a list of which indices in the array to connect to the target. model.connect('interp.h', 'execcomp.x', src_indices=[0])
I was trying to run some entropy() calculations on Force Platform data and i get a warning message:
> library(entropy)
> d2 <- read.csv("c:/users/SLA9DI/Documents/data2.csv")
> entropy(d2$CoPy, method="MM")
[1] 10.98084
> entropy(d2$CoPx, method="MM")
[1] 391.2395
Warning message:
In log(freqs) : NaNs produced
I am sure it is because the entropy() is trying to take the log of a negative number. I also know R can do complex numbers using complex(), however i have not been successful in getting it to work with my data. I did not get this error on my CoPy data, only the CoPx data, since a force platform gets Center of Pressure data in 2 dimensions. Does anyone have any suggestions on getting complex() to work on my data set or is there another function that would work better to try and get a proper entropy calculation? Entropy shouldn't be that much greater in CoPx compared to CoPy. I also tried it with some more data sets from other subjects and the same thing was popping up, CoPx entropy measures were giving me warning messages and CoPy measurements were not. I am attaching a data set link so anyone can try it out for themselves and see if they can figure it out, as the data is a little long to just post into here.
Data
Edit: Correct Answer
As suggested, i tried the table(...) function and received no warning/error message and the entropy output was also in the expected range as well. However, i apparently overlooked a function in the package discretize() and that is what you are supposed to use to correctly setup the data for entropy calculation.
I think there's no point in applying the entropy function on your data. According to ?entropy, it
estimates the Shannon entropy H of the random variable Y from the corresponding observed counts y
(emphasis mine). This means that you need to convert your data (which seems to be continuous) to count data first, for instance by binning it.
Im running an optimisation routine using optim in R and im telling the programme what i want returned. for example, if i put return(op1$par), it will return all 4 of my variable values. Thats fine, and if i run return(op1), I obviously get all the information from the optimisation routine (par, value, convergence etc). However, in this format, the par values arent accessible in the output, it simply details that there are 4 values.
Now what i need is to the get the parameter values and the convergence information at the same time. R wont let me call this return(op1$par, op1$convergence) so im looking for the best way to get these two entities in one run?
I should specify that im writing this to a file for 1000s of iterations and not just looking to call it up once on screen.
Cheers
Try something like this:
return(c(Parameters=op1$par, Convergence=op1$convergence))
The names Parameters and Convergence are only for identifying what are the parameters and what is the convergence, since this result will be a vector.
By design, a function can return only one object (or else assignments like a <- fn(b) would get confusing; which thing do you assign?). But that object can be a vector, or a list (which is what optim does). So wrap your arguments in something like
return(c(par=op1$par, convergence=op1$convergence))
or more generally (for objects of different types),
return(list(par=op1$par, convergence=op1$convergence))