FancyRpartPlot - What does the number inside the node mean? - r

anyone know what does it means? I have a confusion on the information inside the decision tree below. I tried to find the number sources inside the variable. Hence, i could not find anything

Normally you see the probability of each class and the percentage of cases falling in that class when you plot with fancyRplot- but in your case, your attribute perhaps is a numeric, and then in the box is the mean of responses for that split.

Related

What are the rules for ppp objects? Is selecting two variables possible for an sapply function?

Working with code that describes a poisson cluster process in spatstat. Breaking down each line of code one at a time to understand. Easy to begin.
library(spatstat)
lambda<-100
win<-owin(c(0,1),c(0,1))
n.seeds<-lambda*win$xrange[2]*win$yrange[2]
Once the window is defined I then generate my points using a random generation function
x=runif(min=win$xrange[1],max=win$xrange[2],n=pmax(1,n.seeds))
y=runif(min=win$yrange[1],max=win$yrange[2],n=pmax(1,n.seeds))
This can be plotted straight away I know using the ppp function
seeds<-ppp(x=x,
y=y,
window=win)
plot(seeds)
The next line I add marks to the ppp object, it is apparently describing the angle of rotation of the points, I don't understand how this works right now but that is okay, I will figure out later.
marks<-data.frame(angles=runif(n=pmax(1,n.seeds),min=0,max=2*pi))
seeds1<-ppp(x=x,
y=y,
window=win,
marks=marks)
The first problem I encounter is that an objects called pops, describing the populations of the window, is added to the ppp object. I understand how the values are derived, it is a poisson distribution given the input value mu, which can be any value and the total number of observations equal to points in the window.
seeds2<-ppp(x=x,
y=y,
window=win,
marks=marks,
pops=rpois(lambda=5,n=pmax(1,n.seeds)))
My first question is, how is it possible to add a variable that has no classification in the ppp object? I checked the ppp documentation and there is no mention of pops.
The second question I have is about using double variables, the next line requires an sapply function to define dimensions.
dim1<-pmax(1,sapply(seeds1$marks$pops, FUN=function(x)rpois(n=1,sqrt(x))))
I have never seen the $ function being used twice, and seeds2$marks$pop returns $ operator is invalid for atomic vectors. Could you explain what is going on here?
Many thanks.
That's several questions - please ask one question at a time.
From your post it is not clear whether you are trying to understand someone else's code, or developing code yourself. This makes a difference to the answer.
Just to clarify, this code does not come from inside the spatstat package; it is someone's code using the spatstat package to generate data. There is code in the spatstat package to generate simulated realisations of a Poisson cluster process (which is I think what you want to do), and you could look at the spatstat code for rPoissonCluster to see how it can be done correctly and efficiently.
The code you have shown here has numerous errors. But I will start by answering the two questions in your title.
The rules for creating ppp objects are set out in the help file for ppp. The help says that if the argument window is given, then unmatched arguments ... are ignored. This means that in the line seeds2<-ppp(x=x,y=y,window=win,marks=marks,pops=rpois(lambda=5,n=pmax(1,n.seeds)))
the argument pops will be ignored.
The idiom sapply(seeds1$marks$pops, FUN=f) is perfectly valid syntax in R. If the object seeds1 is a structure or list which has a component named marks, which in turn is a structure or list which has a component named pops, then the idiom seeds1$marks$pops would extract it. This has nothing particularly to do with sapply.
Now turning to errors in the code,
The line n.seeds<-lambda*win$xrange[2]*win$yrange[2] is presumably meant to calculate the expected number of cluster parents (cluster seeds) in the window. This would only work if the window is a rectangle with bottom left corner at the origin (0,0). It would be safer to write n.seeds <- lambda * area(win).
However, the variable n.seeds is used later as it it were the number of cluster parents (cluster seeds). The author has forgotten that the number of seeds is random with a Poisson distribution. So, the more correct calculation would be n.seeds <- rpois(1, lambda * area(win))
However this is still not correct because cluster parents (seed points) outside the window can also generate offspring points inside the window. So, seed points must actually be generated in a larger window obtained by expanding win. The appropriate command used inside spatstat to generate the cluster parents is bigwin <- grow.rectangle(Frame(win), cluster_diameter) ; Parents <- rpoispp(lambda, bigwin)
The author apparently wants to assign two mark values to each parent point: a random angle and a random number pops. The correct way to do this is to make the marks a data frame with two columns, for example marks(seeds1) <- data.frame(angles=runif(n.seeds, max=2*pi), pops=rpois(n.seeds, 5))

what does positive class mean in classification tree, in R?

So I'm a newbie in data science, and have some question regarding tree model.
Below is the result of my classification modeling, but I'm having trouble intrepreting it.
As you can see at the very bottom line in the bottom-left part of the screen, it says 'positive class : 1'. Our target attribute has value of either 1 or 0. What does that 'positive class : 1' mean in this case?
I very much appreciate your help. Thanks. :)
Positive Class: 1 indicates that the positive class, i.e. the class which you are most interested in, is labeled as 1 in your dataset; it is not the numerical value R uses under the levels of the factor, but the value with which it is encoded/written in the dataset.
For more information about it, you can see the documentation of confusionMatrix function from caret, which I believe is the one you used. Look for the optional argument named positive.
Positive class is the class that is related to your objective function. For example, if you want to classify whether the objects are present or not in a given scenario. So for all the data samples where objects are present will be considered in the positive class. So in your problem, you want to identify in which case you will have 1 as a target variable so it is considered as a positive class.

Finding values for a range of variables against the same constant

I am currently attempting to find the bias for a range of means previously calculated against the same constant. I have been using the code
b_all<-bias(1,c(x2:x6))
But its only returning the bias of the first variable x2. I'm sure there is a simple fix that I'm just not seeing. Thanks for the help.
Hard to say without any data to verify, but this could work:
b_all <- sapply(2:6, function(i){bias(1,get(paste0("x", i)))})

Constraints on BsplinesComp

I am using BsplinesComp for a sample problem.
The objective is to maximize the area under the line.
My problem arises when I want to set a constraint for one of the values in the output array that bspline gives. So a value such that the spline goes through that no matter what configuration it is in.
I tried this in two ways and I have uploaded the codes. They are both very badly coded so i think there is a neater way to do so. Links to codes:
https://gist.github.com/stackoverflow38/5eae1e86c5802a4df91becdf580d28c5
1- Using an extra explicit component in which the middle array value is imposed to be a selected value
2- Tried to use an execcomp but I get an error. Target shapes do not match.
I vaguely remember reading such a question but could not find it.
Overall I am trying to set a constraint for either the first, middle or last value of the bspline and some range that it should be in.
Similar to the plots here
So, I think you want to know the best way to do this, and the best way is to not use any extra components at all. You can directly constrain a single point in the output of the BsplinesComp by using the "indices" argument in the add_constraint call. Here, I constrain the first point in the spline to lie on the interval [-1, 1].
model.add_constraint('interp.h', lower=-1, upper=1, indices=[0])
Running the model gives me a shape that looks more like one of the ones you included.
Just for reference, for the errors you got with 1 and 2:
Not sure what is wrong here, but maybe the version you uploaded isn't the latest. You never used the AeraComp in a constraint, so it didn't do anything.
The exception was due to a size mismatch in connecting the vector output of the Bsplines comp to a scaler expression. You can do this by specifying the "src_indices", giving it a list of which indices in the array to connect to the target. model.connect('interp.h', 'execcomp.x', src_indices=[0])

Prediction with cpdist using "probabilities" as evidence

I have a very quick question with an easy reproducible example that is related to my work on prediction with bnlearn
library(bnlearn)
Learning.set4=cbind(c("Yes","Yes","Yes","No","No","No"),c(9,10,8,3,2,1))
Learning.set4=as.data.frame(Learning.set4)
Learning.set4[,c(2)]=as.numeric(as.character(Learning.set4[,c(2)]))
colnames(Learning.set4)=c("Cause","Cons")
b.network=empty.graph(colnames(Learning.set4))
struct.mat=matrix(0,2,2)
colnames(struct.mat)=colnames(Learning.set4)
rownames(struct.mat)=colnames(struct.mat)
struct.mat[1,2]=1
bnlearn::amat(b.network)=struct.mat
haha=bn.fit(b.network,Learning.set4)
#Some predictions with "lw" method
#Here is the approach I know with a SET particular modality.
#(So it's happening with certainty, here for example I know Cause is "Yes")
classic_prediction=cpdist(haha,nodes="Cons",evidence=list("Cause"="Yes"),method="lw")
print(mean(classic_prediction[,c(1)]))
#What if I wanted to predict the value of Cons, when Cause has a 60% chance of being Yes and 40% of being no?
#I decided to do this, according the help
#I could also make a function that generates "Yes" or "No" with proper probabilities.
prediction_idea=cpdist(haha,nodes="Cons",evidence=list("Cause"=c("Yes","Yes","Yes","No","No")),method="lw")
print(mean(prediction_idea[,c(1)]))
Here is what the help says:
"In the case of a discrete or ordinal node, two or more values can also be provided. In that case, the value for that node will be sampled with uniform probability from the set of specified values"
When I predict the value of a variable using categorical variables, I for now just used a certain modality of said variable as in the first prediction in the example. (Having the evidence set at "Yes" gets Cons to take a high value)
But if I wanted to predict Cons without knowing the exact modality of the variable Cause with certainty, could I use what I did in the second prediction (Just knowing the probabilities) ?
Is this an elegant way or are there better implemented ones I don't know off?
I got in touch with the creator of the package, and I will paste his answer related to the question here:
The call to cpquery() is wrong:
Prediction_idea=cpdist(haha,nodes="Cons",evidence=list("Cause"=c("Yes","Yes","Yes","No","No")),method="lw")
print(mean(prediction_idea[,c(1)]))
A query with the 40%-60% soft evidence requires you to place these new probabilities in the network first
haha$Cause = c(0.40, 0.60)
and then run the query without an evidence argument. (Because you do not have any hard evidence, really, just a different probability distribution for Cause.)
I will post the code that lets me do what I wanted off of the fitted network from the example.
change=haha$Cause$prob
change[1]=0.4
change[2]=0.6
haha$Cause=change
new_prediction=cpdist(haha,nodes="Cons",evidence=TRUE,method="lw")
print(mean(new_prediction[,c(1)]))

Resources