How to use Splot for data in gnuplot - grid

I am from Brazil, so my english is terrible.. I am a begginer in gnuplot and I've been trying to plot a surface. I have a x, y, z data, like points in a 3D space: Points_3D
I understood that i have to grid the data, and i tried to use splot in this way:
set dgrid3d 11, 7
splot 'abs.txt' u 1:2:3 with lines title 'abs'
As you can see, I don't have a lot of point (only 8x12), then the surface plotted is like this: Surface, but this is not the kind of surface I want, because it is not getting all the point, and i would like a softened surface, without this peaks.
I tried to use othes values at "set dgrid3d", but it didn't work. Does anyone know what i should do?
In an other data i have a lot of values in y and only a few values in x, like 8x100, what should i do in this case also??
My data is something like:
2 0.250000000 0.33333334326744080
2 0.500000000 0.33333334326744080
2 1.00000000 0.33333334326744080
2 2.00000000 0.33333334326744080
2 4.00000000 0.33333331346511841
2 8.00000000 0.33333328366279602
2 16.0000000 0.33333316445350647
2 32.0000000 0.33333286643028259
2 64.0000000 0.33333197236061096
2 128.000000 0.33332949876785278
2 256.000000 0.33332267403602600
2 512.000000 0.33330380916595459
3 0.250000000 8.3333335816860199E-002
3 0.500000000 8.3333335816860199E-002
3 1.00000000 8.3333335816860199E-002
3 2.00000000 8.3333328366279602E-002
3 4.00000000 8.3333313465118408E-002
3 8.00000000 8.3333276212215424E-002
3 16.0000000 8.3333164453506470E-002
3 32.0000000 8.3332858979701996E-002
3 64.0000000 8.3331987261772156E-002
3 128.000000 8.3329580724239349E-002
3 256.000000 8.3322964608669281E-002
3 512.000000 8.3304964005947113E-002
4 0.250000000 3.3333335071802139E-002
and continues...

Add a blank line at the end of every row. like this:
2 0.250000000 0.33333334326744080
2 0.500000000 0.33333334326744080
2 1.00000000 0.33333334326744080
2 2.00000000 0.33333334326744080
2 4.00000000 0.33333331346511841
2 8.00000000 0.33333328366279602
2 16.0000000 0.33333316445350647
2 32.0000000 0.33333286643028259
2 64.0000000 0.33333197236061096
2 128.000000 0.33332949876785278
2 256.000000 0.33332267403602600
2 512.000000 0.33330380916595459
3 0.250000000 8.3333335816860199E-002
3 0.500000000 8.3333335816860199E-002
3 1.00000000 8.3333335816860199E-002
3 2.00000000 8.3333328366279602E-002
3 4.00000000 8.3333313465118408E-002
3 8.00000000 8.3333276212215424E-002
3 16.0000000 8.3333164453506470E-002
3 32.0000000 8.3332858979701996E-002
3 64.0000000 8.3331987261772156E-002
3 128.000000 8.3329580724239349E-002
3 256.000000 8.3322964608669281E-002
3 512.000000 8.3304964005947113E-002
4 0.250000000 3.3333335071802139E-002
.........
then plot it:
splot "abs.txt" u 1:2:3 w pm3d

Related

percentages of a single column of a table in r

I have a table of this style:
a b c d
1 225.4 45 1920 1
2 812.3 101 1930 1
3 623.7 23 1965 2
4 551.7 32 1975 3
5 1374.1 91 1975 3
6 931.0 64 1912 3
How can I get a proportion table of the column d that gets me something like this: 1 33.3 2 16.7 3 50.0
With table(df$d) I get
1 2 3
2 1 3
But with prop.table command I don't get the same results with proportions.
You need both table and prop.table.
prop.table(table(df$d))
# 1 2 3
#0.3333333 0.1666667 0.5000000
We may use proportions from base R
proportions(table(df$d))

How to turn an rpart object into a dendrogram? (as.dendrogram.rpart ?))

I would like a way to turn an rpart tree object into a nested list of lists (a dendrogram). Ideally, the attributes in each node will include the information in the rpart object (impurity, variable and rule that is used for splitting, the number of observations funneled to that node, etc.).
Looking at the rpart$frame object, it is not clear to me how to read it. Any suggestions?
Tiny example:
library(rpart)
fit <- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)
fit$frame
var n wt dev yval complexity ncompete nsurrogate yval2.V1 yval2.V2 yval2.V3 yval2.V4 yval2.V5 yval2.nodeprob
1 Start 81 81 17 1 0.17647059 2 1 1.00000000 64.00000000 17.00000000 0.79012346 0.20987654 1.00000000
2 Start 62 62 6 1 0.01960784 2 2 1.00000000 56.00000000 6.00000000 0.90322581 0.09677419 0.76543210
4 <leaf> 29 29 0 1 0.01000000 0 0 1.00000000 29.00000000 0.00000000 1.00000000 0.00000000 0.35802469
5 Age 33 33 6 1 0.01960784 2 2 1.00000000 27.00000000 6.00000000 0.81818182 0.18181818 0.40740741
10 <leaf> 12 12 0 1 0.01000000 0 0 1.00000000 12.00000000 0.00000000 1.00000000 0.00000000 0.14814815
11 Age 21 21 6 1 0.01960784 2 0 1.00000000 15.00000000 6.00000000 0.71428571 0.28571429 0.25925926
22 <leaf> 14 14 2 1 0.01000000 0 0 1.00000000 12.00000000 2.00000000 0.85714286 0.14285714 0.17283951
23 <leaf> 7 7 3 2 0.01000000 0 0 2.00000000 3.00000000 4.00000000 0.42857143 0.57142857 0.08641975
3 <leaf> 19 19 8 2 0.01000000 0 0 2.00000000 8.00000000 11.00000000 0.42105263 0.57894737 0.23456790
(the function ggdendro:::dendro_data.rpart might be helpful somehow, but I couldn't get it to really solve the problem)
Here is a GitHub gist with the function rpart2dendro for converting an object of class "rpart" to a dendrogram. Note that branches are not weighted in the output object, but it should be fairly straightforward to recursively modify the "height" attributes of the dendrogram to get proportional branch lengths. The Kyphosis example is provided at the bottom.

Ada in R giving me single classification

I am using the function ada in R, and I'm having a little difficulty. I have training data that looks like this
V13 V15 V17 V19
1 0.017241379 0.471264368 0.01449275 0.24637681
2 0.255813953 0.011627907 0.06849315 0.05479452
3 0.040000000 0.400000000 0.06000000 0.10000000
4 0.500000000 0.000000000 0.05128205 0.00000000
5 0.102040816 0.367346939 0.05769231 0.19230769
6 0.561403509 0.105263158 0.11111111 0.00000000
7 0.300813008 0.048780488 0.12222222 0.03333333
8 0.000000000 0.714285714 0.14285714 0.07142857
9 0.328947368 0.013157895 0.01492537 0.00000000
10 0.536585366 0.060975610 0.16071429 0.03571429
11 0.338461538 0.030769231 0.11764706 0.03921569
12 0.033898305 0.322033898 0.11764706 0.21568627
This is what I have stored in the variable
matrix.x
Then I have the response variables y
y
1 1
2 -1
3 1
4 -1
5 1
6 -1
7 -1
8 1
9 -1
10 -1
11 -1
12 1
I simply run the following:
ada.obj = ada(matrix.x, matrix.y)
And then
ada.pred = predict(ada.obj, matrix.x)
And for some reason, I get a matrix with all 1s or all -1s. What am I doing wrong? Ideally, I want the ada.pred to spit out the exact classifications of the training data.
Thanks.
Also how would I go about using the AdabOost1.M1 function in caret package of R?

Combining DF and rpart$where?

If I do DF$where <- tree$where after fitting an rpart object using DF as my data, will each row be mapped to its corresponding leaf through the column where?
Thanks!
As an example of how to demonstrate that this is possibly true (modulo my understanding of your question being correct), we work with the first example in ?rpart:
require(rpart)
fit <- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)
kyphosis$where <- fit$where
> str(kyphosis)
'data.frame': 81 obs. of 5 variables:
$ Kyphosis: Factor w/ 2 levels "absent","present": 1 1 2 1 1 1 1 1 1 2 ...
$ Age : int 71 158 128 2 1 1 61 37 113 59 ...
$ Number : int 3 3 4 5 4 2 2 3 2 6 ...
$ Start : int 5 14 5 1 15 16 17 16 16 12 ...
$ where : int 9 7 9 9 3 3 3 3 3 8 ...
> plot(fit)
> text(fit, use.n = TRUE)
And now look at some tables based on the 'where' vector and some logical tests:
First node:
> with(kyphosis, table(where, Start >= 8.5))
where FALSE TRUE
3 0 29
5 0 12
7 0 14
8 0 7
9 19 0 # so this is the row that describes that split
> fit$frame[9,]
var n wt dev yval complexity ncompete nsurrogate yval2.V1
3 <leaf> 19 19 8 2 0.01 0 0 2.0000000
yval2.V2 yval2.V3 yval2.V4 yval2.V5 yval2.nodeprob
3 8.0000000 11.0000000 0.4210526 0.5789474 0.2345679
Second node:
> with(kyphosis, table(where, Start >= 8.5, Start>=14.5))
, , = FALSE
where FALSE TRUE
3 0 0
5 0 12
7 0 14
8 0 7
9 19 0
, , = TRUE
where FALSE TRUE
3 0 29
5 0 0
7 0 0
8 0 0
9 0 0
And this is the row of fit$frame that describes the second split:
> fit$frame[3,]
var n wt dev yval complexity ncompete nsurrogate yval2.V1
4 <leaf> 29 29 0 1 0.01 0 0 1.0000000
yval2.V2 yval2.V3 yval2.V4 yval2.V5 yval2.nodeprob
4 29.0000000 0.0000000 1.0000000 0.0000000 0.3580247
So I would characterize the value of fit$where as describing the "terminal nodes" which are being labeled as '<leaf>', which may or not be what you were calling the "nodes".
> fit$frame
var n wt dev yval complexity ncompete nsurrogate yval2.V1
1 Start 81 81 17 1 0.17647059 2 1 1.00000000
2 Start 62 62 6 1 0.01960784 2 2 1.00000000
4 <leaf> 29 29 0 1 0.01000000 0 0 1.00000000
5 Age 33 33 6 1 0.01960784 2 2 1.00000000
10 <leaf> 12 12 0 1 0.01000000 0 0 1.00000000
11 Age 21 21 6 1 0.01960784 2 0 1.00000000
22 <leaf> 14 14 2 1 0.01000000 0 0 1.00000000
23 <leaf> 7 7 3 2 0.01000000 0 0 2.00000000
3 <leaf> 19 19 8 2 0.01000000 0 0 2.00000000
yval2.V2 yval2.V3 yval2.V4 yval2.V5 yval2.nodeprob
1 64.00000000 17.00000000 0.79012346 0.20987654 1.00000000
2 56.00000000 6.00000000 0.90322581 0.09677419 0.76543210
4 29.00000000 0.00000000 1.00000000 0.00000000 0.35802469
5 27.00000000 6.00000000 0.81818182 0.18181818 0.40740741
10 12.00000000 0.00000000 1.00000000 0.00000000 0.14814815
11 15.00000000 6.00000000 0.71428571 0.28571429 0.25925926
22 12.00000000 2.00000000 0.85714286 0.14285714 0.17283951
23 3.00000000 4.00000000 0.42857143 0.57142857 0.08641975
3 8.00000000 11.00000000 0.42105263 0.57894737 0.23456790

Changing the scale of x-axis using scale_x_continuous

I am creating a scatter plot using ggplot2. The default gives me an x axis that has every value form 0 to 30. I'd prefer to have it go by 5s or something like that. I have been trying to use scale_x_continuous(), but I get this:
Error: Discrete value supplied to continuous scale
Here is the code that I am trying to work with:
Daily.Average.plot <- ggplot(data = Daily.average, aes(factor(Day), Mass))+
geom_point(aes(color = factor(Temp))) +
scale_x_continuous(breaks = seq(0,30,5))
Daily.Average.plot
When I run this without the scale_x_continuous I get a graph that looks fine with no errors, just the incorrect x axis. All of the columns in the data set are numeric when I check str(), if that matters. Do I have an error in my code, or should I be using something different to change the scale?
Here is a sample of my data set:
N Day Mass Temp
1 1 0.00000000 5
2 2 0.00000000 5
3 3 0.07692308 5
4 4 0.07692308 5
5 5 0.07692308 5
6 6 0.15384615 5
7 7 0.15384615 5
8 8 0.23076923 5
9 9 0.38461538 5
10 10 0.46153846 5
11 1 0.00000000 10
12 2 0.00000000 10
13 3 0.00000000 10
14 4 0.09090909 10
15 5 0.09090909 10
16 6 0.54545455 10
17 7 0.54545455 10
18 8 0.63636364 10
19 9 0.90909091 10
20 10 1.36363636 10
21 1 0.00000000 15
22 2 0.07692308 15
23 3 0.61538462 15
24 4 0.76923077 15
25 5 0.76923077 15
26 6 1.23076923 15
27 7 1.69230769 15
28 8 2.07692308 15
29 9 2.46153846 15
30 10 3.07692308 15

Resources