I have a data set where the 3rd, 5th and 7th line is the confidence interval of the previous lines respectively. For example:
0.1 0.53 0.51 0.29 0.28 0.13 0.12
0.2 0.54 0.53 0.31 0.30 0.14 0.13
0.3 0.57 0.56 0.32 0.31 0.14 0.14
0.4 0.60 0.59 0.34 0.33 0.15 0.15
0.5 0.64 0.63 0.36 0.35 0.16 0.16
0.6 0.69 0.68 0.38 0.37 0.18 0.17
0.7 0.73 0.72 0.41 0.40 0.19 0.18
0.8 0.82 0.80 0.45 0.44 0.22 0.21
0.9 0.88 0.86 0.48 0.47 0.24 0.23
1.0 0.98 0.96 0.53 0.51 0.27 0.27
When plotting the graph, the error bar becomes very large, clearly wrong, as shown in the figure:
My script is simple, but it is not working as I expected. Could someone point me the error?
My script:
reset
set termopt enhanced
set encoding iso_8859_1
set datafile missing '-'
set ylabel 'NDT normalized by symmetrical case'
set xlabel 'Delivery probality'
unset log
unset label
set ytic auto
set xtic auto
set yrange [0:*]
set terminal png size 800,600 enhanced font "Arial,16"
set output 'prob_normal.png' # setando o nome da saída
set key center top inside
f(x) = x
plot "prob-normal.dat" using ($1):($2) title "DC 10.98%" with linespoints ls 1, \
"prob-normal.dat" using ($1):($2):($3) notitle with yerrorbars,\
"prob-normal.dat" using ($1):($4) title "DC 19.35%" with linespoints ls 2, \
"prob-normal.dat" using ($1):($4):($5) notitle with yerrorbars,\
"prob-normal.dat" using ($1):($6) title "DC 42.85%" with linespoints ls 3, \
"prob-normal.dat" using ($1):($6):($7) notitle with yerrorbars
Related
First of all, I'd like to say that I'm completely new to R, and I'm just trying to accomplish this one task.
So, what I'm trying to do is that I'd like to create an network diagram from a weighted matrix. I made an example:
The CSV is a simple correlation matrix that looks like this:
,A,B,C,D,E,F,G
A,1,0.9,0.64,0.43,0.38,0.33,0.33
B,0.9,1,0.64,0.33,0.43,0.38,0.38
C,0.64,0.64,1,0.59,0.69,0.64,0.64
D,0.43,0.33,0.59,1,0.28,0.23,0.28
E,0.38,0.43,0.69,0.28,1,0.95,0.9
F,0.33,0.38,0.64,0.23,0.95,1,0.9
G,0.33,0.38,0.64,0.28,0.9,0.9,1
I tried to draw the wanted result by myself and came up with this:
To be more precise, I draw the diagram first, then, using a ruler, I took note of the distances, calculated an equation to get the weights and made the CSV table.
The higher the value is, the closer the two points are to each other.
However, whatever I do, the best result I get is this:
And this is how I'm trying to accomplish it, using this tutorial:
First of all, I import my matrix:
> matrix <- read.csv(file = 'test_dataset.csv')
But after printing the matrix out with head(), this already somehow cuts the last line of the matrix:
> head(matrix)
ï.. A B C D E F G
1 A 1.00 0.90 0.64 0.43 0.38 0.33 0.33
2 B 0.90 1.00 0.64 0.33 0.43 0.38 0.38
3 C 0.64 0.64 1.00 0.59 0.69 0.64 0.64
4 D 0.43 0.33 0.59 1.00 0.28 0.23 0.28
5 E 0.38 0.43 0.69 0.28 1.00 0.95 0.90
6 F 0.33 0.38 0.64 0.23 0.95 1.00 0.90
> dim(matrix)
[1] 7 8
I then proceed with removing the first column so the matrix is square again...
> matrix <- data.matrix(matrix)[,-1]
> head(matrix)
A B C D E F G
[1,] 1.00 0.90 0.64 0.43 0.38 0.33 0.33
[2,] 0.90 1.00 0.64 0.33 0.43 0.38 0.38
[3,] 0.64 0.64 1.00 0.59 0.69 0.64 0.64
[4,] 0.43 0.33 0.59 1.00 0.28 0.23 0.28
[5,] 0.38 0.43 0.69 0.28 1.00 0.95 0.90
[6,] 0.33 0.38 0.64 0.23 0.95 1.00 0.90
> dim(matrix)
[1] 7 7
Then I create the graph and try to plot it:
> network <- graph_from_adjacency_matrix(matrix, weighted=T, mode="undirected", diag=F)
> plot(network)
And the result above appears...
So, after spending the last few hours googling and trying way, way more things, this is the closest I've been able to get to.
So I'm asking for your help, thank you very much!
This is all fine.
head() just prints out the first 6 rows of a matrix or dataframe, if you want to see all of it use print() or just the name of the matrix variable.
graph_from_adjacency_matrix produces a link between two nodes if the value is non-zero. That's why you are getting every node linked to every other node.
To get what that tutorial is doing you need to add a line like
matrix[matrix<0.5] <- 0
to remove the edges for correlations below a cut off before you create the graph.
It's still not going to produce a chart like your hand drawn one (where closeness is roughly the correlation), just clump them together if they are above 0.5 correlation.
I am trying to learn R and I am having problems with the way it works. I tried to make an entropy function of variables p and 1-p from scratch and I am having problems when I try to add some ifs to avoid the NaN when dividing by 0.
When I try the custom entropy with the plot, it just works but it shows the NaN when I print the results. But when I try to add the ifs, then it says:
Error in xy.coords(x, y, xlabel, ylabel, log) :
'x' and 'y' lengths differ
entropy <- function(p){
cat("p = " , p)
if (p==0 || p==1) {
result = 0
}else{
result = - p*log2(p)-(1-p)*log2((1-p))
}
cat("\nresult=",result)
return(result)
}
p <- seq(0,1,0.01)
plot(p, entropy(p), type='l', main='Funcion entropia con dos valores posibles')
I don't understand it since I am using a plot of an array as x and a function with that array as parameter as y, so it should be the same lengths with and without ifs.
Console without the ifs:
p = 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13 0.14 0.15 0.16 0.17 0.18 0.19 0.2 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29 0.3 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.4 0.41 0.42 0.43 0.44 0.45 0.46 0.47 0.48 0.49 0.5 0.51 0.52 0.53 0.54 0.55 0.56 0.57 0.58 0.59 0.6 0.61 0.62 0.63 0.64 0.65 0.66 0.67 0.68 0.69 0.7 0.71 0.72 0.73 0.74 0.75 0.76 0.77 0.78 0.79 0.8 0.81 0.82 0.83 0.84 0.85 0.86 0.87 0.88 0.89 0.9 0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 1
result= NaN 0.08079314 0.1414405 0.1943919 0.2422922 0.286397 0.3274449 0.3659237 0.4021792 0.4364698 0.4689956 0.499916 0.5293609 0.5574382 0.5842388 0.6098403 0.6343096 0.6577048 0.680077 0.7014715 0.7219281 0.7414827 0.7601675 0.7780113 0.7950403 0.8112781 0.8267464 0.8414646 0.8554508 0.8687212 0.8812909 0.8931735 0.9043815 0.9149264 0.9248187 0.9340681 0.9426832 0.9506721 0.958042 0.9647995 0.9709506 0.9765005 0.9814539 0.985815 0.9895875 0.9927745 0.9953784 0.9974016 0.9988455 0.9997114 1 0.9997114 0.9988455 0.9974016 0.9953784 0.9927745 0.9895875 0.985815 0.9814539 0.9765005 0.9709506 0.9647995 0.958042 0.9506721 0.9426832 0.9340681 0.9248187 0.9149264 0.9043815 0.8931735 0.8812909 0.8687212 0.8554508 0.8414646 0.8267464 0.8112781 0.7950403 0.7780113 0.7601675 0.7414827 0.7219281 0.7014715 0.680077 0.6577048 0.6343096 0.6098403 0.5842388 0.5574382 0.5293609 0.499916 0.4689956 0.4364698 0.4021792 0.3659237 0.3274449 0.286397 0.2422922 0.1943919 0.1414405 0.08079314 NaN
Console with the ifs:
p = 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13 0.14 0.15 0.16 0.17 0.18 0.19 0.2 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29 0.3 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.4 0.41 0.42 0.43 0.44 0.45 0.46 0.47 0.48 0.49 0.5 0.51 0.52 0.53 0.54 0.55 0.56 0.57 0.58 0.59 0.6 0.61 0.62 0.63 0.64 0.65 0.66 0.67 0.68 0.69 0.7 0.71 0.72 0.73 0.74 0.75 0.76 0.77 0.78 0.79 0.8 0.81 0.82 0.83 0.84 0.85 0.86 0.87 0.88 0.89 0.9 0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 1
result= 0Error in xy.coords(x, y, xlabel, ylabel, log) :
'x' and 'y' lengths differ
You did not create a vector but a scalar since you did not used a vectorized functionality in you if else clause. The result of your function has been just one number.
This should work:
entropy <- function(p){
# initialize a vector of the desired length with zeros
result <- numeric(length(p))
# subset the vector for which you want to apply your formula on
x <- p[!(p %in% c(0,1))]
# overwrite only those positions for which you want to calculate values based
# on your formula
result[!(p %in% c(0,1))] <- - x*log2(x)-(1-x)*log2((1-x))
#cat("\nresult=",result)
return(result)
}
p <- seq(0,1,0.01)
plot(p, entropy(p), type='l', main='Funcion entropia con dos valores posibles')
EDIT:
Even tho I was suggested to do it vectorizing it, I wanted to do it somewhat similar to other languages I know for the moment, since I am starting. I was able to fix it, althought I ended up using a for and printing 2 arrays instead of the function itself.
entropy <- function(p){
if (p==0 || p==1) {
result = 0
}else{
result = - p*log2(p)-(1-p)*log2((1-p))
}
return(result)
}
x <- seq(0,1,0.01)
y <- numeric(length(p))
i = 1
for (p in x) {
y[i] = entropy(p)
cat(x[i],"=",y[i],"\n")
i=i+1
}
plot(x, y, type='l', main='Funcion entropia con dos valores posibles')
I just applied your entropy function to the p vector prior to trying to plot it using the sapply function.
entropy <- function(p){
cat("p = " , p)
if (p==0 || p==1) {
result = 0
}else{
result = - p*log2(p)-(1-p)*log2((1-p))
}
cat("\nresult=",result)
return(result)
}
p <- seq(0,1,0.01)
# Apply the function over all the values of 'p'
entropy_p <- sapply(p,FUN = entropy)
plot(p, entropy_p, type='l', main='Funcion entropia con dos valores posibles')
I have the following code:
S = [100 200 500 1000 10000];
H = [0.14 0.15 0.17 0.19 0.28;0.14 0.16 0.18 0.20 0.29;0.15 0.17 0.19 0.21 0.31;0.16 0.17 0.20 0.22 0.32;0.23 0.22 0.28 0.30 0.44;0.23 0.23 0.29 0.3 0.5;0.33 0.32 0.4 0.42 0.63;0.32 0.31 0.39 0.40 0.61;0.23 0.23 0.30 0.30 0.50];
for i = 1:9
hold on
plot(S, H(i,:));
legend('GHM01','GHM02','GHM03','GHM04','GHM05','GHM06','GHM07','GHM08','GHM09'); %legend not correctly
axis([100 10000 0.1 1])
end
set(gca,'xscale','log')
The x-axis looks like this:
Because The S-values are very far from each other, I used a logaritmic x-axis (and linear y-axis).
I have on the axis 5 values (see S), and I only want those 5 values visible on the x-axis with equidistant spacing between the values. How do I do this? Or is there a better alternative to display my x-axis, rather than logaritmic scale?
If you want the X-axis ticks to be equally distant although they are not (neither on a linear nor on a log scale) then you basically treat this axis as categorical, and then it should get and ordinal temporary value (say 1:5) to determine the distance between them.
Here is a quick implementation of your comment above:
S = {'100' '200' '500' '1000' '10000'};
H = [0.14 0.15 0.17 0.19 0.28;...
0.14 0.16 0.18 0.20 0.29;
0.15 0.17 0.19 0.21 0.31;
0.16 0.17 0.20 0.22 0.32;
0.23 0.22 0.28 0.30 0.44;
0.23 0.23 0.29 0.3 0.5;
0.33 0.32 0.4 0.42 0.63;
0.32 0.31 0.39 0.40 0.61;
0.23 0.23 0.30 0.30 0.50];
f = figure;
plot(1:length(S),H);
f.Children.XTick = 1:length(S);
f.Children.XTickLabel = S;
TMHO this is the most straightforward way to solve this problem ;)
Sorry possibly very silly question? Couldn't find the answer? How do I load this kind of .dat file in R and stck them in one column? I have been trying
NerveData<-as.vector(read.table("D:/Dropbox/nerve.dat", sep=" ")$value)
The data set looks like
0.21 0.03 0.05 0.11 0.59 0.06
0.18 0.55 0.37 0.09 0.14 0.19
0.02 0.14 0.09 0.05 0.15 0.23
0.15 0.08 0.24 0.16 0.06 0.11
0.15 0.09 0.03 0.21 0.02 0.14
0.24 0.29 0.16 0.07 0.07 0.04
0.02 0.15 0.12 0.26 0.15 0.33
If you want to read all the data in as a single vector, use
src <- "http://www.stat.cmu.edu/~larry/all-of-nonpar/=data/nerve.dat"
NerveData <- scan(src, numeric())
Actually I found a easier solution thanks for the initial helps
Nervedata<-read.table("nerve.dat",sep ="\t")
Nervedata2<-c(t(Nervedata))
Simply use read.table with the correct separator. Which in your case is probably \t, a tab character.
So try:
NerveData = read.table("D:/Dropbox/nerve.dat", sep="\t")
I am generating a lot of ftable() crosstabulations for a descriptive report. Example:
AUS BEL BUL EST FRA GEO GER HUN ITA NET NOR ROM RUS
30- primary 0.06 0.03 0.07 0.03 0.02 0.03 0.03 0.02 0.05 0.03 0.05 0.04 0.02
secondary 0.30 0.09 0.16 0.10 0.10 0.14 0.10 0.16 0.11 0.08 0.08 0.09 0.11
tertiary 0.05 0.07 0.04 0.05 0.07 0.06 0.02 0.04 0.02 0.05 0.06 0.02 0.09
30+ primary 0.07 0.16 0.12 0.07 0.16 0.03 0.05 0.11 0.35 0.21 0.09 0.17 0.03
secondary 0.40 0.20 0.30 0.29 0.25 0.35 0.35 0.34 0.27 0.20 0.27 0.34 0.26
tertiary 0.13 0.23 0.13 0.18 0.17 0.17 0.18 0.09 0.09 0.23 0.23 0.06 0.24
60+ primary 0.00 0.12 0.10 0.13 0.14 0.07 0.05 0.12 0.09 0.11 0.06 0.19 0.12
secondary 0.00 0.05 0.05 0.08 0.06 0.10 0.14 0.09 0.02 0.04 0.11 0.07 0.06
tertiary 0.00 0.05 0.03 0.06 0.03 0.04 0.07 0.03 0.01 0.05 0.06 0.02 0.07
I am looking for a function that could take the ftable() or table() output, and highligh values that deviate from the row-mean, or assign an overall gradient to the text of the values, e.g. from 0-100% the values are coloured from red to green.
The output is now processed through knitr, but I'm not sure at which point in the toolchain I could intervene and add colour based on the relative size of the values.
You can use the latex function, in the Hmisc package.
# Example shamelessly copied from http://www.karlin.mff.cuni.cz/~kulich/vyuka/Rdoc/harrell-R-latex.pdf
cat('
\\documentclass{article}
\\usepackage[table]{xcolor}
\\begin{document}
<<results=tex>>=
library(Hmisc)
d <- head(iris)
cellTex <- matrix(rep("", nrow(d) * ncol(d)), nrow=nrow(d))
cellTex[2,2] <- "cellcolor{red}"
cellTex[2,3] <- "color{red}"
cellTex[5,1] <- "rowcolor{yellow}"
latex(d, file = "", cellTexCmds = cellTex, rowname=NULL)
#
\\end{document}',
file="tmp.Rnw" )
Sweave("tmp.Rnw")
library(utils)
texi2pdf("tmp.tex")
To generate latex tables from R objects, you can use the xtable package. It is available on CRAN, take a look at the documentation. To get the color in the table, use the color latex package. Some example code:
library(xtable)
n = 100
cat_country = c("NL","BE","HU")
cat_prim = c("primary","secondary","tertiary")
dat = data.frame(country = sample(cat_country, n, replace = TRUE),
prim = sample(cat_prim, n, replace = TRUE))
ftable_dat = ftable(dat)
## Make latex table:
latex_table = xtable(as.table(ftable_dat))
To get what you want I made the following hack (ugly one). The trick is to print the xtable object and than edit that:
latex_table = within(latex_table, {
# browser()
primary = ifelse(primary > 12, sprintf("\\textbf{%s}", primary), primary)
#primary = sub("\\{", "{", primary)
})
printed_table = print(latex_table)
printed_table = sub("backslash", "\\", printed_table)
printed_table = sub("\\\\}", "}", printed_table)
printed_table = sub("\\\\\\{", "{", printed_table)
printed_table = sub("\\$", "\\", printed_table)
printed_table = sub("\\$", "\\", printed_table)
cat(printed_table)
Which leads to:
% latex table generated in R 2.14.1 by xtable 1.6-0 package
% Thu Feb 16 13:10:55 2012
\begin{table}[ht]
\begin{center}
\begin{tabular}{rrrr}
\hline
& primary & secondary & tertiary \\
\hline
BE & 10 & 5 & 11 \\
HU & \textbf{13} & 13 & 8 \\
NL & 11 & 17 & 12 \\
\hline
\end{tabular}
\end{center}
\end{table}
This example makes a number in the primary category bold, but it can work for colorization just as easily. Maybe someone else has a more elegant solution?