An urn contains 10 balls, in which 3 are white, 4 blue and 3 black. Three balls are drawn at random from the urn. I assign this to a sample space using the following code:
require(prob)
L<-rep(c("White","Blue","Black"),times=c(3,4,3))
M<-urnsamples(L,size=3,replace=FALSE, ordered=FALSE)
N<-probspace(M)
While calculating the probability of drawing three blue balls, I get the right answer.
> Prob(N,isin(N,c("White","Black")))
[1] 0.45
But, while trying to calculate the probability for drawing two white balls and one black ball, or for one ball of each colour, i get a returned answer as 0:
> Prob(N,isrep(N,"White","Blue","Black",1,1,1))
[1] 0
> Prob(N,isrep(N,"White","Black",2,1))
[1] 0
Is there something wrong with the code? Because logically the answers are 0.3 and 0.75 respectively. And if it works with the first case, why not the second and third, since all three should have the same code
You want to be able to specify the number of times that a certain color will appear in your results.
Bear in mind that we are somewhat limited by the sample size that you set, which was 3. We can see the list of possible combinations of 3 colors and their probabilities in an easy-to-read format using noorder:
noorder(N)
X1 X2 X3 probs
1 Ash Gray Ash Gray Ash Gray 0.008333333
2 Ash Gray Ash Gray Blue 0.100000000
3 Ash Gray Blue Blue 0.150000000
4 Blue Blue Blue 0.033333333
5 Ash Gray Ash Gray Ghost White 0.075000000
6 Ash Gray Blue Ghost White 0.300000000
7 Blue Blue Ghost White 0.150000000
8 Ash Gray Ghost White Ghost White 0.075000000
9 Blue Ghost White Ghost White 0.100000000
10 Ghost White Ghost White Ghost White 0.008333333
So from that table you can see that the probability of having 3 "Ash Gray" balls for instance is 0.008333333.
If we want to find the probability of having 2 "Ghost White" balls in the sample:
Q <- noorder(N)
Prob(Q,isin(Q,c("Ghost White", "Ghost White")))
[1] 0.1833333
We can verify this answer using the table above:
> 0.100000000+0.008333333+0.075000000
[1] 0.1833333
Let's make the sample size bigger and experiment some more.
M<-urnsamples(L,size=7,replace=FALSE, ordered=FALSE)
N<-probspace(M)
Q <- noorder(N)
With a sample size of 7 the probability of 2 "Ash Gray" and 1 "Ghost White" is:
Prob(Q,isin(Q,c("Ash Gray", rep(c("Ghost White", "Ash Gray"),1))))
[1] 0.8083333
and the probability of 3 "Ash Gray" and 2 "Ghost White" is:
> Prob(Q,isin(Q,c("Ash Gray", rep(c("Ghost White", "Ash Gray"),2)))
[1] 0.1833333
Related
I want to plot my data using boxxyerror bar in gnuplot
data looks like this:
#x y fill-color border-color
2 2 0.50 1.00
8 2 0.25 0.50
8 8 0.40 0.40
2 8 0.50 0.50
column 1 gives x cordinates, column 2 gives y cordinates
column 3 gives border color and column 4 gives fill color
I will be placing squares of side 2 at all these points effectively.
Color is chosen from a palette with a range [0.0:1.0]
The plot will look something like this(this is just sample):
if I was taking only fill colour from data I will plot it using boxxyerrorbar as
plot "data.txt" u 1:2:(1.0):(1.0):3 with boxxyerrorbar fs solid lc palette
if I was taking only border colour from data I will plot it using boxxyerrorbar as
plot "data.txt" u 1:2:(1.0):(1.0):4 with boxxyerrorbar fs border lc palette
using palette for both is also fine:
plot "data.txt" u 1:2:(1.0):(1.0):4 with boxxyerrorbar fs border lc palette fc palette
But we are using column 4 for both colours, but I need them to be read from columns 3 and 4 separately. Or at least I would like to have something like one colour is a function of the same column data using for palette, like:
plot "data.txt" u 1:2:(1.0):(1.0):4 with boxxyerrorbar fs border lc (column(4)*0.5) fc palette
More generally, I want to use numbers in some columns to be used for determining the colour of border or fill color in my plot, I could be able to do something like:
plot "data.txt" u 1:2:(1.0):(1.0):4 with boxxyerrorbar fs border lc (2*(column 5)-0.60) fc (palette * 0.24)
I need some kind of solution to this problem please help me.
Finally, this Is the plot I want to make(but basic requirement covers in above question). This is a phase diagram, grey borders are data I have, red borders means it's extrapolated!
It was done by plotting two files which store fill color and border color separately(Inspired by #theozh 's answer)-
The first "quick and dirty" solution which comes to my mind is to plot the boxes twice:
once will will and variable color fill and the second time empty for the border with other variable color.
Plotting twice should work the same with a palette.
Hopefully there will be a better solution for this.
Script:
### different colors for fill and border from datafile
reset session
$Data <<EOD
#x y fill-color border-color
2 2 0x0000ff 0x000000
8 2 0x00ff00 0xffaa00
8 8 0xffffff 0xff0000
2 8 0xffff00 0x0000ff
EOD
set size square
set offsets 1,1,1,1
set style fill solid 1.0 border lc rgb var
set key noautotitle
plot $Data u 1:2:(1.0):(1.0):3 w boxxy fs solid lc rgb var, \
'' u 1:2:(1.0):(1.0):4 w boxxy fs empty lc rgb var lw 2
### end of script
Result:
Addition:
Based on the comments and modified question, here is another suggestion.
My understanding is that you want to have only one number to define the colors for fill and border of a square. In your example you only have 2 colors for fill and 2 colors for border, which makes 4 possible combinations in total.
The example below expands this to 16 colors each for border and fill, which makes it 256 combinations.
Check the table $myColors and select a color for the fill and a color for the border. Add the corresponding numbers and put the result into your datafile as third column.
For example:
black fill=0 + red border=16 --> 16
green fill=2 + grey border=128 --> 130
yellow fill=4 + blue border=48 --> 52
You can easily change the colors or expand it accordingly to many more combinations. By the way, you can find predefined gnuplot colors by typing show colors or here as an overview.
Furthermore, I noticed that plotting borders of adjacent squares will "overwrite" each other in the graph, i.e. visibility of a border depends on the plotting sequence. In order to be independent of this, I would suggest to plot larger and smaller filled squares instead of bordered squares.
Another suggestion: instead of going for logscale (making the size for your boxes difficult) you can stay in linear scale and simply adjust the tic-labels via xtic() and ytic(), check help xticlabels.
All these are just suggestions, since I might not know all the background and details for your specific plotting case.
Script: (only 5x5 table for illustration)
### define fill and border by a single number
reset session
$Data <<EOD
# x y colors
1 1 128
1 2 128
1 3 128
1 4 16
1 5 52
2 1 128
2 2 128
2 3 128
2 4 16
2 5 16
3 1 130
3 2 128
3 3 128
3 4 16
3 5 241
4 1 130
4 2 130
4 3 128
4 4 16
4 5 16
5 1 18
5 2 18
5 3 130
5 4 16
5 5 16
EOD
# color fill border name
$myColors <<EOD
0x000000 0 0 black
0xff0000 1 16 red
0x00ff00 2 32 green
0x0000ff 3 48 blue
0xffff00 4 64 yellow
0xffa500 5 80 orange
0xff00ff 6 96 magenta
0x00ffff 7 112 cyan
0xc0c0c0 8 128 grey
0xd3d3d3 9 144 light-grey
0xa0a0a0 10 160 dark-grey
0xc080ff 11 176 purple
0x00c000 12 192 web-green
0x0080ff 13 208 web-blue
0xffb6c1 14 224 light-pink
0xffffff 15 240 white
EOD
fillColor(n) = int(word($myColors[int(n)%16+1],1))
borderColor(n) = int(word($myColors[(int(n)/16)%16+1],1)) # integer division!
set size square
set xrange [:] noextend
set yrange [:] noextend
set tics out
set style fill solid 1.0 noborder
set key noautotitle
plot $Data u 1:2:(0.50):(0.50):(borderColor($3)): \
xtic(sprintf("%g",2**($1+1))):ytic(sprintf("%g",2**($2+1))) \
w boxxy lc rgb var, \
'' u 1:2:(0.46):(0.46):(fillColor($3)) w boxxy lc rgb var
### end of script
Result:
I got a temporary solution to the actual problem I had.
I defined my palette as
set palette model RGB defined ( 0 'black', 0.3 'red', 0.6 'grey', 1 'green')
stored my border colors in 'border.dat' and fill color in 'fill.dat'
and plotted them like this:
plot for [i=1:13] "fill.dat" u 1:(2**(i+1)):($1/2.0**0.5):($1*2.0**0.5):(2**(i+1)/2.0**0.5):(2**(i+1)*2.0**0.5):i+1 with boxxyerror fc palette,\
for [i=1:13] "border.dat" u 1:(2**(i+1)):($1/2.0**0.5):($1*2.0**0.5):(2**(i+1)/2.0**0.5):(2**(i+1)*2.0**0.5):i+1 with boxxyerror fs empty lc palette lw 2
( sorry, cordinates and box dimensions will appear messy because I am plotting on log scale)
and Finally got this! (red borders means it is extrapolated[which I don't have data for])
I am trying to find a way to use apply function along with subset (or custom function based on subset). I know similar questions has already been asked, mine is little bit more specific. I need to subset certain part of multiple data sets based on more than one variables. I have couple "types" of data frame structures, one of them looks similar to this:
colour shade value
RED LIGHT -1.05
RED LIGHT -1.37
RED LIGHT -0.32
RED LIGHT 0.87
RED LIGHT -0.2
RED DARK 0.52
RED DARK -0.2
RED DARK 0.64
RED DARK 1.12
RED DARK 4
BLUE LIGHT 0.93
BLUE LIGHT 0.78
BLUE LIGHT -1.84
BLUE LIGHT -0.5
BLUE LIGHT -1.11
BLUE DARK -4.86
BLUE DARK 1.11
BLUE DARK 0.14
BLUE DARK 0.12
BLUE DARK -1.65
GREEN LIGHT 3.13
GREEN LIGHT 2.65
GREEN LIGHT -2.36
GREEN LIGHT -3.11
GREEN LIGHT 3.49
GREEN DARK 1.91
GREEN DARK -1.1
GREEN DARK -1.93
GREEN DARK 1
GREEN DARK -0.23
I have lot of those. They names are stored in
list.dfs.names=df1,df2,df3
Based on this I need to use subset or custom function based on it:
customSubset=function(df,col,shade){subset(df,df$colour %in% col & df$shade %in% shade)}
I use custom functions like this because as I said I have couple types of df structures and it speeds up my work a little bit. It works like this:
example=customSubset(df1,"BLUE","DARK")
and output is:
colour shade value
11 BLUE LIGHT 0.93
12 BLUE LIGHT 0.78
13 BLUE LIGHT -1.84
14 BLUE LIGHT -0.50
15 BLUE LIGHT -1.11
16 BLUE DARK -4.86
17 BLUE DARK 1.11
18 BLUE DARK 0.14
19 BLUE DARK 0.12
20 BLUE DARK -1.65
Till now I was using for loops but I want to change my approach to apply which seems to be more convenient especially where nesting loops is required. So I tired:
lapply(customSubset(list.dfs.names, "BLUE","DARK") )
and
lapply(list.dfs.names, customSubset("BLUE","DARK") )
with no success. Could anyone give mi little hand on this issue, I dont think I clearly understand how apply loops works. However I am quite familiar with for method so any additional explanation about differences would be appreciated.
If it is not possible with customSubset its ok for me to use regular subset or any other method that produces same result as example presented above.
Thank you in advance
EDIT: here is code to produce similar df to example i posted:
`data.frame("colour"=(c(rep("RED",10),rep("BLUE",10),rep("GREEN",10)))
,"shade"=c(rep(c(rep("LIGHT",5),rep("DARK",5)),3))
, runif(30,min=0,max=1))`
EDIT2:As requested I am editing my post to expand it on my year problem. My dfs comes from different years (multiple from each) for example like this: df.1.2012, df.2.2012,df.1.2011 and so on. The main issue is that I never need to refer to same year in all of dfs (it would be very easy then) instead I need to subset data based on certain horizon (example: year+2 or year-1). I used to create list of desired years (example with year+2 it would be list.year=c(2014,2014,2013)) which was paired with list of my dfs (that how it worked with for loop).
I need to find similar method for apply approach. Here is example:
set.seed(200)
df_2014=data.frame(colour=(c(rep("RED",10),rep("BLUE",10),rep("GREEN",10)))
,shade=c(rep(c(rep("LIGHT",5),rep("DARK",5)),3))
,year=c(rep(2011:2015,6))
,value=runif(30,min=0,max=1))
df_2013=data.frame(colour=(c(rep("RED",10),rep("BLUE",10),rep("GREEN",10)))
,shade=c(rep(c(rep("LIGHT",5),rep("DARK",5)),3))
,year=c(rep(2011:2015,6))
,value=runif(30,min=0,max=1))
horizon=+1
subset(df_2014, df_2014$colour %in% "BLUE" & df_2014$shade %in% "DARK" & df_2014$year %in% c(2014+horizon))
subset(df_2013, df_2013$colour %in% "BLUE" & df_2013$shade %in% "DARK" & df_2013$year %in% c(2013+horizon))
So i added column with years and i called it year and named dfs after year (so year+1 would be here 2014+1). Horizon is self explanatory. Result is:
#df_2014
colour shade year value
20 BLUE DARK 2015 0.6463296
#df_2013
colour shade year value
20 BLUE DARK 2015 0.6532767
I need to use apply function to list of data frames (in this edit list.df=list(df_2014,df_2013) as in previous example but this time add subset condition year+horizon (and possible puts all result in one df, but this is not main issue here).
In conclusion: when you look at both my subset function in this part in year+horizon, year has to change based on which df(from list) in loop it refers (while horizon is constant).
If you have trouble understanding what I mean please let me know, I tried to be very specific.
The problem seems to be the construct
subset(df,df$colour %in% col & df$shade %in% shade)
You are using subset, that evaluates the logical expression in the environment of its first argument, df, and then doing df$shade %in% shade. This is equivalent to shade %in% shade, since the df is the first argument. You should rewrite the function as follows, to use different names will do the trick.
customSubset <- function(DF, COL, SHADE){
subset(DF, colour %in% COL & shade %in% SHADE)
}
Now everything works as expected.
set.seed(5601) # make the results reproducible
df1 <- data.frame(colour = sample(c("RED", "GREEN", "BLUE"), 30, TRUE),
shade = sample(c("LIGHT", "DARK"), 30, TRUE),
value = rnorm(30, sd = 9))
df2 <- data.frame(colour = c(rep("RED",10), rep("BLUE",10), rep("GREEN",10))
,shade=c(rep(c(rep("LIGHT",5),rep("DARK",5)), 3))
, value = runif(30,min=0,max=1))
list.dfs <- list(df1, df2)
customSubset(df1,"BLUE","DARK")
# colour shade value
#5 BLUE DARK 4.288107
#6 BLUE DARK 2.860724
#8 BLUE DARK -10.720379
#10 BLUE DARK -15.407090
#14 BLUE DARK -2.259848
#30 BLUE DARK -18.364494
# apply the function to all df's in the list
# both forms are equivalent
lapply(list.dfs, function(x) customSubset(x, "BLUE", "DARK"))
lapply(list.dfs, customSubset, "BLUE", "DARK")
I have a 2 x 2 x 2 design, where data(dat) looks like this.
bucket size level marbles distance
Blue Large Low 80 30
Blue Large High 9 33
Blue Small Low 91 1
Blue Small High 2 11
White Large Low 80 21
White Large High 9 13
White Small Low 91 52
White Small High 2 17
I want to plot the relationship between and use the following code:
ggplot(dat,aes(x = marbles, y = distance, colour=size, shape=level)) +
geom_point() + geom_smooth(method="lm", fill=NA) + facet_grid(~bucket)
In addition to having different colours for size, I want to have different colours and shape for the levels. However, I'm not sure how this can be achieved using this code. Also, is there a way to add 95% confidence intervals to these for the corresponding treatments
I am plotting two undesireable statistics
Columns: AGG(%SEP11)
Row: AGG(%Outdated_Defs)
This is how my graph looks. Only the points that have 50% or more installations of SEP 11 are red, even if they have high % of outdated defs.
I wish to make is such that sites with high % of outdated defs are also red, i.e.
In other words, only bottom left side of scatterplot should have green dots, remaining should have shades of red where top right quadrant had the most deep red dots.
Please help!
One option is to create a calculated field called bad_color:
IF AGG(%SEP11) >= 0.5 OR AGG(%Outdated_Defs) >= 0.5 THEN 1 ELSE 0 END
Then drag bad_color to the color field. Doubleclick on the color field and select red for 1 and green for 0.
I have a need that I imagine could be satisfied by aggregate or reshape, but I can't quite figure out.
I have a list of names with the color of the car that own each person. This data is in long form, so names can have multiple colours. I'd like to fuse by the name and get the max colour.
For example,
Name car_colour
Euler blue
Gauss red
Hilbert white
Hilbert green
Knuth yellow
Knuth orange
Knuth cyan
Knuth violet
Knuth darkblue
Would become...
Name car_color
Euler blue
Gauss red
Hilbert green
Knuth cyan
How would I accomplish this?
Sorry guys but the answer was very simple:
> Name=c('Euler','Gauss','Hilbert','Hilbert','Knuth','Knuth','Knuth','Knuth','Knuth')
> car_colour=c('blue','red','white','green','yellow','orange','cyan','violet','darkblue')
> nc=as.data.frame(cbind(Name,car_colour))
> nc
Name car_colour
1 Euler blue
2 Gauss red
3 Hilbert white
4 Hilbert green
5 Knuth yellow
6 Knuth orange
7 Knuth cyan
8 Knuth violet
9 Knuth darkblue
> nc.agg <- aggregate( as.character(car_colour) ~ Name, nc, FUN = "min")
> nc.agg
Name as.character(car_colour)
1 Euler blue
2 Gauss red
3 Hilbert green
4 Knuth cyan
Quick, super awful method!
I made an example called test.
test
letter color
[1,] "a" "blue"
[2,] "a" "red"
[3,] "a" "red"
[4,] "b" "orange"
[5,] "c" "green"
testTable=table(test[,1], test[,2])
testNames=colnames(testTable)[apply(testTable,1,which.max)]
testOut=data.frame(letter=unique(test[,1]), color=testNames)
I'm thinking someone else might have a less silly way to get to the same answer. I'll happily upvote someone who one-liners it!