Ploting barplot differently in R - r

Suppose I have the following data frame:
> example
col1 col2 col3
1 -1 1 -1
2 0 -1 3
3 1 10 -1
and I want to plot a barplot and using row 3 as an example, I do barplot(example[3,]). This works perfectly. However, I want to flip the value and add more color -- specifically, I want:
if the value is negative (i.e., -1 in row 3), I want to flip it into +1 and color red when plotting the boxplot. (but note that there are +1 value in the row already and we don't want to color that in red)
if the value is >= +10, color the column green in the boxplot
How can I do the above?
> dput(example)
structure(list(col1 = c(-1, 0, 1), col2 = c(1, -1, 10), col3 = c(-1,
3, -1)), row.names = c(NA, 3L), class = "data.frame")

Here it is a complet solution using only base R :
df <- structure(list(col1 = c(-1, 0, 1),
col2 = c(1, -1, 10),
col3 = c(-1, 3, -1)),
row.names = c(NA, 3L),
class = "data.frame")
# apply conditions on the matrix to color your plot
# If I well understand your demand, it is to have specific color with
# respect to specific condition. Multiply by 2 is to have different factor
level
color <- (df >= 10) * 2 + (df < 0) + 1
# to swap -1 to +1 do that :
df[df < 0] <- df[df < 0] + 2
# set color as wishes
color <- matrix(c("black", "red", "green")[color], nrow = nrow(color), ncol = ncol(color), byrow = F)
# plot the vector we want
barplot(df[,3], col = color[,3])
EDIT 1
To plot the row 3 you can use this trick with transposition function t() :
barplot(t(df)[, 3], col = t(color)[, 3])

Related

Problems with setting contrasts for ANOVA in R

For testing a specific hypothesis, I am trying to contrast a factor in R.
set.seed(24)
data <- data.frame(var = sample(1:100, 70, replace = TRUE),
version = rep(c("v3", "v4", "v1", "v3", "v4","v2","v2"),times=10))
c1 <- c(1/3, -1, 1/3, 1/3)
c2 <- c(0, -1, 1, 0)
c3 <- c(0, -1, 0, 1)
c4 <- c(1, -1, 0, 0)
mat <- cbind(c1, c2, c3, c4)
contrasts(data$version, how.many = 4) <- mat
model <- aov(var ~ version, data = data)
summary.aov(model, split=list(version=list("comparison1"=1,"comparison2"= 2,
"comparison3"=3,"comparison4"= 4)))
Why is there no result for comparison 4? How can I fix that? Thanks.
We specify the how.many parameter and it should work. According to ?contrasts
how.many -How many contrasts should be made. Defaults to one less than the number of levels of x. This need not be the same as the number of columns of value.
So, it is the default behavior we observe while doing the assignment without any how.many parameter
contrasts(data$var, how.many = 4) <- mat
contrasts(data$var)
# c1 c2 c3 c4
#var1 0.3333333 0 0 1
#var2 -1.0000000 -1 -1 -1
#var3 0.3333333 1 0 0
#var4 0.3333333 0 1 0
data
set.seed(24)
data <- data.frame(var = sample(paste0("var", 1:4), 20, replace = TRUE))

Replace Value based on Value Before

I have a tibble with variables that use '99' or '999' as a conglomeration of all values above the value before. How would I change that to the value before + 2.
Example below.
level <- c(1,2,3,4,99)
variable <- c('age', 'age','age','age','age')
value <- c(.5, .75, 1, 1.25, 1.89)
d <- data.frame(Variable = variable, Level = level, value = value)
I would like to end up with
Variable Level Value
age 1 .5
age 2 .75
age 3 1
age 4 1.25
age 6 1.89
I'm not even sure where to start in picking the value before the 99 based on the condition that the starting value is 99.
Maybe
d$Level <- if(d$Level = 99, nrow(-1) + 2, d$Level)
I would use data.table::shift():
with(d, ifelse(Level %in% c(99, 999), shift(Level) + 2, Level))
[1] 1 2 3 4 6
But to do this in base R you could define a helper function:
baseShiftBy1 <- function(x) c(NA, x[-length(x)])
with(d, ifelse(Level %in% c(99, 999), baseShiftBy1(Level) + 2, Level))
Create a new vector with lagged values
temp=c(0,d$Level[1:(length(d$Level)-1)])+2
d$Level=ifelse(d$Level==99,temp,d$Level)

Achieving t random variables with each different df and ncp in R?

I'm trying to generate 5 random t variates using rt(), with each of the 5 having a particular df (respectively, from 1 to 5) and a particular ncp (respectively, seq(0, 1, l = 5)). So, 5 random t-variables each having a different df and a different ncp.
To achieve the above, I tried the below with no success. What could be the efficient R code to achieve what I described above?
vec.rt = Vectorize(function(n, df, ncp) rt(n, df, ncp), c("n", "df", "ncp"))
vec.rt(n = 5, df = 1:5, ncp = seq(0, 1, l = 5))
Or
mapply(FUN = rt, n = 5 , df = 1:5, ncp = seq(0, 1, l = 5))
Notice for:
rt(n = 5, df = 1:5, ncp = seq(0, 1, l = 5))
R gives the following warning:
Warning message:
In if (is.na(ncp)) { :
the condition has length > 1 and only the first element will be used
Rephrasing your question helps to find an answer: you want sample of length 1 (n = 1) from 5 random variables each having different parameters.
mapply(FUN = rt, n = 1 , df = 1:5, ncp = seq(0, 1, l = 5))

plot r two categorical variables

I am using below command to plot two categorical variables in R
gender has 2 levels and Income has 9 levels.
spineplot(main$Gender,main$Income, xlab="Gender", ylab="Income levels: 1 is lowest",xaxlabels=c("Male","Female"))
It produces chart like below
How can i plot this chart in color?
How can i show % of each income level within each box? for example female income level 1 has 21% of data. How can i show 21% within the dark colored area?
################update 1
Adding reproducible example
fail <- factor(c(2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1,
1, 1, 1, 2, 1, 1, 1, 1, 1,2,2,2,2),
levels = c(1, 2), labels = c("male", "female"))
gender <- factor(rep(c(1:9),3))
spineplot(fail,gender)
I think it may be easier to do this with a barplot since spineplot doesn't return anything useful.
The default would be the following, but you can adjust the widths of the bars to some other variable (you can see the x-axis coordinates are returned):
par(mfrow = 1:2)
(barplot(table(gender, fail)))
# [1] 0.7 1.9
(barplot(table(gender, fail), width = table(fail)))
# [1] 10.7 26.9
With some final touches we get
tbl <- table(gender, fail)
prp <- prop.table(tbl, 2L)
yat <- prp / 2 + apply(rbind(0, prp[-nrow(prp), ]), 2L, cumsum)
bp <- barplot(prp, width = table(fail), axes = FALSE, col = rainbow(nrow(prp)))
axis(2L, at = yat[, 1L], labels = levels(gender), lwd = 0)
axis(4L)
text(rep(bp, each = nrow(prp)), yat, sprintf('%0.f%%', prp * 100), col = 0)
Compare to
spineplot(fail, gender, col = rainbow(nlevels(gender)))
An alternative to the interesting solution of #rawr is:
fail <- factor(c(2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1,
1, 1, 1, 2, 1, 1, 1, 1, 1,2,2,2,2),
levels = c(1, 2), labels = c("male", "female"))
gender <- factor(rep(c(1:9),3))
mypalette <- colorRampPalette(c("lightblue","darkblue"))
tbl <- spineplot(fail, gender, xlab="Gender", ylab="Income levels: 1 is lowest",
xaxlabels=c("Male","Female"), col=mypalette(nlevels(gender)) )
print(tbl)
# Income levels: 1 is lowest
# Gender 1 2 3 4 5 6 7 8 9
# male 2 1 2 1 3 2 2 2 1
# female 1 2 1 2 0 1 1 1 2
print.perc <- function(k, tbl, ndigits=2, str.pct="%") {
# These lines of codes are the same used by from spineplot
# for the calculation of the x-position of the stacked bars
nx <- nrow(tbl)
off <- 0.02
xat <- c(0, cumsum(prop.table(margin.table(tbl, 1)) + off))
posx <- (xat[1L:nx] + xat[2L:(nx + 1L)] - off)/2
# Proportions by row (gender)
ptbl <- prop.table(tbl,1)
# Define labels as strings with a given format
lbl <- paste(format(round(100*ptbl[k,], ndigits), nsmall=ndigits), str.pct, sep="")
# Print labels
# cumsum(ptbl[k,])-ptbl[k,]/2 is the vector of y-positions
# for the centers of each stacked bar
text(posx[k], cumsum(ptbl[k,])-ptbl[k,]/2, lbl)
}
# Print income levels for males and females
strsPct <- c("%","%")
for (k in 1:nrow(tbl)) print.perc(k, tbl, ndigits=2, str.pct=strsPct[k])
Hope it can help you.

Moving outward a range of numbers plotted on a curve line

I was wondering how to make the numbers currently plotted on the curve line below to move a bit outward such that however a and b in my R code are changed the distance between the numbers and the curve line remain the same (i.e., constant)?
Please see my R code below the following image:
a = 0 ; b = 1
curve( dnorm(x, mean = a, sd = b ), -4, 4, axes = F, ann = F)
xx <- -4:4
yy <- dnorm(xx, mean = a, sd = b)
text(xx, yy, paste(round(yy, 2) ), font = 2 )
As other colleagues also mentioned, the calculation of this distance can be taken care of by text() itself. One of the most suitable arguments in text() for this purpose is pos. Per R documentation pos takes 4 integer values, each of which move the text in one one of the 4 main directions: see ?text. In this case, 3 produces the desired effect.
Thus, the following might resolve the problem:
a = 0 ; b = 1
curve( dnorm(x, mean = a, sd = b ), -4, 4, axes = F, ann = F)
xx <- -4:4
yy <- dnorm(xx, mean = a, sd = b)
text(xx, yy, paste(round(yy, 2) ), font = 2, pos = 3 )
a = 0
b = 1
#Draw curve
curve(dnorm(x, mean = a, sd = b ), -4, 4, axes = F, ann = F)
#Assign curve to 'cc' and determine the length of points on the curve
cc = curve(dnorm(x, mean = a, sd = b ), -4, 4, axes = F, ann = F)
l_cc = length(cc$x)
xx <- -4:4
yy <- dnorm(xx, mean = a, sd = b)
#Find indices of values in cc$x closest ot xx
slope_inds = findInterval(xx, cc$x)
#Calculate approximate slope of cc for each xx
slope = numeric(0)
for (i in 1:length(slope_inds)){
if (slope_inds[i] == 1){
n = 1
}else if (slope_inds[i] == l_cc){
n = l_cc - 1
}else{
n = slope_inds[i]
}
slope[i] = round(diff(cc$y[n:(n+1)])/diff(cc$x[n:(n+1)]), 1)
}
#Assign pos value based on slope of cc. For ~zero slope, put text on top
# For other slopes assign values accordingly
positions = integer(0)
positions[slope == 0] = 3
positions[slope > 0] = 2
positions[slope < 0] = 4
#Write text
points(xx,yy)
text(xx, yy, paste(round(yy, 2) ), font = 2, pos = positions)

Resources