I have a tibble with variables that use '99' or '999' as a conglomeration of all values above the value before. How would I change that to the value before + 2.
Example below.
level <- c(1,2,3,4,99)
variable <- c('age', 'age','age','age','age')
value <- c(.5, .75, 1, 1.25, 1.89)
d <- data.frame(Variable = variable, Level = level, value = value)
I would like to end up with
Variable Level Value
age 1 .5
age 2 .75
age 3 1
age 4 1.25
age 6 1.89
I'm not even sure where to start in picking the value before the 99 based on the condition that the starting value is 99.
Maybe
d$Level <- if(d$Level = 99, nrow(-1) + 2, d$Level)
I would use data.table::shift():
with(d, ifelse(Level %in% c(99, 999), shift(Level) + 2, Level))
[1] 1 2 3 4 6
But to do this in base R you could define a helper function:
baseShiftBy1 <- function(x) c(NA, x[-length(x)])
with(d, ifelse(Level %in% c(99, 999), baseShiftBy1(Level) + 2, Level))
Create a new vector with lagged values
temp=c(0,d$Level[1:(length(d$Level)-1)])+2
d$Level=ifelse(d$Level==99,temp,d$Level)
Related
Suppose I have the following data frame:
> example
col1 col2 col3
1 -1 1 -1
2 0 -1 3
3 1 10 -1
and I want to plot a barplot and using row 3 as an example, I do barplot(example[3,]). This works perfectly. However, I want to flip the value and add more color -- specifically, I want:
if the value is negative (i.e., -1 in row 3), I want to flip it into +1 and color red when plotting the boxplot. (but note that there are +1 value in the row already and we don't want to color that in red)
if the value is >= +10, color the column green in the boxplot
How can I do the above?
> dput(example)
structure(list(col1 = c(-1, 0, 1), col2 = c(1, -1, 10), col3 = c(-1,
3, -1)), row.names = c(NA, 3L), class = "data.frame")
Here it is a complet solution using only base R :
df <- structure(list(col1 = c(-1, 0, 1),
col2 = c(1, -1, 10),
col3 = c(-1, 3, -1)),
row.names = c(NA, 3L),
class = "data.frame")
# apply conditions on the matrix to color your plot
# If I well understand your demand, it is to have specific color with
# respect to specific condition. Multiply by 2 is to have different factor
level
color <- (df >= 10) * 2 + (df < 0) + 1
# to swap -1 to +1 do that :
df[df < 0] <- df[df < 0] + 2
# set color as wishes
color <- matrix(c("black", "red", "green")[color], nrow = nrow(color), ncol = ncol(color), byrow = F)
# plot the vector we want
barplot(df[,3], col = color[,3])
EDIT 1
To plot the row 3 you can use this trick with transposition function t() :
barplot(t(df)[, 3], col = t(color)[, 3])
I have the following vector
vec1 = c(0.001, 0.05, 0.003, 0.1)
and a data frame
df = data_frame( x = seq(0.001, 0.1, length.out = 10), y = seq(0.03, 0.07, length.out = 10), z = seq(0, 0.005, length.out = 10), w = seq(0.05, 0.25, length.out = 10))
I would like to filter df such that the output would contain the rows of df for which, in each column, the minimum value would be the corresponding value of vec1 - 0.05, and the maximum would be vec1 + 0.05.
So in this example, only the first 4 rows satisfy this condition (in x I allow -0.049 to 0.501 based on the first entry of vec1, in y I allow 0 to 0.1 based on the second entry, and so on).
I am sure this can be done with filter_all and (.), something along the lines of
filter_all(df, all_vars(. >= (vec1(.) - 0.05) & . <= (vec1(.) + 0.05))))
But this doesn't work.
What am I doing wrong?
We can use mapply on the dataframe and pass it along with vec1 and check which of the values satisfy the criteria and select only those rows where all of the columns have TRUE value in it.
df[rowSums(mapply(function(x, y) x > (y-0.05) & x < (y+0.05),
df, vec1)) == ncol(df), ]
# x y z w
# <dbl> <dbl> <dbl> <dbl>
#1 0.0120 0.0344 0.000556 0.0722
#2 0.0230 0.0389 0.00111 0.0944
#3 0.0340 0.0433 0.00167 0.117
#4 0.0450 0.0478 0.00222 0.139
I'm trying to utilize the uniroot function inside a piping scheme. I have root data by depth, and I fit a model for each crop-year set and put the fitted parameter (A in this example) into a tibble. A simplified dataset is below:
mydat <- tribble(
~crop, ~year, ~A,
"corn", 2011, 4,
"corn", 2012, 8.5,
"soy", 2011, 4.2
)
I want to add a column that tells me the x value of my function at y = 0.5. The following code works as a stand-alone.
myfunc <- function(x, y, A) {2 + A * x - y}
uniroot(myfunc, y = 0.5, A = 4, lower = 0, upper = 10, extendInt = "yes")
If I try to put it into a piping scheme using dplyr's mutate or do, it doesn't work.
mydat %>%
mutate(x50 = uniroot(myfunc, y = 0.5, A = .$A, lower = 0, upper = 10,
extendInt = "yes"))
mydat %>%
do(x50 = uniroot(myfunc, y = 0.5, A = .$A, lower = 0, upper = 10,
extendInt = "yes"))
The uniroot function is not vectorised over its arguments. Functions like sqrt are:
> sqrt(c(1,2,3))
[1] 1.000000 1.414214 1.732051
but uniroot isnt:
> uniroot(myfunc, y = 0.5, A = c(1,2,3), lower = 0, upper = 10, extendInt = "yes")
Error in uniroot(myfunc, y = 0.5, A = c(1, 2, 3), lower = 0, upper = 10, :
did not succeed extending the interval endpoints for f(lower) * f(upper) <= 0
In addition: Warning messages:
1: In if (is.na(f.lower)) stop("f.lower = f(lower) is NA") :
the condition has length > 1 and only the first element will be used
2: In if (is.na(f.upper)) stop("f.upper = f(upper) is NA") :
the condition has length > 1 and only the first element will be used
and mutate relies on having vectorised computation.
Use lapply to iterate over any vector and call a function like this:
> lapply(mydat$A, function(a){uniroot(myfunc, y = 0.5, A = a, lower = 0, upper = 10, extendInt = "yes")$root})
[[1]]
[1] -0.375
[[2]]
[1] -0.1764706
[[3]]
[1] -0.3571429
Then use standard R functions to put that data back in your data frame if that's where you want it.
You could use purrr::map to build a list column with the results (coercing it to a data.frame), then tidyr::unnest to spread it out into columns...
library(tibble)
library(dplyr)
library(purrr)
library(tidyr)
mydat <- tribble(
~crop, ~year, ~A,
"corn", 2011, 4,
"corn", 2012, 8.5,
"soy", 2011, 4.2
)
myfunc <- function(x, y, A) {2 + A * x - y}
mydat %>%
mutate(x50 = map(A, function(x) {
as.data.frame(uniroot(myfunc, y = 0.5, A = x, lower = 0, upper = 10,
extendInt = "yes"))
})) %>%
unnest()
# # A tibble: 3 x 8
# crop year A root f.root iter init.it estim.prec
# <chr> <dbl> <dbl> <dbl> <dbl> <int> <int> <dbl>
# 1 corn 2011. 4.00 -0.375 0. 20 19 52439.
# 2 corn 2012. 8.50 -0.176 2.22e-16 20 18 0.0000610
# 3 soy 2011. 4.20 -0.357 2.22e-16 21 19 0.0000610
The solution with dplyr is
data |>
rowwise() |>
mutate(var_name = uniroot(f, c(lower_limit, upper_limit), vars_from_data)$root)
I am using below command to plot two categorical variables in R
gender has 2 levels and Income has 9 levels.
spineplot(main$Gender,main$Income, xlab="Gender", ylab="Income levels: 1 is lowest",xaxlabels=c("Male","Female"))
It produces chart like below
How can i plot this chart in color?
How can i show % of each income level within each box? for example female income level 1 has 21% of data. How can i show 21% within the dark colored area?
################update 1
Adding reproducible example
fail <- factor(c(2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1,
1, 1, 1, 2, 1, 1, 1, 1, 1,2,2,2,2),
levels = c(1, 2), labels = c("male", "female"))
gender <- factor(rep(c(1:9),3))
spineplot(fail,gender)
I think it may be easier to do this with a barplot since spineplot doesn't return anything useful.
The default would be the following, but you can adjust the widths of the bars to some other variable (you can see the x-axis coordinates are returned):
par(mfrow = 1:2)
(barplot(table(gender, fail)))
# [1] 0.7 1.9
(barplot(table(gender, fail), width = table(fail)))
# [1] 10.7 26.9
With some final touches we get
tbl <- table(gender, fail)
prp <- prop.table(tbl, 2L)
yat <- prp / 2 + apply(rbind(0, prp[-nrow(prp), ]), 2L, cumsum)
bp <- barplot(prp, width = table(fail), axes = FALSE, col = rainbow(nrow(prp)))
axis(2L, at = yat[, 1L], labels = levels(gender), lwd = 0)
axis(4L)
text(rep(bp, each = nrow(prp)), yat, sprintf('%0.f%%', prp * 100), col = 0)
Compare to
spineplot(fail, gender, col = rainbow(nlevels(gender)))
An alternative to the interesting solution of #rawr is:
fail <- factor(c(2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1,
1, 1, 1, 2, 1, 1, 1, 1, 1,2,2,2,2),
levels = c(1, 2), labels = c("male", "female"))
gender <- factor(rep(c(1:9),3))
mypalette <- colorRampPalette(c("lightblue","darkblue"))
tbl <- spineplot(fail, gender, xlab="Gender", ylab="Income levels: 1 is lowest",
xaxlabels=c("Male","Female"), col=mypalette(nlevels(gender)) )
print(tbl)
# Income levels: 1 is lowest
# Gender 1 2 3 4 5 6 7 8 9
# male 2 1 2 1 3 2 2 2 1
# female 1 2 1 2 0 1 1 1 2
print.perc <- function(k, tbl, ndigits=2, str.pct="%") {
# These lines of codes are the same used by from spineplot
# for the calculation of the x-position of the stacked bars
nx <- nrow(tbl)
off <- 0.02
xat <- c(0, cumsum(prop.table(margin.table(tbl, 1)) + off))
posx <- (xat[1L:nx] + xat[2L:(nx + 1L)] - off)/2
# Proportions by row (gender)
ptbl <- prop.table(tbl,1)
# Define labels as strings with a given format
lbl <- paste(format(round(100*ptbl[k,], ndigits), nsmall=ndigits), str.pct, sep="")
# Print labels
# cumsum(ptbl[k,])-ptbl[k,]/2 is the vector of y-positions
# for the centers of each stacked bar
text(posx[k], cumsum(ptbl[k,])-ptbl[k,]/2, lbl)
}
# Print income levels for males and females
strsPct <- c("%","%")
for (k in 1:nrow(tbl)) print.perc(k, tbl, ndigits=2, str.pct=strsPct[k])
Hope it can help you.
How can i apply the following function rt on each and every value l in df.
x and y have the following values.
x<-9
y<-1
rt<-function(x,y,l) min(x,max(0,l-y))
df
a b c
5 6 7
1 4 1
2 4 3
Probably simplest if you'd like to stick with dataframes is to use apply with the MARGIN parameter set to c(1,2), which makes it apply the function by both rows and columns (i.e., to every cell).
x <- 9
y <- 1
rt <- function(x, y, l) min(x, max(0, l-y))
df <- data.frame(a = c(5, 1, 2),
b = c(6, 4, 4),
c = c(7, 1, 3))
rt_df <- as.data.frame(apply(df, c(1,2), rt, x = x, y = y))