Round conditionally numbers in R - r

How can I round conditionally the values of a column in a dataframe in R? I need to round to the lower 10 from 0-89 and not from 90-100. For example:
ID value
A 15
B 47
C 91
D 92
has to be changed to
ID value
A 10
B 40
C 91
D 92
so, no changes for C/D and A/B rounded down
Any ideas?
Thanks

You can do it like this:
df$value[df$value < 90] <- floor(df$value[df$value < 90] / 10) * 10
# ID value
# 1 A 10
# 2 B 40
# 3 C 91
# 4 D 92
As a reminder, here is your data:
df <- structure(list(ID = c("A", "B", "C", "D"), value = c(15L, 47L,
91L, 92L)), .Names = c("ID", "value"), class = "data.frame", row.names = c(NA,
-4L))
Other solution using data.table:
library(data.table)
setDT(df)[, value:= as.numeric(value)][value<90, value:= floor(value/10) * 10]
# ID value
# 1: A 10
# 2: B 40
# 3: C 91
# 4: D 92

You could do:
df$value <- with(df, ifelse(value %in% c(0:89), value-(value%%10), value))

Related

Sort a data frame based on another sorted column value in R

I have a data frame that is sorted based on one column(numeric column) to assign the rank. if this column value is zero then arrange the data frame based on another character column for those rows which have zero as a value in a numeric column.
But to give rank I have to consider var2 that is the reason I sorted based on var2, if there is any identical values in var2 for those rows I have to consider var3 to give rank. please see the data frame 2 and 3 rows, var2 values are identical in that case i have to consider var3 to give rank. In case var2 is zero i have to sort the var1 column(character column) in alphabetical order and give rank. if var2 is NA no rank. please refer the data frame given below.
Below, the data frame is sorted based on var2 column descending order, but var2 contains zero also if var2 is zero I have to sort the data frame based on var1 for the rows which are having zero in var2. I need sort by var1 for those rows which are having var2 as zero and followed by NA in alphabetical order of var1.
example:
# var1 var2 var3 rank
# 1 c 556 45 1
# 2 a 345 35 3
# 3 f 345 64 2
# 4 b 134 87 4
# 5 z 0 34 5
# 6 d 0 32 6
# 7 c 0 12 7
# 8 a 0 23 8
# 9 e NA
# 10 b NA
below is my code
df <- data.frame(var1=c("c","a","f","b","z","d", "c","a", "e", "b", "ad", "gf", "kg", "ts", "mp"), var2=c(134, NA,345, 200, 556,NA, 345, 200, 150, 0, 25,10,0,150,0), var3=c(65,'',45,34,68,'',73,12,35,23,34,56,56,78,123))
# To break the tie between var3 and var2
orderdf <- df[order(df$var2, df$var1, decreasing = TRUE), ]
#assigning rank
rankdf <- orderdf %>% mutate(rank = ifelse(is.na(var2),'', seq(1:nrow(orderdf))))
expected output is sort the var1 in alphabetical order if var2 value is zero(for those rows with var2 value is zero)
expected output:
# var1 var2 var3 rank
# 1 c 556 45 1
# 2 a 345 35 3
# 3 f 345 64 2
# 4 b 134 87 4
# 5 a 0 34 5
# 6 c 0 32 6
# 7 d 0 12 7
# 8 z 0 23 8
# 9 b NA
# 10 e NA
With dplyr you can use
df %>%
arrange(desc(var2), var1)
and afterwards you create the column rank
EDIT
The following code is a bit cumbersome but it gets the job done. Basically it orders the rows in which var2 is equal or different from zero separately, then combines the two ordered dataframes together and finally creates the rank column.
Data
df <- data.frame(
var1 = c("c","a","f","b","z","d", "c","a", "e", "z", "ad", "gf", "kg", "ts", "mp"),
var2 = c(134, NA,345, 200, 556,NA, 345, 200, 150, 0, 25,10,0,150,0),
var3 = as.numeric(c(65,'',45,34,68,'',73,12,35,23,34,56,56,78,123))
)
df
# var1 var2 var3
# 1 c 134 65
# 2 a NA NA
# 3 f 345 45
# 4 b 200 34
# 5 z 556 68
# 6 d NA NA
# 7 c 345 73
# 8 a 200 12
# 9 e 150 35
# 10 z 0 23
# 11 ad 25 34
# 12 gf 10 56
# 13 kg 0 56
# 14 ts 150 78
# 15 mp 0 123
Code
df %>%
# work on rows with var2 different from 0 or NA
filter(var2 != 0) %>%
arrange(desc(var2), desc(var3)) %>%
# merge with rows with var2 equal to 0 or NA
bind_rows(df %>% filter(var2 == 0 | is.na(var2)) %>% arrange(var1)) %>%
arrange(desc(var2)) %>%
# create the rank column only for the rows with var2 different from NA
mutate(
rank = seq_len(nrow(df)),
rank = ifelse(is.na(var2), NA, rank)
)
Output
# var1 var2 var3 rank
# 1 z 556 68 1
# 2 c 345 73 2
# 3 f 345 45 3
# 4 b 200 34 4
# 5 a 200 12 5
# 6 ts 150 78 6
# 7 e 150 35 7
# 8 c 134 65 8
# 9 ad 25 34 9
# 10 gf 10 56 10
# 11 kg 0 56 11
# 12 mp 0 123 12
# 13 z 0 23 13
# 14 a NA NA NA
# 15 d NA NA NA
Using only base R's order() function, sort first on descending order of var2 then ascending order of var1 to sort the data by passing the subsequent integer vector to square braces
df[order(-df$var2, df$var1), ]
Adding a rank column too is then just
df[order(-df$var2, df$var1), "rank"] <- 1:length(df$var1)
Using data.table
library(data.table)
setDT(df)[order(-var2, var1)][, rank := seq_len(.N)][]
data
df <- structure(list(var1 = structure(c(3L, 1L, 6L, 2L, 7L, 4L, 3L,
1L, 5L, 2L), .Label = c("a", "b", "c", "d", "e", "f", "z"), class = "factor"),
var2 = c(1456L, 456L, 345L, 134L, 0L, 0L, 0L, 0L, NA, NA)),
class = "data.frame", row.names = c(NA, -10L))
You can do it in base R, using order :
cols <- c('var1', 'var2')
remaining_cols <- setdiff(names(df), cols)
df1 <- df[cols]
cbind(transform(df1[with(df1, order(-var2, var1)), ],
rank = seq_len(nrow(df1))), df[remaining_cols])
# var1 var2 rank var3
#1 c 556 1 45
#2 a 345 2 35
#3 f 345 3 64
#4 b 134 4 87
#8 a 0 5 34
#7 c 0 6 32
#6 d 0 7 12
#5 z 0 8 23
#10 b NA 9 10
#9 e NA 10 11
data
df <- structure(list(var1 = structure(c(3L, 1L, 6L, 2L, 7L, 4L, 3L,
1L, 5L, 2L), .Label = c("a", "b", "c", "d", "e", "f", "z"), class = "factor"),
var2 = c(556L, 345L, 345L, 134L, 0L, 0L, 0L, 0L, NA, NA),
var3 = c(45L, 35L, 64L, 87L, 34L, 32L, 12L, 23L, 10L, 11L
)), class = "data.frame", row.names = c(NA, -10L))

Add the index of list to bind_rows?

I have this data:
dat=list(structure(list(Group.1 = structure(3:4, .Label = c("A","B", "C", "D", "E", "F"), class = "factor"), Pr1 = c(65, 75)), row.names = c(NA, -2L), class = "data.frame"),NULL, structure(list( Group.1 = structure(3:4, .Label = c("A","B", "C", "D", "E", "F"), class = "factor"), Pr1 = c(81,4)), row.names = c(NA,-2L), class = "data.frame"))
I want to use combine using bind_rows(dat) but keeping the index number as a varaible
Output Include Type([[1]] and [[3]])
type Group.1 Pr1
1 1 C 65
2 1 D 75
3 3 C 81
4 3 D 4
data.table solution
use rbindlist() from the data.table-package, which had built-in id-support that respects NULL df's.
library(data.table)
rbindlist( dat, idcol = TRUE )
.id Group.1 Pr1
1: 1 C 65
2: 1 D 75
3: 3 C 81
4: 3 D 4
dplyr - partly solution
bind_rows also has ID-support, but it 'skips' empty elements...
bind_rows( dat, .id = "id" )
id Group.1 Pr1
1 1 C 65
2 1 D 75
3 2 C 81
4 2 D 4
Note that the ID of the third element from dat becomes 2, and not 3.
According to the documentation of bind_rows() you can supply the name for .id argument of the function. When you apply bind_rows() to the list of data.frames the names of the list containing your data.frames are assigned to the identifier column. [EDIT] But there is a problem mentioned by #Wimpel:
names(dat)
NULL
However, supplying the names to the list will do the thing:
names(dat) <- 1:length(dat)
names(dat)
[1] "1" "2" "3"
bind_rows(dat, .id = "type")
type Group.1 Pr1
1 1 C 65
2 1 D 75
3 3 C 81
4 3 D 4
Or in one line, if you prefer:
bind_rows(setNames(dat, seq_along(dat)), .id = "type")

R data.table conditional sum by row

> tempDT <- data.table(colA = c("E","E","A","C","E","C","E","C","E"), colB = c(20,30,40,30,30,40,30,20,10), group = c(1,1,1,1,2,2,2,2,2), want = c(NA, 30, 40, 70,NA,40,70,20,30))
> tempDT
colA colB group want
1: E 20 1 NA
2: E 30 1 30
3: A 40 1 40
4: C 30 1 70
5: E 30 2 NA
6: C 40 2 40
7: E 30 2 70
8: C 20 2 20
9: E 10 2 30
I have columns 'colA' 'colB' 'group': within each 'group', I would like to sum up 'colB' from bottom up until 'colA' is 'E'.
Based on the expected 'want', we create a run-length-id column 'grp' by checking if the value is 'E' in 'colA', then create 'want1' as the cumulative sum of 'colB' after grouping by 'grp' and 'group', get the row index ('i1') of elements that are duplicated in 'colA' and also is 'E' and assign the 'colB' values to 'want1'
tempDT[, grp:= rleid(colA=="E") * (colA != "E")
][grp!= 0, want1 := cumsum(colB), .(grp, group)]
i1 <- tempDT[, .I[colA=="E" & duplicated(colA)], group]$V1
tempDT[i1, want1 := colB][, grp := NULL][]
# colA colB group want want1
#1: E 20 1 NA NA
#2: E 30 1 30 30
#3: A 40 1 40 40
#4: C 30 1 70 70
#5: E 30 2 NA NA
#6: C 30 2 30 30
There's one approach: row reference + sums
# input data
tempDT <- data.table(colA = c("E","E","A","C","E","C","E","C","E"), colB = c(20,30,40,30,30,40,30,20,10), group = c(1,1,1,1,2,2,2,2,2), want = c(NA, 30, 40, 70,NA,40,70,20,30))
tempDT
# find row reference previous row where colA is "E"
lastEpos <- function(i) tail(which(tempDT$colA[1:(i-1)] == "E"), 1)
tempDT[, rowRef := sapply(.I, lastEpos), by = "group"]
# sum up
sumEpos <- function(i) {
valTEMP <- tempDT$rowRef[i]
outputTEMP <- sum(tempDT$colB[(valTEMP+1):i]) # sum
return(outputTEMP)
}
tempDT[, want1 := sapply(.I, sumEpos), by = "group"]
# deal with first row in every group
tempDT[, want1 := c(NA, want1[-1]), by = "group"]
# clean output
tempDT[, rowRef := NULL]
tempDT
library(dplyr)
df %>%
group_by(group) %>%
mutate(row_num = n():1) %>%
group_by(group) %>%
mutate(sum_colB = sum(colB[row_num < row_num[which(colA=='E')]]),
flag = ifelse(row_num >= row_num[which(colA=='E')], 0, 1),) %>%
mutate(sum_colB = ifelse(flag==1 & row_num==1, sum_colB, ifelse(flag==0, NA, colB))) %>%
select(-flag, -row_num) %>%
data.frame()
Output is:
colA colB group want sum_colB
1 E 20 1 NA NA
2 E 30 1 30 NA
3 A 40 1 40 40
4 C 30 1 70 70
5 E 30 2 NA NA
6 C 30 2 30 30
Sample data:
df <- structure(list(colA = structure(c(3L, 3L, 1L, 2L, 3L, 2L), .Label = c("A",
"C", "E"), class = "factor"), colB = c(20, 30, 40, 30, 30, 30
), group = c(1, 1, 1, 1, 2, 2), want = c(NA, 30, 40, 70, NA,
30)), .Names = c("colA", "colB", "group", "want"), row.names = c(NA,
-6L), class = "data.frame")

sumif in ifelse condition R

I have a DT with multiple columns and I need to give a condition in ifelse and do the calculations accordingly. I want it to do count/sum(count) grouped by segment. Here is the DT
Segment Count Flag
A 23 Y
B 45 N
A 56 N
B 212 Y
I want the fourth column as count per total count of the segment based on the flag so the out put should look something like this. For flag N it is the share of the count per segment. For flag Y, it is the revenue percentage calculation if the No(N) becomes Yes(Y) and in that case the revenue that could be earned. I am sorry as it is clumsy but kindly ask me in comments if you have any doubts.
Segment Count Flag Rev Value
A 23 Y 34 ((56/23)*34)/(34+69)
B 45 N 48 45/(45+212)
A 56 N 23 56/(56+23)
B 212 Y 67 ((45/212)*67)/(67+12)
A 65 Y 69 ...
B 10 Y 12 ...
Any help is appreciated. Thanks!
We can do this with data.table. Convert the 'data.frame' to 'data.table' (setDT(DT)), grouped by 'Segment', create the 'Value' column by diviing the 'Count' by the sum of 'Count', then we update the 'Value' where the Flag' is 'N'
library(data.table)
setDT(DT)[, Value := Count/sum(Count), Segment
][Flag == "N", Value := Count/sum(Count), Segment]
DT
# Segment Count Flag Value
#1: A 23 Y 0.18852459
#2: B 45 N 1.00000000
#3: A 56 N 1.00000000
#4: B 212 Y 0.78810409
#5: A 43 Y 0.35245902
#6: B 12 Y 0.04460967
Just checking with the OPs expected output 'Value'
> 23/122
#[1] 0.1885246
> 212/269
#[1] 0.7881041
> 43/122
#[1] 0.352459
> 12/269
#[1] 0.04460967
Update3
Based on the update No:3 in Op's post
s1 <- setDT(DT1)[, .(rn = .I[Flag == "Y"], Value = (Rev[Flag=="Y"] *
(Count[Flag == "N"]/Count[Flag=="Y"]))/sum(Rev[Flag == "Y"])), Segment]
s2 <- DT1[, .(rn = .I[Flag == "N"], Value = Count[Flag == "N"]/(Count[Flag == "N"] +
Count[Flag=="Y"][1])), Segment]
DT1[, Value := rbind(s1, s2)[order(rn)]$Value]
DT1
# Segment Count Flag Rev Value
#1: A 23 Y 34 0.8037146
#2: B 45 N 48 0.1750973
#3: A 56 N 23 0.7088608
#4: B 212 Y 67 0.1800215
#5: A 65 Y 69 0.5771471
#6: B 10 Y 12 0.6835443
>((56/23)*34)/(34+69)
#[1] 0.8037146
> 45/(45+212)
#[1] 0.1750973
> 56/(56+23)
#[1] 0.7088608
> ((45/212)*67)/(67+12)
#[1] 0.1800215
data
DT <- structure(list(Segment = c("A", "B", "A", "B", "A", "B"), Count = c(23L,
45L, 56L, 212L, 43L, 12L), Flag = c("Y", "N", "N", "Y", "Y",
"Y")), .Names = c("Segment", "Count", "Flag"), row.names = c(NA,
-6L), class = "data.frame")
DT1 <- structure(list(Segment = c("A", "B", "A", "B", "A", "B"), Count = c(23L,
45L, 56L, 212L, 65L, 10L), Flag = c("Y", "N", "N", "Y", "Y",
"Y"), Rev = c(34L, 48L, 23L, 67L, 69L, 12L)), .Names = c("Segment",
"Count", "Flag", "Rev"), class = "data.frame", row.names = c(NA,
-6L))
Alternatively we could have also used dplyr pkg for that...
Updating based on the suggestions provided by #Aramis7d - thanks!
library(data.table)
df <- fread("Segment Count Flag
A 23 Y
B 45 N
A 56 N
B 212 Y
A 43 Y
B 12 Y")
library(dplyr)
df %>%
group_by(Segment) %>%
mutate(Value = Count/sum(Count)) %>%
group_by(Segment, Flag) %>%
mutate(Value = if_else( Flag == "N", Count/sum(Count), Value))

Replace value in data.frame with value in next column

I have dataframe with two columns:
names duration
1 J 97
2 G NA
3 H 53
4 A 23
5 E NA
6 D NA
7 C 73
8 F NA
9 B 37
10 I 67
What I want to do is replace all NA values in duration column with value from names column from the same row. How can I achive that?
Data
zz <- "names duration
1 J 97
2 G NA
3 H 53
4 A 23
5 E NA
6 D NA
7 C 73
8 F NA
9 B 37
10 I 67"
df <- read.table(text = zz, header = TRUE)
Solution with dplyr
library(dplyr)
df_new <- df %>%
mutate(duration = ifelse(is.na(duration), as.character(names), duration))
Output
df_new
# names duration
# 1 J 97
# 2 G G
# 3 H 53
# 4 A 23
# 5 E E
# 6 D D
# 7 C 73
# 8 F F
# 9 B 37
# 10 I 67
We can use is.na to create a logical index and then subset both the 'names' based on the 'i1' to replace the 'duration' on the same row.
i1 <- is.na(df$duration)
df$duration[i1] <- df$names[i1]
df
# names duration
#1 J 97
#2 G G
#3 H 53
#4 A 23
#5 E E
#6 D D
#7 C 73
#8 F F
#9 B 37
#10 I 67
NOTE: This should change the class of the 'duration' to character from numeric
Or this can be done with a faster approach with data.table. Convert the 'data.frame' to 'data.table' (setDT(df)), change the class of 'duration' to character, then by specifying the condition in 'i' (is.na(duration)), we assign (:=) the values in 'name' that correspond to the 'i' condition to 'duration'. As the assignment happens in place, it will be very efficient.
library(data.table)
setDT(df)[, duration:= as.character(duration)][is.na(duration), duration:= names]
data
df <- structure(list(names = c("J", "G", "H", "A", "E", "D", "C", "F",
"B", "I"), duration = c(97L, NA, 53L, 23L, NA, NA, 73L, NA, 37L,
67L)), .Names = c("names", "duration"), row.names = c("1", "2",
"3", "4", "5", "6", "7", "8", "9", "10"), class = "data.frame")

Resources