Creating an index by group - r

Simple question, but can't seem to find the answer.
I am trying to divide all cells in a column with the first cell.
V1=c(4,5,6,3,2,7)
V2= c(2,4,5,8,7,9)
group=c(1,1,1,2,2,2)
D= data.frame(V1=V1, V2=V2, group=group)
D
V1 V2 group
1 4 2 1
2 5 4 1
3 6 5 1
4 3 8 2
5 2 7 2
6 7 9 2
This is what I would like to get:
V1 V2 group
1 1.0 1.0 1
2 1.3 2.0 1
3 1.5 2.5 1
4 1.0 1.0 2
5 0.7 0.9 2
6 2.3 1.1 2

A dplyr option:
D %>%
group_by(group) %>%
mutate_at(c("V1", "V2"), ~./first(.))
# A tibble: 6 x 3
# Groups: group [2]
V1 V2 group
<dbl> <dbl> <dbl>
1 1 1 1
2 1.25 2 1
3 1.5 2.5 1
4 1 1 2
5 0.667 0.875 2
6 2.33 1.12 2

Here is a one-liner base R solution,
D[-3] <- sapply(D[-3], function(i) ave(i, D$group, FUN = function(i)i / i[1]))
D
# V1 V2 group
#1 1.0000000 1.000 1
#2 1.2500000 2.000 1
#3 1.5000000 2.500 1
#4 1.0000000 1.000 2
#5 0.6666667 0.875 2
#6 2.3333333 1.125 2

A dplyr way:
library(dplyr)
D %>%
group_by(group) %>%
mutate_all(~ round(. / first(.), 1))
A data.table approach:
library(data.table)
setDT(D)[, lapply(.SD, function(x) round(x / x[1], 1)), by = group]

A base R solution:
split(D, D$group) <- lapply(split(D, D$group),
function(.) {
.[,1:2] <- as.data.frame(t(t(.[, 1:2]) / unlist(.[1,1:2])))
.
})
D
# V1 V2 group
# 1 1.0000000 1.000 1
# 2 1.2500000 2.000 1
# 3 1.5000000 2.500 1
# 4 1.0000000 1.000 2
# 5 0.6666667 0.875 2
# 6 2.3333333 1.125 2

An option with base R
by(D[-3], D[3], FUN = function(x) x/unlist(x[1,])[col(x)])

Related

Compute mean excluding current value

I have the following table
a b avg
1: 1 7 3
2: 1 0 3
3: 1 2 3
4: 2 1 2
5: 2 3 2
where 'a' and 'b' are data and 'avg' calculates the average of 'b' grouped by 'a'.
Now I want to calculate the average ('avg2') of 'b' grouped by 'a' excluding the current value:
a b avg avg2
1: 1 7 3 1.00
2: 1 0 3 4.50
3: 1 2 3 3.50
4: 2 1 2 3.00
5: 2 3 2 1.00
I have tried a manual calculation,
dt[ , (sum(b) - ?? )/(.N -1), by = a]
but I don't know how to fill the gap in the numerator. I guess a related question I have is if there is a way to refer to the current row while performing a summary calculation.
I am not sure if your calculation is correct for group 1 but you can do -
library(data.table)
setDT(df)[, avg2 := (sum(b) - b)/(.N -1), a]
df
# a b avg avg2
#1: 1 7 3 1.0
#2: 1 0 3 4.5
#3: 1 2 3 3.5
#4: 2 1 2 3.0
#5: 2 3 2 1.0
Using dplyr
library(dplyr)
df1 %>%
group_by(a) %>%
mutate(avg = (sum(b) - b)/(n() - 1))
# A tibble: 5 × 3
# Groups: a [2]
a b avg
<int> <int> <dbl>
1 1 7 1
2 1 0 4.5
3 1 2 3.5
4 2 1 3
5 2 3 1

Normalizing data where one column contains discrete subsets of values (in R) [duplicate]

I'm trying to normalize the StrengthCode by Item
E.g.
ID Item StrengthCode
7 A 1
7 A 5
7 A 7
8 B 1
8 B 3
9 A 5
9 A 3
What I need to achieve is something like this:
ID Item StrengthCode Nor
7 A 1 0.14
7 A 5 0.71
7 A 7 1
8 B 1 0.34
8 B 3 1
9 A 5 0.71
9 A 3 0.42
I tried this code but I'm stuck.... if you can help me would be awesome!!!
normalit <- function(m){(m - min(m))/(max(m)-min(m))}
Tbl.Test <- Tbl.3.1 %>%
group_by(ID, Item) %>%
mutate(Nor = normalit(StregthCode))
I get this error:
Warning message NAs introduced by coercion
Your desired output looks like you are wanting this:
df <- read.table(header=TRUE, text=
'ID Item StrengthCode
7 A 1
7 A 5
7 A 7
8 B 1
8 B 3
9 A 5
9 A 3')
df$Nor <- ave(df$StrengthCode, df$Item, FUN=function(x) x/max(x))
df
# > df
# ID Item StrengthCode Nor
# 1 7 A 1 0.1428571
# 2 7 A 5 0.7142857
# 3 7 A 7 1.0000000
# 4 8 B 1 0.3333333
# 5 8 B 3 1.0000000
# 6 9 A 5 0.7142857
# 7 9 A 3 0.4285714
With dplyr you can do (thx to Sotos for the comment+code):
library("dplyr")
(df %>% group_by(Item) %>% mutate(Nor = StrengthCode/max(StrengthCode)))
# > (df %>% group_by(Item) %>% mutate(Nor = StrengthCode/max(StrengthCode)))
# Source: local data frame [7 x 4]
# Groups: Item [2]
#
# ID Item StrengthCode Nor
# <int> <fctr> <int> <dbl>
# 1 7 A 1 0.1428571
# 2 7 A 5 0.7142857
# 3 7 A 7 1.0000000
# 4 8 B 1 0.3333333
# 5 8 B 3 1.0000000
# 6 9 A 5 0.7142857
# 7 9 A 3 0.4285714
Also easy to do in data.table.
library(data.table)
setDT(df)[, Nor := StrengthCode / max(StrengthCode), by = Item]

windowed time domain statistics in R: how to window one column and apply statistic methods to others

I have the following data frame in R :
Time A
1 1
2 1
3 1
4 1
5 2
6 2
7 3
8 3
9 2
10 1
11 1
12 1
13 3
14 3
15 3
Let's consider numbers in Time column are second, i need to define a window of 3 seconds, and apply two or three different methods to A column and have the results for each function in separate columns, lets consider first function is Average, second function is max like this:
Time-window average max
1 1 1
2 2.5 2
3 4 3
4 1 1
4 3 3
How can i do it in R, using any of available libraries.
A data.table solution.
library(data.table)
dat <- setDT(dat)
dat2 <- dat[, `Time-window` := rep(1:(.N/3), each = 3)][
, .(average = mean(A), max = max(A)), by = `Time-window`
]
dat2
# Time-window average max
# 1: 1 1.000000 1
# 2: 2 1.666667 2
# 3: 3 2.666667 3
# 4: 4 1.000000 1
# 5: 5 3.000000 3
DATA
dat <- read.table(text = "Time A
1 1
2 1
3 1
4 1
5 2
6 2
7 3
8 3
9 2
10 1
11 1
12 1
13 3
14 3
15 3",
header = TRUE, stringsAsFactors = FALSE)
If you prefer dplyr, you can do:
df %>%
group_by(time_window = ceiling(Time/3)) %>%
summarise_at(2, list(mean = mean, max = max))
time_window mean max
<fct> <dbl> <int>
1 1 1 1
2 2 1.67 2
3 3 2.67 3
4 4 1 1
5 5 3 3
Or using gl() as already posted by #Ronak Shah for a base R solution:
df %>%
group_by(time_window = gl(n()/3, 3)) %>%
summarise_at(2, list(mean = mean, max = max))
Create a function which applies all the functions you need
apply_fun <- function(x) {
c(mean = mean(x), max = max(x))
}
Create a grouping column and apply the function by group
n <- 3
df$group <- gl(nrow(df)/n, n)
aggregate(A~group, df, apply_fun)
# group A.mean A.max
#1 1 1.000000 1.000000
#2 2 1.666667 2.000000
#3 3 2.666667 3.000000
#4 4 1.000000 1.000000
#5 5 3.000000 3.000000

Cumulative percentages in R

I have the following data frame
d2
# A tibble: 10 x 2
ID Count
<int> <dbl>
1 1
2 1
3 1
4 1
5 1
6 2
7 2
8 2
9 3
10 3
Which states how many counts each person (ID) had.
I would like to calculate the cumulative percentage of each count: 1 - 50%, up to 2: 80%, up to 3: 100%.
I tried
> d2 %>% mutate(cum = cumsum(Count)/sum(Count))
# A tibble: 10 x 3
ID Count cum
<int> <dbl> <dbl>
1 1 0.05882353
2 1 0.11764706
3 1 0.17647059
4 1 0.23529412
5 1 0.29411765
6 2 0.41176471
7 2 0.52941176
8 2 0.64705882
9 3 0.82352941
10 3 1.00000000
but this result is obviously incorrect because I would expect that the count of 1 would correspond to 50% rather than 29.4%.
What is wrong here? How do I get the correct answer?
We get the count of 'Count', create the 'Cum' by taking the cumulative sum of 'n' and divide it by the sum of 'n', then right_join with the original data
d2 %>%
count(Count) %>%
mutate(Cum = cumsum(n)/sum(n)) %>%
select(-n) %>%
right_join(d2) %>%
select(names(d2), everything())
# A tibble: 10 x 3
# ID Count Cum
# <int> <int> <dbl>
# 1 1 1 0.500
# 2 2 1 0.500
# 3 3 1 0.500
# 4 4 1 0.500
# 5 5 1 0.500
# 6 6 2 0.800
# 7 7 2 0.800
# 8 8 2 0.800
# 9 9 3 1.00
#10 10 3 1.00
If we need the output as #LAP mentioned
d2 %>%
mutate(Cum = row_number()/n())
# ID Count Cum
#1 1 1 0.1
#2 2 1 0.2
#3 3 1 0.3
#4 4 1 0.4
#5 5 1 0.5
#6 6 2 0.6
#7 7 2 0.7
#8 8 2 0.8
#9 9 3 0.9
#10 10 3 1.0
This works:
d2 %>%
mutate(cum = cumsum(rep(1/n(), n())))
ID Count cum
1 1 1 0.1
2 2 1 0.2
3 3 1 0.3
4 4 1 0.4
5 5 1 0.5
6 6 2 0.6
7 7 2 0.7
8 8 2 0.8
9 9 3 0.9
10 10 3 1.0
One option could be as:
library(dplyr)
d2 %>%
group_by(Count) %>%
summarise(proportion = n()) %>%
mutate(Perc = cumsum(100*proportion/sum(proportion))) %>%
select(-proportion)
# # A tibble: 3 x 2
# Count Perc
# <int> <dbl>
# 1 1 50.0
# 2 2 80.0
# 3 3 100.0

How to manipulate a data.frame by factor with dplyr

df <- data.frame(a=factor(c(1,1,2,2,3,3) ), b=c(1,1, 10,10, 20,20) )
a b
1 1 1
2 1 1
3 2 10
4 2 10
5 3 20
6 3 20
I want to split the data frame by column a, calculate b/sum(b) in each group, and put the result in column c. With plyr I can do:
fun <- function(x){
x$c=x$b/sum(x$b)
x
}
ddply(df, .(a), fun )
and have
a b c
1 1 1 0.5
2 1 1 0.5
3 2 10 0.5
4 2 10 0.5
5 3 20 0.5
6 3 20 0.5
but how can I do it with dplyr?
df %.% group_by(a) %.% do(fun)
returns a list instead of a data.frame.
df %>%
group_by(a) %>%
mutate(c=b/sum(b))
a b c
1 1 1 0.5
2 1 1 0.5
3 2 10 0.5
4 2 10 0.5
5 3 20 0.5
6 3 20 0.5
Just to mention an R base solution, you can use transform (R base equivalent to mutate) and ave function to split vectors and apply functions.
> transform(df, c=ave(b,a, FUN= function(b) b/sum(b)))
a b c
1 1 1 0.5
2 1 1 0.5
3 2 10 0.5
4 2 10 0.5
5 3 20 0.5
6 3 20 0.5

Resources