given a table with defined groups where within each group
I have just 1 reference (query) I'd like to change all values of a column
based in value of the reference.
This values are just 1 or -1.
The idea is:
- if reference is equal to 1 so keep all values as it are
- but if reference is -1, so all values should be multiplied by -1, so that way reference became to be 1 and the items with value 1 became to be -1
- Also, modified groups should have opposite order
I'm trying to do this way:
library(tidyverse)
item <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l")
grou <- c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4)
quer <- c(0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0)
dir <- c(1, 1, 1, -1, 1, 1, 1, -1, 1, -1, -1, -1)
ds <- tibble(item = item,
group = grou,
query = quer,
direction = dir)
ds %>%
group_by(group) %>%
mutate(
direction = ifelse(
direction[query == 1] == 1, direction, (-1 * direction)
)
)
So this
# A tibble: 12 x 5
# Groups: group [4]
item group query direction
<chr> <dbl> <dbl> <dbl>
1 a 1 0 1
2 b 1 1 1
3 c 1 0 1
4 d 2 0 -1
5 e 2 1 1
6 f 2 0 1
7 g 3 0 1
8 h 3 1 -1
9 i 3 0 1
10 j 4 0 -1
11 k 4 1 -1
12 l 4 0 -1
Should became this
# A tibble: 12 x 5
# Groups: group [4]
item group query direction
<chr> <dbl> <dbl> <dbl>
1 a 1 0 1
2 b 1 1 1
3 c 1 0 1
4 d 2 0 -1
5 e 2 1 1
6 f 2 0 1
7 i 3 0 -1
8 h 3 1 1
9 g 3 0 -1
10 l 4 0 1
11 k 4 1 1
12 j 4 0 1
But it is not working.
Thanks in advance
Here is a way to do it:
ds %>%
rowid_to_column("id") %>%
group_by(group) %>%
mutate(tmp = max(query * direction) - 0.5,
direction = tmp * 2 * direction) %>%
arrange(id * tmp, .by_group = TRUE) %>%
select(-c(id, tmp))
The result:
# A tibble: 12 x 4
# Groups: group [4]
item group query direction
<chr> <dbl> <dbl> <dbl>
1 a 1 0 1
2 b 1 1 1
3 c 1 0 1
4 d 2 0 -1
5 e 2 1 1
6 f 2 0 1
7 i 3 0 -1
8 h 3 1 1
9 g 3 0 -1
10 l 4 0 1
11 k 4 1 1
12 j 4 0 1
Related
An example dataframe with 2 columns:
groupID <- c(1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3)
index_ad <- c( 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0)
df <- data.frame(groupID, index_ad)
I want to add another column with a sequence for each group starting at the row where index_ad = 1 and then adding sequential positive/negative numbers depending on whether the row comes before or after the row where index_ad = 1.
ep_id <- c(0, 1, 2, 3, -2, -1, 0, 1, 2, -1, 0, 1, 2)
df1 <- data.frame(groupID, index_ad, ep_id)
I've tried using row_number, but that always starts from the first row in each group.
df <- df %>% group_by(groupID) %>% mutate(ep_num = row_number()) %>% ungroup()
The real dataset has >10,000 rows and multiple other variables including date/times. The groups are arranged/sorted by date/time and the 'index_ad' variable refers to whether the case/row should be considered the index case for that group. All cases/rows before the index case have date/times that occurred before it and all cases/rows after it have date/times that occurred after it.
Please help me figure out how to add the 'ep_id' numeric sequence using R! Thankyou!
You can try
library(dplyr)
df |> group_by(groupID) |> mutate(ep_id = 1:n() - which(index_ad == 1))
output
# A tibble: 13 × 3
# Groups: groupID [3]
groupID index_ad ep_id
<dbl> <dbl> <int>
1 1 1 0
2 1 0 1
3 1 0 2
4 1 0 3
5 2 0 -2
6 2 0 -1
7 2 1 0
8 2 0 1
9 2 0 2
10 3 0 -1
11 3 1 0
12 3 0 1
13 3 0 2
df %>%
group_by(groupID) %>%
mutate(row = row_number(),
ep_num = row - row[index_ad == 1]) %>%
ungroup()
# A tibble: 13 × 4
groupID index_ad row ep_num
<dbl> <dbl> <int> <int>
1 1 1 1 0
2 1 0 2 1
3 1 0 3 2
4 1 0 4 3
5 2 0 1 -2
6 2 0 2 -1
7 2 1 3 0
8 2 0 4 1
9 2 0 5 2
10 3 0 1 -1
11 3 1 2 0
12 3 0 3 1
13 3 0 4 2
Here is a way. Subtract which index row is equal to 1 from the row number to get the result.
groupID <- c(1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3)
index_ad <- c( 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0)
df <- data.frame(groupID, index_ad)
suppressPackageStartupMessages(library(dplyr))
df %>%
group_by(groupID) %>%
mutate(ep_num = row_number(),
ep_num = ep_num - which(index_ad == 1)) %>%
ungroup()
#> # A tibble: 13 × 3
#> groupID index_ad ep_num
#> <dbl> <dbl> <int>
#> 1 1 1 0
#> 2 1 0 1
#> 3 1 0 2
#> 4 1 0 3
#> 5 2 0 -2
#> 6 2 0 -1
#> 7 2 1 0
#> 8 2 0 1
#> 9 2 0 2
#> 10 3 0 -1
#> 11 3 1 0
#> 12 3 0 1
#> 13 3 0 2
Created on 2022-08-12 by the reprex package (v2.0.1)
I have coded the mutate above in two lines to make it clearer but it can be simplified to
df %>%
group_by(groupID) %>%
mutate(ep_num = row_number() - which(index_ad == 1)) %>%
ungroup()
I have a dataset in R like this one:
and I want to keep the same dataset with adding a column that gives the sum rows by ID when A=B=1.
This is the required dataset:
I tried the following R code but it doesn't give the result I want:
library(dplyr)
data1<-data%>% group_by(ID) %>%
mutate(result=case_when(A==1 & B==1 ~ sum(A),TRUE ~ 0)) %>% ungroup()
Not as neat and clean , but still:
data %>%
mutate(row_sum = apply(across(A:B), 1, sum)) %>%
group_by(ID) %>%
mutate(result = sum(row_sum == 2)) %>%
ungroup() %>%
select(-row_sum)
which gives:
# A tibble: 10 x 4
ID A B result
<dbl> <dbl> <dbl> <int>
1 1 1 0 3
2 1 1 1 3
3 1 0 1 3
4 1 0 0 3
5 1 1 1 3
6 1 1 1 3
7 2 1 0 2
8 2 1 1 2
9 2 1 1 2
10 2 0 0 2
After grouping by 'ID', multiply the 'A' with 'B' (0 values in B returns 0 in A) and then get the sum
library(dplyr)
data %>%
group_by(ID) %>%
mutate(result = sum(A*B)) %>%
ungroup
-output
# A tibble: 10 × 4
ID A B result
<dbl> <dbl> <dbl> <dbl>
1 1 1 0 3
2 1 1 1 3
3 1 0 1 3
4 1 0 0 3
5 1 1 1 3
6 1 1 1 3
7 2 1 0 2
8 2 1 1 2
9 2 1 1 2
10 2 0 0 2
data
data <- structure(list(ID = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2), A = c(1,
1, 0, 0, 1, 1, 1, 1, 1, 0), B = c(0, 1, 1, 0, 1, 1, 0, 1, 1,
0)), class = "data.frame", row.names = c(NA, -10L))
Dear all I have a data frame that looks like this
df <- data.frame(time=c(1,2,3,4,1,2,3,4,5), type=c("A","A","A","A","B","B","B","B","B"), count=c(10,0,0,1,8,0,1,0,1))
df
time type count
1 1 A 10
2 2 A 0
3 3 A 0
4 4 A 1
5 1 B 8
6 2 B 0
7 3 B 1
8 4 B 0
9 5 B 1
I want to examine each group of types and if I see that one count is 0 then to replace the next count forward in time with 0. I do not count to be resurrected from the zero.
I want my data to looks like this
time type count
1 1 A 10
2 2 A 0
3 3 A 0
4 4 A 0
5 1 B 8
6 2 B 0
7 3 B 0
8 4 B 0
9 5 B 0
If I understood correctly
library(tidyverse)
df <-
data.frame(
time = c(1, 2, 3, 4, 1, 2, 3, 4, 5),
type = c("A", "A", "A", "A", "B", "B", "B", "B", "B"),
count = c(10, 0, 0, 1, 8, 0, 1, 0, 1)
)
df %>%
group_by(type) %>%
mutate(count = if_else(lag(count, default = first(count)) == 0, 0, count))
#> # A tibble: 9 x 3
#> # Groups: type [2]
#> time type count
#> <dbl> <chr> <dbl>
#> 1 1 A 10
#> 2 2 A 0
#> 3 3 A 0
#> 4 4 A 0
#> 5 1 B 8
#> 6 2 B 0
#> 7 3 B 0
#> 8 4 B 0
#> 9 5 B 0
Created on 2021-09-10 by the reprex package (v2.0.1)
You may use cummin function.
library(dplyr)
df %>% group_by(type) %>% mutate(count = cummin(count))
# time type count
# <dbl> <chr> <dbl>
#1 1 A 10
#2 2 A 0
#3 3 A 0
#4 4 A 0
#5 1 B 8
#6 2 B 0
#7 3 B 0
#8 4 B 0
#9 5 B 0
Since cummin is a base R function you may also implement it in base R -
transform(df, count = ave(count, type, FUN = cummin))
The title of the question may be unclear but I hope these codes will clearly demonstrate my problem.
I have a data frame with three columns. $sensor (A and B); $hour of the day (0-4); and the $value taken by the temperature (1-5).
new.df <- data.frame(
sensor = c("A", "A", "A", "A", "A", "B", "B", "B", "B", "B"),
hour_day = c(0:4, 0:4),
value = c(1, 1, 3, 1, 2, 1, 3, 4, 5, 2)
new.df
sensor hour_day value
1 A 0 1
2 A 1 1
3 A 2 3
4 A 3 1
5 A 4 2
6 B 0 1
7 B 1 3
8 B 2 4
9 B 3 5
10 B 4 2
I want to make a new column that indicates the difference in hour from the hour with maximum value according to the sensor.
Desired result
sensor value hour_day hour_from_max_hour
1 A 1 0 -2
2 A 1 1 -1
3 A 3 2 0
4 A 1 3 1
5 A 2 4 2
6 B 1 0 -3
7 B 3 1 -2
8 B 4 2 -1
9 B 5 3 0
10 B 2 4 1
Note that for sensor A (max = hour 2), and sensor B (max = hour 3). I just want a new column that tells me how many hour different is that sensor-value group is from the max sensor-value.
Thank you in advance and please let me know if I can provide more information.
EDIT
Previous answer were very helpful, I forgot that there is one more variable (day) in this problem. Also, some times there is more than one maximum in a column. When this is the case, I would like to base the difference on the first maximum.
df_add <- data.frame(
sensor = c("A", "A", "A", "A", "A", "B", "B", "B", "B", "B",
"A", "A", "A", "A", "A", "B", "B", "B", "B", "B"),
hour_day = c(0:4, 0:4, 0:4, 0:4),
value = c(1, 1, 3, 3, 2,
3, 2, 4, 4, 1,
1, 5, 6, 6, 2,
2, 1, 3, 3, 1),
day = c(1, 1, 1, 1, 1,
1, 1, 1, 1, 1,
2, 2, 2, 2, 2,
2, 2, 2, 2, 2)
)
df_add
sensor hour_day value day
1 A 0 1 1
2 A 1 1 1
3 A 2 3 1
4 A 3 3 1
5 A 4 2 1
6 B 0 3 1
7 B 1 2 1
8 B 2 4 1
9 B 3 4 1
10 B 4 1 1
11 A 0 1 2
12 A 1 5 2
13 A 2 6 2
14 A 3 6 2
15 A 4 2 2
16 B 0 2 2
17 B 1 1 2
18 B 2 3 2
19 B 3 3 2
20 B 4 1 2
A simple pipe can do it. All you have to do is to get max(value) in the mutate instruction.
new.df %>%
group_by(sensor) %>%
mutate(hour_from_max_hour = hour_day - hour_day[which(value == max(value))[1]])
## A tibble: 10 x 4
## Groups: sensor [2]
# sensor hour_day value hour_from_max_hour
# <fct> <int> <dbl> <int>
# 1 A 0 1. -2
# 2 A 1 1. -1
# 3 A 2 3. 0
# 4 A 3 1. 1
# 5 A 4 2. 2
# 6 B 0 1. -3
# 7 B 1 3. -2
# 8 B 2 4. -1
# 9 B 3 5. 0
#10 B 4 2. 1
library(dplyr)
new.df.2 <-
# First get the hours with the max values
new.df %>%
group_by(sensor) %>%
filter(value == max(value)) %>%
ungroup() %>%
select(sensor, max_hour = hour_day) %>% # This renames hour_day as max_hour
# Now join that to the original table and make the calculation
right_join(new.df) %>%
mutate(hour_from_max_hour = hour_day - max_hour)
Result:
new.df.2
# A tibble: 10 x 5
sensor max_hour hour_day value hour_from_max_hour
<fct> <int> <int> <dbl> <int>
1 A 2 0 1 -2
2 A 2 1 1 -1
3 A 2 2 3 0
4 A 2 3 1 1
5 A 2 4 2 2
6 B 3 0 1 -3
7 B 3 1 3 -2
8 B 3 2 4 -1
9 B 3 3 5 0
10 B 3 4 2 1
This is probably how I would do it:
library(plyr)
dd = ddply(new.df, .(sensor), summarize,
max.value = max(value),
hour.of.max = hour_day[which.max(value)])
new.df = merge(new.df, dd, all.x=T, by='sensor')
new.df$hour_from_max_hour = new.df$hour_day - new.df$hour.of.max
Gave you a couple extra columns, but you can delete them:
sensor hour_day value max.value hour.of.max hour_from_max_hour
1 A 0 1 3 2 -2
2 A 1 1 3 2 -1
3 A 2 3 3 2 0
4 A 3 1 3 2 1
5 A 4 2 3 2 2
6 B 0 1 5 3 -3
7 B 1 3 5 3 -2
8 B 2 4 5 3 -1
9 B 3 5 5 3 0
10 B 4 2 5 3 1
Let's say we have the following data:
library(tidyverse)
data <- tibble(
V1 = c(1, 1, 1, 1, 2, 2, 1, 3),
V2 = c(1, 1, 1, 2, 2, 2, 1, 3),
V3 = c(1, 1, 1, 2, 2, 2, 3, 3),
V4 = c(1, 1, 1, 2, 2, 2, 3, 3)
)
> data
# A tibble: 8 x 4
V1 V2 V3 V4
<dbl> <dbl> <dbl> <dbl>
1 1 1 1 1 ## 1st occurrence
2 1 1 1 1 ## 2nd occurrence
3 1 1 1 1 ## 3rd occurrence
4 1 2 2 2 ## This row does not count while it occurs only once in the data
5 2 2 2 2 ## 1st occurrence
6 2 2 2 2 ## 2nd occurrence
7 1 1 3 3 ## This row does not count while it occurs only once in the data
8 3 3 3 3 ## This row does not count while it occurs only once in the data
We want to filter out rows which occur more often than a threshold; let's say threshold is set to 2 in our example. Additionally, values of the rows which don't reach the threshold are set to 0. Therefore, the result table should be:
> data_filtered
# A tibble: 8 x 4
V1 V2 V3 V4
<dbl> <dbl> <dbl> <dbl>
1 1 1 1 1
2 1 1 1 1
3 1 1 1 1
4 0 0 0 0
5 2 2 2 2
6 2 2 2 2
7 0 0 0 0
8 0 0 0 0
Any suggestion is greatly appreciated.
An idea using dplyr,
library(dplyr)
a %>%
group_by_all() %>%
mutate(new = n()) %>%
rowwise() %>%
mutate_at(vars(-new), funs(replace(., new < 2 , 0))) %>%
select(-new) %>%
ungroup()
which gives,
# A tibble: 8 x 4
V1 V2 V3 V4
<dbl> <dbl> <dbl> <dbl>
1 1 1 1 1
2 1 1 1 1
3 1 1 1 1
4 0 0 0 0
5 2 2 2 2
6 2 2 2 2
7 0 0 0 0
8 0 0 0 0
I would go with data.table:
library(data.table)
data <- data.table(
V1 = c(1, 1, 1, 1, 2, 2, 1, 3),
V2 = c(1, 1, 1, 2, 2, 2, 1, 3),
V3 = c(1, 1, 1, 2, 2, 2, 3, 3),
V4 = c(1, 1, 1, 2, 2, 2, 3, 3)
)
data[,key:=apply(data,1,function(x) paste0(x,collapse = ""))]#create a unique key per row
setkey(data,key) #set the "key" (to be used later on)
data<-merge(data,data[,.N,by=key])#create the frequency N and propagate the values to the initial table via merge
So for the moment:
>data
key V1 V2 V3 V4 N
1: 1111 1 1 1 1 3
2: 1111 1 1 1 1 3
3: 1111 1 1 1 1 3
4: 1133 1 1 3 3 1
5: 1222 1 2 2 2 1
6: 2222 2 2 2 2 2
7: 2222 2 2 2 2 2
8: 3333 3 3 3 3 1
data[,key:=NULL]#drop the key
You can now filter entire rows based on N, via:
data[N<=2,c("V1","V2","V3","V4"):=0]#set all columns to 0 if N is less or equal to 2
resulting in:
V1 V2 V3 V4 N
1: 1 1 1 1 3
2: 1 1 1 1 3
3: 1 1 1 1 3
4: 0 0 0 0 1
5: 0 0 0 0 1
6: 2 2 2 2 2
7: 2 2 2 2 2
8: 0 0 0 0 1
Of course you can drop now N via data[,N:=NULL]