How to create a column with information from other columns - r

Not able to create the column as I want. It consist in using the previous third value of the flow column, for each new value of the event column.
I tried to approach this problem by using for loops but can't exactly replicate what I want. I'm close but not there.
just to recreate the example I generated the following data frame
flow<- c(40, 39, 38, 37, 50, 49, 46, 44, 60, 55, 40, 70, 80, 75, 90, 88, 86, 100, 120, 118)
event<- c(1,1,1,1,2,2,2,2,3,3,3,4,5,5,6,6,6,7,8,8)
a<- data.frame(flow, event)
for (j in seq(1, length(a$event))) {
if (a$event[j] <= 1){
a$BF[a$event==j]<- NA}
else{
if (a$event[j] == a$event[j-1]){
a$BF[a$event==j]<- a$flow[j-3]
} else{
a$BF[j]<- a$flow[j-3] }
}
}
I expected to generate a column called "BF" to be like this:
flow event BF
1 40 1 NA
2 39 1 NA
3 38 1 NA
4 37 1 NA
5 50 2 39
6 49 2 39
7 46 2 39
8 44 2 39
9 60 3 49
10 55 3 49
11 40 3 49
12 70 4 60
13 80 5 55
14 75 5 55
15 90 6 70
16 88 6 70
17 86 6 70
18 100 7 90
19 120 8 88
20 118 8 88
The error that I am obtaining with the previous code is that is not duplicating the values properly that match with the "event" column. (It should be as it is shown in the table).

More Tidy-er solution will be:
library(dplyr)
a %>%
mutate(BF = ifelse(event<=1,NA,row_number()-3)) %>%
group_by(event) %>%
mutate(BF = BF[1]) %>%
ungroup() %>%
mutate(BF = a[BF,]$flow)
# A tibble: 20 x 3
flow event BF
<dbl> <dbl> <dbl>
1 40 1 NA
2 39 1 NA
3 38 1 NA
4 37 1 NA
5 50 2 39
6 49 2 39
7 46 2 39
8 44 2 39
9 60 3 49
10 55 3 49
11 40 3 49
12 70 4 60
13 80 5 55
14 75 5 55
15 90 6 70
16 88 6 70
17 86 6 70
18 100 7 90
19 120 8 88
20 118 8 88

An alternative way to get the output with tidyverse. This breaks your problem up into two pieces. There is likely something more succinct out there:
library(tidyverse)
critical_info <- a %>%
mutate(previous = lag(flow, 3)) %>% #find the previous flow number for each
group_by(event) %>%
mutate(subevent = row_number()) %>% #to knew each subevent within an event
filter(subevent == 1) %>% #filter out unimportant rows
rename(BF = previous) %>% #rename the column
select(event, BF) # get the right stuff
a %>%
left_join(critical_info, by ="event")

Related

Use mutate and dynamically named variables in dplyr

I would like to apply a function that selects the best transformation of certain variables in a data frame, and then adds new columns to the data frame with the transformed data. I can currently get the transformation to run as follows. However, this rewrites the existing data, instead of adding new, transformed variables. I have seen the other stackoverflow posts about dynamically-added variables but can't quite seem to get it to work. Here is what I have:
df <- data.frame(study_id = c(1:10),
v1 = (sample(1:100, 10)),
v2 = (sample(1:100, 10)),
v3 = (sample(1:100, 10)),
v4 = (sample(1:100, 10)))
require(bestNormalize)
transformed <- function(x) {
bn <- bestNormalize(x)
return(bn$x.t)
}
df <- df %>%
mutate(across(c(2,4:5), transformed))
Current output:
study_id v1 v2 v3 v4
1 1 -0.001846842 43 0.6559159 0.37893888
2 2 -2.416625847 81 -1.2998111 -0.64356058
3 3 1.012132345 95 -1.5086228 -0.48845289
4 4 0.798561562 2 0.8301299 0.30168982
5 5 -0.257460026 35 0.1322051 0.78737617
6 6 -0.179681789 42 -1.1352463 -2.42438347
7 7 0.378206706 22 -0.3635088 0.79583687
8 8 0.909304988 70 1.0748401 0.63712357
9 9 0.325879668 32 0.9041796 -0.09711216
10 10 -0.568470765 7 0.7099185 0.75254380
Desired output:
study_id v1 v2 v3 v4 v1_transformed v3_transformed v4_transformed
1 1 72 7 87 100 4 3 2
2 2 57 78 64 69 10 8 6
3 3 35 65 83 96 3 5 4
4 4 24 58 94 53 6 10 10
5 5 100 62 82 63 -1 7 3
6 6 47 55 4 50 8 4 1
7 7 83 97 35 41 7 2 -1
8 8 78 86 22 73 1 -1 9
9 9 11 39 93 68 2 0 7
10 10 36 49 8 72 0 1 0
Many thanks in advance.
Use the .names= argument of across:
df %>%
mutate(across(c(2,4:5), transformed, .names = "{.col}_transformed"))
giving:
study_id v1 v2 v3 v4 v1_transformed v3_transformed v4_transformed
1 1 50 72 12 7 0.3850197 -0.7916019 -1.9775107
2 2 53 82 61 42 0.4425318 0.6132865 0.6790496
3 3 3 12 90 20 -2.3661268 0.9496526 -0.4232995
4 4 20 84 37 21 -0.5190229 0.1809655 -0.3508475
5 5 55 54 4 23 0.4790925 -1.7301008 -0.2157362
6 6 61 96 85 74 0.5812924 0.9002185 1.5209888
7 7 52 94 22 38 0.4237308 -0.2683955 0.5302984
8 8 72 41 57 35 0.7449435 0.5546340 0.4080778
9 9 13 67 6 45 -0.9434502 -1.3866702 0.7815968
10 10 74 48 93 14 0.7719892 0.9780114 -0.9526174

how to subtract the next column by the previous column and create a new column after?

There are here on stackoverflow questions about how to diff a column by the previous column like this my question is a little bit different, i want to create a new column after that diff and don't modify the existing columns
Sample data:
dfData <- data.frame(ID = c(1, 2, 3, 4, 5),
DistA = c(10, 8, 15, 22, 15),
DistB = c(15, 35, 40, 33, 20),
DistC = c(20,40,50,45,30),
DistD = c(60,55,55,48,50))
ID DistA DistB DistC DistD
1 1 10 15 20 60
2 2 8 35 40 55
3 3 15 40 50 55
4 4 22 33 45 48
5 5 15 20 30 50
Expected output:
ID DistA DistB DiffB-A DistC DistD Diff D-C
1 1 10 15 05 20 60 40
2 2 8 35 27 40 55 15
3 3 15 40 25 50 55 05
4 4 22 33 11 45 48 03
5 5 15 20 5 30 50 20
Subtract the next column by the previous column and create a new column after
If you want to subtract every two columns, we can use split.default to split the data into two columns each and subtract the second column with the first one.
cols <- ceiling(seq_along(dfData[-1])/2)
new_cols <- tapply(names(dfData[-1]), cols, function(x)
sprintf('diff_%s', paste0(x, collapse = '')))
dfData[new_cols] <- sapply(split.default(dfData[-1], cols), function(x)
x[[2]] - x[[1]])
dfData
# ID DistA DistB DistC DistD diff_DistADistB diff_DistCDistD
#1 1 10 15 20 60 5 40
#2 2 8 35 40 55 27 15
#3 3 15 40 50 55 25 5
#4 4 22 33 45 48 11 3
#5 5 15 20 30 50 5 20

Using rescale based on categories in R

I have a dataframe that contains tiers and scores. I want to rescale the scores based on the tier with tier 5 from 100-91, tier 4 from 90-81, tier 3 from 80-71 etc. A sample of the data is as follows...
Tier Score
1 95
2 85
3 90
3 87
1 90
4 88
5 90
2 90
5 75
3 80
4 72
1 86
5 70
What I have so far is
library(scales)
df$scale = ifelse(df$tier == "5", rescale(df[df$tier == "5",]$score, to = c(91, 100)), df$scale)
and the output is NA
First, create a list containing the limits for rescale. The first list element is for Tier 1, the second list element is for Tier 2 etc.
limits <- list(c(60, 51), c(61, 70), c(71, 80), c(81, 90), c(91, 100))
You can use this list in the following dplyr approach:
library(dplyr)
df %>%
group_by(Tier) %>%
mutate(scale = rescale(Score, to = limits[[first(Tier)]]))
The result:
# A tibble: 13 x 3
# Groups: Tier [5]
Tier Score scale
<int> <int> <dbl>
1 1 95 51
2 2 85 61
3 3 90 80
4 3 87 77.3
5 1 90 56
6 4 88 90
7 5 90 100
8 2 90 70
9 5 75 93.2
10 3 80 71
11 4 72 81
12 1 86 60
13 5 70 91

R: Creating a vector with certain values from another vector

So I have a csv file with column headers ID, Score, and Age.
So in R I have,
data <- read.csv(file.choose(), header=T)
attach(data)
I would like to create two new vectors with people's scores whos age are below 70 and above 70 years old. I thought there was a nice a quick way to do this but I cant find it any where. Thanks for any help
Example of what data looks like
ID, Score, Age
1, 20, 77
2, 32, 65
.... etc
And I am trying to make 2 vectors where it consists of all peoples scores who are younger than 70 and all peoples scores who are older than 70
Assuming data looks like this:
Score Age
1 1 29
2 5 39
3 8 40
4 3 89
5 5 31
6 6 23
7 7 75
8 3 3
9 2 23
10 6 54
.. . ..
you can use
df_old <- data[data$Age >= 70,]
df_young <- data[data$Age < 70,]
which gives you
> df_old
Score Age
4 3 89
7 7 75
11 7 97
13 3 101
16 5 89
18 5 89
19 4 96
20 3 97
21 8 75
and
> df_young
Score Age
1 1 29
2 5 39
3 8 40
5 5 31
6 6 23
8 3 3
9 2 23
10 6 54
12 4 23
14 2 23
15 4 45
17 7 53
PS: if you only want the scores and not the age, you could use this:
df_old <- data[data$Age >= 70, "Score"]
df_young <- data[data$Age < 70, "Score"]

transforming & adding new column in r

I have currently have a data frame that is taken from a data feed of events that happened in chronological order. I would like to add a new column onto to each row of my data the corresponds to the previous event's endx if the prior event type is 1 & the previous event's x if the prior event type is not 1
e.g
player_id <- c(12, 17, 26, 3)
event_type <- c(1, 3, 1, 10)
x <- c(65, 34, 43, 72)
endx <- c(68, NA, 47, NA)
df <- data.frame(player_id, event_type, x, endx)
df
player_id event_type x endx
1 12 1 65 68
2 17 3 34 NA
3 26 1 43 47
4 3 10 72 NA
so end result
player_id event_type x endx previous
1 12 1 65 68 NA
2 17 3 34 NA 68
3 26 1 43 47 34
4 3 10 72 NA 47
We can use if_else
library(dplyr)
df %>%
mutate(previous = if_else(lag(event_type)==1, lag(endx), lag(x)))
# player_id event_type x endx previous
#1 12 1 65 68 NA
#2 17 3 34 NA 68
#3 26 1 43 47 34
#4 3 10 72 NA 47
I am sure this isn't the most succient way but you can use a loop and indexing.
df$previous <- NA
for( i in 2: nrow(df)){
df[ i , "previous"] <- df[ i-1 , "endx"]
}

Resources