Address several variables with one command [duplicate] - r

I have a table with approximately 3000 rows with data in the form of :
Number Type
10001 0
10005 7
10006 0
10007 14
10012 16
10022 14
10023 0
10024 0
10029 7
10035 17
10045 14
I want to add a third column so that the table looks like :
Number Type SCHEach
10001 0 0
10005 7 0
10006 0 0
10007 14 0
10012 16 1
10022 14 0
10023 0 0
10024 0 0
10029 7 0
10035 17 1
10045 14 0
where values in the SCHEach column are based on values in the Type column. If values in the Type column are 16,17,21, or 22, values in the SCHeach column should be 1. For any other values in the Type column, SCHEach values should be 0.
Right now I'm using the following
library(dplyr)
schtable$SCHEach = ifelse(
schtable$Type == 16 |
schtable$Type == 17 |
schtable$Type == 21 |
schtable$Type == 22, 1, 0)
I am new to R and wanted to know if there is a way to do it without having to type the following separately for 16,17,21,and 22?
schtable$Type == 'number'

> mydf$SCHEach <- ifelse(mydf$Type %in% c(16,17,21,22),1,0)
> mydf
Number Type SCHEach
1 10001 0 0
2 10005 7 0
3 10006 0 0
4 10007 14 0
5 10012 16 1
6 10022 14 0
7 10023 0 0
8 10024 0 0
9 10029 7 0
10 10035 17 1
11 10045 14 0

Related

Creating a new column in R

I have a data.frame like the following:
regions admit men_age group
1 1234 34 2
2 3416 51 1
3 2463 26 3
4 1762 29 2
5 2784 31 4
6 999 42 1
7 2111 23 2
8 1665 36 3
9 2341 21 4
10 1723 33 1
I would like to create new columns using admit and group as follows:
regions admit men_age group admit1 admit2 admit3 admit4
1 1234 34 2 0 1234 0 0
2 3416 51 1 3416 0 0 0
3 2463 26 3 0 0 2463 0
4 1762 29 2 0 1762 0 0
5 2784 31 4 0 0 0 2784
6 999 42 1 999 0 0 0
7 2111 23 2 0 2111 0 0
8 1665 36 3 0 0 1665 0
9 2341 21 4 0 0 0 2341
10 1723 33 1 1723 0 0 0
In fact, what I want to do is to create four new admit columns according to group column as follows: in admit 1 column, the value for rows where group is 1, put the corresponding admit number, other wise put zero. In admit 2 column, the values for rows where group is 2, put the corresponding admit number, otherwise put zero ans this applies for two other column as well.
I tried a couple of ways to solve it, but failed.
May please someone help me to solve this?
A solution using tidyverse. We can create the columns and then spread them with fill = 0.
library(tidyverse)
dat2 <- dat %>%
mutate(group2 = str_c("admit", group), admit2 = admit) %>%
spread(group2, admit2, fill = 0)
dat2
# regions admit men_age group admit1 admit2 admit3 admit4
# 1 1 1234 34 2 0 1234 0 0
# 2 2 3416 51 1 3416 0 0 0
# 3 3 2463 26 3 0 0 2463 0
# 4 4 1762 29 2 0 1762 0 0
# 5 5 2784 31 4 0 0 0 2784
# 6 6 999 42 1 999 0 0 0
# 7 7 2111 23 2 0 2111 0 0
# 8 8 1665 36 3 0 0 1665 0
# 9 9 2341 21 4 0 0 0 2341
# 10 10 1723 33 1 1723 0 0 0
DATA
dat <- read.table(text = "regions admit men_age group
1 1234 34 2
2 3416 51 1
3 2463 26 3
4 1762 29 2
5 2784 31 4
6 999 42 1
7 2111 23 2
8 1665 36 3
9 2341 21 4
10 1723 33 1",
header = TRUE)
A Base R solution would be using ifelse(). Supposed you data.frame is x, you could do this:
# create the columns with the selected values
for( i in 1:4 ) x[ i + 4 ] <- ifelse( x$group == i, x$admit, 0 )
# rename the columns to your liking
colnames( x )[ 5:8 ] <- c( "admit1", "admit2", "admit3", "admit4" )
This gives you
> x
regions admit men_age group admit1 admit2 admit3 admit4
1 1 1234 34 2 0 1234 0 0
2 2 3416 51 1 3416 0 0 0
3 3 2463 26 3 0 0 2463 0
4 4 1762 29 2 0 1762 0 0
5 5 2784 31 4 0 0 0 2784
6 6 999 42 1 999 0 0 0
7 7 2111 23 2 0 2111 0 0
8 8 1665 36 3 0 0 1665 0
9 9 2341 21 4 0 0 0 2341
10 10 1723 33 1 1723 0 0 0
If you don't like the explicit naming, you could do it in the for() loop already:
for( i in 1:4 )
{
adm <- paste ( "admit", i, sep = "" )
x[ adm ] <- ifelse( x$group == i, x$admit, 0 )
}

Use value in colum as argument in function

I have two data frames, one with a list with 3 index variables: User, Log and Pass, and one of which has many values for each of these variables.
I'm trying to pass the many values from the big DF into a list within the smaller DF, so that I can perform summary statistics later.
Small.DF
User,Log,Pass,Valid.Event.Pass
1 11 76 Yes
1 11 46 Yes
1 15 38 Yes
1 15 47 Yes
1 15 386 Yes
1 15 388 Yes
1 8 119 Yes
1 8 120 Yes
1 8 121 Yes
1 8 122 Yes
1 8 123 Yes
1 16 35 Yes
1 16 37 Yes
1 17 22 Yes
1 17 102 Yes
1 12 203 Yes
1 12 205 Yes
1 12 207 Yes
1 12 209 Yes
1 12 24 Yes
2 13 29 Yes
2 1 31 Yes
Big.DF
User,Log,Pass,Passing.Distance
1 11 0 739.5
1 11 0 411.5
1 11 0 0
1 11 0 739.5
1 11 0 0
1 11 0 739.5
1 11 0 0
1 0 0 739.5
1 0 0 0
1 0 0 739.5
1 0 0 0
1 0 0 739.5
1 0 0 0
1 0 0 739.5
1 15 76 371.5
1 15 76 371.5
1 15 76 370.5
1 15 767 368.5
1 15 76 367.5
1 15 76 366.5
1 15 76 365.5
1 15 76 364.5
1 15 76 364.5
1 15 76 363.5
1 15 76 364.5
1 15 76 0
1 15 76 739.5
1 15 76 369.5
1 15 76 0
1 15 76 739.5
1 15 0 0
1 15 0 739.5
1 15 0 0
1 15 0 739.5
1 15 0 0
1 15 0 739.5
1 15 0 0
1 15 0 739.5
1 15 0 0
1 15 0 739.5
1 15 0 0
1 15 0 739.5
1 15 0 0
I'm interested in subsetting the values that match for these three variables in Big.DF but also the 100 values before and 100 values after.
To achieve this I've written a function that will create such a list:
newfn<- function(User,Log,Pass){
test<-subset(Sensor.Data[(min(which(Big.DF$User==User&Big.DF$Log==Log & Big.DF$Pass==Pass))-100):(max(which(Big.DF$User==User&Big.DF$Log==Log & Big.DF$Pass==Pass))+100),],select=Passing.Distance)
}
But I can't figure out how to apply this function over each row in smalldf.
The simplest explanation I can think of would be
Small.df$listofvalues<- newfn(Small.df$User,Small.df$Log,Small.df$Pass)
But that won't work for several reasons I can see....
If it were apply it would be something like this
Small.df$listofvalues<-apply(smalldf,1,newfn)
But this doesn't quite work....and sweep doesn't seem quite right either. Is there any function I'm missing?
Figured it out....
rowfinder<- function(User,Log,Pass){
subset(Sensor.Data[(min(which(Sensor.Data$User==User&Sensor.Data$Log==Log & Sensor.Data$Pass==Pass))-100):(max(which(Sensor.Data$User==User&Sensor.Data$Log==Log & Sensor.Data$Pass==Pass))+100),],select=LH.passing.distance)
}
SmallDF$LHvalues<-apply(SmallDF[,c('User','Log','Pass')], 1, function(y) rowfinder(y['User'],y['Log'],y['Pass']))

Excel Formula Implementation in R

I need to implement a logic in my R script for the below shown sample data frame. df
ID A B
1 2.471264262 0
2 2.53024575 0
3 2.559114933 1
4 2.502350493 1
5 2.529496526 0
6 2.480199137 0
7 2.521066835 0
8 2.481272625 0
9 2.505953959 0
10 2.481272625 0
11 2.499424723 0
12 2.492515087 0
13 2.502385996 0
14 2.487579633 0
15 2.479438021 -1
16 2.044195946 1
17 2.054051421 0
18 2.108811073 1
19 2.249767599 0
20 2.627294516 -1
21 2.624337386 0
22 2.157110862 0
23 2.142325212 -1
24 2.124582433 -1
25 2.114725333 0
26 2.113739623 0
27 1.92054047 0
28 2.00037188 0
29 2.183995509 0
30 2.629451192 0
31 2.772756046 0
32 2.603141474 0
33 2.502385996 0
Column B shows the data point where State is changed. Now I need to implement a complex logic where I will be adding or subtracting the "Correction Factor" for the values in Column A for next 15 data points from the point where B == 1 or -1.
The formula for the correction factor is as follows,
If B == 1 then Correction Factor == [A - 0.19*(15/15)*A], Also value the fraction (15/15) will keep on decrementing for the next 15 values like (14/15) , (13/15) .....(0/15).
Similarly if B == -1 then Correction Factor == [A + 0.53*(15/15)*A], Also value (15/15) will keep on decrementing for the next 15 values like (14/15) , (13/15) .....(0/15).
And another condition to consider is that, Once a state change has be detected in B then though there is state change with in the next 15 values, it should not be considered. Ex First change in state is detected at B3 then though there is state change in B4,B15,16 it should not be considered.
For a better Understanding I have attached my expected output along with the formulas executed manually in excel.
Expected Output
A B A With Correction Factor Formula Executed
2.471264262 0 2.471264262 Same Value of A retained since no transition
2.53024575 0 2.53024575 Same Value of A retained since no transition
2.559114933 1 2.072883096 A4-0.19* (15/15)*A4
2.502350493 1 2.058600339 A5-0.19* (14/15)*A5
2.529496526 0 2.112972765 A6-0.19* (13/15)*A6
2.480199137 0 2.103208868 A7-0.19* (12/15)*A7
2.521066835 0 2.169798189 A8-0.19* (11/15)*A8
2.481272625 0 2.166978093 A9-0.19* (10/15)*A9
2.505953959 0 2.220275208 A10-0.19* (9/15)*A10
2.481272625 0 2.229836999 A11-0.19* (8/15)*A11
2.499424723 0 2.277809064 A12-0.19* (7/15)*A12
2.492515087 0 2.30308394 A13-0.19* (6/15)*A13
2.502385996 0 2.34390155 A14-0.19* (5/15)*A14
2.487579633 0 2.361542265 A15-0.19* (4/15)*A15
2.479438021 -1 2.385219376 A16-0.19* (3/15)*A16
2.044195946 1 1.992409649 A17-0.19* (2/15)*A17
2.054051421 0 2.028033436 A18-0.19* (1/15)*A18
2.108811073 1 2.108811073 A19-0.19* (0/15)*A19
2.249767599 0 2.249767599 Same Value of A retained since no transition
2.627294516 -1 4.019760609 A21+0.53*(15/15)*A21
2.624337386 0 3.922509613 A22+0.53*(14/15)*A22
2.157110862 0 3.147943785 A23+0.53*(13/15)*A23
2.142325212 -1 3.050671102 A24+0.53*(12/15)*A24
2.124582433 -1 2.950336805 A25+0.53*(11/15)*A25
2.114725333 0 2.861928284 A26+0.53*(10/15)*A26
2.113739623 0 2.785908823 A27+0.53*(9/15)*A27
1.92054047 0 2.463413243 A28+0.53*(8/15)*A28
2.00037188 0 2.495130525 A29+0.53*(7/15)*A29
2.183995509 0 2.647002557 A30+0.53*(6/15)*A30
2.629451192 0 3.093987569 A31+0.53*(5/15)*A31
2.772756046 0 3.164638901 A32+0.53*(4/15)*A32
2.603141474 0 2.87907447 A33+0.53*(3/15)*A33
2.502385996 0 2.679221273 A34+0.53*(2/15)*A34
Edit
The code suggested below works exactly as required for the above mentioned dataframe i.e the dataframe with 33 rows, but I have the below data frame with 32rows and code doesnt work. Any suggestion on this?
ID A B
1 2.471264262 0
2 2.53024575 0
3 2.559114933 1
4 2.502350493 1
5 2.529496526 0
6 2.480199137 0
7 2.521066835 0
8 2.481272625 0
9 2.505953959 0
10 2.481272625 0
11 2.499424723 0
12 2.492515087 0
13 2.502385996 0
14 2.487579633 0
15 2.479438021 -1
16 2.044195946 1
17 2.054051421 0
18 2.108811073 1
19 2.249767599 0
20 2.627294516 -1
21 2.624337386 0
22 2.157110862 0
23 2.142325212 -1
24 2.124582433 -1
25 2.114725333 0
26 2.113739623 0
27 1.92054047 0
28 2.00037188 0
29 2.183995509 0
30 2.629451192 0
31 2.772756046 0
32 2.603141474 0
Well I was not able to post another question giving this post as the reference so I have updated iin the same post.
Thanks.
This should work, the counting to 15 is a little tricky, so we use a for loop to calculate the correct counter and state. The actual formula is then relatively simple:
counter <- 0
current_state <- NA
for (i in seq_along(df$B)) {
if (counter == 0) {
if (df$B[i] == 0) next
counter <- 15
current_state <- df$B[i]
df$state[i] <- df$B[i]
df$counter[i] <- counter
} else {
counter <- counter - 1
df$state[i] <- current_state
df$counter[i] <- counter
}
}
df$A_corr <- ifelse(df$state == 1,
df$A - 0.19 * (df$counter / 15) * df$A,
df$A + 0.53 * (df$counter / 15) * df$A)
df$A_corr <- ifelse(is.na(df$A_corr), df$A, df$A_corr)
Gives:
> df
ID A B state counter A_corr
1 1 2.471264 0 NA NA 2.471264
2 2 2.530246 0 NA NA 2.530246
3 3 2.559115 1 1 15 2.072883
4 4 2.502350 1 1 14 2.058600
5 5 2.529497 0 1 13 2.112973
6 6 2.480199 0 1 12 2.103209
7 7 2.521067 0 1 11 2.169798
8 8 2.481273 0 1 10 2.166978
9 9 2.505954 0 1 9 2.220275
10 10 2.481273 0 1 8 2.229837
11 11 2.499425 0 1 7 2.277809
12 12 2.492515 0 1 6 2.303084
13 13 2.502386 0 1 5 2.343902
14 14 2.487580 0 1 4 2.361542
15 15 2.479438 -1 1 3 2.385219
16 16 2.044196 1 1 2 1.992410
17 17 2.054051 0 1 1 2.028033
18 18 2.108811 1 1 0 2.108811
19 19 2.249768 0 NA NA 2.249768
20 20 2.627295 -1 -1 15 4.019761
21 21 2.624337 0 -1 14 3.922510
22 22 2.157111 0 -1 13 3.147944
23 23 2.142325 -1 -1 12 3.050671
24 24 2.124582 -1 -1 11 2.950337
25 25 2.114725 0 -1 10 2.861928
26 26 2.113740 0 -1 9 2.785909
27 27 1.920540 0 -1 8 2.463413
28 28 2.000372 0 -1 7 2.495131
29 29 2.183996 0 -1 6 2.647003
30 30 2.629451 0 -1 5 3.093988
31 31 2.772756 0 -1 4 3.164639
32 32 2.603141 0 -1 3 2.879074
33 33 2.502386 0 -1 2 2.679221

How to consider following and previous rows of each observation in R

I need to create 2 columns: PRETARGET and TARGET based on several conditions.
To create PRETARGET, for each row of my data (for each participant PPT and trial TRIAL) I need to check that the CURRENT_ID is associated with a value of 0 in the column CanBePretarget, and that the following row is the value of CURRENT_ID + 1. If these conditions are fulfil, then I would like to have a value of 0, if they are not fulfil a value of 1.
To create TARGET, for each row of my data (for each participant PPT and trial TRIAL) I need to check that the CURRENT_ID is associated with a value of 0 in the column CanBeTarget, and that the previous row is the value of CURRENT_ID - 1. If these conditions are fulfil, then I would like to have a value of 0, if they are not fulfil a value of 1.
In addition, if the result in PRETARGET is 1, then the value of the next row in TARGET should also be 1.
I have added the desired output in the following example.
I was thinking to use for loops and ifelse statements, but I am not sure how to consider the following/previous row of each observation.
PPT TRIAL PREVIOUS_ID CURRENT_ID NEXT_ID CURRENT_INDEX CanBePretarget CanBeTarget PRETARGET TARGET
ppt01 11 2 3 4 3 0 0 0 1
ppt01 11 3 4 3 4 1 0 1 0
ppt01 11 4 5 6 8 0 0 1 1
ppt01 11 6 7 8 10 0 0 1 1
ppt01 11 7 10 11 18 0 1 0 1
ppt01 11 10 11 12 19 0 0 0 0
ppt01 11 11 12 14 20 1 0 1 0
ppt01 12 1 2 1 2 1 0 1 1
ppt01 12 2 3 4 5 0 0 1 1
ppt01 12 5 6 6 8 0 0 0 1
ppt01 12 6 7 7 10 0 0 0 0
ppt01 12 7 8 9 12 0 0 0 0
ppt01 12 8 9 9 13 0 0 0 0
ppt01 12 9 10 11 16 0 0 0 0
ppt01 12 10 11 11 17 0 0 0 0
ppt01 13 1 2 2 2 1 0 1 1
ppt01 13 3 3 3 10 0 0 1 1
ppt01 13 4 5 6 13 0 0 0 1
ppt01 13 5 6 7 14 0 0 1 0
ppt01 13 9 9 10 19 0 0 0 1
ppt01 13 9 10 10 20 0 0 0 0
ppt01 13 10 11 12 22 0 0 0 0
ppt01 13 11 12 12 23 0 0 1 0
ppt01 14 10 11 11 15 0 0 0 1
ppt01 14 11 12 12 17 0 0 1 0
This can be achieved by using dplyr
df.new <- df %>%
mutate(PRETARGET1 = abs(as.numeric(CanBePretarget == 0 & lead(CURRENT_ID, default = 0) == (CURRENT_ID + 1)) - 1)) %>%
group_by(PPT, TRIAL) %>%
mutate(TARGET1 = abs(as.numeric((CanBeTarget == 0 & lag(CURRENT_ID, default = 0) == (CURRENT_ID - 1)) ) -1),
TARGET1 = ifelse(lag(PRETARGET1, default = 0) == 1, 1, TARGET1))
To compare to your results, I created PRETARGET1 and TARGET1.

Using ifelse statement for multiple values in a column

I have a table with approximately 3000 rows with data in the form of :
Number Type
10001 0
10005 7
10006 0
10007 14
10012 16
10022 14
10023 0
10024 0
10029 7
10035 17
10045 14
I want to add a third column so that the table looks like :
Number Type SCHEach
10001 0 0
10005 7 0
10006 0 0
10007 14 0
10012 16 1
10022 14 0
10023 0 0
10024 0 0
10029 7 0
10035 17 1
10045 14 0
where values in the SCHEach column are based on values in the Type column. If values in the Type column are 16,17,21, or 22, values in the SCHeach column should be 1. For any other values in the Type column, SCHEach values should be 0.
Right now I'm using the following
library(dplyr)
schtable$SCHEach = ifelse(
schtable$Type == 16 |
schtable$Type == 17 |
schtable$Type == 21 |
schtable$Type == 22, 1, 0)
I am new to R and wanted to know if there is a way to do it without having to type the following separately for 16,17,21,and 22?
schtable$Type == 'number'
> mydf$SCHEach <- ifelse(mydf$Type %in% c(16,17,21,22),1,0)
> mydf
Number Type SCHEach
1 10001 0 0
2 10005 7 0
3 10006 0 0
4 10007 14 0
5 10012 16 1
6 10022 14 0
7 10023 0 0
8 10024 0 0
9 10029 7 0
10 10035 17 1
11 10045 14 0

Resources