R: Set a value for certain data meeting two conditions - r

I don't know how to set this the easiest way. I have a dataframe called Test, with a column containing some NA values. Now I want to set a value of 1 to all fields meeting the following conditions:
row number > 60
if there is an NA in the specific field
So far I have:
Test$MyColumn[is.na(Test$MyColumn)] <- 1
This works, but I don't know how to set the second condition :-/

If both conditions must apply before you change an element to 1 in bb here is an alternative:
aa <- 1:10
bb <- c(1,NA,6,4,NA,9,1,NA,2,5)
cc <- c(100,102,104,NA,78,54,99,NA,22,0)
dd <- data.frame(aa,bb,cc)
dd
dd$bb[4:nrow(dd)][is.na(dd$bb[4:nrow(dd)])] <- 1
dd
Here is the original data set:
aa bb cc
1 1 1 100
2 2 NA 102
3 3 6 104
4 4 4 NA
5 5 NA 78
6 6 9 54
7 7 1 99
8 8 NA NA
9 9 2 22
10 10 5 0
Here is the modified data set:
aa bb cc
1 1 1 100
2 2 NA 102
3 3 6 104
4 4 4 NA
5 5 1 78
6 6 9 54
7 7 1 99
8 8 1 NA
9 9 2 22
10 10 5 0
This changes NA in rows 4-10 of all columns if there is an NA in rows 4-10 of bb:
aa <- 1:10
bb <- c(1,NA,6,4,NA,9,1,NA,2,5)
cc <- c(100,102,104,NA,78,54,99,NA,22,0)
dd <- data.frame(aa,bb,cc)
dd
dd[4:nrow(dd),1:3][is.na(dd$bb[4:nrow(dd)]),] <- 1
dd
aa bb cc
1 1 1 100
2 2 NA 102
3 3 6 104
4 4 4 NA
5 1 1 1
6 6 9 54
7 7 1 99
8 1 1 1
9 9 2 22
10 10 5 0
This changes NA in rows 4-10 of all columns if there is an NA in rows 4-10 of bb then it changes all remaining NA in bb:
aa <- 1:10
bb <- c(1,NA,6,4,NA,9,1,NA,2,5)
cc <- c(100,102,104,NA,78,54,99,NA,22,0)
dd <- data.frame(aa,bb,cc)
dd
dd[4:nrow(dd),1:3][is.na(dd$bb[4:nrow(dd)]),] <- 1
dd$bb[is.na(dd$bb)] <- 1
dd
aa bb cc
1 1 1 100
2 2 1 102
3 3 6 104
4 4 4 NA
5 1 1 1
6 6 9 54
7 7 1 99
8 1 1 1
9 9 2 22
10 10 5 0

You can set rownumber like this:
Test$RowNumber <- 1:nrow(Test)
And then the condition would be:
Test$MyColumn[is.na(Test$MyColumn) & Test$RowNumber>60] <- 1

You can get the desired result as
Test[60:nrow(Test),][is.na(Test[60:nrow(Test),])]<-1

Related

Merging two dataframes by keeping certain column values in r

I have two dataframes I need to merge with. The second one has certain columns missing and it also has some more ids. Here is how the sample datasets look like.
df1 <- data.frame(id = c(1,2,3,4,5,6),
item = c(11,22,33,44,55,66),
score = c(1,0,1,1,1,0),
cat.a = c("A","B","C","D","E","F"),
cat.b = c("a","a","b","b","c","f"))
> df1
id item score cat.a cat.b
1 1 11 1 A a
2 2 22 0 B a
3 3 33 1 C b
4 4 44 1 D b
5 5 55 1 E c
6 6 66 0 F f
df2 <- data.frame(id = c(1,2,3,4,5,6,7,8),
item = c(11,22,33,44,55,66,77,88),
score = c(1,0,1,1,1,0,1,1),
cat.a = c(NA,NA,NA,NA,NA,NA,NA,NA),
cat.b = c(NA,NA,NA,NA,NA,NA,NA,NA))
> df2
id item score cat.a cat.b
1 1 11 1 NA NA
2 2 22 0 NA NA
3 3 33 1 NA NA
4 4 44 1 NA NA
5 5 55 1 NA NA
6 6 66 0 NA NA
7 7 77 1 NA NA
8 8 88 1 NA NA
The two datasets share first 6 rows and dataset 2 has two more rows. When I merge I need to keep cat.a and cat.b information from the first dataframe. Then I also want to keep id=7 and id=8 with cat.a and cat.b columns missing.
Here is my desired output.
> df3
id item score cat.a cat.b
1 1 11 1 A a
2 2 22 0 B a
3 3 33 1 C b
4 4 44 1 D b
5 5 55 1 E c
6 6 66 0 F f
7 7 77 1 <NA> <NA>
8 8 88 1 <NA> <NA>
Any ideas?
Thanks!
We may use rows_update
library(dplyr)
rows_update(df2, df1, by = c("id", "item", "score"))
-output
id item score cat.a cat.b
1 1 11 1 A a
2 2 22 0 B a
3 3 33 1 C b
4 4 44 1 D b
5 5 55 1 E c
6 6 66 0 F f
7 7 77 1 <NA> <NA>
8 8 88 1 <NA> <NA>

count row number first and then insert new row by condition [duplicate]

This question already has answers here:
How to create missing value for repeated measurement data?
(2 answers)
Closed 4 years ago.
I need to count the number of rows first after a group_by function and add up new row(s) to 6 row if the row number < 6.
My df has three variables (v1,v2,v3): v1 = group name, v2 = row number (i.e., 1,2,3,4,5,6). In the new row(s), I want to repeat the v1 value, v2 continue the couting of row number, v3 = NA
sample df
v1 v2 v3
1 1 79
1 2 32
1 3 53
1 4 33
1 5 76
1 6 11
2 1 32
2 2 42
2 3 44
2 4 12
3 1 22
3 2 12
3 3 12
3 4 67
3 5 32
expected output
v1 v2 v3
1 1 79
1 2 32
1 3 53
1 4 33
1 5 76
1 6 11
2 1 32
2 2 42
2 3 44
2 4 12
2 5 NA #insert
2 6 NA #insert
3 1 22
3 2 12
3 3 12
3 4 67
3 5 32
3 6 NA #insert
I tried to count the row number first by dplyr, but I don't know if I can or how can I add this if else condition by using the pip. Or is there other easier function?
My code
df %>%
group_by(v1) %>%
dplyr::summarise(N=n()) %>%
if (N < 6) {
# sth like that?
}
Thanks!
We can use complete
library(tidyverse)
complete(df1, v1, v2)
# A tibble: 18 x 3
# v1 v2 v3
# <int> <int> <int>
# 1 1 1 79
# 2 1 2 32
# 3 1 3 53
# 4 1 4 33
# 5 1 5 76
# 6 1 6 11
# 7 2 1 32
# 8 2 2 42
# 9 2 3 44
#10 2 4 12
#11 2 5 NA
#12 2 6 NA
#13 3 1 22
#14 3 2 12
#15 3 3 12
#16 3 4 67
#17 3 5 32
#18 3 6 NA
Here is a way to do it using merge.
df <- read.table(text =
"v1 v2 v3
1 1 79
1 2 32
1 3 53
1 4 33
1 5 76
1 6 11
2 1 32
2 2 42
2 3 44
2 4 12
3 1 22
3 2 12
3 3 12
3 4 67
3 5 32", header = T)
toMerge <- data.frame(v1 = rep(1:3, each = 6), v2 = rep(1:6, times = 3))
m <- merge(toMerge, df, by = c("v1", "v2"), all.x = T)
m
v1 v2 v3
1 1 1 79
2 1 2 32
3 1 3 53
4 1 4 33
5 1 5 76
6 1 6 11
7 2 1 32
8 2 2 42
9 2 3 44
10 2 4 12
11 2 5 NA
12 2 6 NA
13 3 1 22
14 3 2 12
15 3 3 12
16 3 4 67
17 3 5 32
18 3 6 NA

How do I select rows in a data frame before and after a condition is met?

I'm searching the web for a few a days now and I can't find a solution to my (probably easy to solve) problem.
I have huge data frames with 4 variables and over a million observations each. Now I want to select 100 rows before, all rows while and 1000 rows after a specific condition is met and fill the rest with NA's. I tried it with a for loop and if/ifelse but it doesn't work so far. I think it shouldn't be a big thing, but in the moment I just don't get the hang of it.
I create the data using:
foo<-data.frame(t = 1:15, a = sample(1:15), b = c(1,1,1,1,1,4,4,4,4,1,1,1,1,1,1), c = sample(1:15))
My Data looks like this:
ID t a b c
1 1 4 1 7
2 2 7 1 10
3 3 10 1 6
4 4 2 1 4
5 5 13 1 9
6 6 15 4 3
7 7 8 4 15
8 8 3 4 1
9 9 9 4 2
10 10 14 1 8
11 11 5 1 11
12 12 11 1 13
13 13 12 1 5
14 14 6 1 14
15 15 1 1 12
What I want is to pick the value of a (in this example) 2 rows before, all rows while and 3 rows after the value of b is >1 and fill the rest with NA's. [Because this is just an example I guess you can imagine that after these 15 rows there are more rows with the value for b changing from 1 to 4 several times (I did not post it, so I won't spam the question with unnecessary data).]
So I want to get something like:
ID t a b c d
1 1 4 1 7 NA
2 2 7 1 10 NA
3 3 10 1 6 NA
4 4 2 1 4 2
5 5 13 1 9 13
6 6 15 4 3 15
7 7 8 4 15 8
8 8 3 4 1 3
9 9 9 4 2 9
10 10 14 1 8 14
11 11 5 1 11 5
12 12 11 1 13 11
13 13 12 1 5 NA
14 14 6 1 14 NA
15 15 1 1 12 NA
I'm thankful for any help.
Thank you.
Best regards,
Chris
here is the same attempt as missuse, but with data.table:
library(data.table)
foo<-data.frame(t = 1:11, a = sample(1:11), b = c(1,1,1,4,4,4,4,1,1,1,1), c = sample(1:11))
DT <- setDT(foo)
DT[ unique(c(DT[,.I[b>1] ],DT[,.I[b>1]+3 ],DT[,.I[b>1]-2 ])), d := a]
t a b c d
1: 1 10 1 2 NA
2: 2 6 1 10 6
3: 3 5 1 7 5
4: 4 11 4 4 11
5: 5 4 4 9 4
6: 6 8 4 5 8
7: 7 2 4 8 2
8: 8 3 1 3 3
9: 9 7 1 6 7
10: 10 9 1 1 9
11: 11 1 1 11 NA
Here
unique(c(DT[,.I[b>1] ],DT[,.I[b>1]+3 ],DT[,.I[b>1]-2 ]))
gives you your desired indixes : the unique indices of the line for your condition, the same indices+3 and -2.
Here is an attempt.
Get indexes that satisfy the condition b > 1
z <- which(foo$b > 1)
get indexes for (z - 2) : (z + 3)
ind <- unique(unlist(lapply(z, function(x){
g <- pmax(x - 2, 1) #if x - 2 is negative
g : (x + 3)
})))
create d column filled with NA
foo$d <- NA
replace elements with appropriate indexes with foo$a
foo$d[ind] <- foo$a[ind]
library(dplyr)
library(purrr)
# example dataset
foo<-data.frame(t = 1:15,
a = sample(1:15),
b = c(1,1,1,1,1,4,4,4,4,1,1,1,1,1,1),
c = sample(1:15))
# function to get indices of interest
# for a given index x go 2 positions back and 3 forward
# keep only positive indices
GetIDsBeforeAfter = function(x) {
v = (x-2) : (x+3)
v[v > 0]
}
foo %>% # from your dataset
filter(b > 1) %>% # keep rows where b > 1
pull(t) %>% # get the positions
map(GetIDsBeforeAfter) %>% # for each position apply the function
unlist() %>% # unlist all sets indices
unique() -> ids_to_remain # keep unique ones and save them in a vector
foo$d = foo$c # copy column c as d
foo$d[-ids_to_remain] = NA # put NA to all positions not in our vector
foo
# t a b c d
# 1 1 5 1 8 NA
# 2 2 6 1 14 NA
# 3 3 4 1 10 NA
# 4 4 1 1 7 7
# 5 5 10 1 5 5
# 6 6 8 4 9 9
# 7 7 9 4 15 15
# 8 8 3 4 6 6
# 9 9 7 4 2 2
# 10 10 12 1 3 3
# 11 11 11 1 1 1
# 12 12 15 1 4 4
# 13 13 14 1 11 NA
# 14 14 13 1 13 NA
# 15 15 2 1 12 NA

How to generate new variables based on the name of the variables in the data frame

For example, I have a toy dataset as the one I created below,
a1<-1:10
a2<-11:20
v<-c(1,2,1,NA,2,1,2,1,2,1)
data<-data.frame(a1,a2,v,stringsAsFactors = F)
Then I want to create a new variable y which will be assigned the value a1 or a2 or NA based on the value of variable v. Therefore, the 'y'
should equals to 1 12 3 NA 15 6 17 8 19 10.
I want to generate it with the command similar to the ones I list below, It doesn't work, I guess it's because of the vectorization issue, then how can I fix it?
In reality, I have several as, say 10 and the actual values are characters instead of numeric ones.
data$y[!is.na(data$v)]<-data[,paste0('a',data$v)]
or
data%>%
mutate(y=ifelse(!is.na(v),get(paste0('a',v)),NA))
You could use standard indexing with cbind for that:
dat$y <- dat[cbind(1:nrow(dat), dat$v)]
The result:
> dat
a1 a2 v y
1 1 11 1 1
2 2 12 2 12
3 3 13 1 3
4 4 14 NA NA
5 5 15 2 15
6 6 16 1 6
7 7 17 2 17
8 8 18 1 8
9 9 19 2 19
10 10 20 1 10
(I used dat instead of data, because it is not wise to call a dataframe the same as a function; see ?data)
Only idea that comes to my mind:
data%>%
mutate(y=ifelse(!is.na(v),paste0('a',v),NA)) %>%
mutate(z=ifelse(!is.na(y),(ifelse(y=="a1",get("a1"),get("a2"))),NA))
a1 a2 v y z
1 1 11 1 a1 1
2 2 12 2 a2 12
3 3 13 1 a1 3
4 4 14 NA <NA> NA
5 5 15 2 a2 15
6 6 16 1 a1 6
7 7 17 2 a2 17
8 8 18 1 a1 8
9 9 19 2 a2 19
10 10 20 1 a1 10
or more directly:
data%>%
mutate(y=ifelse(!is.na(v),(ifelse(v==1, get("a1"),get("a2"))),NA))
a1 a2 v y
1 1 11 1 1
2 2 12 2 12
3 3 13 1 3
4 4 14 NA NA
5 5 15 2 15
6 6 16 1 6
7 7 17 2 17
8 8 18 1 8
9 9 19 2 19
10 10 20 1 10
still based on ifelse :(
You need to use a matrix accessor:
# Get the indices of missing values
ind <- which(!is.na(data$v))
# Transform colnames to indices
tab <- structure(match(c("a1", "a2"), names(data)), .Names = c("a1", "a2"))
# Access data with a matrix accessor
data$y[ind] <- data[cbind(ind, tab[paste0('a', data$v[ind])])]

R Partial Reshape Data from Long to Wide

I like to reshape a dataset from long to wide. Specifically, the new wide dataset should consist of rows corresponding to the unique number of IDs in the long dataset, and the number of columns is a multiple of unique values of another variable.
Let's say this is the original dataset:
ID a b C d e f g
1 1 1 1 1 2 3 4
1 1 1 2 5 6 7 8
2 2 2 1 1 2 3 4
2 2 2 3 9 0 1 2
2 2 2 2 5 6 7 8
3 3 3 3 9 0 1 2
3 3 3 2 5 6 7 8
3 3 3 1 1 2 3 4
In the new dataset, the number of rows is the number of IDs, the number of columns is 3 plus the multiple of unique elements found in variable C and the values from variables d to g are populated after sorting variable C in ascending order. It should look something like this:
ID a b d1 e1 f1 g1 d2 e2 f2 g2 d3 e3 f3 g3
1 1 1 1 2 3 4 5 6 7 8 NA NA NA NA
2 2 2 1 2 3 4 5 6 7 8 9 0 1 2
3 3 3 1 2 3 4 5 6 7 8 9 0 1 2
You can use dcast from data.table:
data.table::setDT(df)
data.table::dcast(df, ID + a + b ~ C, sep = "", value.var = c("d", "e", "f", "g"), fill=NA)
ID a b d1 d2 d3 e1 e2 e3 f1 f2 f3 g1 g2 g3
1: 1 1 1 1 5 NA 2 6 NA 3 7 NA 4 8 NA
2: 2 2 2 1 5 9 2 6 0 3 7 1 4 8 2
3: 3 3 3 1 5 9 2 6 0 3 7 1 4 8 2
Base reshape version - just have to use C as your time variable and away you go.
reshape(dat, idvar=c("ID","a","b"), direction="wide", timevar="C", sep="")
# ID a b d1 e1 f1 g1 d2 e2 f2 g2 d3 e3 f3 g3
#1 1 1 1 1 2 3 4 5 6 7 8 NA NA NA NA
#3 2 2 2 1 2 3 4 5 6 7 8 9 0 1 2
#6 3 3 3 1 2 3 4 5 6 7 8 9 0 1 2

Resources