add multiple columns to matrix based on value in existing column - r

I am looking for a way to add 3 values in 3 different columns to a matrix based on the value in an existing column.
experiment = rbind(1,1,1,2,2,2,3,3,3)
newColumns = matrix(NA,dim(experiment)[1],3) # make 3 columns of length experiment filled with NA
experiment = cbind(experiment,newColumns) # add new columns to the experimental data
experiment = data.frame(experiment)
experiment[experiment[,1]==1,2:4] = cbind(0,1,2) # add 3 columns at once
experiment$new[experiment[,1]==2] = 5 # add a single column
print(experiment)
X1 X2 X3 X4 new
1 1 0 0 0 NA
2 1 1 1 1 NA
3 1 2 2 2 NA
4 2 NA NA NA 5
5 2 NA NA NA 5
6 2 NA NA NA 5
7 3 NA NA NA NA
8 3 NA NA NA NA
9 3 NA NA NA NA
this, however, fills the new columns the wrong way. I want column 2 to be all 0's, column 3 to be all 1's and column 4 to be all 3's.
I know I can do it 1 column at a time, but my real dataset is quit large so that isn't my preferred solution. I would like to be able to easily add more columns just by making the range of columns larger and adding values to the 3 values in the example

Instead of this:
experiment[experiment[,1]==1,2:4] = cbind(0,1,2) # add 3 columns at once
Try this:
experiment[experiment[,1] == 1, 2:4] <- rep(c(0:2), each=3)
The problem is that you've provided 3 values (0,1,2) to fill 9 entries. The values are by default filled column-wise. So, the first column is filled with 0, 1, 2 and then the values get recycled. So, it goes again 0,1,2 and 0,1,2. Since you want 0,0,0,1,1,1,2,2,2, you should explicitly generate using rep(0:2, each=3) (the each does the task of generating the data shown just above).

Related

create new column (with outcome min or NA) from multiple selected columns

My data has many columns and subjects, but to illustrate it simpler, lets say I have 7 subjects with 3 variables/columns called x1, x2 and x3 (values range from 1 to 3 and NAs). In the analysis that I want it is important I actually call the columns I want to use (since I cannot just use the whole dataframe in my analysis because there are more variables/columns there)
>data <- data.frame(‘id’=c(1,2,3,4,5,6,7), ‘x1’=c(1,2,2,NA,3,3,1), ‘x2’=c(NA,3,1,NA,2,3,2), ‘x3’=c(NA,2,NA,NA,3,NA,1)
id x1 x2 x3
1 1 NA NA
2 2 3 2
3 2 1 NA
4 NA NA NA
5 3 2 NA
6 3 3 NA
7 1 2 1
The class of x1 x2 and x3 are numeric.
Out of that, I want to create a variable/column called ‘x4’ that:
- gives me the lowest number of row x1, x2 and x3.
-If there is an NA in a row of x1,x2,x3, the NA shall be ignored.
-If they are however ALL NAs, I would want the outcome to be NA. (NOT Inf, which is what it does with my code now)
-If there are two lowest numbers that are the same, just display any one of those two. So like this:
>data <- data.frame(‘id’=c(1,2,3,4,5,6,7), ‘x1’=c(1,2,2,NA,3,3,1), ‘x2’=c(NA,3,1,NA,2,3,2), ‘x3’=c(NA,2,NA,NA,3,NA,1), ‘x4’=c(1,2,1,NA,2,3,1)
id x1 x2 x3 x4
1 1 NA NA 1
2 2 3 2 2
3 2 1 NA 1
4 NA NA NA NA
5 3 2 NA 2
6 3 3 NA 3
7 1 2 1 1
I managed to find a very similar question, and I can mostly make it work: min for each row with dataframe in R
data$x4 <- apply(data[, c("x1","x2","x3")],1, FUN=min, na.rm = TRUE)
the problem I have now is that in case of all NAs (so id number 4), my outcome is not NA, but it is 'Inf'.
Question 1:How can I make it so it becomes an NA instead of Inf? I can of course do that afterwards like this:
is.na(data$x4) <- sapply(data$x4, is.infinite)
But I wonder if there is a nice way to do that already with/inside the previous code?
Also, rather then using sapply and the inside FUNction min, I would also like to try to make it work with code in a way like below: Question 2: is using this other code below possible?
data$x4 <- min(data[, c("x1","x2","x3")],1 , na.rm = TRUE)
for this x4 gets the outcome '1' everytime. I guess it just shows the lowest number (1) of the whole column? I dont understand why. I am already using ',1' but doesnt help.
I hope somebody can help me(r and stackoverflow newbie) out, thanks!
You are looking for pmin function which returns the (regular or parallel) minima of the input values. Below are two approaches using pmin:
df$minIget <- do.call(pmin, c(df[,-1], na.rm = TRUE)) # Approch1: using do.call
df %>% rowwise() %>% mutate(minIget = pmin(x1, x2,x3,na.rm = T))# Approch2: using tidyverse.
output:
A tibble: 7 x 5
# Rowwise:
id x1 x2 x3 minIget
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 NA NA 1
2 2 2 3 2 2
3 3 2 1 NA 1
4 4 NA NA NA NA
5 5 3 2 3 2
6 6 3 3 NA 3
7 7 1 2 1 1
You can test if all are NA before you call min like:
apply(data[, c("x1","x2","x3")], 1, function(x)
if(all(is.na(x))) NA else min(x, na.rm=TRUE))
#[1] 1 2 1 NA 2 3 1
min(data[, c("x1","x2","x3")],1 , na.rm = TRUE) gives you the minimum of 1 and data[, c("x1","x2","x3")].

copy values from different columns based on conditions (r code)

I have data like one in the picture where there are two columns (Cday,Dday) with some missing values.
There can't be a row where there are values for both columns; there's a value on either one column or the other or in neither.
I want to create the column "new" that has copied values from whichever column there was a number.
Really appreciate any help!
Since no row has a value for both, you can just sum up the two existing columns. Assume your dataframe is called df.
df$'new' = rowSums(df[,2:3], na.rm=T)
This will sum the rows, removing NAs and should give you what you want. (Note: you may need to adjust column numbering if you have more columns than what you've shown).
The dplyr package has the coalesce function.
library(dplyr)
df <- data.frame(id=1:8, Cday=c(1,2,NA,NA,3,NA,2,NA), Dday=c(NA,NA,NA,3,NA,2,NA,1))
new <- df %>% mutate(new = coalesce(Dday, Cday, na.rm=T))
new
# id Cday Dday new
#1 1 1 NA 1
#2 2 2 NA 2
#3 3 NA NA NA
#4 4 NA 3 3
#5 5 3 NA 3
#6 6 NA 2 2
#7 7 2 NA 2
#8 8 NA 1 1

Data.frame copy and paste values based on a condition

I have a data frame with the following structure/values and would like to go through the data frame (by row) and paste the values from the first column ("One") into the cells of the other columns only if they are not NA:
My data:
One Two Three Four
1 Bar_2_Foo NA NA 1
2 Mur_4_Doo 1 NA 2
3 Bur_3_Hoo NA 1 NA
What I would like to achieve:
One Two Three Four
1 Bar_2_Foo NA NA Bar_2_Foo_1
2 Mur_4_Doo Mur_4_Doo_1 NA Mur_4_Doo_2
3 Bur_3_Hoo NA Bur_3_Hoo_1 NA
Any ideas how to achieve this would be great. Thanks.
Is this what you're looking for?
mutate_at(data, Two:Four, function(i){
ifelse(!is.na(i), paste0(One, "_", i), i) } )

Shifting rows up in a particular column of data

I have a question about shifting of rows in the particular column of a data.
data <- data.frame(B=c(NA,NA,0,NA,NA,0),C=c(1,NA,NA,1,NA,NA))
B C
1 NA 1
2 NA NA
3 0 NA
4 NA 1
5 NA NA
6 0 NA
I tried from this post Shifting a column down by one
na.omit(transform(data, B = c(NA, B[-nrow(data)])))
but only get
B C
4 0 1
expected output;
B C
1 0 1
2 0 1
How can we achieve that ?
Thanks.
If you want to remove all NA from each column and do not care that the rows will not match between columns you can do:
data <- data.frame(B=c(NA,NA,0,NA,NA,0),C=c(1,NA,NA,1,NA,NA))
res<-lapply(data,function(x){x[complete.cases(x)]})
res<-data.frame(res)
the second line says: for every column in data keep only the values which are not NA
Thanks to #thelatemail for the correction from the solution below, which worked, but would have kept the columns as factors:
data <- data.frame(B=c(NA,NA,0,NA,NA,0),C=c(1,NA,NA,1,NA,NA))
res<-apply(data,2,function(x){x[complete.cases(x)]})

Conditionals calculations across rows R

First, I'm brand new to R and am making the switch from SAS. I have a dataset that is 1000 rows by 24 columns, where the columns are different treatments. I want to count the number of times an observation meets a criteria across rows of my dataset listed below.
Gene A B C D
1 AARS_3 NA NA 4.168365 NA
2 AASDHPPT_21936 NA NA NA -3.221287
3 AATF_26432 NA NA NA NA
4 ABCC2_22 4.501518 3.17992 NA NA
5 ABCC2_26620 NA NA NA NA
I was trying to create column vectors that counted
1) Number of NAs
2) Number of columns <0
3) Number of columns >0
I would then use cbind to add these to my large dataset
I solved the first one with :
NA.Count <- (apply(b01,MARGIN=1,FUN=function(x) length(x[is.na(x)])))
I tried to modify this to count evaluate the !is.na and then count the number of times the value was less than zero with this:
lt0 <- (apply(b01,MARGIN=1,FUN=function(x) ifelse(x[!is.na(x)],count(x[x<0]))))
which didn't work at all.
I tried a dozen ways to get dplyr mutate to work with this and did not succeed.
What I want are the last two columns below; and if you had a cleaner version of the NA.Count I did, that would also be greatly appreciated.
Gene A B C D NA.Count lt0 gt0
1 AARS_3 NA NA 4.168365 NA 3 0 1
2 AASDHPPT_21936 NA NA NA -3.221287 3 1 0
3 AATF_26432 NA NA NA NA 4 0 0
4 ABCC2_22 4.501518 3.17992 NA NA 2 0 2
5 ABCC2_26620 NA NA NA NA 4 0 0
Here is one way to do it taking advantage of the fact that TRUE equals 1 in R.
# test data frame
lil_df <- data.frame(Gene = c("AAR3", "ABCDE"),
A = c(NA, 3),
B = c(2, NA),
C = c(-1, -2),
D = c(NA, NA))
# is.na
NA.count <- rowSums(is.na(lil_df[,-1]))
# less than zero
lt0 <- rowSums(lil_df[,-1]<0, na.rm = TRUE)
# more that zero
mt0 <- rowSums(lil_df[,-1]>0, na.rm = TRUE)
# cbind to data frame
larger_df <- cbind(lil_df, NA.count, lt0, mt0 )
larger_df
Gene A B C D NA.count lt0 mt0
1 AAR3 NA 2 -1 NA 2 1 1
2 ABCDE 3 NA -2 NA 2 1 1

Resources