Data.frame copy and paste values based on a condition - r

I have a data frame with the following structure/values and would like to go through the data frame (by row) and paste the values from the first column ("One") into the cells of the other columns only if they are not NA:
My data:
One Two Three Four
1 Bar_2_Foo NA NA 1
2 Mur_4_Doo 1 NA 2
3 Bur_3_Hoo NA 1 NA
What I would like to achieve:
One Two Three Four
1 Bar_2_Foo NA NA Bar_2_Foo_1
2 Mur_4_Doo Mur_4_Doo_1 NA Mur_4_Doo_2
3 Bur_3_Hoo NA Bur_3_Hoo_1 NA
Any ideas how to achieve this would be great. Thanks.

Is this what you're looking for?
mutate_at(data, Two:Four, function(i){
ifelse(!is.na(i), paste0(One, "_", i), i) } )

Related

Move data from small data frame to columns in large dataframe with R [duplicate]

This question already has answers here:
Combine two data frames by rows (rbind) when they have different sets of columns
(14 answers)
Closed last year.
I have two data frames in R. There is not an ID of any sort in DF1 to use to map the rows to - I just need the entire column copied over for a data migration.
DF1 has 1349 named columns, and empty rows.
DF2 has 10 named columns and 2990 rows of sample data.
I made a small scale example:
DF1 <- data.frame(matrix(ncol = 10, nrow = 0))
colnames(DF1) <- c('one','two','three','four','five','six','seven','eight','nine','ten')
one <- c(1,54,7,3,6,3)
seven <- c('MLS','Marshall','AAE','JC','AAA','EXE')
DF2 <- data.frame(one,seven)
The column names are the same, but they are not blocked together in DF1 - they are randomly dispersed.
I want to find an efficient way of mapping the 10 columns and all of the rows from DF2 to DF1 without needing to type in each column name, as I will also need to do with with a much larger data frame later.
I expect the rest of the columns in DF1 to be blank/null other than the 'imported' columns from DF2 have been added -- this is okay. Is there an easy way to do this?
Thanks!
dplyr has a nice utility for this:
dplyr::bind_rows(DF1, DF2)
# one two three four five six seven eight nine ten
# 1 1 NA NA NA NA NA MLS NA NA NA
# 2 54 NA NA NA NA NA Marshall NA NA NA
# 3 7 NA NA NA NA NA AAE NA NA NA
# 4 3 NA NA NA NA NA JC NA NA NA
# 5 6 NA NA NA NA NA AAA NA NA NA
# 6 3 NA NA NA NA NA EXE NA NA NA

find the row with highest number of NA value in R

I have datafrom
df
1 a c NA NA
2 a a a NA
3 c NA NA NA
Firstly, I want to find which row has the highest number of NA value. I am also interested to find rows with the condition of having more than 2 NA values.
How can I do it in R?
na_rows = rowSums(is.na(df)) gives the count of NA by row. You can then look at which.max(na_rows) and which(na_rows > 2).

Shifting rows up in a particular column of data

I have a question about shifting of rows in the particular column of a data.
data <- data.frame(B=c(NA,NA,0,NA,NA,0),C=c(1,NA,NA,1,NA,NA))
B C
1 NA 1
2 NA NA
3 0 NA
4 NA 1
5 NA NA
6 0 NA
I tried from this post Shifting a column down by one
na.omit(transform(data, B = c(NA, B[-nrow(data)])))
but only get
B C
4 0 1
expected output;
B C
1 0 1
2 0 1
How can we achieve that ?
Thanks.
If you want to remove all NA from each column and do not care that the rows will not match between columns you can do:
data <- data.frame(B=c(NA,NA,0,NA,NA,0),C=c(1,NA,NA,1,NA,NA))
res<-lapply(data,function(x){x[complete.cases(x)]})
res<-data.frame(res)
the second line says: for every column in data keep only the values which are not NA
Thanks to #thelatemail for the correction from the solution below, which worked, but would have kept the columns as factors:
data <- data.frame(B=c(NA,NA,0,NA,NA,0),C=c(1,NA,NA,1,NA,NA))
res<-apply(data,2,function(x){x[complete.cases(x)]})

How can I find out the names of columns that satisfy a condition in a data frame

I wish to know (by name) which columns in my data frame satisfy a particular condition. For example, if I was looking for the names of any columns that contained more than 3 NA, how could I proceed?
>frame
m n o p
1 0 NA NA NA
2 0 2 2 2
3 0 NA NA NA
4 0 NA NA 1
5 0 NA NA NA
6 0 1 2 3
> for (i in frame){
na <- is.na(i)
as.numeric(na)
total<-sum(na)
if(total>3){
print (i) }}
[1] NA 2 NA NA NA 1
[2] NA 2 NA NA NA 2
So this actually succeeds in evaluating which columns satisfy the condition, however, it does not display the column name. Perhaps subsetting the columns which interest me would be another way to do it, but I'm not sure how to solve it that way either. Plus I'd prefer to know if there's a way to just get the names directly.
I'll appreciate any input.
We can use colSums on a logical matrix (is.na(frame)), check whether it is greater than 3 to get a logical vector and then subset the names of 'frame' based on that.
names(frame)[colSums(is.na(frame))>3]
#[1] "n" "o"
If we are using dplyr, one way is
library(dplyr)
frame %>%
summarise_each(funs(sum(is.na(.))>3)) %>%
unlist() %>%
names(.)[.]
#[1] "n" "o"

add multiple columns to matrix based on value in existing column

I am looking for a way to add 3 values in 3 different columns to a matrix based on the value in an existing column.
experiment = rbind(1,1,1,2,2,2,3,3,3)
newColumns = matrix(NA,dim(experiment)[1],3) # make 3 columns of length experiment filled with NA
experiment = cbind(experiment,newColumns) # add new columns to the experimental data
experiment = data.frame(experiment)
experiment[experiment[,1]==1,2:4] = cbind(0,1,2) # add 3 columns at once
experiment$new[experiment[,1]==2] = 5 # add a single column
print(experiment)
X1 X2 X3 X4 new
1 1 0 0 0 NA
2 1 1 1 1 NA
3 1 2 2 2 NA
4 2 NA NA NA 5
5 2 NA NA NA 5
6 2 NA NA NA 5
7 3 NA NA NA NA
8 3 NA NA NA NA
9 3 NA NA NA NA
this, however, fills the new columns the wrong way. I want column 2 to be all 0's, column 3 to be all 1's and column 4 to be all 3's.
I know I can do it 1 column at a time, but my real dataset is quit large so that isn't my preferred solution. I would like to be able to easily add more columns just by making the range of columns larger and adding values to the 3 values in the example
Instead of this:
experiment[experiment[,1]==1,2:4] = cbind(0,1,2) # add 3 columns at once
Try this:
experiment[experiment[,1] == 1, 2:4] <- rep(c(0:2), each=3)
The problem is that you've provided 3 values (0,1,2) to fill 9 entries. The values are by default filled column-wise. So, the first column is filled with 0, 1, 2 and then the values get recycled. So, it goes again 0,1,2 and 0,1,2. Since you want 0,0,0,1,1,1,2,2,2, you should explicitly generate using rep(0:2, each=3) (the each does the task of generating the data shown just above).

Resources