R: Combining columns under precondition - r

I want to have a column's values equal another column's values if the first column's value is NA in this row. So I want to change something like this
A B
3 NA
NA NA
NA NA
5 NA
NA NA
NA NA
7 5
to something like this
A B
3 3
NA NA
NA NA
5 5
NA NA
NA NA
7 5
I am fairly new to R and any other kind of programming.

As per OP's description:
equal another column's values if the first column's value is NA in
this row
Could you please try following and let me know if this helps you.
df21223$B[is.na(df21223$B[1])] <- df21223$A
Output will be as follows for data frame's B part:
> df21223$B
[1] 3 NA NA 5 NA NA 7
Where Sample data is:
> df21223$A
[1] 3 NA NA 5 NA NA 7
> df21223$B
[1] NA NA NA NA NA NA NA

try:
df$B[is.na(df$B)] <- df$A

Related

Stack non-NA values to the top of each column

I have a dataframe that contains values and NAs in columns. The dataframe looks like:
A B C D
1 NA NA NA
NA 2 3 NA
NA 4 NA NA
5 NA NA 6
I'm trying to transform this into a form that looks like:
A B C D
1 2 3 6
5 4 NA NA
NA NA NA NA
NA NA NA NA
by stacking the non-NA values to the top in each column. Is there a simple way to do this?
You can use lapply to order each column by NA values. Keep in mind this retains non-NA column order whereas x[order(x)] will just reorder the columns:
df1[] <- lapply(df1, function(x) x[order(is.na(x))])
df1
A B C D
1 1 2 3 6
2 5 4 NA NA
3 NA NA NA NA
4 NA NA NA NA
Data:
df1 <- read.table(header = T, text = "A B C D
1 NA NA NA
NA 2 3 NA
NA 4 NA NA
5 NA NA 6")
This should do the trick:
data<- data.frame(A=c(1, NA,NA,5),
B=c(NA,2,4,NA),
C=c(NA,3,NA,NA),
D=c(NA,NA,NA, 6))
apply(data,2,function(x)c(x[!is.na(x)], rep(NA,(length(x)-length(x[!is.na(x)])))) )
A B C D
[1,] 1 2 3 6
[2,] 5 4 NA NA
[3,] NA NA NA NA
[4,] NA NA NA NA
You can use data.table package for more flexibility -
> setDT(df1)
> df1[,(names(df1)) := lapply(.SD, function(x) x[order(is.na(x))]),.SDcols=names(df1)]
Note: You can select using which columns you want to order.
> df1[,(c("A","B")) := lapply(.SD, function(x) x[order(is.na(x))]),.SDcols=c("A","B")]

Create new variables based on list, then populate based on whether row contains variable name [duplicate]

This question already has answers here:
Add empty columns to a dataframe with specified names from a vector
(6 answers)
Closed 4 years ago.
I have some data:
df = data.frame(matrix(rnorm(20), nrow=10))
X1 X2
1 1.17596402 0.06138821
2 -1.76439330 1.03674803
3 -0.39069424 0.61616793
4 0.68375346 0.27435354
5 0.27426476 -1.71226109
6 -0.06153577 1.14514453
7 -0.37067621 -0.61243104
8 1.11107852 0.47788971
9 -1.73036658 0.31545148
10 -1.83155718 -0.14433432
I want to add new variables to it for every element in a list, which changes:
list = c("a","b","c")
The result should be:
X1 X2 a b c
1 1.17596402 0.06138821 NA NA NA
2 -1.76439330 1.03674803 NA NA NA
3 -0.39069424 0.61616793 NA NA NA
4 0.68375346 0.27435354 NA NA NA
5 0.27426476 -1.71226109 NA NA NA
6 -0.06153577 1.14514453 NA NA NA
7 -0.37067621 -0.61243104 NA NA NA
8 1.11107852 0.47788971 NA NA NA
9 -1.73036658 0.31545148 NA NA NA
10 -1.83155718 -0.14433432 NA NA NA
I can do this using suggestions below:
df[list] <- NA
But now, I want to search every row for the variable name as a value and flag if it contains that value. For example:
X1 X2 a b c
1 a b 1 1 0
2 a c 1 0 1
So the code would search for "a" in all columns and flag if any column contains "a". How do I do this?
You can use
df[list] <- NA
The result:
X1 X2 a b c
1 -2.07205164 -0.93585363 NA NA NA
2 1.11014587 0.23468072 NA NA NA
3 -1.17909665 0.04741478 NA NA NA
4 0.23955056 1.02029880 NA NA NA
5 -0.79212220 -1.13485661 NA NA NA
6 -0.57571547 0.33069641 NA NA NA
7 -0.70063920 -0.17251563 NA NA NA
8 1.90625189 0.30277177 NA NA NA
9 0.09029121 -0.72104778 NA NA NA
10 -1.36324313 -1.48041873 NA NA NA
If you want to add only the variables that are not present in df, you can use:
df[list[!list %in% names(df)]] <- NA

Set consequent non na values to NA

Set every non-NA value that has a non-NA value to "his left" to NA.
Data
a <- c(3,2,3,NA,NA,1,NA,NA,2,1,4,NA)
[1] 3 2 3 NA NA 1 NA NA 2 1 4 NA
Desired Output
[1] 3 NA NA NA NA 1 NA NA 2 NA NA NA
My working but ugly solution:
IND <- !(is.na(a)) & data.table::rleidv(!(is.na(a))) %>% duplicated
a[IND]<- NA
a
There's gotta be a better solution ...
Alternatively,
a[-1][diff(!is.na(a)) == 0] <- NA; a
# [1] 3 NA NA NA NA 1 NA NA 2 NA NA NA
OK for brevity...
a[!is.na(dplyr::lag(a))]<-NA
a
[1] 3 NA NA NA NA 1 NA NA 2 NA NA NA
You can do a simple ifelse statement where you add your vector with a lagged vector a. If the result is NA then the value should remain the same. Else, NA, i.e.
ifelse(is.na(a + dplyr::lag(a)), a, NA)
#[1] 3 NA NA NA NA 1 NA NA 2 NA NA NA

Create a new data frame

I have a data frame with only one column. Column contain some names. I need change this data frame.
I created a list with some places:
voos_inter <- c("PUJ","SCL","EZE","MVD","ASU","VVI")
How can i include on this data frame the number of column according the names of the list?
Is a vector your one column data frame? You can convert a vector to a data.frame and add columns. I use to add columns with NA and add values later. Check this example:
vtr <-c(1:6)
df <- as.data.frame(vtr)
voos_inter <- c("PUJ","SCL","EZE","MVD","ASU","VVI")
df[,2:(length(voos_inter)+1)] <- NA
names(df)[2:(length(voos_inter)+1)] <- voos_inter
df
vtr PUJ SCL EZE MVD ASU VVI
1 1 NA NA NA NA NA NA
2 2 NA NA NA NA NA NA
3 3 NA NA NA NA NA NA
4 4 NA NA NA NA NA NA
5 5 NA NA NA NA NA NA
6 6 NA NA NA NA NA NA

Converting each item of a character vector into column headers without looping

INPUT
specimens: character vector of 60 items: specimen1A, specimen1B, specimen2A ... specimen 30B.
DESIRED OUTPUT
A matrix or a dataframe in which each item in specimens is the name of a column in the matrix/dataframe.
The number of rows must be set to a fixed value (any).
The data for the cells will be filled with subsequent code so can be left as NA.
For example:
specimen1A specimen1B specimen2A ... specimen 30B
1 NA NA NA NA
2 NA NA NA NA
3 NA NA NA NA
.. .. .. .. ..
100 NA NA NA NA
Thanks
A data.frame is just a list with some added attributes. Just coerce it:
> specimens <- list(A=runif(10),B=runif(10))
>
> as.data.frame(specimens)
A B
1 0.6746436 0.7599987
2 0.2198677 0.5004017
3 0.4927745 0.9455003
4 0.8028011 0.8718274
5 0.6190707 0.7415874
6 0.5273992 0.8118802
7 0.6602548 0.4432799
8 0.5820781 0.8117375
9 0.8196531 0.5172833
10 0.0683938 0.0205693
Edit: Re-reading your problem, I suspect specimens is a character vector not really a list. If so:
N.rows <- 10
specimens <- c("A","B")
spec.dat <- as.data.frame(matrix(NA,nrow=N.rows,ncol=length(specimens)))
colnames(spec.dat) <- specimens
> spec.dat
A B
1 NA NA
2 NA NA
3 NA NA
4 NA NA
5 NA NA
6 NA NA
7 NA NA
8 NA NA
9 NA NA
10 NA NA

Resources