How to select consecutive columns with across function dplyr [duplicate] - r

This question already has answers here:
How to replace all NA in a dataframe using tidyr::replace_na? [duplicate]
(3 answers)
dplyr mutate rowwise max of range of columns
(8 answers)
Closed 2 years ago.
I wanted to use the new across function from dplyr to select consecutive columns and to change the NA in zeros. However, it does not work. It seems like a very simple thing so it could be that I miss something.
A working example:
> m <- matrix(sample(c(NA, 1:10), 100, replace = TRUE), 10)
> d <- as.data.frame(m)
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 4 3 NA 3 7 6 6 10 6 5
2 9 8 9 5 10 NA 2 1 7 2
3 1 1 6 3 6 NA 1 4 1 6
4 NA 4 NA 7 10 2 NA 4 1 8
5 1 2 4 NA 2 6 2 6 7 4
6 NA 3 NA NA 10 2 1 10 8 4
7 4 4 9 10 9 8 9 4 10 NA
8 5 8 3 2 1 4 5 9 4 7
9 3 9 10 1 9 9 10 5 3 3
10 4 2 2 5 NA 9 7 2 5 5
This works fine:
mutate_at(vars(V1:V4), ~replace(., is.na(.), 0))
But if try these options I get an error:
d %>% mutate(across(vars(V1:V4)), ~replace(., is.na(.), 0))
d %>% mutate(across(V1:V4)), ~replace(., is.na(.), 0))
d %>% mutate(across("V1":"V4")), ~replace(., is.na(.), 0))
I am not sure why this doesn't work

In across(), there are two basic arguments. The first argument are the columns that are to be modified, while the second argument is the function which should be applied to the columns. In addition, vars() is no longer needed to select the variables. Thus, the correct form is:
d %>%
mutate(across(V1:V4, ~ replace(., is.na(.), 0)))
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 2 6 0 6 5 6 10 5 3 1
2 2 9 2 4 10 6 9 4 NA NA
3 5 5 3 0 3 7 1 5 9 5
4 7 1 1 6 2 1 8 NA 8 4
5 3 5 3 0 2 3 4 2 3 NA
6 0 10 0 2 5 10 1 10 4 3
7 4 3 10 6 NA 5 9 3 3 9
8 9 9 8 5 8 1 3 1 NA 10
9 6 3 0 1 1 9 3 5 8 4
10 3 2 9 1 5 2 4 NA 6 1

Related

Shift the value of the variable in R "A" instead of NA [duplicate]

This question already has answers here:
Replace a value NA with the value from another column in R
(5 answers)
Closed 3 months ago.
I need to put the value of variable "A" in place of the NA of variable "B".
Example of my dataframe:
> df <- data.frame(A = seq(1, 10), B = c(1, NA, 3, 4, NA, NA, 7, 8, NA, NA))
> df
A B
1 1 1
2 2 NA
3 3 3
4 4 4
5 5 NA
6 6 NA
7 7 7
8 8 8
9 9 NA
10 10 NA
I want the above dataframe converted into this:
> df
A B
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
10 10 10
Using R base indexing
> df$B[is.na(df$B)] <- df$A[is.na(df$B)]
> df
A B
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
10 10 10
Use coalesce
library(dplyr)
df <- df %>%
mutate(B = coalesce(B, A))
-output
df
A B
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
10 10 10
I prefer coalesce. Here is one with an ifelse:
library(dplyr)
df %>%
mutate(B = ifelse(is.na(B), A, B))
A B
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
10 10 10

Finding equal rows between dataframes in R

I have the following data set as example:
df1 <- data.frame(V1 = 1:10, V2 = 1:10, V3 = 1:10)
df2 <- data.frame(V1 = 5:1, V2 = 5:1, v3 = c(1, 4, 5, 2, 3))
If a row in df1 are present in df2, I would create a column in df1 that indicates the corresponding row to the df2 and for other rows showed FALSE or NULL or NA or 0 or ...
output expected:
V1 V2 V3 rows_matched
1 1 1 1 FALSE
2 2 2 2 4
3 3 3 3 FALSE
4 4 4 4 2
5 5 5 5 FALSE
6 6 6 6 FALSE
7 7 7 7 FALSE
8 8 8 8 FALSE
9 9 9 9 FALSE
10 10 10 10 FALSE
in Base R:
cbind(df1, matched = match(interaction(df1), interaction(df2)))
V1 V2 V3 matched
1 1 1 1 NA
2 2 2 2 4
3 3 3 3 NA
4 4 4 4 2
5 5 5 5 NA
6 6 6 6 NA
7 7 7 7 NA
8 8 8 8 NA
9 9 9 9 NA
10 10 10 10 NA
You can do a simple left join. Note: I fixed the column name in df2 from v3 to V3 to match the names of df1
left_join(
df1,
df2 %>% mutate(rows_matched=row_number())
)
Output:
V1 V2 V3 rows_matched
1 1 1 1 NA
2 2 2 2 4
3 3 3 3 NA
4 4 4 4 2
5 5 5 5 NA
6 6 6 6 NA
7 7 7 7 NA
8 8 8 8 NA
9 9 9 9 NA
10 10 10 10 NA
Here is another way of solving your problem using data.table
library(data.table)
setDT(df1)
setDT(df2)
df1[, rows_matched := df2[df1, on=.(V1,V2,V3), which=TRUE]]
#
# V1 V2 V3 rows_matched
# 1: 1 1 1 NA
# 2: 2 2 2 4
# 3: 3 3 3 NA
# 4: 4 4 4 2
# 5: 5 5 5 NA
# 6: 6 6 6 NA
# 7: 7 7 7 NA
# 8: 8 8 8 NA
# 9: 9 9 9 NA
# 10: 10 10 10 NA
Another possible solution, based on dplyr::left_join (we have to previously capitalize V3 in df2):
library(dplyr)
df1 %>%
left_join(df2 %>% mutate(new = row_number()))
#> Joining, by = c("V1", "V2", "V3")
#> V1 V2 V3 new
#> 1 1 1 1 NA
#> 2 2 2 2 4
#> 3 3 3 3 NA
#> 4 4 4 4 2
#> 5 5 5 5 NA
#> 6 6 6 6 NA
#> 7 7 7 7 NA
#> 8 8 8 8 NA
#> 9 9 9 9 NA
#> 10 10 10 10 NA

is there a way to use a column to label my variables in R [duplicate]

This question already has answers here:
R: Assign variable labels of data frame columns
(4 answers)
Closed 2 years ago.
I am trying to assign variable labels into my columns in R. I was able to create values list using the following code:
var.labels= dataframe$var name
now I want to use this list as variable labels for my columns in the data frame. I tried this code but it did not work:
label(dataframe) = as.list(var.labels[match(names(dataframe), names(var.labels))]
Thank you for your help.
Maybe this?
var.labels <- dataframe$var name
colnames(dataframe) <- var.labels
Example:
df <- as.data.frame(replicate(n = 13, expr = sample(c(1:8, NA), 13, replace = TRUE)))
df$names <- LETTERS[1:13]
df
#> V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 names
#>1 1 NA 5 3 2 5 7 NA 3 NA NA NA 1 A
#>2 7 5 1 6 4 3 1 6 4 3 NA 3 NA B
#>3 7 3 1 2 7 2 6 4 6 1 7 3 2 C
#>4 3 1 5 1 5 6 1 2 2 NA 8 5 1 D
#>5 3 6 3 7 4 2 6 7 7 NA 1 2 8 E
#>6 NA 4 4 1 8 3 8 NA 6 3 8 4 NA F
#>7 1 8 3 8 1 3 2 4 7 4 2 1 2 G
#>8 NA NA 2 3 5 4 5 1 4 7 8 5 3 H
#>9 7 NA 3 2 7 NA 2 8 7 NA 6 8 6 I
#>10 6 8 5 3 6 5 5 3 4 8 NA 5 1 J
#>11 1 7 8 5 1 2 3 NA NA 3 2 6 7 K
#>12 8 4 1 8 7 NA 6 6 5 6 7 NA 2 L
#>13 5 8 5 1 2 1 6 3 NA 1 7 3 5 M
colnames(df) <- df$names
#> A B C D E F G H I J K L M NA
#>1 1 NA 5 3 2 5 7 NA 3 NA NA NA 1 A
#>2 7 5 1 6 4 3 1 6 4 3 NA 3 NA B
#>3 7 3 1 2 7 2 6 4 6 1 7 3 2 C
#>4 3 1 5 1 5 6 1 2 2 NA 8 5 1 D
#>5 3 6 3 7 4 2 6 7 7 NA 1 2 8 E
#>6 NA 4 4 1 8 3 8 NA 6 3 8 4 NA F
#>7 1 8 3 8 1 3 2 4 7 4 2 1 2 G
#>8 NA NA 2 3 5 4 5 1 4 7 8 5 3 H
#>9 7 NA 3 2 7 NA 2 8 7 NA 6 8 6 I
#>10 6 8 5 3 6 5 5 3 4 8 NA 5 1 J
#>11 1 7 8 5 1 2 3 NA NA 3 2 6 7 K
#>12 8 4 1 8 7 NA 6 6 5 6 7 NA 2 L
#>13 5 8 5 1 2 1 6 3 NA 1 7 3 5 M
# Finally, remove the names column
df[14] <- NULL

extracting the nth column of each row of a DT in R where n is a vector of the number of rows in the DT

Seems silly but a simple extraction from a DT is giving me problems.
Consider a toy example:
Create a test data.table with 5 columns:
library(data.table)
dt <- fread("
V1 V2 V3 V4 V5
1 10 7 4 3
2 11 8 5 2
3 12 9 6 1
4 1 10 7 4
5 2 11 8 4
6 3 12 9 3
7 4 1 10 3
8 5 2 11 1
9 6 3 12 2")
Now I want to add a 6th column V6 that contains the value of the column with column number in V5, for each row. So the final output I need is a data.table that transforms dt to below:
V1 V2 V3 V4 V5 V6
1: 1 10 7 4 3 7
2: 2 11 8 5 2 11
3: 3 12 9 6 1 3
4: 4 1 10 7 4 7
5: 5 2 11 8 4 8
6: 6 3 12 9 3 12
7: 7 4 1 10 3 1
8: 8 5 2 11 1 8
9: 9 6 3 12 2 6
With data.table, we can loop through the rows, subset the .SD based on the column index in 'V5' and assign (:= it to create 'V6'
dt2[, V6 := .SD[[V5]], by = 1:nrow(dt2)]
dt2
# V1 V2 V3 V4 V5 V6
#1: 1 10 7 4 3 7
#2: 2 11 8 5 2 11
#3: 3 12 9 6 1 3
#4: 4 1 10 7 4 7
#5: 5 2 11 8 4 8
#6: 6 3 12 9 3 12
#7: 7 4 1 10 3 1
#8: 8 5 2 11 1 8
#9: 9 6 3 12 2 6
In base R, we use row/column indexing
setDF(dt2)
dt2$V6 <- dt2[cbind(seq_len(nrow(dt2)), dt2$V5)]

R, Using reshape to pull pre post data

I have a simple data frame as follows
x = data.frame(id = seq(1,10),val = seq(1,10))
x
id val
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
I want to add 4 more columns. The first 2 are the previous two rows and the next two are the next two rows. For the first two rows and last two rows it needs to write out as NA.
How do I accomplish this using cast in the reshape package?
The final output would look like
1 1 NA NA 2 3
2 2 NA 1 3 4
3 3 1 2 4 5
4 4 2 3 5 6
... and so on...
Thanks much in advance
After your give the example , I change the solution
mat <- cbind(dat,
c(c(NA,NA),head(dat$id,-2)),
c(c(NA),head(dat$val,-1)),
c(tail(dat$id,-1),c(NA)),
c(tail(dat$val,-2),c(NA,NA)))
colnames(mat) <- c('id','val','idp','valp','idn','valn')
id val idp valp idn valn
1 1 1 NA NA 2 3
2 2 2 NA 1 3 4
3 3 3 1 2 4 5
4 4 4 2 3 5 6
5 5 5 3 4 6 7
6 6 6 4 5 7 8
7 7 7 5 6 8 9
8 8 8 6 7 9 10
9 9 9 7 8 10 NA
10 10 10 8 9 NA NA
Here is a soluting with sapply. First, choose the relative change for the new columns:
lags <- c(-2, -1, 1, 2)
Create the new columns:
newcols <- sapply(lags,
function(l) {
tmp <- seq.int(nrow(x)) + l;
x[replace(tmp, tmp < 1 | tmp > nrow(x), NA), "val"]})
Bind together:
cbind(x, newcols)
The result:
id val 1 2 3 4
1 1 1 NA NA 2 3
2 2 2 NA 1 3 4
3 3 3 1 2 4 5
4 4 4 2 3 5 6
5 5 5 3 4 6 7
6 6 6 4 5 7 8
7 7 7 5 6 8 9
8 8 8 6 7 9 10
9 9 9 7 8 10 NA
10 10 10 8 9 NA NA

Resources