I have a dataset that looks like this:
Col1 Col2 Col3 Col4 Col5
A B 4 5 7
G H 5 6 NA
H I NA 9 8
K F 9 NA NA
E L NA 8 9
H I 1 0 10
How do I apply the na.fill() function to all the columns after Col2?
If I were to do it individually, it would be something like this:
df$Col3<-na.fill(df$Col3, c(NA, "extend", NA))
df$Col4<-na.fill(df$Col4, c(NA, "extend", NA))
df$Col5<-na.fill(df$Col5, c(NA, "extend", NA))
The problem is that my actual dataframe has over 100 columns. Is there a quick way to apply this function to all the columns after the first 2?
na.fill does handle multiple columns. Really no need to use lapply, mutate, etc. Just replace the relevant columns with the result of running na.fill on those same columns. If you know what ix is then you could replace the first line with it so that in this example we could alternately use ix <- 3:5 or ix <- -(1:2) .
ix <- sapply(DF, is.numeric)
replace(DF, ix, na.fill(DF[ix], c(NA, "extend", NA)))
giving:
Col1 Col2 Col3 Col4 Col5
1 A B 4 5.0 7.0
2 G H 5 6.0 7.5
3 H I 7 9.0 8.0
4 K F 9 8.5 8.5
5 E L 5 8.0 9.0
6 H I 1 0.0 10.0
Note that you could alternately use na.approx:
replace(DF, ix, na.approx(DF[ix], na.rm = FALSE))
Note
Lines <- "Col1 Col2 Col3 Col4 Col5
A B 4 5 7
G H 5 6 NA
H I NA 9 8
K F 9 NA NA
E L NA 8 9
H I 1 0 10"
DF <- read.table(text = Lines, header = TRUE, as.is = TRUE, strip.white = TRUE)
The mutate_-family of functions in the dplyr package would do the trick.
There are a few ways to do this. Some may work better than others depending on what your other columns look like. Here are three versions that would work better in different circumstances.
# Make dummy data.
df <- data.frame(
Col1 = LETTERS[1:6],
Col2 = LETTERS[7:12],
Col3 = c(4, 5, NA, 9, NA, 1),
Col4 = c(5,6,9,NA,8,0),
Col5 = c(7,NA,8,NA,9,10)
)
You can apply the na.fill function to columns specified by name vector. This is useful if you want to use a regular expression to select columns with certain name parts.
cn <- names(df) %>%
str_subset("[345]") # Column names with 3, 4 or 5 in them.
result_1 <- df %>%
mutate_at(vars(cn),
zoo::na.fill, c(NA, 'extend', NA)
)
You can apply the na.fill function to any numeric column.
result_2 <- df %>%
mutate_if(is.numeric, # First argument is function that returns a logical vector.
zoo::na.fill, c(NA, 'extend', NA)
)
You can apply the function to columns specified in an numeric index vector.
result_3 <- df
result_3[ , 3:5] <- result_3[ , 3:5] %>% # Just replace columns 3 through 5
mutate_all(
zoo::na.fill, c(NA, 'extend', NA)
)
In this case, all three versions should have done the same thing.
all.equal(result_1, result_2) # TRUE
all.equal(result_1, result_3) # TRUE
Related
I want to paste together multiple columns but ignore NAs.
Here's a basic working example of what the df looks like and what I'd like it to look like. Does anyone have any tips?
df <- data.frame("col1" = c("A", NA, "B", "C"),
"col2" = c(NA, NA, NA, "E"),
"col3" = c(NA, "D", NA, NA),
"col4" = c(NA, NA, NA, NA))
df_fixed <- data.frame("col" = c("A", "D", "B", "C,E"))
Using paste.
data.frame(col1=sapply(apply(df, 1, \(x) x[!is.na(x)]), paste, collapse=','))
# col1
# 1 A
# 2 D
# 3 B
# 4 C,E
Or without apply:
data.frame(col1=unname(as.list(as.data.frame(t(df))) |>
(\(x) sapply(x, \(x) paste(x[!is.na(x)], collapse=',')))()))
# col1
# 1 A
# 2 D
# 3 B
# 4 C,E
To add as a column use transform.
transform(df, colX=sapply(apply(df, 1, \(x) x[!is.na(x)]), paste, collapse=','))
# col1 col2 col3 col4 colX
# 1 A <NA> <NA> NA A
# 2 <NA> <NA> D NA D
# 3 B <NA> <NA> NA B
# 4 C E <NA> NA C,E
Note: Actually, you also could replace \(x) x[!is.na(x)] by na.omit, since it's attributes vanish; see e.g. # G. Grothendieck's answer.
A possible base R solution:
df2 <- data.frame(col=apply(df,1, function(x) paste0(na.omit(x), collapse = ",")))
df2
#> col
#> 1 A
#> 2 D
#> 3 B
#> 4 C,E
Use na.omit and toString. No packages are used.
data.frame(col = apply(df, 1, function(x) toString(na.omit(x)))
## col
## 1 A
## 2 D
## 3 B
## 4 C, E
Use one of these instead of the anonymous function shown if spaces in the output are a problem:
function(x) paste(na.omit(x), collapse = ",")
function(x) gsub(", ", ",", toString(na.omit(x)))
We may use unite which can have na.rm as argument
library(tidyr)
library(dplyr)
df %>%
unite(col, everything(), na.rm = TRUE, sep=",")
-output
col
1 A
2 D
3 B
4 C,E
Or using base R with do.call and trimws
data.frame(col = trimws(do.call(paste, c(df, sep = ",")),
whitespace = "(?:,?NA,?)+"))
-output
col
1 A
2 D
3 B
4 C,E
I have a df where one variable is an integer. I'd like to split this column into it's individual digits. See my example below
Group Number
A 456
B 3
C 18
To
Group Number Digit1 Digit2 Digit3
A 456 4 5 6
B 3 3 NA NA
C 18 1 8 NA
We can use read.fwf from base R. Find the max number of character (nchar) in 'Number' column (mx). Read the 'Number' column after converting to character (as.character), specify the 'widths' as 1 by replicating 1 with mx and assign the output to new 'Digit' columns in the data
mx <- max(nchar(df1$Number))
df1[paste0("Digit", seq_len(mx))] <- read.fwf(textConnection(
as.character(df1$Number)), widths = rep(1, mx))
-output
df1
# Group Number Digit1 Digit2 Digit3
#1 A 456 4 5 6
#2 B 3 3 NA NA
#3 C 18 1 8 NA
data
df1 <- structure(list(Group = c("A", "B", "C"), Number = c(456L, 3L,
18L)), class = "data.frame", row.names = c(NA, -3L))
Another base R option (I think #akrun's approach using read.fwf is much simpler)
cbind(
df,
with(
df,
type.convert(
`colnames<-`(do.call(
rbind,
lapply(
strsplit(as.character(Number), ""),
`length<-`, max(nchar(Number))
)
), paste0("Digit", seq(max(nchar(Number))))),
as.is = TRUE
)
)
)
which gives
Group Number Digit1 Digit2 Digit3
1 A 456 4 5 6
2 B 3 3 NA NA
3 C 18 1 8 NA
Using splitstackshape::cSplit
splitstackshape::cSplit(df, 'Number', sep = '', stripWhite = FALSE, drop = FALSE)
# Group Number Number_1 Number_2 Number_3
#1: A 456 4 5 6
#2: B 3 3 NA NA
#3: C 18 1 8 NA
Updated
I realized I could use max function for counting characters limit in each row so that I could include it in my map2 function and save some lines of codes thanks to an accident that led to an inspiration by dear #ThomasIsCoding.
library(dplyr)
library(tidyr)
library(purrr)
library(stringr)
df %>%
rowwise() %>%
mutate(map2_dfc(Number, 1:max(nchar(Number)), ~ str_sub(.x, .y, .y))) %>%
unnest(cols = !c(Group, Number)) %>%
rename_with(~ str_replace(., "\\.\\.\\.", "Digit"), .cols = !c(Group, Number)) %>%
mutate(across(!c(Group, Number), as.numeric, na.rm = TRUE))
# A tibble: 3 x 5
Group Number Digit1 Digit2 Digit3
<chr> <dbl> <dbl> <dbl> <dbl>
1 A 456 4 5 6
2 B 3 3 NA NA
3 C 18 1 8 NA
Data
df <- tribble(
~Group, ~Number,
"A", 456,
"B", 3,
"C", 18
)
Two base r methods:
no_cols <- max(nchar(as.character(df1$Number)))
# Using `strsplit()`:
cbind(df1, setNames(data.frame(do.call(rbind,
lapply(strsplit(as.character(df1$Number), ""),
function(x) {
length(x) <- no_cols
x
}
)
)
), paste0("Digit", seq_len(no_cols))))
# Using `regmatches()` and `gregexpr()`:
cbind(df1, setNames(data.frame(do.call(rbind,
lapply(regmatches(df1$Number, gregexpr("\\d", df1$Number)),
function(x) {
length(x) <- no_cols
x
}
)
)
), paste0("Digit", seq_len(no_cols))))
I have a list of dataframes and I want to apply a custom function to it using lapply
Here is my function:
rename_cols_pattern <- function (df, pattern, replacement = "") {
names(df) <- gsub(names(df), pattern = pattern, replacement = replacement)
}
How do I use this function with lapply? This does not work because the df variable is missing. How do I pass in the df variable which would be the dataframes in the di_data list
di_data <- lapply(di_data, rename_cols_pattern(pattern = "X"))
I can get this to work like so:
di_data <- lapply(di_data, function(x) {
names(x) <- gsub(names(x), pattern = "X", replacement = "")
x
})
However I want the function to be separate and want to understand how to achieve this
You probably missed the return statement of your function.
rename_cols_pattern <- function(df, pattern, replacement="") {
names(df) <- gsub(names(df), pattern=pattern, replacement=replacement)
return(df)
}
Normal usage:
rename_cols_pattern(dat, pattern="X", replacement="COL")
# COL1 COL2 COL3 COL4
# 1 1 4 7 10
# 2 2 5 8 11
# 3 3 6 9 12
Using lapply:
lapply(list(dat, dat), rename_cols_pattern, pattern="X", replacement="COL")
# [[1]]
# COL1 COL2 COL3 COL4
# 1 1 4 7 10
# 2 2 5 8 11
# 3 3 6 9 12
#
# [[2]]
# COL1 COL2 COL3 COL4
# 1 1 4 7 10
# 2 2 5 8 11
# 3 3 6 9 12
Data:
dat <- structure(list(X1 = 1:3, X2 = 4:6, X3 = 7:9, X4 = 10:12), class = "data.frame", row.names = c(NA,
-3L))
rename_with was created to solve these kind of problems
library(tidyverse)
mtcars %>%
rename_with(.fn = ~ str_remove_all(.x,"X"))
I'm trying to mutate a column by dividing the value of a row with the value above. For example, lets say i have this dataframe:
V1
A 4
B 2
C 8
Using something like:
df <- mutate(df, V2 = V1[row+1] / V1[row])
I want to get:
V1 v2
A 4 NA
B 2 2
C 8 0.25
I can't find any way to do this...does anyone have any info?
edit: clarity
Try with:
library(dplyr)
df <- mutate(df, v2 = lag(V1) / V1)
Output:
V1 v2
A 4 NA
B 2 2.00
C 8 0.25
In base R, we can remove the first and last element and do the division
df$V2 <- with(df, c(NA, V1[-length(V1)]/V1[-1]))
data
df <- structure(list(V1 = c(4, 2, 8)), class = "data.frame",
row.names = c("A",
"B", "C"))
I have a dataframe with several numeric variables along with factors. I wish to run over the numeric variables and replace the negative values to missing. I couldn't do that.
My alternative idea was to write a function that gets a dataframe and a variable, and does it. It didn't work either.
My code is:
NegativeToMissing = function(df,var)
{
df$var[df$var < 0] = NA
}
Error in $<-.data.frame(`*tmp*`, "var", value = logical(0)) : replacement has 0 rows, data has 40
what am I doing wrong ?
Thank you.
Here is an example with some dummy data.
df1 <- data.frame(col1 = c(-1, 1, 2, 0, -3),
col2 = 1:5,
col3 = LETTERS[1:5])
df1
# col1 col2 col3
#1 -1 1 A
#2 1 2 B
#3 2 3 C
#4 0 4 D
#5 -3 5 E
Now find columns that are numeric
numeric_cols <- sapply(df1, is.numeric)
And replace negative values
df1[numeric_cols] <- lapply(df1[numeric_cols], function(x) replace(x, x < 0 , NA))
df1
# col1 col2 col3
#1 NA 1 A
#2 1 2 B
#3 2 3 C
#4 0 4 D
#5 NA 5 E
You could also do
df1[df1 < 0] <- NA
With tidyverse, we can make use of mutate_if
library(tidyverse)
df1 %>%
mutate_if(is.numeric, funs(replace(., . < 0, NA)))
If you still want to change only one selected variable a solution withdplyr would be to use non-standard evaluation:
library(dplyr)
NegativeToMissing <- function(df, var) {
quo_var = quo_name(var)
df %>%
mutate(!!quo_var := ifelse(!!var < 0, NA, !!var))
}
NegativeToMissing(data, var=quo(val1)) # use quo() function without ""
# val1 val2
# 1 1 1
# 2 NA 2
# 3 2 3
Data used:
data <- data.frame(val1 = c(1, -1, 2),
val2 = 1:3)
data
# val1 val2
# 1 1 1
# 2 -1 2
# 3 2 3