Converting columns in dataframe within list? - r

What is the best way to convert a specific column in each list object to a specific format?
For instance, I have a list with four objects (each of which is a data frame) and I want to change column 3 in each data.frame from double to integer?
I'm guessing something along the line of lapply but I didn't know what specific synthax to use. I was trying:
lapply(df,function(x){as.numeric(var1(x))})
but it wasn't working.
Thanks!

Yes, lapply works well here:
lapply(listofdfs, function(df) { # loop through each data.frame in list
df[ , 3] <- as.integer(df[ , 3]) # make the 3rd column of type integer
df # return the new data.frame
})

This is just an alternative to C. Braun's answer.
You can also use map() function from the purr library.
Input:
library(tidyverse)
df <- tibble(a = c(1, 2, 3), b =c(4, 5, 6), d = c(7, 8, 9))
myList <- list(df, df, df)
myList
Method:
map(myList, ~(.x %>% mutate_at(vars(3), funs(as.integer(.)))))
Output:
[[1]]
# A tibble: 3 x 3
a b d
<dbl> <dbl> <int>
1 1. 4. 7
2 2. 5. 8
3 3. 6. 9
[[2]]
# A tibble: 3 x 3
a b d
<dbl> <dbl> <int>
1 1. 4. 7
2 2. 5. 8
3 3. 6. 9
[[3]]
# A tibble: 3 x 3
a b d
<dbl> <dbl> <int>
1 1. 4. 7
2 2. 5. 8
3 3. 6. 9

You can use this:
dlist2 <- lapply(dlist,function(x){
y <- x
y[,coltochange] <- as.numeric(x[,coltochange])
return(y)
} )
Simple example:
data <- data.frame(cbind(c("1","2","3","4",NA),c(1:5)),stringsAsFactors = F)
typeof(data[,1]) #character
dlist <- list(data,data,data)
coltochange <- 1
dlist2 <- lapply(dlist,function(x){
y <- x
y[,coltochange] <- as.numeric(x[,coltochange])
return(y)
} )
typeof(dlist[[1]][,1]) #character
typeof(dlist2[[1]][,1]) #double

Related

Initialise a dataframe where a column references another column

I wonder if there is a way to do:
df <- data.frame(x = 1:3)
df$y = df$x + 5
yielding:
x y
1 1 6
2 2 7
3 3 8
in one line of code where the y column refers to the x column? For example:
data.frame(x = 1:3, y = self$x + 5) # doesn't work
(I won't accept answers that ignore the x column, for example, data.frame(x = 1:3, y = 6:8 :-))
This is possible using tibble from tibble library. Credit to #DaveArmstrong from the comments.
library(tibble)
tibble(x = 1:3, y = x + 5)
# A tibble: 3 × 2
x y
<int> <dbl>
1 1 6
2 2 7
3 3 8
Here's a base R method that do not need to use external package (e.g. tibble).
We can use outer to add 5 to each element in df$x, then cbind the result with df.
setNames(data.frame(cbind(1:3, outer(1:3, 5, `+`))), c("x", "y"))
# or to expand your code
setNames(cbind(data.frame(x = 1:3), outer(1:3, 5, `+`)), c("x", "y"))
x y
1 1 6
2 2 7
3 3 8

How do I add a column to a data frame consisting of minimum values from other columns?

How do I add a column to a data frame consisting of the minimum values from other columns? So in this case, to create a third column that will have the values 1, 2 and 2?
df = data.frame(A = 1:3, B = 4:2)
You can use apply() function to do this. See below.
df$C <- apply(df, 1, min)
The second argument allows you to choose the dimension in which you want min to be applied, in this case 1, applies min to all columns in each row separately.
You can choose specific columns from the dataframe, as follows:
df$newCol <- apply(df[c('A','B')], 1, min)
You can call the parallel minimum function with do.call to apply it on all your columns:
df$C <- do.call(pmin, df)
df %>%
rowwise() %>%
mutate(C = min(A, B))
# A tibble: 3 × 3
# Rowwise:
A B C
<int> <int> <int>
1 1 4 1
2 2 3 2
3 3 2 2
Using input with equal values across rows:
df = data.frame(A = 1:10, B = 11:2)
df %>%
rowwise() %>%
mutate(C = min(A, B))
# A tibble: 10 × 3
# Rowwise:
A B C
<int> <int> <int>
1 1 11 1
2 2 10 2
3 3 9 3
4 4 8 4
5 5 7 5
6 6 6 6
7 7 5 5
8 8 4 4
9 9 3 3
10 10 2 2
You do simply:
df$C <- apply(FUN=min,MARGIN=1,X=df)
Or:
df[, "C"] <- apply(FUN=min,MARGIN=1,X=df)
or:
df["C"] <- apply(FUN=min,MARGIN=1,X=df)
Instead of apply, you could also use data.farme(t(df)), where t transposes df, because sapply would traverse a data frame column-wise applying the given function. So the rows must be made columns. Since t outputs always a matrix, you need to make it a data.frame() again.
df$C <- sapply(data.frame(t(df)), min)
Or one could use the fact that ifelse is vectorized:
df$C <- with(df, ifelse(A<B,A,B))
Or:
df$C <- ifelse(df$A < df$B, df$A, df$B)
matrixStats
# install.packages("matrixStats")
matrixStats::rowMins(as.matrix(df))
According to this SO answer the fastest.
apply-type functions use lists and are always quite slow.
You can use transform() to add the min column as the output of pmin(a, b) and access the elements of df without indexing:
df <- transform(df, min = pmin(a, b))
or
In data.table
library(data.table)
DT = data.table(a = 1:3, b = 4:2)
DT[, min := pmin(a, b)]

R: How to insert a row in Dataframe starting at a certain column?

I have the following data frame:
df <- tibble(x = 1:3, y = 3:1, z = 4:6, a = 6:4, b = 7:9)
I now need to extract the values from the second row, third to fifth column with this command:
newrow <- df[2,3:5]
I now want to insert a new row after the second row. The problem is that I need the new row to start at column 2. If I use the following code, the row will be added at the same column positions as I extracted it from:
df%>% add_row(newrow, .before = 3)
Hope anybody can help with this, any help is much appreciated.
Your newrow dataframe has the colnames from coluns 3:5 (z,a,b). Therefore add_row()matches the newrow to these columns.
You need to rename the columns of newrow with the first three column names.
df%>% add_row(setNames(newrow, names(df)[1:ncol(newrow)]),
.before = 3)
I'm not sure exactly what you're desired outcome is but does this achieve what you want?
library(tibble)
library(dplyr)
df <- tibble::tibble(x = 1:3, y = 3:1, z = 4:6, a = 6:4, b = 7:9)
whatrow <- 2
whatcolumns <- 3:5
beforerow <- 3
newdf <-
slice(df, whatrow) %>%
select(all_of(whatcolumns)) %>%
setNames(., names(df)[whatcolumns - 1]) %>%
add_row(df, ., .before = beforerow)
newdf
#> # A tibble: 4 x 5
#> x y z a b
#> <int> <int> <int> <int> <int>
#> 1 1 3 4 6 7
#> 2 2 2 5 5 8
#> 3 NA 5 5 8 NA
#> 4 3 1 6 4 9

In R subtract a vector from each row of a dataframe

I'm searching a better, more efficient solution to subtract a vector from each row of a dataframe (df1). My current solution repeats the vector (Vec) to create a dataframe (Vec_df1) with similar length as the df1 and then subtracts the two dataframes. Now I wonder if there is a more "direct" way to do this without having to create the new Vec_df1 dataframe (preferably in tidyverse). See example data below.
#Example data
V1 <- c(1, 2, 3)
V2 <- c(4, 5, 6)
V3 <- c(7, 8, 9)
df1 <- tibble(V1, V2, V3)
Vec <- c(1, 1, 2)
# Current solution, creates a dataframe with the same nrows by repeating the vector.
Vec_df1 <- tibble::as_tibble(t(Vec)) %>%
dplyr::slice(rep(dplyr::row_number(), nrow(df1)))
# Subtraction.
df2 <- df1-Vec_df1
df2
Thanks in advance
We can use sweep :
sweep(df1, 2, Vec, `-`)
# `-` is default FUN in sweep so you can also use
#sweep(df1, 2, Vec)
# V1 V2 V3
#1 0 3 5
#2 1 4 6
#3 2 5 7
Or an attempt similar to yours
df1 - rep(Vec, each = nrow(df1))
A similar approach using map2_df():
library(purrr)
map2_df(df1, Vec, `-`)
# A tibble: 3 x 3
V1 V2 V3
<dbl> <dbl> <dbl>
1 0 3 5
2 1 4 6
3 2 5 7
the fastest way to do this :
as_tibble(t(t(df1) - Vec))
# A tibble: 3 x 3
V1 V2 V3
<dbl> <dbl> <dbl>
1 0 3 5
2 1 4 6
3 2 5 7
We can also do
df1 - Vec[col(df1)]

Get description of groups from within a grouped data frame

I need to write a function that will take in a grouped data frame (from dplyr) and make a plot for each group, with the title describing what group it is for. The kicker is I don't know what the grouping variable is, or even how many there will be.
I've hacked together something using groups to get the grouping variables and then accessing the value with .[1,g], where g is a character version of the grouping variable names, as below.
Although I'm new to dplyr, this feels like the wrong way to go about this, that is, it's not really a dplyr native way of doing it. It works in the little testing I've done but I'm worried it will fail in some odd circumstance I haven't foreseen. How would you all do it? Is there a more dplyr-ish way of doing it?
On the odd chance that what I've done is actually a good idea, I've posted it as answer for you all to vote on as appropriate.
library(data.table)
setDT(d) # or create directly as data.table
par(mfrow = c(2, 3))
d[, plot(y, main = paste(names(.BY), .BY, sep = "=", collapse = ", ")), by = .(A, B)]
This is what I've hacked together; as described in the question, it uses groups to get the grouping variables and then accessing the value with .[1,g], where g is a character version of the grouping variable names, as below.
Instead of making a plot, it just makes a data frame with the title as a variable.
library(dplyr)
d <- as.tbl(data.frame(expand.grid(A=1:3,B=1:2,y=1:2)))
d1 <- d %>% group_by(A)
g <- unlist(lapply(groups(d1), paste))
d1 %>% do(data.frame(title=paste(paste(g, "=", .[1,g]), collapse=", "), stringsAsFactors=FALSE))
## Source: local data frame [3 x 2]
## Groups: A [3]
##
## A title
## <int> <chr>
## 1 1 A = 1
## 2 2 A = 2
## 3 3 A = 3
d1 <- d %>% group_by(A, B)
g <- unlist(lapply(groups(d1), paste))
d1 %>% do(data.frame(title=paste(paste(g, "=", .[1,g]), collapse=", "), stringsAsFactors=FALSE))
## Source: local data frame [6 x 3]
## Groups: A, B [6]
##
## A B title
## <int> <int> <chr>
## 1 1 1 A = 1, B = 1
## 2 1 2 A = 1, B = 2
## 3 2 1 A = 2, B = 1
## 4 2 2 A = 2, B = 2
## 5 3 1 A = 3, B = 1
## 6 3 2 A = 3, B = 2

Resources