as.numeric on subset of dataframe [duplicate] - r

This question already has answers here:
Change the class from factor to numeric of many columns in a data frame
(16 answers)
Closed 6 years ago.
How can I convert columns 1 and 2 to numeric? I tried this which seemed obvious to me, but wasn't.
Thanks
df <- data.frame(v1 = c(1,2,'x',4,5,6), v2 = c(1,2,'x',4,5,6), v3 = c(1,2,3,4,5,6), stringsAsFactors = FALSE)
as.numeric(df[,1:2])

We can use lapply to loop over the columns of interest and convert to numeric
df[1:2] <- lapply(df[1:2], as.numeric)

I suggest this approach, using unlist. I particularly like the "one line" approach, when possible.And for more options and references, I strongly suggest this beautiful post.
df <- data.frame(v1 = c(1,2,'x',4,5,6), v2 = c(1,2,'x',4,5,6),
v3 = (1,2,3,4,5,6), stringsAsFactors = FALSE)
df[, c("v1","v2")] <- as.numeric(as.character(unlist(df[, c("v1","v2")])))
Warning message:
NAs introduced by coercion
str(df)
data.frame': 6 obs. of 3 variables:
$ v1: num 1 2 NA 4 5 6
$ v2: num 1 2 NA 4 5 6
$ v3: num 1 2 3 4 5 6

You can use matrix indexing without any hidden loops:
df[, 1:2] <- as.numeric(as.matrix(df[, 1:2]))

Related

Reverse the values of a list of variables [duplicate]

This question already has answers here:
Change row order in a matrix/dataframe
(7 answers)
Closed 2 years ago.
I have a df in which around 50 variables have with character values ranging from 1,2,3,4
var
1
2
3
4
How can I "bulk" change the values reversing them such that I get:
var
4
3
2
1
So 4 becomes 1, 3 becomes 2, etc... Kind of like applying the formula (var = 5-value) for each variable but for character values.
This as mentioned for a long list of variables (~50).
You can try :
library(dplyr)
df %>% mutate(across(var1:var50, ~5 - as.numeric(.)))
OR in base R :
cols <- paste0('var', 1:50)
df[cols] <- lapply(df[cols], function(x) 5 - as.numeric(x))
If you're just subtracting the data.frame from a value, as you indicate in your example, you should be able to just do this:
df[] <- 5 - data.matrix(df)
Here's an example:
df <- data.frame(var1 = as.character(c(1, 2, 3, 4)),
var2 = as.character(c(10, 20, 30, 40)),
stringsAsFactors = FALSE)
df[] <- 5 - data.matrix(df)
str(df)
# 'data.frame': 4 obs. of 2 variables:
# $ var1: num 4 3 2 1
# $ var2: num -5 -15 -25 -35
If you're just reversing the row order, then something like this should work:
df[nrow(df):1, ]
# var1 var2
# 4 4 40
# 3 3 30
# 2 2 20
# 1 1 10
You can use tidyverse’s mutate_at() or mutate_all().

Create a character variable with data.frame function [duplicate]

This question already has an answer here:
Data Frame Initialization - Character Initialization read as Factors?
(1 answer)
Closed 5 years ago.
Using the data.frame function in R, I am creating an example dataset. However, the vectors with strings are converted to a factor column.
How can I make vectors with strings (e.g. var1) become character column in my data set?
Current Code
df = data.frame(var1 = c("1","2","3","4"),
var2 = c(1,2,3,4))
Resulting Output
As shown below, var1 is a factor. I need var1 it to have the chr class.
> str(df)
'data.frame': 4 obs. of 2 variables:
$ var1 : Factor w/ 4 levels "1","2","3","4": 1 2 3 4
$ var2 : num 1 2 3 4
Trouble-shooting
Based on this post, I tried adding as.character, but var1 remains a factor.
df = data.frame(var1 = as.character(c("1","2","3","4")),
var2 = c(1,2,3,4))
stringsAsFactors is your friend. Namely:
df = data.frame(var1 = c("1","2","3","4"),var2 = c(1,2,3,4),stringsAsFactors = F)
yielding:
> str(df)
'data.frame': 4 obs. of 2 variables:
$ var1: chr "1" "2" "3" "4"
$ var2: num 1 2 3 4
Based on the comments, adding the argument stringsAsFactors=FALSE will create character variables instead of factor variables.

lapply(x, as.factor) returning just one level [duplicate]

This question already has answers here:
Coerce multiple columns to factors at once
(11 answers)
Closed 6 years ago.
let's say I have the following dataframe
a <- as.integer(runif(20, 1, 30))
b <- as.integer(runif(20, 10, 30))
df <- data.frame(Sender = a, Receiver = b)
df
I want to transform both columns into factor:
var <- c("Sender", "Receiver")
df[var] <- lapply(var, factor)
str(df)
But it turns out that there is just one level in each column instead of as many as unique numbers in my example
'data.frame': 20 obs. of 2 variables:
$ Sender : Factor w/ 1 level "Sender": 1 1 1 1 1 1 1 1 1 1 ...
$ Receiver: Factor w/ 1 level "Receiver": 1 1 1 1 1 1 1 1 1 1 ...
Of course if works if I do it separately:
df$Sender <- as.factor(df$Sender)
df$Receiver <- as.factor(df$Receiver)
Can someone explain why?
You're not actually using your original data here, only the label.
Try it like this:
df <- as.data.frame(lapply(df, factor))
You want
df[] <- lapply(df, factor)

R assign variable types to large data.frame from vector

I have a wide data.frame that is all character vectors (df1). I have a separate vector(vec1) that contains the column classes I'd like to assign to each of the columns in df1.
If I was using read.csv(), I'd use the colClasses argument and set it equal to vec1, but there doesn't appear to be a similar option for an existing data.frame.
Any suggestions for a fast way to do this besides a loop?
I don't know if it will be of help but I have run into the same need many times and I have created a function in case it helps:
reclass <- function(df, vec){
df[] <- Map(function(x, f){
#switch below shows the accepted values in the vector
#you can modify it and/or add more
f <- switch(f,
as.is = 'force',
factor = 'as.factor',
num = 'as.numeric',
char = 'as.character')
#takes the name of the function and fetches the function
f <- get(f)
#apply the function
f(x)
},
df,
vec)
df
}
It uses Map to pass in a vector of classes to the data.frame. Each element corresponds to the class of the column. The length of both the dataframe and the vector need to be the same.
I am using switch as well to make the corresponding classes shorter to type. Use as.is to keep the class the same, the rest are self explanatory I think.
Small example:
df1 <- data.frame(1:10, letters[1:10], runif(50))
> str(df1)
'data.frame': 50 obs. of 3 variables:
$ X1.10 : int 1 2 3 4 5 6 7 8 9 10 ...
$ letters.1.10.: Factor w/ 10 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
$ runif.50. : num 0.0969 0.1957 0.8283 0.1768 0.9821 ...
And after the function:
df1 <- reclass(df1, c('num','as.is','char'))
> str(df1)
'data.frame': 50 obs. of 3 variables:
$ X1.10 : num [1:50] 1 2 3 4 5 6 7 8 9 10 ...
$ letters.1.10.: Factor w/ 10 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
$ runif.50. : chr [1:50] "0.0968757788650692" "0.19566105119884" "0.828283685725182" "0.176784737734124" ...
I guess Map internally is a loop but it is written in C so it should be fast enough.
May be you could try this function that makes the same work.
reclass <- function (df, vec_types) {
for (i in 1:ncol(df)) {
type <- vec_types[i]
class(df[ , i]) <- type
}
return(df)
}
and this is an example of vec_types (vector of types):
vec_types <- c('character', rep('integer', 3), rep('character', 2))
you can test the function (reclass) whith this table (df):
table <- data.frame(matrix(sample(1:10,30, replace = T), nrow = 5, ncol = 6))
str(table) # original column types
# apply the function
table <- reclass(table, vec_types)
str(table) # new column types

assign data.frame as a component in a data.frame in R

This does not work
> dfi=data.frame(v1=c(1,1),v2=c(2,2))
> dfi
v1 v2
1 1 2
2 1 2
> df$df=dfi
Error in `$<-.data.frame`(`*tmp*`, "df", value = list(v1 = c(1, 1), v2 = c(2, :
replacement has 2 rows, data has 0
df$df=I(dfi) has the same error. Please help.
Thank you.
Moved this from comments for formatting reasons:
What exactly are you trying to achieve? If you want the contents of dfi passed to df you can use this code:
df <- data.frame(matrix(vector(), 0, 2, dimnames=list(c(), c("V1", "V2"))), stringsAsFactors=F)
df=dfi
As #joran says, it is unclear why you would ever want to do this. Nevertheless, it is possible.
One of the requirements of a data frame is that all the columns have the same number of rows. This is why you are getting the error. Something like this will work:
dfi <- data.frame(v1=c(1,1),v2=c(2,2)) # 2 rows
df <- data.frame(x=1:2) # also 2 rows
df$df <- dfi # works now
Printing would lead you to believe that df has three columns...
df
# x df.v1 df.v2
# 1 1 1 2
# 2 2 1 2
but it does not!
str(df)
# 'data.frame': 2 obs. of 2 variables:
# $ x : int 1 2
# $ df:'data.frame': 2 obs. of 2 variables:
# ..$ v1: num 1 1
# ..$ v2: num 2 2
Since df$df is a data frame
class(df$df)
# [1] "data.frame"
you can use the standard data frame accessors...
df$df$v1
# [1] 1 1
df$df[1,]
# v1 v2
# 1 1 2
Incidentally, RStudio has trouble displaying this type of data structure; view(df) gives an inaccurate display of the structure.
Finally, you are probably better off creating a list of data frames, rather than a data frame containing data frames:
df <- data.frame(grp=rep(LETTERS[1:3],each=5),x=rnorm(15),y=rpois(15,5))
df.lst <- split(df,df$grp) # creates a list of data frames
df.lst$A
# grp x y
# 1 A -1.3606420 10
# 2 A -0.4511408 5
# 3 A -1.1951950 4
# 4 A -0.8017765 5
# 5 A -0.2816298 9
df.lst$A$x
# [1] -1.3606420 -0.4511408 -1.1951950 -0.8017765 -0.2816298

Resources