I am trying to add the data from one data frame into a much larger data frame. Basically, in my case "df" is a data frame where the row lengths are 376, but I want to put these 376 items into "data" which will have row lengths of 10000. The reason I am doing this is because while my data in "df" is capped at 376 items per row, each row is not complete and I am going to concatenate rows together once I get them into the larger "data" data frame. The issue with my code shown below is that when I try transferring a row of "df" into a row of "data", I get numbers instead of some letter name that should be shown in place (I'm assuming the numbers are the location of the item in memory). How do I fix my code so that I get the actual item names transferred over to "data"?
df<-read.table("msigdb.v5.2.symbols.txt", fill = TRUE)
data<-data.frame(matrix(NA,ncol = 10000, nrow = 19337))
for (w in 1:20){
data[w,]<-df[w,]
}
data[1,1] should be "AAANWWTGC_UNKNOWN", instead I am just getting "5"
That is due to that you are replacing a row in a data.frame that contain factors to a data.frame that contains logical values, and the default will be to convert the df values to integers.
Look at the two examples below:
1) matrix to data.frame
data<-data.frame(matrix(NA,ncol = 3, nrow = 3))
df<-(matrix(data=rep(c("pos1","pos2",3),3),nrow=3,byrow=FALSE))
str(df)
chr [1:3, 1:3] "pos1" "pos2" "3" "pos1" "pos2" "3" "pos1" "pos2" "3"
str(data)
'data.frame': 3 obs. of 3 variables:
$ X1: logi NA NA NA
$ X2: logi NA NA NA
$ X3: logi NA NA NA
data[1,]<-df[1,]
str(data)
'data.frame': 3 obs. of 3 variables:
$ X1: chr "pos1" NA NA
$ X2: chr "pos1" NA NA
$ X3: chr "pos1" NA NA
2) data.frame to data.frame
data<-data.frame(matrix(NA,ncol = 3, nrow = 3))
df<-data.frame(matrix(data=rep(c("pos1","pos2",3),3),nrow=3,byrow=FALSE))
str(df)
'data.frame': 3 obs. of 3 variables:
$ X1: Factor w/ 3 levels "3","pos1","pos2": 2 3 1
$ X2: Factor w/ 3 levels "3","pos1","pos2": 2 3 1
$ X3: Factor w/ 3 levels "3","pos1","pos2": 2 3 1
str(data)
'data.frame': 3 obs. of 3 variables:
$ X1: logi NA NA NA
$ X2: logi NA NA NA
$ X3: logi NA NA NA
data[1,]<-df[1,]
str(data)
'data.frame': 3 obs. of 3 variables:
$ X1: int 2 NA NA
$ X2: int 2 NA NA
$ X3: int 2 NA NA
Related
I would like to loop through columns in a data set and use the name of the column to aggregate the data set. However, I am getting an error when I try to feed through the column name into the aggregate function:
"Error in model.frame.default(formula = cbind(SurveyID) ~ Panel + Category + :
variable lengths differ (found for 'i')"
Once I can store this is a temp file, I will add the temp file to a permanent dataset; however, I can't get past this part. Any help would be so much appreciated!
#example of my data:
df <- data.frame("SurveyID" = c('A','B','C','D'), "Panel" = c('E','E','S','S'), "Category" = c(1,1,2,3), "ENG" = c(3,3,1,2), "PAR"
= c(3,1,1,2), "REL" = c(3,1,1,2), "CLC"= c(3,1,1,2))
#for loop to get column name to include as part of the aggregate function
for (i in colnames(df[4:7])) {
print (i)
temp <- data.frame(setNames(aggregate(cbind(SurveyID) ~ Panel + Category + i, data = df, FUN = length), c("Panel","GENDER", "Favlev", "Cnt")))
}
You are making one newbie mistake and one more sophisticated mistake:
Newb mistake: failing to index successive items upon assignment, i.e., overwriting earlier values with new values.
Not so newb mistake. Improper construction of formula objects. Need as.formula
temp=list() # need empty list with a name
for (i in colnames(df[4:7])) {
print (i); form <- as.formula( paste( "SurveyID ~ Panel + Category +", i) )
temp[[i]] <- data.frame(setNames(aggregate(form, data = df, FUN = length), c("Panel","GENDER", "Favlev", "Cnt")))
}
#Output
[1] "ENG"
[1] "PAR"
[1] "REL"
[1] "CLC"
str(temp)
#----------------
List of 4
$ ENG:'data.frame': 3 obs. of 4 variables:
..$ Panel : Factor w/ 2 levels "E","S": 2 2 1
..$ GENDER: num [1:3] 2 3 1
..$ Favlev: num [1:3] 1 2 3
..$ Cnt : int [1:3] 1 1 2
$ PAR:'data.frame': 4 obs. of 4 variables:
..$ Panel : Factor w/ 2 levels "E","S": 1 2 2 1
..$ GENDER: num [1:4] 1 2 3 1
..$ Favlev: num [1:4] 1 1 2 3
..$ Cnt : int [1:4] 1 1 1 1
$ REL:'data.frame': 4 obs. of 4 variables:
..$ Panel : Factor w/ 2 levels "E","S": 1 2 2 1
..$ GENDER: num [1:4] 1 2 3 1
..$ Favlev: num [1:4] 1 1 2 3
..$ Cnt : int [1:4] 1 1 1 1
$ CLC:'data.frame': 4 obs. of 4 variables:
..$ Panel : Factor w/ 2 levels "E","S": 1 2 2 1
..$ GENDER: num [1:4] 1 2 3 1
..$ Favlev: num [1:4] 1 1 2 3
..$ Cnt : int [1:4] 1 1 1 1
I think this is a simple question but I haven't found a suitable solution. To begin with a set of simplified data :
df <- as.data.frame(matrix(1:20, 5, 4))
str(df)
# 'data.frame': 5 obs. of 4 variables:
# $ V1: int 1 2 3 4 5
# $ V2: int 6 7 8 9 10
# $ V3: int 11 12 13 14 15
# $ V4: int 16 17 18 19 20
We can see that all the classes are the integer. What I wanna achieve is converting the 4 classes to integer, numeric, character, and factor respectively. Of course, I can use
df$V1 <- as.XXX(df$V1)
for each column, but I think it's inefficient.
Expected Output
# 'data.frame': 5 obs. of 4 variables:
# $ V1: int 1 2 3 4 5
# $ V2: num 6 7 8 9 10
# $ V3: chr "11" "12" "13" "14" ...
# $ V4: Factor w/ 5 levels "16","17","18",..: 1 2 3 4 5
Question 2
I reference #joran's answer in R Assign (or copy) column classes from a data frame to another and run the following code :
myclass <- c("integer", "numeric", "character", "factor")
df.2 <- df
df.2[] <- mapply(FUN = as, df.2, myclass, SIMPLIFY = F)
When I call df.2, an error appears :
Error in as.character.factor(x) : malformed factor
However, It's okay to call str(df.2), and apparently only V1 and V3 reach my request.
str(df.2)
# 'data.frame': 5 obs. of 4 variables:
# $ V1: int 1 2 3 4 5
# $ V2: int 6 7 8 9 10
# $ V3: chr "11" "12" "13" "14" ...
# $ V4:Formal class 'factor' [package "methods"] with 3 slots
# .. ..# .Data : int 16 17 18 19 20
# .. ..# levels : chr
# .. ..# .S3Class: chr "factor"
Why cannot as function deal with classes numeric and factor?
We can use mapply and provide the functions as a list to convert the columns.
df <- as.data.frame(matrix(1:20, 5, 4))
df[] <- mapply(function(x, FUN) FUN(x),
df,
list(as.integer, as.numeric, as.character, as.factor),
SIMPLIFY = FALSE)
str(df)
# 'data.frame': 5 obs. of 4 variables:
# $ V1: int 1 2 3 4 5
# $ V2: num 6 7 8 9 10
# $ V3: chr "11" "12" "13" "14" ...
# $ V4: Factor w/ 5 levels "16","17","18",..: 1 2 3 4 5
If you don't exclude the for loop method, try this :
df <- as.data.frame(matrix(1:20, 5, 4))
type <- c("integer", "numeric", "character", "factor")
for(i in 1:ncol(df)){
call <- paste("as", type[i], sep = ".")
df[[i]] <- do.call(call, list(df[[i]]))
}
str(df)
# 'data.frame': 5 obs. of 4 variables:
# $ V1: int 1 2 3 4 5
# $ V2: num 6 7 8 9 10
# $ V3: chr "11" "12" "13" "14" ...
# $ V4: Factor w/ 5 levels "16","17","18",..: 1 2 3 4 5
I have the following list of dataframes structure:
str(mylist)
List of 2
$ L1 :'data.frame': 12471 obs. of 3 variables:
...$ colA : Date[1:12471], format: "2006-10-10" "2010-06-21" ...
...$ colB : int [1:12471], 62 42 55 12 78 ...
...$ colC : Factor w/ 3 levels "type1","type2","type3",..: 1 2 3 2 2 ...
I would like to replace type1 or type2 with a new factor type4.
I have tried:
mylist <- lapply(mylist, transform, colC =
replace(colC, colC == 'type1','type4'))
Warning message:
1: In `[<-.factor`(`*tmp*`, list, value = "type4") :
invalid factor level, NA generated
2: In `[<-.factor`(`*tmp*`, list, value = "type4") :
invalid factor level, NA generated
I do not want to read in my initial data with stringAsFactor=F but i have tried adding type4 as a level in my initial dataset (before splitting into a list of dataframes) using:
levels(mydf$colC) <- c(levels(mydf$colC), "type4")
but I still get the same error when trying to replace.
how do I tell replace that type4 is to be treated as a factor?
You can try to use levels options to renew your factor.
Such as,
status <- factor(status, order=TRUE, levels=c("1", "3", "2",...))
c("1", "3", "2",...) is your type4 in here.
As you state, the crucial thing is to add the new factor level.
## Test data:
mydf <- data.frame(colC = factor(c("type1", "type2", "type3", "type2", "type2")))
mylist <- list(mydf, mydf)
Your data has three factor levels:
> str(mylist)
List of 2
$ :'data.frame': 5 obs. of 1 variable:
..$ colC: Factor w/ 3 levels "type1","type2",..: 1 2 3 2 2
$ :'data.frame': 5 obs. of 1 variable:
..$ colC: Factor w/ 3 levels "type1","type2",..: 1 2 3 2 2
Now add the fourth factor level, then your replace command should work:
## Change levels:
for (ii in seq(along = mylist)) levels(mylist[[ii]]$colC) <-
c(levels(mylist[[ii]]$colC), "type4")
## Replace level:
mylist <- lapply(mylist, transform, colC = replace(colC,
colC == 'type1','type4'))
The new data has four factor levels:
> str(mylist)
List of 2
$ :'data.frame': 5 obs. of 1 variable:
..$ colC: Factor w/ 4 levels "type1","type2",..: 4 2 3 2 2
$ :'data.frame': 5 obs. of 1 variable:
..$ colC: Factor w/ 4 levels "type1","type2",..: 4 2 3 2 2
Here a list of list x generated as follow:
list1 <- list(NULL, as.integer(0))
list2 <- list(NULL, as.integer(1))
list3 <- list(1:5, 0:4)
x <- list(a=list1, b=list2, c=list3)
x has the following structure:
str(x)
List of 3
$ a:List of 2
..$ : NULL
..$ : int 0
$ b:List of 2
..$ : NULL
..$ : int 1
$ c:List of 2
..$ : int [1:5] 1 2 3 4 5
..$ : int [1:5] 0 1 2 3 4
I'm trying to convert it to a coerced dataframe. I first used
xc <- data.frame(lapply(x, as.numeric)
I got the following error
Error in lapply(x, as.numeric) :
(list) object cannot be coerced to type 'double
Actually it only works with as.character as an argument.
My goal is to reach the dataframe with the following structure:
str(xc)
'data.frame': 2 obs. of 3 variables:
$ a: int NA 0 ...
$ b: int NA 1 ...
$ c: int [1:5] 1 2 3 4 5 int [1:5] 0 1 2 3 4
I think the columns of the resulting data frame must be lists (this is the type that can handle multiple vectors and NULL values).
Using dplyr or data.table package is probably the easiest way.
You can then convert it back to base data.frame with as.data.frame:
library(data.table)
xc <- as.data.table(x)
or
library(dplyr)
xc <- as_data_frame(x)
After converting to base data.frame, the result is the same:
as.data.frame(xc)
#> a b c
#> 1 NULL NULL 1, 2, 3, 4, 5
#> 2 0 1 0, 1, 2, 3, 4
The columns are lists:
str(as.data.frame(xc))
#> 'data.frame': 2 obs. of 3 variables:
#> $ a:List of 2
#> ..$ : NULL
#> ..$ : int 0
#> $ b:List of 2
#> ..$ : NULL
#> ..$ : int 1
#> $ c:List of 2
#> ..$ : int 1 2 3 4 5
#> ..$ : int 0 1 2 3 4
I want to convert variables into factors using apply():
a <- data.frame(x1 = rnorm(100),
x2 = sample(c("a","b"), 100, replace = T),
x3 = factor(c(rep("a",50) , rep("b",50))))
a2 <- apply(a, 2,as.factor)
apply(a2, 2,class)
results in:
x1 x2 x3
"character" "character" "character"
I don't understand why this results in character vectors instead of factor vectors.
apply converts your data.frame to a character matrix. Use lapply:
lapply(a, class)
# $x1
# [1] "numeric"
# $x2
# [1] "factor"
# $x3
# [1] "factor"
In second command apply converts result to character matrix, using lapply:
a2 <- lapply(a, as.factor)
lapply(a2, class)
# $x1
# [1] "factor"
# $x2
# [1] "factor"
# $x3
# [1] "factor"
But for simple lookout you could use str:
str(a)
# 'data.frame': 100 obs. of 3 variables:
# $ x1: num -1.79 -1.091 1.307 1.142 -0.972 ...
# $ x2: Factor w/ 2 levels "a","b": 2 1 1 1 2 1 1 1 1 2 ...
# $ x3: Factor w/ 2 levels "a","b": 1 1 1 1 1 1 1 1 1 1 ...
Additional explanation according to comments:
Why does the lapply work while apply doesn't?
The first thing that apply does is to convert an argument to a matrix. So apply(a) is equivalent to apply(as.matrix(a)). As you can see str(as.matrix(a)) gives you:
chr [1:100, 1:3] " 0.075124364" "-1.608618269" "-1.487629526" ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:3] "x1" "x2" "x3"
There are no more factors, so class return "character" for all columns.
lapply works on columns so gives you what you want (it does something like class(a$column_name) for each column).
You can see in help to apply why apply and as.factor doesn't work :
In all cases the result is coerced by
as.vector to one of the basic vector
types before the dimensions are set,
so that (for example) factor results
will be coerced to a character array.
Why sapply and as.factor doesn't work you can see in help to sapply:
Value (...) An atomic vector or matrix
or list of the same length as X (...)
If simplification occurs, the output
type is determined from the highest
type of the return values in the
hierarchy NULL < raw < logical <
integer < real < complex < character <
list < expression, after coercion of
pairlists to lists.
You never get matrix of factors or data.frame.
How to convert output to data.frame?
Simple, use as.data.frame as you wrote in comment:
a2 <- as.data.frame(lapply(a, as.factor))
str(a2)
'data.frame': 100 obs. of 3 variables:
$ x1: Factor w/ 100 levels "-2.49629293159922",..: 60 6 7 63 45 93 56 98 40 61 ...
$ x2: Factor w/ 2 levels "a","b": 1 1 2 2 2 2 2 1 2 2 ...
$ x3: Factor w/ 2 levels "a","b": 1 1 1 1 1 1 1 1 1 1 ...
But if you want to replace selected character columns with factor there is a trick:
a3 <- data.frame(x1=letters, x2=LETTERS, x3=LETTERS, stringsAsFactors=FALSE)
str(a3)
'data.frame': 26 obs. of 3 variables:
$ x1: chr "a" "b" "c" "d" ...
$ x2: chr "A" "B" "C" "D" ...
$ x3: chr "A" "B" "C" "D" ...
columns_to_change <- c("x1","x2")
a3[, columns_to_change] <- lapply(a3[, columns_to_change], as.factor)
str(a3)
'data.frame': 26 obs. of 3 variables:
$ x1: Factor w/ 26 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
$ x2: Factor w/ 26 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10 ...
$ x3: chr "A" "B" "C" "D" ...
You could use it to replace all columns using:
a3 <- data.frame(x1=letters, x2=LETTERS, x3=LETTERS, stringsAsFactors=FALSE)
a3[, ] <- lapply(a3, as.factor)
str(a3)
'data.frame': 26 obs. of 3 variables:
$ x1: Factor w/ 26 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
$ x2: Factor w/ 26 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10 ...
$ x3: Factor w/ 26 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10 ...