assign data.frame as a component in a data.frame in R - r

This does not work
> dfi=data.frame(v1=c(1,1),v2=c(2,2))
> dfi
v1 v2
1 1 2
2 1 2
> df$df=dfi
Error in `$<-.data.frame`(`*tmp*`, "df", value = list(v1 = c(1, 1), v2 = c(2, :
replacement has 2 rows, data has 0
df$df=I(dfi) has the same error. Please help.
Thank you.

Moved this from comments for formatting reasons:
What exactly are you trying to achieve? If you want the contents of dfi passed to df you can use this code:
df <- data.frame(matrix(vector(), 0, 2, dimnames=list(c(), c("V1", "V2"))), stringsAsFactors=F)
df=dfi

As #joran says, it is unclear why you would ever want to do this. Nevertheless, it is possible.
One of the requirements of a data frame is that all the columns have the same number of rows. This is why you are getting the error. Something like this will work:
dfi <- data.frame(v1=c(1,1),v2=c(2,2)) # 2 rows
df <- data.frame(x=1:2) # also 2 rows
df$df <- dfi # works now
Printing would lead you to believe that df has three columns...
df
# x df.v1 df.v2
# 1 1 1 2
# 2 2 1 2
but it does not!
str(df)
# 'data.frame': 2 obs. of 2 variables:
# $ x : int 1 2
# $ df:'data.frame': 2 obs. of 2 variables:
# ..$ v1: num 1 1
# ..$ v2: num 2 2
Since df$df is a data frame
class(df$df)
# [1] "data.frame"
you can use the standard data frame accessors...
df$df$v1
# [1] 1 1
df$df[1,]
# v1 v2
# 1 1 2
Incidentally, RStudio has trouble displaying this type of data structure; view(df) gives an inaccurate display of the structure.
Finally, you are probably better off creating a list of data frames, rather than a data frame containing data frames:
df <- data.frame(grp=rep(LETTERS[1:3],each=5),x=rnorm(15),y=rpois(15,5))
df.lst <- split(df,df$grp) # creates a list of data frames
df.lst$A
# grp x y
# 1 A -1.3606420 10
# 2 A -0.4511408 5
# 3 A -1.1951950 4
# 4 A -0.8017765 5
# 5 A -0.2816298 9
df.lst$A$x
# [1] -1.3606420 -0.4511408 -1.1951950 -0.8017765 -0.2816298

Related

R: replacing <NA> within factor variables as 0

I am working with the R programming language. I have a dataset with both character and numeric variables - I am trying to replace all NA's and empty values in this data with "0". For a continuous variable, the NA/empty value should be replaced with a "numeric 0". For factor variables, the NA/empty value should be replaced with a "factor 0".
In the past, I used to use a standard command for replacing all NA's with 0 (in the below code, "df" represents the data frame containing the data):
df[df == NA] <- 0
I tried the above code on my data, but I still noticed that within the factor variables, this code was not able to replace <NA> values with 0. <NA> 's are still present.
I tried several approaches:
1st Approach:
df[is.na(df)] <- 0
But this did not work:
Warning message:
In '[<-.factor'('*tmp*',thisvar, value = 0):
invalid factor level, NA generated
Second Approach: I tried for one of the factor variables
library(car)
df$some_factor_var <- recode(df$some_factor_var, "NA = 0")
But this replaced every value within "some_factor_var" as 0
Third Approach : I tried again for one of the factor variables
library(forcats)
fct_explicit_na(df$some_factor_var,0)
Error: Can't convert a double vector to a character vector
Can someone please show me how to fix this problem? Is there a way to replace ALL empty/missing/NA values for all variables at once?
Thanks
For factor variables you need to first include the new level (0) in the data if it is not already present.
See this example -
df <- data.frame(a = factor(c(1, NA, 2, 5)), b = 1:4,
c = c('a', 'b', 'c', NA), d = c(1, 2, NA, 1))
#Include 0 in the levels for "a" variable
levels(df$a) <- c(levels(df$a), 0)
#Replace NA to 0
df[is.na(df)] <- 0
df
# a b c d
#1 1 1 a 1
#2 0 2 b 2
#3 2 3 c 0
#4 5 4 0 1
str(df)
#'data.frame': 4 obs. of 4 variables:
# $ a: Factor w/ 4 levels "1","2","5","0": 1 4 2 3
# $ b: int 1 2 3 4
# $ c: chr "a" "b" "c" "0"
# $ d: num 1 2 0 1
With tidyverse, try:
library(tidyverse)
df <-
tibble(var_numeric = c(1,2,3,NA),
var_factor = as.factor(c(4,5,6,NA)))
df %>%
replace_na(list(var_numeric = 0)) %>%
mutate(var_factor = fct_explicit_na(var_factor, "0"))
# A tibble: 4 x 2
var_numeric var_factor
<dbl> <fct>
1 1 4
2 2 5
3 3 6
4 0 0

Reverse the values of a list of variables [duplicate]

This question already has answers here:
Change row order in a matrix/dataframe
(7 answers)
Closed 2 years ago.
I have a df in which around 50 variables have with character values ranging from 1,2,3,4
var
1
2
3
4
How can I "bulk" change the values reversing them such that I get:
var
4
3
2
1
So 4 becomes 1, 3 becomes 2, etc... Kind of like applying the formula (var = 5-value) for each variable but for character values.
This as mentioned for a long list of variables (~50).
You can try :
library(dplyr)
df %>% mutate(across(var1:var50, ~5 - as.numeric(.)))
OR in base R :
cols <- paste0('var', 1:50)
df[cols] <- lapply(df[cols], function(x) 5 - as.numeric(x))
If you're just subtracting the data.frame from a value, as you indicate in your example, you should be able to just do this:
df[] <- 5 - data.matrix(df)
Here's an example:
df <- data.frame(var1 = as.character(c(1, 2, 3, 4)),
var2 = as.character(c(10, 20, 30, 40)),
stringsAsFactors = FALSE)
df[] <- 5 - data.matrix(df)
str(df)
# 'data.frame': 4 obs. of 2 variables:
# $ var1: num 4 3 2 1
# $ var2: num -5 -15 -25 -35
If you're just reversing the row order, then something like this should work:
df[nrow(df):1, ]
# var1 var2
# 4 4 40
# 3 3 30
# 2 2 20
# 1 1 10
You can use tidyverse’s mutate_at() or mutate_all().

r - subsetting dataframe creates factors

I have a huge data frame (call it huge) I would like to split up in two by row number. Though, I notice that the way I'd do it makes the resulting subsets large factors instead of data frames.
list1 <- huge[c(1:8175),]
list2 <- huge[c(8176:nrow(huge),]
class(list1)
[1] "factor"
Can someone explain to me why it is like that, and how do I prevent that?
It is likely that you subset a one-column data frame. Considering the following example.
# Create an example data frame
dt <- data.frame(a = 1:5, b = letters[1:5])
dt
# a b
# 1 1 a
# 2 2 b
# 3 3 c
# 4 4 d
# 5 5 e
str(dt)
# 'data.frame': 5 obs. of 2 variables:
# $ a: int 1 2 3 4 5
# $ b: Factor w/ 5 levels "a","b","c","d",..: 1 2 3 4 5
# Subset the data frame
list1 <- dt[1:2, ]
list2 <- dt[3:nrow(dt), ]
class(list1)
# [1] "data.frame"
The code to subset dt works well. However, when I created a one-column data frame from dt and subset it, you can see that the output automatically becomes a vector.
# Create a one-column data frame
dt2 <- dt[, 2, drop = FALSE]
# Subset the data frame
list3 <- dt2[1:2, ]
list4 <- dt2[3:nrow(dt2), ]
class(list3)
# [1] "factor"
list3
# [1] a b
# Levels: a b c d e
The solution would be add drop = FALSE when subsetting the data frame to keep the output as a data frame.
# Subset the data frame
list5 <- dt2[1:2, , drop = FALSE]
class(list5)
# [1] "data.frame"

R: Adding row to a dataframe with multiple classes

I have a seemingly simple task of adding a row to a data frame in R but I just can't do it!
I have a data frame with 50 rows and 100 columns. The data frame, which I would like to keep in the same format, has the first column as a factor, and all other columns as characters -- this is what lapply produced. I would simply like to add append a 51st row...but I incur warnings every time.
My added data is of the form Cat <- c("Cat", 1,NA,3,NA,5). (I have no clue where " or ' need to go - quite new to R!)
rbind shows "invalid factor levels" every time.
e.g.
df <- rbind(df,Cat)
I believe this is because of the factor/character divide.
The factor levels in your data.frame should also contain the values in your "Cat" object for the relevant factor column.
Here's a simple example:
df <- data.frame(v1 = c("a", "b"), v2 = 1:2)
toAdd <- list("c", 3)
## Warnings...
rbind(df, toAdd)
# v1 v2
# 1 a 1
# 2 b 2
# 3 <NA> 3
# Warning message:
# In `[<-.factor`(`*tmp*`, ri, value = "c") :
# invalid factor level, NA generated
## Possible fix
df$v1 <- factor(df$v1, unique(c(levels(df$v1), toAdd[[1]])))
rbind(df, toAdd)
# v1 v2
# 1 a 1
# 2 b 2
# 3 c 3
Alternatively, consider rbindlist from "data.table", which should save you from having to convert the factor levels:
> library(data.table)
> df <- data.frame(v1 = c("a", "b"), v2 = 1:2)
> rbindlist(list(df, toAdd))
v1 v2
1: a 1
2: b 2
3: c 3
> str(.Last.value)
Classes ‘data.table’ and 'data.frame': 3 obs. of 2 variables:
$ v1: Factor w/ 3 levels "a","b","c": 1 2 3
$ v2: num 1 2 3
- attr(*, ".internal.selfref")=<externalptr>

Store list into a file (R)

I have the following data frame:
Group.1 V2
1 27562761 GO:0003676
2 27562765 c("GO:0004345", "GO:0050661", "GO:0006006", "GO:0055114")
3 27562775 GO:0016020
4 27562776 c("GO:0005525", "GO:0007264", "GO:0005622")
where the second column is a list. I tried to write the data frame into a text file using write.table, but it did not work. My desired output is the following one (file.txt):
27562761 GO:0003676
27562765 GO:0004345, GO:0050661, GO:0006006, GO:0055114
27562775 GO:0016020
27562776 GO:0005525, GO:0007264, GO:0005622
How could I obtain that?
You could look into sink, or you could use write.csv after flattening "V2" to a character string.
Try the following examples:
## recreating some data that is similar to your example
df <- data.frame(a = c(1, 1, 2, 2, 3), b = letters[1:5])
x <- aggregate(list(V2 = df$b), list(df$a), c)
x
# Group.1 V2
# 1 1 1, 2
# 2 2 3, 4
# 3 3 5
## V2 is a list, as you describe in your question
str(x)
# 'data.frame': 3 obs. of 2 variables:
# $ Group.1: num 1 2 3
# $ V2 :List of 3
# ..$ 1: int 1 2
# ..$ 3: int 3 4
# ..$ 5: int 5
sink(file = "somefile.txt")
x
sink()
## now open up "somefile.txt" from your working directory
x$V2 <- sapply(x$V2, paste, collapse = ", ")
write.csv(x, file = "somefile.csv")
## now open up "somefile.csv" from your working directory

Resources