See the attributes of a dataframe but exclude rownames - r

I'd like to see the attributes of a dataframe but exclude the $row.names part.
I'm using attributes().
attributes( df )
Is there a way to achieve this using this function or do I need to seek a different function?
Thanks in advance.

We could use
atr1 <- attributes(df)
atr1[setdiff(names(atr1), "row.names")]
If we want in a single step, use modifyList
modifyList(attributes(df), list("row.names" = NULL))

Related

How to name the elements of an unnamed list

I have a list that look like this:
setlist2 <- list(wsb_b6, wsb_id8)
[[1]]
[1] "Gm10116" "Tpm3-rs7" "Wdfy1" "Rps3a2" "AK157302" "Gm6563"
"Gm9825" "Gm10259" "Gm6768"
[[2]]
[1] "Gm6401" "Ecel1" "Hpca" "Tmem176a" "Lepr"
"Baiap3" "Fam183b" "Vsx2" "Vtn"
I need it to look like this:
$wsb_b6
[1] "Gm10116" "Tpm3-rs7" "Wdfy1" "Rps3a2" "AK157302" "Gm6563"
"Gm9825" "Gm10259" "Gm6768"
$wsb_id8
[1] "Gm6401" "Ecel1" "Hpca" "Tmem176a" "Lepr"
"Baiap3" "Fam183b" "Vsx2" "Vtn"
I know that by doing it manually I can achieve it but it is more that 100 each, there's got to be a better way
#I found that I had to unlist my two previous lists
wsb_b6 <-wsb_b6[,1]
wsb_b6 <-unlist(wsb_b6)
wsb_id8 <-wsb_id8[,1]
wsb_id8 <- unlist(wsb_id8)
#And then list them again, but like this
setlist2 <-list(wsb_b6=wsb_b6, wsb_id8= wsb_id8)
Use dplyr::lst
setlist2 <- dplyr::lst(wsb_b6, wsb_id8)
It sounds like you want to create a named list. Specifically, you want to create a named list where the names are taken from the names of the variables in the environment.
This is similar to this question: Can lists be created that name themselves based on input object names?
I don't believe there is a simple function to do this in base R, but you can using the function llist from the package Hmisc:
library(Hmisc)
setlist2 <- llist(wsb_b6, wsb_id8)

How do I get a unique named list?

This is a simple question and I think I can probably re-invent the wheel and write something custom but I'm sure there must be an easy way to do this that I can't think of at the moment. Suppose I have a list:
l <- list("NY"=10001, "CT"=10002, "CT"=10002)
I would like a list:
list("NY"=10001, "CT"=10002)
I tried to use unique(l) but it just returns:
list(10001, 10002)
How do I get a unique list but preserve the names assigned to the values?
Using duplicated:
l[ !duplicated(l) ]
Given that
Each string is mapped to 1 number
we can do:
l[unique(names(l))]
Edit:, another alternative
tapply(l, names(l), `[`, 1)
Try the duplicated function:
l=list("NY"=10001, "CT"=10002, "CT"=10002)
l[!duplicated(l)]
Results in:
$NY
[1] 10001
$CT
[1] 10002

Conditional Lookup in R

I am trying to replace the blank (missing) zipcodes in the df table with the zipcodes in another table called zipless, based on names.
What would be the best approach? A for loop is probably very slow.
I was trying with something like this, but it does not work.
df$zip_new <- ifelse(df, is.na(zip_new),
left_join(df,zipless, by = c("contbr_nm" = "contbr_nm")),
zip_new)
I was able to make it work using this approach, but I am sure it is not the best one.
I first added a new column from the lookup table and in the next step selectively used it, where necessary.
library(dplyr)
#temporarly renaming the lookup column in the lookup table
zipless <- plyr::rename(zipless, c("zip_new"="zip_new_temp"))
#adding the lookup column to the main table
df <- left_join(df, zipless, by = c("contbr_nm" = "contbr_nm"))
#taking over the value from the lookup column zip_new_temp if the condition is met, else, do nothing.
df$zip_new <- ifelse((df$zip_new == "") &
(df$contbr_nm %in% zipless$contbr_nm),
df$zip_new_temp,
df$zip_new)
What would be a proper way to do this?
Thank you very much!
I'd suggest using match to just grab the zips you need. Something like:
miss_zips = is.na(df$zip_new)
df$zip_new[miss_zips] = zipless$zip_new[match(
df$contbr_nm[miss_zips],
zipless$contbr_nm
)]
Without sample data I'm not wholly sure of your column names, but something like that should work.
I can only recommend the data.table-package for things like these. But your general approach is correct. The data.table-package has a much nicer syntax and is designed to handle large data sets.
In data.table it would probably look like this:
zipcodes <- data.table(left_join(df, zipless, by = "contbr_nm"))
zipcodes[, zip_new := ifelse(is.na(zip_new), zip_new_temp, zip_new)]

How to constrain duplicate removal for specific data.frame in the list more elegantly?

I have data.frame objects in the list as an output of custom function, and I intend to proceed duplicate removal only first data.frame objects, while others shouldn't not be effected. I tried this in lapply function to control this constrain, but I have subscript error instead. I know this easier to do it in separately, but this is not desired for me. Can anyone point me out how to make this easier in functional programming ? Does anyone knows any useful trick of controlling the constrain on specific objects in the list ?
mini example:
myList <- list(
bar = data.frame(v1=c(12,21,37,21,37), v2=c(14,29,45,29,45)),
cat = data.frame(v1=c(18,42,18,42,81), v2=c(27,46,27,46,114)),
foo = data.frame(v1=c(3,3,33,3,33,91), v2=c(26,26,42,26,42,107))
)
it is easy to do like this:
.first <- unique(myList[[1L]])
res <- c(list(.first), myList[- 1L])
but I need to constrain duplicate removal only effect on the first data.frame, while others doesn't do remove duplicate, I intend to implement this in function more elegant way.
desired output:
myOutput <- list(
bar = data.frame(v1=c(12,21,37),v2=c(14,29,45)),
cat = data.frame(v1=c(18,42,18,42,81), v2=c(27,46,27,46,114)),
foo = data.frame(v1=c(3,3,33,3,33,91), v2=c(26,26,42,26,42,107))
)
If we need to use lapply, then we can loop through the sequence of list, and with if/else modify the list elements
lapply(seq_along(myList), function(i) if(i==1) unique(myList[[i]]) else myList[[i]])
Or else we can assign the modified list element
myList[[1]] <- unique(myList[[1]])

Replace a common string across two columns with a string from a different column in a dataframe in R

Suppose we have a data frame as below :
test<-data.frame(v1=c(1:10),v2=c(rep("a x",10)),v3=c(rep(c("a b","a c","a d","a e","a f"),2)),v4=c((rep("p",5)),rep("q",5)))
Basically, we need to replace "a" in the strings of columns 2 and 3 with the string mentioned in column 4. The result data frame should ideally look like this:
result<-data.frame(v1=c(1:10),v2=c(rep("p x",5),rep("q x",5)),v3=c("p b","p c","p d","p e","p f","q b","q c","q d","q e","q f"),v4=c((rep("p",5)),rep("q",5)))
Have tried the following to get the same:
for (i in 1:nrow(test))
{
test[i,2]<-gsub("a",test[i,4],test[,2])
test[i,3]<-gsub("a",test[i,4],test[,3])
}
Have also tried using apply functions, but wasn't able to achieve the desired result.
Any help in this regard would be highly appreciated. Thanks in advance!
We can loop over the columns, remove the 'a' and paste with the 'v4'
test[2:3] <- lapply(test[2:3], function(x) paste0(test$v4, sub("a", "", x)))
Perhaps this? (not sure how generic you need the code to be)
test2 <- test
test2$v2 <- mapply(gsub , "a" , test$v4 , test$v2)
test2$v3 <- mapply(gsub , "a" , test$v4 , test$v3)

Resources