Let's say one of the columns in my dataframe refers to the name of a city. The city names are expressed as "longformA", "longformB", and I'd like to replace them all with "shrtfrmA", "shrtfrmB". Each "longform" name has an associated "shrtfrm" name with which it should be replaced.
I've got a solution involving a named list and purrr bouncing around in my head, but I can't quite conceptualize it. The named list would have this structure:
city_names_short <- list("ANA" = "Anaheim", "BOS" = "Boston")
And so on, and so forth.
example_df$city[example_df$city == "Anaheim"] <- "ANA"
example_df$city[example_df$city == "Boston"] <- "BOS"
I could of course replace them one by one, as per the above, but I'd like to be a little more elegant.
Any and all advice is greatly appreciated!
I suggest unlisting your list to a named vector and then using match to create the shortform names:
city_names_short <- unlist(city_names_short)
df$shortname <- names(city_names_short)[match(df$city, city_names_short)]
Method 1
You can loop over your city column using sapply:
df$city <- sapply(df$city, function(city) {
names(city_names_short)[city_names_short == city]
})
The function in sapply finds the name (i.e. the shortened city name) of the list item that matches each city name.
Method 2
You can create a map by inverting the city_names_short list:
city_map <- names(city_names_short)
names(city_map) <- city_names_short
df$city <- city_map[df$city]
There is a function setNames in base R:
map = setNames(c("ANA","BOS"),c("Anaheim","Boston"))
df$city_short = map[df$city_long]
Related
I have a set of 270 RNA-seq samples, and I have already subsetted out their expected counts using the following code:
for (i in 1:length(sample_ID_list)) {
assign(sample_ID_list[i], subset(get(sample_file_list[i]), select = expected_count))
}
Where sample_ID_list is a character list of each sample ID (e.g., 313312664) and sample_file_list is a character list of the file names for each sample already in my environment to be subsetted (e.g., s313312664).
Now, the head of one of those subsetted samples looks like this:
> head(`308087571`)
# A tibble: 6 x 1
expected_count
<dbl>
1 129
2 8
3 137
4 6230.
5 1165.
6 0
The problem is I want to paste all of these lists together to make a counts dataframe, but I will not be able to differentiate between columns without their sample ID as the column name instead of expected_count.
Does anyone know of a good way to go about this? Please let me know if you need any more details!
You can use:
dplyr::bind_rows(mget(sample_ID_list), .id = name)
If we want to name the list, loop over the list, extract the first element of 'expected_count' ('nm1') and use that to assign the names of the list
nm1 <- sapply(sample_file_list, function(x) x$expected_count[1])
names(sample_file_list) <- nm1
Or from sample_ID_list
do.call(rbind, Map(cbind, mget(sample_ID_list), name = sample_ID_list))
Update
Based on the comments, we can loop over the 'sample_file_list and 'sample_ID_list' with Map and rename the 'expected_count' column with the corresponding value from 'sample_ID_list'
sample_file_list2 <- Map(function(dat, nm) {
names(dat)[match('expected_count', names(dat))] <- nm
dat
}, sample_file_list, sample_ID_list)
Or if we need a package solution,
library(data.table)
rbindlist(mget(sample_ID_list), idcol = name)
Update:
Thank you all so much for your help. I had to update my for loop as follows:
for (i in 1:length(sample_ID_list)) {
assign(sample_ID_list[i], subset(get(sample_file_list[i]), select = expected_count))
data<- get(sample_ID_list[i])
colnames(data)<- sample_ID_list[i]
assign(sample_ID_list[i],data)
}
and was able to successfully reassign the names!
I have a data frame, say acs10. I need to relabel the columns. To do so, I created another data frame, named as labelName with two columns: The first column contains the old column names, and the second column contains names I want to use, like the table below:
column_1
column_2
oldLabel1
newLabel1
oldLabel2
newLabel2
Then, I wrote a for loop to change the column names:
for (i in seq_len(nrow(labelName))){
names(acs10)[names(acs10) == labelName[i,1]] <- labelName[i,2]}
, and it works.
However, when I tried to put the for loop into a function, because I need to rename column names for other data frames as well, the function failed. The function I wrote looks like below:
renameDF <- function(dataF,varName){
for (i in seq_len(nrow(varName))){
names(dataF)[names(dataF) == varName[i,1]] <- varName[i,2]
print(varName[i,1])
print(varName[i,2])
print(names(dataF))
}
}
renameDF(acs10, labelName)
where dataF is the data frame whose names I need to change, and varName is another data frame where old variable names and new variable names are paired. I used print(names(dataF)) to debug, and the print out suggests that the function works. However, the calling the function does not actually change the column names. I suspect it has something to do with the scope, but I want to know how to make it works.
In your function you need to return the changed dataframe.
renameDF <- function(dataF,varName){
for (i in seq_len(nrow(varName))){
names(dataF)[names(dataF) == varName[i,1]] <- varName[i,2]
}
return(dataF)
}
You can also simplify this and avoid for loop by using match :
renameDF <- function(dataF,varName){
names(dataF) <- varName[[2]][match(names(dataF), varName[[1]])]
return(dataF)
}
This should do the whole thing in one line.
colnames(acs10)[colnames(acs10) %in% labelName$column_1] <- labelName$column_2[match(colnames(acs10)[colnames(acs10) %in% labelName$column_1], labelName$column_1)]
This will work if the column name isn't in the data dictionary, but it's a bit more convoluted:
library(tibble)
df <- tribble(~column_1,~column_2,
"oldLabel1", "newLabel1",
"oldLabel2", "newLabel2")
d <- tibble(oldLabel1 = NA, oldLabel2 = NA, oldLabel3 = NA)
fun <- function(dat, dict) {
names(dat) <- sapply(names(dat), function(x) ifelse(x %in% dict$column_1, dict[dict$column_1 == x,]$column_2, x))
dat
}
fun(d, df)
You can create a function containing just on line of code.
renameDF <- function(df, varName){
setNames(df,varName[[2]][pmatch(names(df),varName[[1]])])
}
I have created a basic list, and inside this list called lista (not big fantasy I know) there are 10 small dataframes.
Each one of this dataframes is called "numberone","numbertwo",...,"numberten".
When I accede this list I can't see their names.
but the output I can see in the workspace (Rstudio) is this
This below is the code and my tries:
#creating multiple dataframes and a list and then give a title to this dataframes inside the list.
lista = list()
names = c("numberone","numbertwo","numberthree","numberfour","numberfive","numbersix","numberseven","numbereight","numbernine","numberten")
for (i in 1:10) {
x = rnorm(10)
df = data.frame(x)
assign(names[i],df)
lista[[i]] = df
}
#trying to change manually the names of the dataframes inside the "lista" list
names(lista[1]) = "number one"
print(names(lista[1])) #this gives no results
#trying using dput
output = dput(lista[1])
##trying put manually the name in front of the dput output to rename the first dataframe inside lista..
list('numberone'= structure(list(x = c(0.750704535096297, 1.16925878942967,
0.806475114411396, 1.00973486249489, -0.301553383694518, 0.546485320708262,
1.03645444095639, 0.247820396853631, -1.64294545886444, -0.216784798035195
)), class = "data.frame", row.names = c(NA, -10L)))
#this seems to have renamed the first dataframe but, it's not working anyway
lista$numberone
print(names(lista[1])) #still no results
I've tried almost everything I could, but I can't give this single dataframes their names inside the list.
How can i name these dataframes?
Thank You
Try to do names(list)
Here an example using empty lists
list_test = vector("list",4)
names(list_test) = c("A","B","C","D")
list_test
$A
NULL
$B
NULL
$C
NULL
$D
NULL
With your example, I did:
names(lista) <- names
and I get:
names(lista)
[1] "numberone" "numbertwo" "numberthree" "numberfour" "numberfive" "numbersix" "numberseven"
[8] "numbereight" "numbernine" "numberten"
I think you might be looking to use double brackets (e.g.[[1]]) to reference elements in your list. Using your example code, this will work:
names(lista[[1]]) = "number one"
print(names(lista[[1]])) #first element is now called "number one"
You can also use a setNames() function within a Map() function to rename each column for your list of dataframes.
lista <-Map(setNames, lista , names)
lista # each column is now assigned a name from your vector called names
To keep your code clean as possible, it is best to avoid naming objects with the same names as functions. (Your example code uses a vector called "names" but also uses names() function.)
I want to create a dataframe with 3 columns.
#First column
name_list = c("ABC_D1", "ABC_D2", "ABC_D3",
"ABC_E1", "ABC_E2", "ABC_E3",
"ABC_F1", "ABC_F2", "ABC_F3")
df1 = data.frame(C1 = name_list)
These names in column 1 are a bunch of named results of the cor.test function. The second column should consist of the correlation coefficents I get by writing ABC_D1$estimate, ABC_D2$estimate.
My problem is now that I dont want to add the $estimate manually to every single name of the first column. I tried this:
df1$C2 = paste0(df1$C1, '$estimate')
But this doesnt work, it only gives me this back:
"ABC_D1$estimate", "ABC_D2$estimate", "ABC_D3$estimate",
"ABC_E1$estimate", "ABC_E2$estimate", "ABC_E3$estimate",
"ABC_F1$estimate", "ABC_F2$estimate", "ABC_F3$estimate")
class(df1$C2)
[1] "character
How can I get the numeric result for ABC_D1$estimate in my dataframe? How can I convert these characters into Named num? The 3rd column should constist of the results of $p.value.
As pointed out by #DSGym there are several problems, including the it is not very convenient to have a list of character names, and it would be better to have a list of object instead.
Anyway, I think you can get where you want using:
estimates <- lapply(name_list, function(dat) {
dat_l <- get(dat)
dat_l[["estimate"]]
}
)
cbind(name_list, estimates)
This is not really advisable but given those premises...
Ok I think now i know what you need.
eval(parse(text = paste0("ABC_D1", '$estimate')))
You connect the two strings and use the functions parse and eval the get your results.
This it how to do it for your whole data.frame:
name_list = c("ABC_D1", "ABC_D2", "ABC_D3",
"ABC_E1", "ABC_E2", "ABC_E3",
"ABC_F1", "ABC_F2", "ABC_F3")
df1 = data.frame(C1 = name_list)
df1$C2 <- map_dbl(paste0(df1$C1, '$estimate'), function(x) eval(parse(text = x)))
I feed inputList to my custom function, after several workflows(few simple filtration), I end up with data.frame resultDF, which needed to be relisted. I used relist to make resultDF has the same structure of inputList, but I got an error. Is there any simplest way of relisting resultDF? Can anyone point me out how to make this happen? Any idea? sorry for this simple question.
Here is input data.frame within the list:
inputList <- list(
bar=data.frame(from=c(8,18,33,53),
to=c(14,21,39,61), val=c(48,7,10,8)),
cat=data.frame(from=c(6,15,20,44),
to=c(10,17,34,51), val=c(54,21,14,12)),
foo=data.frame(from=c(11,43), to=c(36,49), val=c(49,13)))
After several workflows, I end up with this data.frame:
resultDF <- data.frame(
from=c(53,8,6,15,11,44,43,44,43),
to=c(61,14,10,17,36,51,49,51,49),
val=c(8,48,54,21,49,12,13,12,13)
)
I need to relist resultDF with the same structure of inputList. I used relit method, but I got an error.
This is my desired list:
desiredList <- list(
bar=data.frame(from=c(8,53), to=c(14,61), val=c(48,8)),
cat=data.frame(from=c(6,15,44,44), to=c(10,17,51,51), val=c(54,21,12,12)),
foo=data.frame(from=c(11,43,43), to=c(36,49,49), val=c(49,13,13))
)
How can I achieve desiredList ? Thanks in advance :)
We can loop through the 'inputList' and check whether the pasted row elements in 'resultDF' are %in% list elements and use that index to subset the 'resultDF'
lapply(inputList, function(x) resultDF[do.call(paste, resultDF) %in% do.call(paste, x),])
Another option is a join and then split. We rbind the 'inputList' to a data.table with an additional column 'grp' specifying the list names, join with the 'resultDF' on the column names of 'resultDF', and finally split the dataset using the 'grp' column
library(data.table)
dt <- rbindlist(inputList, idcol = "grp")[resultDF, on = names(resultDF)]
split(dt[,-1, with = FALSE], dt$grp)