I need to convert a multi-row two-column data.frame to a named character vector.
My data.frame would be something like:
dd = data.frame(crit = c("a","b","c","d"),
name = c("Alpha", "Beta", "Caesar", "Doris")
)
and what I actually need would be:
whatiwant = c("a" = "Alpha",
"b" = "Beta",
"c" = "Caesar",
"d" = "Doris")
Use the names function:
whatyouwant <- as.character(dd$name)
names(whatyouwant) <- dd$crit
as.character is necessary, because data.frame and read.table turn characters into factors with default settings.
If you want a one-liner:
whatyouwant <- setNames(as.character(dd$name), dd$crit)
You can also use deframe(x) from the tibble package for this.
tibble::deframe()
It converts the first column to names and second column to values.
You can make a vector from dd$name, and add names using names(), but you can do it all in one step with structure():
whatiwant <- structure(as.character(dd$name), names = as.character(dd$crit))
Here is a very general, easy, tidy way:
library(dplyr)
iris %>%
pull(Sepal.Length, Species)
The first argument is the values, the second argument is the names.
For variety, try split and unlist:
unlist(split(as.character(dd$name), dd$crit))
# a b c d
# "Alpha" "Beta" "Caesar" "Doris"
There's also a magrittr solution to this via the exposition pipe (%$%):
library(magrittr)
dd %$% set_names(as.character(name), crit)
Minor advantage over tibble::deframe is that one doesn't have to have exactly a two-column frame/tibble as argument (i.e., avoid a select(value_col, name_col) %>%).
Note that the magrittr::set_names versus base::setNames is exchangeable. I simply prefer the former since it matches "set_(col|row)?names".
Related
I'm inside a big function I have to write. In the last part I have to calculate the mean of a column in a data frame. The name of the column I am operating on is given as an argument to the function.
I think you're asking how to compute the mean of a variable in a data frame, given the name of the column. There are two typical approaches to doing this, one indexing with [[ and the other indexing with [:
data(iris)
mean(iris[["Petal.Length"]])
# [1] 3.758
mean(iris[,"Petal.Length"])
# [1] 3.758
mean(iris[["Sepal.Width"]])
# [1] 3.057333
mean(iris[,"Sepal.Width"])
# [1] 3.057333
if your column contain any value that you want to neglect. it will help you
## da is data frame & Ozone is column name
##for single column
mean(da$Ozone, na.rm = TRUE)
##for all columns
colMeans(x=da, na.rm = TRUE)
Any of the following should work!!
df <- data.frame(x=1:3,y=4:6)
mean(df$x)
mean(df[,1])
mean(df[["x"]])
Use summarise in the dplyr package:
library(dplyr)
summarise(df, Average = mean(col_name, na.rm = T))
note: dplyr supports both summarise and summarize.
I think what you are being asked to do (or perhaps asking yourself?) is take a character value which matches the name of a column in a particular dataframe (possibly also given as a character). There are two tricks here. Most people learn to extract columns with the "$" operator and that won't work inside a function if the function is passed a character vecor. If the function is also supposed to accept character argument then you will need to use the get function as well:
df1 <- data.frame(a=1:10, b=11:20)
mean_col <- function( dfrm, col ) mean( get(dfrm)[[ col ]] )
mean_col("df1", "b")
# [1] 15.5
There is sort of a semantic boundary between ordinary objects like character vectors and language objects like the names of objects. The get function is one of the functions that lets you "promote" character values to language level evaluation. And the "$" function will NOT evaluate its argument in a function, so you need to use"[[". "$" only is useful at the console level and needs to be completely avoided in functions.
Suppose you have a data frame(say df) with columns "x" and "y", you can find mean of column (x or y) using:
1.Using mean() function
z<-mean(df$x)
2.Using the column name(say x) as a variable using attach() function
attach(df)
mean(x)
When done you can call detach() to remove "x"
detach()
3.Using with() function, it lets you use columns of data frame as distinct variables.
z<-with(df,mean(x))
I have a dataframe called and df and I want to remove a row for a specific row which contains NA.
As commented before, you should provide a reproducible R example. If I understand correctly you can easily use subset function.
# Generating some fake data:
set.seed(101)
df <- data.frame("StudyID" = paste("Study", seq(1:100), sep = "_"),
"Column" = sample(c(1:30, NA),100, replace = TRUE))
Use subset with !is.na() if your NA is a Not Available value
newdf <- subset(df, !is.na(Column))
If your NA is a character:
# Numeric to character conversion
df$Column<- as.character(df$Column)
# Replace missing values with "NA"
df$Column[is.na(df$Column)] <- "NA"
Thus, just subsetting:
newdf <- subset(reviews, Column != "NA")
Here is a solution using grepl from base R, considering NA as a character.
pattern<-"NA"
df <-df[!grepl(pattern, df$Column),]
If possible share sample data for better clarity on the data
I have list1 of dataframes, which i want to do.call("rbind", list1)later, but first i want to add a identifying factor to each of the dataframes. This factor should be the name of the dataframe:
list1 <- lapply(vector("list", 6), function(x)
data.frame(replicate(10,sample(0:1,1000,rep=TRUE))))
names(list1) <- LETTERS[1:6]
e.g. assign "A" to each row of the first dataframe "A" and so on:
list[[1]]$Cat <- "A"
list[[2]]$Cat <- "B" #etc
I tried something like
list1 <- lapply(list1, function(x)
{list1[[x]]$Cat<- names(list1[[x]]); x})
but failed:
Error in list1[[x]] : invalid subscript type 'list'
How to achieve what i want?
Thank you.
This can be done easily using tidyverse packages:
library( tidyverse )
imap( list1, ~mutate(.x, Cat = .y) ) %>% bind_rows
To break this down:
imap from purrr package passes every element of the first argument (list1 in this case) along with the element's name to the function you provide in the second argument. By imap's convention, the function can refer to the element using .x and to the element's name using .y.
The function in the second argument uses mutate from dplyr package, which creates a new column named Cat.
Lastly, bind_rows is the tidyverse equivalent of do.call( "rbind", list1 ) that you provided in your question.
EDIT: As joran pointed out in the comments, if your end goal is to concatenate all the data.frames together, bind_rows provides a convenient way to automatically prepend a column identifying the original data.frame:
bind_rows( list1, .id = "Cat" )
I want to use adist to calculate edit distance between the values of two columns in each row.
I am using it in more-or-less this way:
A <- c("mad","car")
B <- c("mug","cat")
my_df <- data.frame(A,B)
my_df$dist <- adist(my_df$A, my_df$B, ignore.case = TRUE)
my_df <- my_df[order(dist),]
The last two rows are the same as in my case, but the actual data frame looks a bit different - columns of my original data frame are character type, not factor. Also, the dist column seems to be returned as 2-column matrix, I have no idea why it happens.
Update:
I have read a bit and found that I need to apply it over the rows, so my new code is following:
apply(my_df, 1, function(d) adist(d[1], d[2]))
It works fine, but for my original dataset calling it by column numbers is inpractical, how can I refer to column names in this function?
Using tidyverse approach, you may use the following code:
library(tidyverse)
A <- c("mad","car")
B <- c("mug","cat")
my_df <- data.frame(A,B)
my_df %>%
rowwise() %>%
mutate(Lev_dist=adist(x=A,y=B,ignore.case=TRUE))
You can overcome that problem by using mapply, i.e.
mapply(adist, df$A, df$B)
#[1] 2 1
As per adist function definition the x and y arguments should be character vectors. In your example the function is returning a 2x2 matrix because it is comparing also the cross words "mad" with "cat" and "car" with "mug".
Just look at the matrix master diagonal.
I'm inside a big function I have to write. In the last part I have to calculate the mean of a column in a data frame. The name of the column I am operating on is given as an argument to the function.
I think you're asking how to compute the mean of a variable in a data frame, given the name of the column. There are two typical approaches to doing this, one indexing with [[ and the other indexing with [:
data(iris)
mean(iris[["Petal.Length"]])
# [1] 3.758
mean(iris[,"Petal.Length"])
# [1] 3.758
mean(iris[["Sepal.Width"]])
# [1] 3.057333
mean(iris[,"Sepal.Width"])
# [1] 3.057333
if your column contain any value that you want to neglect. it will help you
## da is data frame & Ozone is column name
##for single column
mean(da$Ozone, na.rm = TRUE)
##for all columns
colMeans(x=da, na.rm = TRUE)
Any of the following should work!!
df <- data.frame(x=1:3,y=4:6)
mean(df$x)
mean(df[,1])
mean(df[["x"]])
Use summarise in the dplyr package:
library(dplyr)
summarise(df, Average = mean(col_name, na.rm = T))
note: dplyr supports both summarise and summarize.
I think what you are being asked to do (or perhaps asking yourself?) is take a character value which matches the name of a column in a particular dataframe (possibly also given as a character). There are two tricks here. Most people learn to extract columns with the "$" operator and that won't work inside a function if the function is passed a character vecor. If the function is also supposed to accept character argument then you will need to use the get function as well:
df1 <- data.frame(a=1:10, b=11:20)
mean_col <- function( dfrm, col ) mean( get(dfrm)[[ col ]] )
mean_col("df1", "b")
# [1] 15.5
There is sort of a semantic boundary between ordinary objects like character vectors and language objects like the names of objects. The get function is one of the functions that lets you "promote" character values to language level evaluation. And the "$" function will NOT evaluate its argument in a function, so you need to use"[[". "$" only is useful at the console level and needs to be completely avoided in functions.
Suppose you have a data frame(say df) with columns "x" and "y", you can find mean of column (x or y) using:
1.Using mean() function
z<-mean(df$x)
2.Using the column name(say x) as a variable using attach() function
attach(df)
mean(x)
When done you can call detach() to remove "x"
detach()
3.Using with() function, it lets you use columns of data frame as distinct variables.
z<-with(df,mean(x))