I have a column in a data frame, which contains string values. I want to convert these values to lists of characters. When i try to execute the following code:
library(tidyverse)
col <- c("a,b,c,d","e,f,h")
df <- data_frame(col)
for (i in 1:length(df$col)) {
df$col[[i]] <- as.vector(unlist(strsplit(df$col[[i]],",")),mode ="list")
}
i get this error message:
Error in df$col[[i]] <- as.vector(unlist(strsplit(df$col[[i]], ",")), : more elements supplied than there are to replace
Traceback:
Is there a way to convert all the values in the column to lists ?
Thanks
If I understand your question correctly, then this will do the trick:
rapply(df, list)
Related
I want to sort an DataFrame by a column which is specified by an object.
What I want to do is
data <- dplyr::arrange(data, desc(`column_name`))
by replacing column_name to object like str_c("column_", "name") because I want to sort by condition.
Those codes does not work.
data <- dplyr::arrange(data, desc(str_c("column_", "name")))
data <- dplyr::arrange(data, desc(colnames(data[str_c("column_", "name")])))
My code returns
"Error: incorrect size (1) at position 1, expecting : columnlength"
An option would be to convert to symbol and then evaluate (!!)
library(stringr)
dplyr::arrange(data, desc(!! rlang::sym(str_c("column_", "name"))))
I'd like to extract the first and second values from a list of lists. I was able to extract the first value with no issue. However, it gives me an error when I was trying to extract the second value because not all lists from the suggestion column has more than one value. How can I extract the second value from the suggestion column in mydf_1 and generate NA to those with no second value?
Below are the codes I wrote to get to the first suggestion, but when I do
mydf_1$second_suggestion <- lapply(mydf_1$suggestion, `[[`, 2)
it gives this error:
Error in FUN(X[[i]], ...) : subscript out of bounds
Thanks.
# create a data frame contains words
mydf <- data.frame("words"=c("banna", "pocorn and drnk", "trael", "rabbitt",
"emptey", "ebay", "templete", "interne", "bing",
"methog", "tullius"), stringsAsFactors=FALSE)
# add a custom word to the dictionary$
library(hunspell)
mydict_hunspell <- dictionary(lang="en_US", affix=NULL, add_words="bing",
cache=TRUE)
# use hunspell to identify misspelled words and create a row number column
# for later uses
mydf$words_checking <- hunspell(mydf$word, dict=mydict_hunspell)
mydf$row_num <- rownames(mydf)
# unlist the words_checking column and get suggestions for those misspelled
# words in another data frame
library(tidyr)
mydf_1 <- unnest(mydf, words_checking)
mydf_1$suggestion <- hunspell_suggest(mydf_1$words_checking)
# extract first suggestion from suggestion column
mydf_1$first_suggestion <- lapply(mydf_1$suggestion, `[[`, 1)
You can check the length of each list first before trying to extract the element of interest. Also, I recommend using sapply so that you have a character vector returned, as opposed to another list.
For the first suggestion:
index <- 1
sapply(mydf_1$suggestion, function(x) {if(length(x) < index) {NA} else {x[[index]]}})
And for the second suggestion and so on:
index <- 2
sapply(mydf_1$suggestion, function(x) {if(length(x) < index) {NA} else {x[[index]]}})
This could be wrapped into a larger function with a bit more code if you need to automate...
In theory, you could test with is.null(see How to test if list element exists? ), but I still got the same error trying that approach.
I have a question.
For example, I would like to remove a dataframe, df_to_remove, in R.
I can remove it in this way: rm("df_to_remove").
If I set a string variable called dataframe_name.
dataframe_name = "df_t_remove"
Why the following commend does not work?
rm(eval(dataframe_name))?
How to remove a dataframe in R by a string variable?
Use the list = argument in rm():
x <- 5
y <- 'x'
rm(list = y)
I am trying to make use of the content of a dataframe in a function, here is a simplified example of my problem.
df <- data.frame(v1=1:10,v2=23:32)
df2 <- data.frame(v1=1:3,v2=3:5)
fxm <- function(x,y,q)
{
return(cbind(q[q[,2]==x,],y))
}
mapply(fxm,df[,1],df[,2],q=df2)
Error in q[, 2] : incorrect number of dimensions
if I add a print statement:
df <- data.frame(v1=1:10,v2=23:32)
df2 <- data.frame(v1=1:3,v2=3:5)
fxm <- function(x,y,q)
{
print(q)
return(cbind(q[q[,2]==x,],y))
}
mapply(fxm,df[,1],df[,2],q=df2)
I get:
[1] 1 2 3
Error in q[, 2] : incorrect number of dimensions
The data frame is converted to a vector of its first column for some reason. How can I stop this from happening, and have the whole dataframe accessible to my function?
I am trying to select a subset of the dataframe and returning it based on the other two parameters of the function, which is why I need the whole dataframe to be passed to the function.
If I understand you correctly, you want the whole thing q = df2 passed to the fxm function you define, am I right?
The problem is that in your code mapply will extract elements from q = df2 as some additional parameters just same as extracting elements from df[,1] and df[,2]. You need to set MoreArgs parameter for mapply to pass the whole thing to the function like this:
df <- data.frame(v1=1:10,v2=23:32)
df2 <- data.frame(v1=1:3,v2=3:5)
fxm <- function(x,y,q)
{
print(q)
return(cbind(q[q[,2]==x,],y))
}
mapply(fxm,df[,1],df[,2], MoreArgs = list(q=df2))
This still doesn't work for me and there is some error elsewhere. From the printing result you can see the whole data.frame prints out, which solves your original problem.
I have a list and when I apply sort() it changes the type to 'integer' which is not understandable to me. Help is really appreciated.
myfile.csv is a single column with values {"a","a","c","b","c","a"}
The code is as follows:
temp <- read.csv("myfile.csv",header=TRUE)
typeof(temp) ## prints: "list"
temp2 <- sort(temp[,1])
typeof(temp2) ## prints: "integer"
and now i can't refer elements in temp2 using temp2[1,] or temp2[2,] and get error
Error in `[.default`(temp3, 1, ) : incorrect number of dimensions
Use this command and temp2 will be a data frame with sorted values:
temp2 <- temp[order(temp[ , 1]), , drop = FALSE]
temp2 <- sort(temp[,1]) takes the first column of the data.frame temp, sorts it, and assigns it to temp2. The result is an atomic vector (possibly with additional attributes) because data.frame columns are atomic vectors (possibly with additional attributes). If you want the first element temp2, you can use temp2[1]. You should study help("[").