How can I make sure each single value contained in a dataframe is a string?
Moreover, is there a way I can add a prefix to each value contained in a dataframe? (for example, turning a 0.02 to "X0.02")
We can loop through the columns of the data.frame with lapply, convert to character and assign the output back to the dataset. The [] is used to preserve the attributes of the original data and not output as a list element
dat[] <- lapply(dat, as.character)
Or if there is at least one character element, conversion to matrix and then back to data.frame will also make sure the elements are character
as.data.frame(as.matrix(dat), stringsAsFactors = FALSE)
For the second case
dat[] <- lapply(dat,function(x) paste0("X", x))
Or in tidyverse
library(dplyr)
library(stringr)
dat %>%
mutate_all(list(~ str_c("X", .)))
Related
I have been trying to extract multiple patterns in a sequence in a data frame of each row and returning those patterns in a new column. But the problem is i get a list if i use str_extract_all and i don't how to unlist.
I have been trying to use the code at the bottom. The unnest does not work either neither does unlist in mutate function.
dc <- z %>%
mutate(sequence_match = str_extract_all(z$Sequence,
c("R..S", "R..T", "R..Y")))
You can return one comma-separated string of values.
library(dplyr)
library(stringr)
dc = z %>%
mutate(sequence_match = sapply(str_extract_all(Sequence,
c("R..S", "R..T", "R..Y"), toString)))
I'm creating an empty list of dataframes that I will append later using lapply.
library(tidyverse)
library(dplyr)
library(purrr)
my.list <- lapply(1:192, function(x, nr = 468, nc = 1) { data.frame(symbol = matrix(nrow=nr, ncol=nc)) })
str(my.list)
If you obtain the structure of my.list you will notice that the structure of the columns within each dataframe is "logical". I would like the structure of the column in each dataframe to be character rather than logical.
Can I change anything within my lapply function above so that the columns in the resulting list of dataframes are character? Or how best would I go about this task? I'm creating this empty list of dataframes because I understand that R works faster if it doesn't have to constantly append files. Thus my next step is to perform a map function to populate each dataframe in this list of dataframes with character data.
The issue would be that by creating NA, by default it is NA_logical_. If we want to create a character column, use NA_character_. Here, we can fix with
my.list <- lapply(my.list, function(x) {x[] <- lapply(x, as.character); x})
Or while creating the data.frame column, use
my.list <- lapply(1:192, function(x) data.frame(symbol = rep(NA_character_, 468)))
The matrix route to get a single column data.frame is not ideal and is sometimes incorrect (because matrix can have only a single class whereas data.frame columns can be of different type). The easiest option is replicate the NA_character_ with n times to create a single column data.frame with n rows
Please consider the following three examples:
library(tidyverse)
x_vector <- c("Device=iPhone", "Device=Samsung Galaxy")
x_df <- as.data.frame(c("Device=iPhone", "Device=Samsung Galaxy"))
x_tibble <- as_tibble(c("Device=iPhone", "Device=Samsung Galaxy"))
I now want to remove part of each string, i.e. the "Device=" sub string. It works for a vector, it also works for a data frame (if I specify the respective column), but I get a weird result for a tibble:
(the desired output would be the ones shown below for the vector and df example)
output_vector <- str_remove(x_vector, "Device=")
output_df <- str_remove(x_df[,1], "Device=")
output_tibble <- str_remove(x_tibble[,1], "Device=")
Can anyone please explain why this doesn't work with tibbles and how I can get it working with tibbles?
Thanks!
The issue is that tibble won't drop the dimensions when we do [,1]. It is a still a tibble with a single column.
library(stringr)
class(x_tibble[,1])
#[1] "tbl_df" "tbl" "data.frame"
class(x_df[,1])
#[1] "factor"
Instead, we can use [[ to extract the column as a vector because str_remove expects a vector as input (?str_remove - string - Input vector. Either a character vector, or something coercible to one.)
str_remove(x_tibble[[1]], "Device=")
Is there something in R to call like df$col1:df$col5?
I would like to convert the character elements to numeric with as.numeric, so I would like to do something like as.numeric(df$col1:df$col5) to convert all elements in these columns to numeric.
df = mtcars
If you want to access multiple columns by column number
lapply(df[,c(1:3,5)], as.numeric) #Or as.character if you want
If you want to access by colnames
lapply(df[,c('mpg','cyl')], as.numeric)
You can use a numeric index to get a range of columns, as suggested in the comments.
But if you the columns are not in order you can construct a vector of names, and use that (rather than write the names explicitly, as in the other answer)
my_cols <- paste0('col', 1:5)
my_df[, my_cols] <- lapply(my_df[, my_cols], as.numeric)
I have a data frame (my_df) with columns named after individual county numbers. I melted/cast the data from a much larger set to get to this point. The first column name is year and it is a list of years from 1970-2011. The next 3010 columns are counties. However, I'd like to rename the county columns to be "column_"+county number.
This code executes in R but for whatever reason doesn't update the column names. they remain solely the numbers... any help?
new_col_names = paste0("county_",colnames(my_df[,2:ncol(my_df)]))
colnames(my_df[,2:ncol(my_df)]) = new_col_names
The problem is the subsetting within the colnames call.
Try names(my_df) <- c(names(my_df)[1], new_col_names) instead.
Note: names and colnames are interchangeable for data.frame objects.
EDIT: alternate approach suggested by flodel, subsetting outside the function call:
names(my_df)[-1] <- new_col_names
colnames() is for a matrix (or matrix-like object), try simply names() for a data.frame
Example:
new_col_names=paste0("county_",colnames(my_df[,2:ncol(my_df)]))
my_df <- data.frame(a=c(1,2,3,4,5), b=rnorm(5), c=rnorm(5), d=rnorm(5))
names(my_df) <- c(names(my_df)[1], new_col_names)