How to use the method separate with a character? - r

cerveceria_dataset$CLIENTE <- separate(cerveceria_dataset$CLIENTE, col = CLIENTE, into = c("Nombre","Apellido"), sep = ";")
This code give me the
"Error in UseMethod("separate") :
no applicable method for 'separate' applied to an object of class "character""enter image description here

I think you are applying the function wrong. Try using -
cerveceria_dataset <- tidyr::separate(cerveceria_dataset,
col = CLIENTE, into = c("Nombre","Apellido"), sep = ";")

tidyr::separate splits a string column into multiple columns. If you are just working with a single character vector, you probably want something like stringr::split, which splits a character vector into a list of vectors (or a matrix if you use str_split_fixed).

Related

convert character column and then split it into multiple new boolean columns using r mutate

I am attempting to split out a flags column into multiple new columns in r using mutate_at and then separate functions. I have simplified and cleaned my solution as seen below, however I am getting an error that indicates that the entire column of data is being passed into my function rather than each row individually. Is this normal behaviour which just requires me to loop over each element of x inside my function? or am I calling the mutate_at function incorrectly?
example data:
dataVariable <- data.frame(c_flags = c(".q.q.q","y..i.o","0x5a",".lll.."))
functions:
dataVariable <- read_csv("...",
col_types = cols(
c_date = col_datetime(format = ""),
c_dbl = col_double(),
c_flags = col_character(),
c_class = col_factor(c("a", "b", "c")),
c_skip = col_skip()
))
funTranslateXForNewColumn <- function(x){
binary = ""
if(startsWith(x, "0x")){
binary=hex2bin(x)
} else {
binary = c(0,0,0,0,0,0)
splitFlag = strsplit(x, "")[[1]]
for(i in splitFlag){
flagVal = 1
if(i=="."){
flagVal = 0
}
binary=append(binary, flagVal)
}
}
return(paste(binary[4:12], collapse='' ))
}
mutate_at(dataVariable, vars(c_flags), funs(funTranslateXForNewColumn(.)))
separate(dataVariable, c_flags, c(NA, "flag_1","flag_2","flag_3","flag_4","flag_5","flag_6","flag_7","flag_8","flag_9"), sep="")
The error I am receiving is:
Warning messages:
1: Problem with `mutate()` input `c_flags`.
i the condition has length > 1 and only the first element will be used
After translating the string into an appropriate binary representation of the flags, I will then use the seperate function to split it into new columns.
Similar to OP's logic but maybe shorter :
dataVariable$binFlags <- sapply(strsplit(dataVariable$c_flags, ''), function(x)
paste(as.integer(x != '.'), collapse = ''))
If you want to do this using dplyr we can implement the same logic as :
library(dplyr)
dataVariable %>%
mutate(binFlags = purrr::map_chr(strsplit(c_flags, ''),
~paste(as.integer(. != '.'), collapse = '')))
# c_flags binFlags
#1 .q.q.q 010101
#2 y..i.o 100101
#3 .lll.. 011100
mutate_at/across is used when you want to apply a function to multiple columns. Moreover, I don't see here that you are creating only one new binary column and not multiple new columns as mentioned in your post.
I was able to get the outcome I desired by replacing the mutate_at function with:
dataVariable$binFlags <- mapply(funTranslateXForNewColumn, dataVariable$c_flags)
However I want to know how to use the mutate_at function correctly.
credit to: https://datascience.stackexchange.com/questions/41964/mutate-with-custom-function-in-r-does-not-work
The above link also includes the solution to get this function to work which is to vectorize the function:
v_funTranslateXForNewColumn <- Vectorize(funTranslateXForNewColumn)
mutate_at(dataVariable, vars(c_flags), funs(v_funTranslateXForNewColumn(.)))

Converting list of Characters to Named num in R

I want to create a dataframe with 3 columns.
#First column
name_list = c("ABC_D1", "ABC_D2", "ABC_D3",
"ABC_E1", "ABC_E2", "ABC_E3",
"ABC_F1", "ABC_F2", "ABC_F3")
df1 = data.frame(C1 = name_list)
These names in column 1 are a bunch of named results of the cor.test function. The second column should consist of the correlation coefficents I get by writing ABC_D1$estimate, ABC_D2$estimate.
My problem is now that I dont want to add the $estimate manually to every single name of the first column. I tried this:
df1$C2 = paste0(df1$C1, '$estimate')
But this doesnt work, it only gives me this back:
"ABC_D1$estimate", "ABC_D2$estimate", "ABC_D3$estimate",
"ABC_E1$estimate", "ABC_E2$estimate", "ABC_E3$estimate",
"ABC_F1$estimate", "ABC_F2$estimate", "ABC_F3$estimate")
class(df1$C2)
[1] "character
How can I get the numeric result for ABC_D1$estimate in my dataframe? How can I convert these characters into Named num? The 3rd column should constist of the results of $p.value.
As pointed out by #DSGym there are several problems, including the it is not very convenient to have a list of character names, and it would be better to have a list of object instead.
Anyway, I think you can get where you want using:
estimates <- lapply(name_list, function(dat) {
dat_l <- get(dat)
dat_l[["estimate"]]
}
)
cbind(name_list, estimates)
This is not really advisable but given those premises...
Ok I think now i know what you need.
eval(parse(text = paste0("ABC_D1", '$estimate')))
You connect the two strings and use the functions parse and eval the get your results.
This it how to do it for your whole data.frame:
name_list = c("ABC_D1", "ABC_D2", "ABC_D3",
"ABC_E1", "ABC_E2", "ABC_E3",
"ABC_F1", "ABC_F2", "ABC_F3")
df1 = data.frame(C1 = name_list)
df1$C2 <- map_dbl(paste0(df1$C1, '$estimate'), function(x) eval(parse(text = x)))

Resolving a formatter string

Suppose I have the following:
format.string <- "#AB#-#BC#/#DF#" #wanted to use $ but it is problematic
value.list <- c(AB="a", BC="bcd", DF="def")
I would like to apply the value.list to the format.string so that the named value is substituted. So in this example I should end up wtih a string: a-bcd/def
I tried to do it like the following:
resolved.string <- lapply(names(value.list),
function(x) {
sub(x = save.data.path.pattern,
pattern = paste0(c("#",x,"#"), collapse=""),
replacement = value.list[x]) })
But it doesn't seem to be working correctly. Where am I going wrong?
The glue package is designed for this. You can change the opening and closing delimiters using .open and .close, but they have to be different. Also note that value.list has to be either a list or a dataframe:
library(glue)
format.string <- "{AB}-{BC}/{DF}"
value.list <- list(AB="a", BC="bcd", DF="def")
glue_data(value.list, format.string)
# a-bcd/def
To answer your actual question, by using lapply over names(value.list) you, as your output shows, take each of the elements of value.list and perform the replacement. However, all this happens independently, i.e., the replacements aren't ultimately combined to a single result.
As to make something very similar to your approach work, we can use Reduce which does exactly this combining:
Reduce(function(x, y) sub(paste0(c("#", y, "#"), collapse = ""), value.list[y], x),
init = format.string, names(value.list))
# [1] "a-bcd/def"
If we call the anonymous function f, then the result is
f(f(f(format.string, "A"), "B"), "C")
exactly as you intended, I believe.
We can use gsubfn that can take a key/value pair as replacement to change the pattern with the 'value'
library(gsubfn)
gsub("#", "", gsubfn("[^#]+", as.list(value.list), format.string))
#[1] "a-bcd/def"
NOTE: 'value.list' is a vector and not a list

How can I write this loop?

I have five dataframes (a-f), each of which has a column 'nq'. I want to find the max, min and average of the nq columns
classes <- c("a","b","c","d","e","f")
for (i in classes){
format(max(i$nq), scientific = TRUE)
format(min(i$nq), scientific = TRUE)
format(mean(i$nq), scientific = TRUE)
}
But the code is not working. Can you please help?
You can't use a character value as a data.frame name. The value "a" is not the same as the data.frame a.
You probably shouldn't have a bunch of data.frames lying around. You probably want to have them all in a list. Then you can lapply over them to get results.
mydata <- list(
a = data.frame(nq=runif(10)),
b = data.frame(nq=runif(10)),
c = data.frame(nq=runif(10)),
d = data.frame(nq=runif(10))
)
then you can do
lapply(mydata, function(x)
format(c(max(x$nq), min(x$nq), mean(x$nq)), scientific = TRUE)
)
to get all the values at once.
The reason it is not working is because 'i' is a character/string. As already mentioned by Mr.Flick you have to make it into a list.
Alternatively, you instead of writing i$nq in your loop you can write get(i)$nq. The get() function will search the workspace for an object by name and it will return the object itself. However, this is not as clean as making it into a list and using lapply.

Avoid that space in column name is replaced with period (".") when using read.csv()

I am using R to do some data pre-processing, and here is the problem that I am faced with: I input the data using read.csv(filename,header=TRUE), and then the space in variable names became ".", for example, a variable named Full Code became Full.Code in the generated dataframe. After the processing, I use write.xlsx(filename) to export the results, while the variable names are changed. How to address this problem?
Besides, in the output .xlsx file, the first column become indices(i.e., 1 to N), which is not what I am expecting.
If your set check.names=FALSE in read.csv when you read the data in then the names will not be changed and you will not need to edit them before writing the data back out. This of course means that you would need quote the column names (back quotes in some cases) or refer to the columns by location rather than name while editing.
To get spaces back in the names, do this (right before you export - R does let you have spaces in variable names, but it's a pain):
# A simple regular expression to replace dots with spaces
# This might have unintended consequences, so be sure to check the results
names(yourdata) <- gsub(x = names(yourdata),
pattern = "\\.",
replacement = " ")
To drop the first-column index, just add row.names = FALSE to your write.xlsx(). That's a common argument for functions that write out data in tabular format (write.csv() has it, too).
Here's a function (sorry, I know it could be refactored) that makes nice column names even if there are multiple consecutive dots and trailing dots:
makeColNamesUserFriendly <- function(ds) {
# FIXME: Repetitive.
# Convert any number of consecutive dots to a single space.
names(ds) <- gsub(x = names(ds),
pattern = "(\\.)+",
replacement = " ")
# Drop the trailing spaces.
names(ds) <- gsub(x = names(ds),
pattern = "( )+$",
replacement = "")
ds
}
Example usage:
ds <- makeColNamesUserFriendly(ds)
Just to add to the answers already provided, here is another way of replacing the “.” or any other kind of punctation in column names by using a regex with the stringr package in the way like:
require(“stringr”)
colnames(data) <- str_replace_all(colnames(data), "[:punct:]", " ")
For example try:
data <- data.frame(variable.x = 1:10, variable.y = 21:30, variable.z = "const")
colnames(data) <- str_replace_all(colnames(data), "[:punct:]", " ")
and
colnames(data)
will give you
[1] "variable x" "variable y" "variable z"

Resources