imported tibble from textfile. Many numeric columns are imported as "chr". I guess it's because they contain a "," instead of a ".".
My goal is to write a loop which runs through the names of desired columns, replaces "," with "." and converts columns into "num".
Little example:
data <- data.frame("A1" =c("2,1","2,1","2,1"), "A2" =c("1,3","1,3","1,3"),
stringsAsFactors = F) %>% as.tibble() #example data
colname <- c("A1", "A2") #creating variable for loop
for(i in colname) {
nam <- paste0("data$", i)
assign(nam, as.numeric(gsub(",",".", eval(parse(text = paste0("data$",i))))) )
}
Instead of overwriting the existing column, R creates a new variable:
data$A1 # that's the existing column as part of the tibble
[1] "2,1" "2,1" "2,1"
`data$A1` # thats just a new variable. mind the little``
[1] 2.1 2.1 2.1
I also tried to assign (<-) the new numeric values via eval, but that does not work either.
eval(parse(text = paste0("data$", i))) <- as.numeric(
gsub(",",".", eval(parse(text = paste0("data$",i)))))
Error: target of assignment expands to non-language object
Any suggestions on how to transform? I have the same issue with other columns that I want to aggregate to a new variable. This variable should also be part of the existing tibble. I could do it by hand. This would take lots of time and probably produce many mistakes.
Thanks a lot!
Sam
As you are already working with the tidyverse, you can use dplyr::mutate_at and the colname variable you have already defined.
data %>%
mutate_at(.vars = colname,
.funs = function(x) { as.numeric(gsub(",", ".", x)) })
Related
I have a dataset with columns that contain information of a code + name, which I would like to separate into 2 columns. So, just an example:
Column E5000_A contain values like `0080002. ALB - Democratic Party' in one cell, I would like two columns one containing the code 0080002, and the other containing the other info.
I have 8 more columns with values very similar (E5000_A until E5000_H). This is the code that I am writing.
cols2 <- c("E5000_A" , "E5000_B" , "E5000_C" , "E5000_D" ,
"E5000_E" , "E5000_F" , "E5000_G" , "E5000_H" )
for(i in cols2){
cses_imd_m <- cses_imd_m %>% mutate(substr(i, 1L, 7L))
}
But for some reason it is only generating a new column for the E5000_A and the loop does not go to the other variables. What am I doing wrong? Let me know if you need more details about the code or data frame.
data.frame approach
# to extract codes
df %>%
mutate_at(.vars = vars(c("E5000_A", "E5000_B", "E5000_C", "E5000_D", "E5000_E",
"E5000_F", "E5000_G", "E5000_H")),
.funs = function(x) str_extract("^\\d+", x))
You can also use across() inside of mutate().
If you want to use for loop
col_names <- c("E5000_A", "E5000_B", "E5000_C", "E5000_D", "E5000_E", "E5000_F", "E5000_G", "E5000_H")
for (i in col_names) {
df[,sprintf("code_%s", i)] <- str_extract("^\\d+", df[,i])
df[,sprintf("party_%s", i)] <- gsub(".*\\.", "", df[,i]) %>% str_trim() # remove all before dot (.)
}
I want to create a dataframe with 3 columns.
#First column
name_list = c("ABC_D1", "ABC_D2", "ABC_D3",
"ABC_E1", "ABC_E2", "ABC_E3",
"ABC_F1", "ABC_F2", "ABC_F3")
df1 = data.frame(C1 = name_list)
These names in column 1 are a bunch of named results of the cor.test function. The second column should consist of the correlation coefficents I get by writing ABC_D1$estimate, ABC_D2$estimate.
My problem is now that I dont want to add the $estimate manually to every single name of the first column. I tried this:
df1$C2 = paste0(df1$C1, '$estimate')
But this doesnt work, it only gives me this back:
"ABC_D1$estimate", "ABC_D2$estimate", "ABC_D3$estimate",
"ABC_E1$estimate", "ABC_E2$estimate", "ABC_E3$estimate",
"ABC_F1$estimate", "ABC_F2$estimate", "ABC_F3$estimate")
class(df1$C2)
[1] "character
How can I get the numeric result for ABC_D1$estimate in my dataframe? How can I convert these characters into Named num? The 3rd column should constist of the results of $p.value.
As pointed out by #DSGym there are several problems, including the it is not very convenient to have a list of character names, and it would be better to have a list of object instead.
Anyway, I think you can get where you want using:
estimates <- lapply(name_list, function(dat) {
dat_l <- get(dat)
dat_l[["estimate"]]
}
)
cbind(name_list, estimates)
This is not really advisable but given those premises...
Ok I think now i know what you need.
eval(parse(text = paste0("ABC_D1", '$estimate')))
You connect the two strings and use the functions parse and eval the get your results.
This it how to do it for your whole data.frame:
name_list = c("ABC_D1", "ABC_D2", "ABC_D3",
"ABC_E1", "ABC_E2", "ABC_E3",
"ABC_F1", "ABC_F2", "ABC_F3")
df1 = data.frame(C1 = name_list)
df1$C2 <- map_dbl(paste0(df1$C1, '$estimate'), function(x) eval(parse(text = x)))
I am a beginner in R and while trying to make some exercises I got stuck in one of them. My data.frame is as follow:
LanguageWorkedNow LanguageNextYear
Java; PHP Java; C++; SQL
C;C++;JavaScript; JavaScript; C; SQL
And I need to know the variables which are in LanguageNextYear and are not in LanguageWorkedNow, to set a list with the different ones.
Sorry if the question is duplicated, I'm quite new here and tried to find it, but with no success.
base R
Idea: mapply setdiff on strsplitted NextYear and WorkedNow, and then paste it using collapse=";":
df$New <- with(df, {
a <- mapply(setdiff, strsplit(NextYear, ";"), strsplit(WorkedNow, ";"), SIMPLIFY = FALSE)
sapply(a, paste, collapse=";")
})
# SIMPLIFY = FALSE is needed in a general case, it doesn't
# affect the output in the example case
# Or if you use Map instead of mapply, that is the default, so
# it could also be...
df$New <- with(df,
sapply(Map(setdiff, strsplit(NextYear, ";"), strsplit(WorkedNow, ";")),
paste, collapse=";"))
data
df <- read.table(text = "WorkedNow NextYear
Java;PHP Java;C++;SQL
C;C++;JavaScript JavaScript;C;SQL
", header=TRUE, stringsAsFactors=FALSE)
Here's a solution using purrr package:
df = read.table(text = "
LanguageWorkedNow LanguageNextYear
Java;PHP Java;C++;SQL
C;C++;JavaScript JavaScript;C;SQL
", header=T, stringsAsFactors=F)
library(purrr)
df$New = map2_chr(df$LanguageWorkedNow,
df$LanguageNextYear,
~{x1 = unlist(strsplit(.x, split=";"))
x2 = unlist(strsplit(.y, split=";"))
paste0(x2[!x2%in%x1], collapse = ";")})
df
# LanguageWorkedNow LanguageNextYear New
# 1 Java;PHP Java;C++;SQL C++;SQL
# 2 C;C++;JavaScript JavaScript;C;SQL SQL
For each row you get your columns and you create vectors of values (separated by ;). Then you check which values of NextYear vector don't exist in WorkedNow vector and you create a string based on / combining those values.
The map function family will help you apply your logic / function to each row. In our case we use map2_chr as we have two inputs (your two columns) and we excpet a string / character output.
I have a vector of column names called tbl_colnames.
I would like to create a tibble with 0 rows and length(tbl_colnames) columns.
The best way I've found of doing this is...
tbl <- as_tibble(data.frame(matrix(nrow=0,ncol=length(tbl_colnames)))
and then I want to name the columns so...
colnames(tbl) <- tbl_colnames.
My question: Is there a more elegant way of doing this?
something like tbl <- tibble(colnames=tbl_colnames)
my_tibble <- tibble(
var_name_1 = numeric(),
var_name_2 = numeric(),
var_name_3 = numeric(),
var_name_4 = numeric(),
var_name_5 = numeric()
)
Haven't tried, but I guess it works too if instead of initiating numeric vectors of length 0 you do it with other classes (for example, character()).
This SO question explains how to do it with other R libraries.
According to this tidyverse issue, this won't be a feature for tribbles.
Since you want to combine a list of tibbles. You can just assign NULL to the variable and then bind_rows with other tibbles.
res = NULL
for(i in tibbleList)
res = bind_rows(res,i)
However, a much efficient way to do this is
bind_rows(tibbleList) # combine all tibbles in the list
For anyone still interested in an elegant way to create a 0-row tibble with column names given by a character vector tbl_colnames:
tbl_colnames %>% purrr::map_dfc(setNames, object = list(logical()))
or:
tbl_colnames %>% purrr::map_dfc(~tibble::tibble(!!.x := logical()))
or:
tbl_colnames %>% rlang::rep_named(list(logical())) %>% tibble::as_tibble()
This, of course, results in each column being of type logical.
The following command will create a tibble with 0 row and variables (columns) named with the contents of tbl_colnames
tbl <- tibble::tibble(!!!tbl_colnames, .rows = 0)
You could abuse readr::read_csv, which allow to read from string. You can control names and types, e.g.:
tbl_colnames <- c("one", "two", "three", "c4", "c5", "last")
read_csv("\n", col_names = tbl_colnames) # all character type
read_csv("\n", col_names = tbl_colnames, col_types = "lcniDT") # various types
I'm a bit late to the party, but for future readers:
as_tibble(matrix(nrow = 0, ncol = length(tbl_colnames)), .name_repair = ~ tbl_colnames)
.name_repair allows you to name you columns within the same function.
I'm new to R and have a problem
I am trying to reformat some data, and in the process I would like to rename the columns of the new data set.
here is how I have tried to do this:
first the .csv file is read in, lets say case1_case2.csv
then the name of the .csv file is broken up into two parts
each part is assigned to a vector
so it ends up being like this:
xName=case1
yName=case2
After I have put my data into new columns I would like to rename each column to be case1 and case2
to do this I tried using the rename function in R but instead of renaming to case1 and case2 the columns get renamed to xName and yName.
here is my code:
for ( n in 1:length(dirNames) ){
inFile <- read.csv(dirNames[n], header=TRUE, fileEncoding="UTF-8-BOM")
xName <- sub("_.*","",dirNames[n])
yName <- sub(".*[_]([^.]+)[.].*", "\\1", dirNames[n])
xValues <- inFile %>% select(which(str_detect(names(inFile), xName))) %>% stack() %>% rename( xName = values ) %>% subset( select = xName)
yValues <- inFile %>% select(which(!str_detect(names(inFile), xName))) %>% stack() %>% rename(yName = values, Organisms=ind)
finalForm <- cbind(xValues, yValues) %>% filter(complete.cases(.))
}
how can I make sure that the variables xName and yName are expanded inside of the rename() function
thanks.
You didn't provide a reproducible example, so I'll just demonstrate the idea in general. The rename function is part of the dplyr package.
You need to "unquote" the variable that contains the string you want to use as the new column name. The unquote operator is !! and you'll need to use the special := assignment operator to make unquoting on the left hand side allowed.
library(tidyverse)
df <- data_frame(x = 1:3)
y <- "Foo"
df %>% rename(y=x) # Not what you want - need to unquote y
df %>% rename(!!y = x) # Gives error - need to use :=
df %>% rename(!!y := x) # Correct