I have a vector of column names called tbl_colnames.
I would like to create a tibble with 0 rows and length(tbl_colnames) columns.
The best way I've found of doing this is...
tbl <- as_tibble(data.frame(matrix(nrow=0,ncol=length(tbl_colnames)))
and then I want to name the columns so...
colnames(tbl) <- tbl_colnames.
My question: Is there a more elegant way of doing this?
something like tbl <- tibble(colnames=tbl_colnames)
my_tibble <- tibble(
var_name_1 = numeric(),
var_name_2 = numeric(),
var_name_3 = numeric(),
var_name_4 = numeric(),
var_name_5 = numeric()
)
Haven't tried, but I guess it works too if instead of initiating numeric vectors of length 0 you do it with other classes (for example, character()).
This SO question explains how to do it with other R libraries.
According to this tidyverse issue, this won't be a feature for tribbles.
Since you want to combine a list of tibbles. You can just assign NULL to the variable and then bind_rows with other tibbles.
res = NULL
for(i in tibbleList)
res = bind_rows(res,i)
However, a much efficient way to do this is
bind_rows(tibbleList) # combine all tibbles in the list
For anyone still interested in an elegant way to create a 0-row tibble with column names given by a character vector tbl_colnames:
tbl_colnames %>% purrr::map_dfc(setNames, object = list(logical()))
or:
tbl_colnames %>% purrr::map_dfc(~tibble::tibble(!!.x := logical()))
or:
tbl_colnames %>% rlang::rep_named(list(logical())) %>% tibble::as_tibble()
This, of course, results in each column being of type logical.
The following command will create a tibble with 0 row and variables (columns) named with the contents of tbl_colnames
tbl <- tibble::tibble(!!!tbl_colnames, .rows = 0)
You could abuse readr::read_csv, which allow to read from string. You can control names and types, e.g.:
tbl_colnames <- c("one", "two", "three", "c4", "c5", "last")
read_csv("\n", col_names = tbl_colnames) # all character type
read_csv("\n", col_names = tbl_colnames, col_types = "lcniDT") # various types
I'm a bit late to the party, but for future readers:
as_tibble(matrix(nrow = 0, ncol = length(tbl_colnames)), .name_repair = ~ tbl_colnames)
.name_repair allows you to name you columns within the same function.
Related
I have imported a stata file that is giving me some encoding problems in the value labels. On import, using labelled::lookfor for any keyword returns this error:
Error in structure(as.character(x), names = names(x)) :
invalid multibyte string at '<e9>bec Solidaire'
Knowing the data-set, that is almost certainly a value label in there.
How to I loop through the data-set fixing the encoding problem in the names of the value labels and then reset them. I have found a solution, I think, to fix the problematic characters, but I don't know how to replace the original names.
v <- labelled(c(1,2,2,2,3,9,1,3,2,NA), c(yes = 1, "Bloc Qu\xe9b\xe9cois" = 3, "don't know" = 9))
x<- labelled(c(1,2,2,2,3,9,1,3,2,NA), c("Bloc Qu\xe9b\xe9cois" = 1, no = 3, "don't know" = 9))
mydat<-data.frame(v=v, x=x)
glimpse(mydat)
mydat %>%
map(., val_labels)
#This works individually
iconv(names(val_labels(x)), from="latin1", to="UTF-8")
#And this seems to work looping over each variable, but how to I store it?
mydat %>%
map(., function(x) iconv(names(val_labels(x)), from="latin1", to="UTF-8"))
This seems to be a bit tough to do in one simple step, so here I used some helper functions
conv_names <- function(x) {
setNames(x, iconv(names(x), from="latin1", to="UTF-8"))
}
conv_val_labels <- function(x) {
val_labels(x) <- conv_names(val_labels(x))
x
}
mydat <- map_dfc(mydat, conv_val_labels)
But we map the function to each column and then reassign those columns back to the data frame. Note we use map_dfc to combine the columns back into a data frame
my_mtcars_1 <- mtcars
my_mtcars_2 <- mtcars
my_mtcars_3 <- mtcars
for(i in 1:3) {get(paste0('my_mtcars_', i))$blah <- 1}
Error in get(paste0("my_mtcars_", i))$blah <- 1 :
target of assignment expands to non-language object
I would like each of my 3 data frames to have a new field called blah that has a value of 1.
How can I iterate over a range of numbers in a loop and refer to DFs by name by pasting the variable name into a string and then edit the df in this way?
These three options all assume you want to modify them and keep them in the environment.
So, if it must be a dataframes (in your environment & in a loop) you could do something like this:
for(i in 1:3) {
obj_name = paste0('my_mtcars_', i)
obj = get(obj_name)
obj$blah = 1
assign(obj_name, obj, envir = .GlobalEnv) # Send back to global environment
}
I agree with #Duck that a list is a better format (and preferred to the above loop). So, if you use a list and need it in your environment, use what Duck suggested with list2env() and send everything back to the .GlobalEnv. I.e. (in one ugly line),
list2env(lapply(mget(ls(pattern = "my_mtcars_")), function(x) {x[["blah"]] = 1; x}), .GlobalEnv)
Or, if you are amenable to working with data.table, you could use the set() function to add columns:
library(data.table)
# assuming my_mtcars_* is already a data.table
for(i in 1:3) {
set(get(paste0('my_mtcars_', i)), NULL, "blah", 1)
}
As suggestion, it is better if you manage data inside a list and use lapply() instead of loop:
#List
List <- list(my_mtcars_1 = mtcars,
my_mtcars_2 = mtcars,
my_mtcars_3 = mtcars)
#Variable
List2 <- lapply(List,function(x) {x$bla <- 1;return(x)})
And it is easy to store your data using a code like this:
#List
List <- mget(ls(pattern = 'my_mt'))
So no need of defining each dataset individually.
We can use tidyverse
library(dplyr)
library(purrr)
map(mget(ls(pattern = '^my_mtcars_\\d+$')), ~ .x %>%
mutate(blah = 1)) %>%
list2env(.GlobalEnv)
I have the same problem as this guy: returning from list to data.frame after lapply
Whilst they solved his specific problem, no one actually answered his original question about how to get dataframes out of a list.
I have a list of data frames:
dfPreList = list(yearlyFunding, yearlyPubs, yearlyAuthors)
And I want to filter/replace etc on them all.
So my function is:
DoThis = function(x){
filter(x, year >=2015 & year <=2018) %>%
replace(is.na(.), 0) %>%
adorn_totals("row")
}
And I use lapply to run the function on them all like this:
a = lapply(dfPreList, DoThis)
As the other post stated, these data frames are now stuck in this list (a), and I need a for loop to get them out, which just cannot be the correct way of doing it.
This is my current working way of applying the function to the dataframes and then getting them out:
dfPreList = list(yearlyFunding, yearlyPubs, yearlyAuthors)
dfPreListstr= list('yearlyFunding', 'yearlyPubs', 'yearlyAuthors')
DoThis = function(x){
filter(x, year >=2015 & year <=2018) %>%
replace(is.na(.), 0) %>%
adorn_totals("row")
}
a = lapply(dfPreList, DoThis)
for( i in seq_along(dfPreList)){
assign(dfPreListstr[[i]], as.data.frame(a[i]))
}
Is there a way of doing this without having to rely on for loops and string names of the dataframes? I.e. a one-liner with the lapply?
Many thanks for your help
You can assign names to the list and then use list2env.
dfPreList = list(yearlyFunding, yearlyPubs, yearlyAuthors)
a = lapply(dfPreList, DoThis)
names(a) <- c('yearlyFunding', 'yearlyPubs', 'yearlyAuthors')
list2env(a, .GlobalEnv)
Another way would be to unlist the list, then convert the content into data frame.
dfPreList = list(yearlyFunding, yearlyPubs, yearlyAuthors)
a = lapply(dfPreList, DoThis)
names(a) <- c('yearlyFunding', 'yearlyPubs', 'yearlyAuthors')
yearlyFunding <- data.frame(matrix(unlist(a$yearlyFunding), nrow= nrow(yearlyFunding), ncol= ncol(yearlyFunding)))
yearlyPubs <- data.frame(matrix(unlist(a$yearlyPubs), nrow= nrow(yearlyPubs), ncol= ncol(yearlyPubs)))
yearlyAuthors <- data.frame(matrix(unlist(a$yearlyAuthors), nrow= nrow(yearlyAuthors), ncol= ncol(yearlyAuthors)))
Since unlist function returns a vector, we first generate a matrix, then convert it to data frame.
I want to create a dataframe with 3 columns.
#First column
name_list = c("ABC_D1", "ABC_D2", "ABC_D3",
"ABC_E1", "ABC_E2", "ABC_E3",
"ABC_F1", "ABC_F2", "ABC_F3")
df1 = data.frame(C1 = name_list)
These names in column 1 are a bunch of named results of the cor.test function. The second column should consist of the correlation coefficents I get by writing ABC_D1$estimate, ABC_D2$estimate.
My problem is now that I dont want to add the $estimate manually to every single name of the first column. I tried this:
df1$C2 = paste0(df1$C1, '$estimate')
But this doesnt work, it only gives me this back:
"ABC_D1$estimate", "ABC_D2$estimate", "ABC_D3$estimate",
"ABC_E1$estimate", "ABC_E2$estimate", "ABC_E3$estimate",
"ABC_F1$estimate", "ABC_F2$estimate", "ABC_F3$estimate")
class(df1$C2)
[1] "character
How can I get the numeric result for ABC_D1$estimate in my dataframe? How can I convert these characters into Named num? The 3rd column should constist of the results of $p.value.
As pointed out by #DSGym there are several problems, including the it is not very convenient to have a list of character names, and it would be better to have a list of object instead.
Anyway, I think you can get where you want using:
estimates <- lapply(name_list, function(dat) {
dat_l <- get(dat)
dat_l[["estimate"]]
}
)
cbind(name_list, estimates)
This is not really advisable but given those premises...
Ok I think now i know what you need.
eval(parse(text = paste0("ABC_D1", '$estimate')))
You connect the two strings and use the functions parse and eval the get your results.
This it how to do it for your whole data.frame:
name_list = c("ABC_D1", "ABC_D2", "ABC_D3",
"ABC_E1", "ABC_E2", "ABC_E3",
"ABC_F1", "ABC_F2", "ABC_F3")
df1 = data.frame(C1 = name_list)
df1$C2 <- map_dbl(paste0(df1$C1, '$estimate'), function(x) eval(parse(text = x)))
imported tibble from textfile. Many numeric columns are imported as "chr". I guess it's because they contain a "," instead of a ".".
My goal is to write a loop which runs through the names of desired columns, replaces "," with "." and converts columns into "num".
Little example:
data <- data.frame("A1" =c("2,1","2,1","2,1"), "A2" =c("1,3","1,3","1,3"),
stringsAsFactors = F) %>% as.tibble() #example data
colname <- c("A1", "A2") #creating variable for loop
for(i in colname) {
nam <- paste0("data$", i)
assign(nam, as.numeric(gsub(",",".", eval(parse(text = paste0("data$",i))))) )
}
Instead of overwriting the existing column, R creates a new variable:
data$A1 # that's the existing column as part of the tibble
[1] "2,1" "2,1" "2,1"
`data$A1` # thats just a new variable. mind the little``
[1] 2.1 2.1 2.1
I also tried to assign (<-) the new numeric values via eval, but that does not work either.
eval(parse(text = paste0("data$", i))) <- as.numeric(
gsub(",",".", eval(parse(text = paste0("data$",i)))))
Error: target of assignment expands to non-language object
Any suggestions on how to transform? I have the same issue with other columns that I want to aggregate to a new variable. This variable should also be part of the existing tibble. I could do it by hand. This would take lots of time and probably produce many mistakes.
Thanks a lot!
Sam
As you are already working with the tidyverse, you can use dplyr::mutate_at and the colname variable you have already defined.
data %>%
mutate_at(.vars = colname,
.funs = function(x) { as.numeric(gsub(",", ".", x)) })