Paste data frame without changing into factor levels - r

I have vectors let say a,b,c,d as below:
a <- c(1,2,3,4)
b <- c("L","L","F","L")
c <- c(11,22,33,44)
d <- c("Y", "N", "Y","Y")
And I try to use paste to get this output (1):
paste(a,b,c,d, sep = "$", collapse = "%")
[1] "1$L$11$Y%2$L$22$N%3$F$33$Y%4$L$44$Y"
Then I change it into this, let say df:
df <- data.frame(a,b,c,d)
and get this output (2):
paste(df, sep = "$", collapse = "%")
[1] "c(1, 2, 3, 4)%c(2, 2, 1, 2)%c(11, 22, 33, 44)%c(2, 1, 2, 2)"
My question is:
(1) Can somebody explain to me why in df it change its elements into numeric?
(2) Is there any other way that I can use df to get output (1)?

paste runs as.character (or something similar internally) on its ... arguments, effectively deparsing the list. Have a look at
as.character(df)
# [1] "c(1, 2, 3, 4)" "c(2, 2, 1, 2)" "c(11, 22, 33, 44)" "c(2, 1, 2, 2)"
deparse(df$a)
# [1] "c(1, 2, 3, 4)"
Your code is pasting these values together. To get around this, you can use do.call.
do.call(paste, c(df, sep = "$", collapse = "%"))
# [1] "1$L$11$Y%2$L$22$N%3$F$33$Y%4$L$44$Y"

Here is an alternative to the approach you used:
df_call <- c(df, sep="$")
paste(do.call(paste, df_call), collapse="%")
[1] "1$L$11$Y%2$L$22$N%3$F$33$Y%4$L$44$Y"
Demo

You cannot directly apply paste to a dataframe for your case here, to get the desired output you need to apply paste in two levels.
paste(apply(df, 1, function(x) paste(x, collapse = "$")), collapse = "%")
#[1] "1$L$11$Y%2$L$22$N%3$F$33$Y%4$L$44$Y"
Where the apply command creates a row-wise vector
apply(df, 1, function(x) paste(x, collapse = "$"))
#[1] "1$L$11$Y" "2$L$22$N" "3$F$33$Y" "4$L$44$Y"
and the next paste command merge these all together with collapse argument as "%".

Here's a dplyr approach:
pull(summarise(unite(df, tmp, 1:ncol(df), sep="$"), paste(tmp, collapse="%")))
Or:
df %>%
unite(tmp, 1:ncol(df),sep="$") %>%
summarise(output = paste(tmp, collapse="%")) %>%
pull()

Related

Data frame output as a single line

I have a dataframe with multiple columns and rows. I am wanting to export this as a .txt file with all values on the same line (i.e one row), with individual values seperated by "," and data from the rows of the df separated by ":"
w<- c(1,5)
x<- c(2,6)
y<- c(3,7)
z<- c(4,8)
df<-data.frame(w,x,y,z)
the output would look like this
1,2,3,4:5,6,7,8:
We can combine data row-wise using apply and paste data together with collapse = ":".
paste0(apply(df, 1, toString), collapse = ":")
#[1] "1, 2, 3, 4:5, 6, 7, 8"
If you want to write it to a file, use:
write.table(df, "df.csv", col.names = FALSE, row.names = FALSE, sep = ",", eol = ":")
If you want the output in R you can use do.call() and paste():
do.call(paste, c(df, sep = ",", collapse = ":"))
[1] "1,2,3,4:5,6,7,8"
We can use str_c
library(stringr)
library(dplyr)
library(purrr)
df %>%
reduce(str_c, sep=",") %>%
str_c(collapse=";")
#[1] "1,2,3,4;5,6,7,8"

Using an element from a table in selecting columns/rows in R

I've been working on a process to create all possible combinations of unique integers for lengths 1:n. I found the nCr function (combn function in the combinat package to be useful here).
Once all unique occurrences are iterated, they are appended to a consolidation table that contains any possible length+combination of the digits 1:n. A subset of the final table's relevant column (one record) looks like this (column is named String and the subset table f1):
c(1,3,4,5,9,10)
I need to select these columns from a secondary data source (df) one at a time (I am going to loop through this table), so my logic was to use this code:
df[,f1$String]
However, I get a message that says that undefined columns are selected, but if I copy and paste the contents of the cell such as:
df[,c(1, 3, 4, 5, 9, 10)]
it works fine ... I've tried all I can think of at this point; if anyone has some insight it would be greatly appreciated.
Code to reproduce is:
library(combinat)
library(data.table)
library(plyr)
rm(list=ls())
NCols=10
NRows=10
myMat<-matrix(runif(NCols*NRows), ncol=NCols)
XVars <- as.data.frame(myMat)
colnames(XVars) <- c("a","b","c","d","e","f","g","h","i","j")
x1 <- as.data.frame(colnames(XVars[1:ncol(XVars)]))
colnames(x1) <- "Independent.Variable"
setDT(x1)[, Index := .GRP, by = "Independent.Variable"]
colClasses = c("character", "numeric", "numeric")
col.names = c("String", "r!", "n!")
Combination <- read.table(text = "", colClasses = colClasses, col.names = col.names)
for(i in 1:nrow(x1)){
x2<- as.data.frame(combn(nrow(x1),i))
for (i in 1:ncol(x2)){
x3 <- paste("c(",paste(x2[1:nrow(x2),i], collapse = ", "), ")", sep="")
x3 <- as.data.frame(x3)
colnames(x3) <- "String"
x3 <- mutate(x3, "r!" = nrow(x2))
x3 <- mutate(x3, "n!" = nrow(x1))
Combination <- rbind(Combination, x3)
}
}
setDT(Combination)[, Index := .GRP, by = c("String", "r!", "n!")]
f1 <- Combination[717,]
f1$String <- as.character(f1$String)
## reference to data frame
myMat[,(f1$String)]
## pasted element
myMat[, c(1, 3, 4, 5, 9, 10)]
f1$String is the string "c(1, 3, 4, 5, 9, 10)". When you use myMat[,(f1$String)], R will look for the column with name "c(1, 3, 4, 5, 9, 10)". To get column numbers 1,3,4,5,9,10, you have to parse the string to an R expression and evaluate it first:
myMat[,eval(parse(text=f1$String))]
As #user3794498 noticed, you set f1$String as.character() so you cannot use is to get the columns you want.
You can change the way you define f1 or extract the column numbers from f1$String. Something like this should also work (load stringr before) myMat[, f1$String %>% str_match_all("[0-9]+") %>% unlist %>% as.numeric].

Compare vector to subset of row in a data.frame

I have a data.frame "dat" and a numeric vector "test":
code <- c("A22", "B15", "C03")
v.1 <- 1:3
v.2 <- 3:1
v.3 <- c(2, NA, 2)
bob <- c("yes", "no", "no")
dat <- data.frame(code, v.1, v.2, v.3, bob, stringsAsFactors = FALSE)
test <- c(3, 1, 2)
I want to find the row in the data.frame where the second to fourth columns ("v.1", "v.2", "v.3") contain the same values as the vector, in the same order, and return the value from the "code"-column (in this case "C03").
I tried
dat[dat[, 2:4] == test]$code
and
which(apply(dat, 1, function(x) all.equal(dat[, 2:4], test)) == FALSE)
both of which do not work.
I would prefer a solution with base R.
Your second option (with which) does not work for several problems: using apply on whole dat converts it to a matrix of character, you're actually not using x, the function argument and you should use all instead of all.equal and probably TRUE instead of FALSE (the comparison is actually not needed).
You can modify it a bit to make it work:
which(apply(dat[, 2:4], 1, function(x) all(x==test)))
[1] 3
Or
dat[apply(dat[, 2:4], 1, function(x) all(x==test)), "code"]
[1] C03
With apply we can paste the columns together and check which row has the same value as that of test when pasted together and selected the column code of respective row.
dat[apply(dat[2:4], 1, paste0, collapse = "|") ==
paste0(test, collapse = "|"), "code"]
#[1] C03
We just need to replicate the 'test' to make the lengths equal before doing the comparison
dat[2:4] == test[row(dat[2:4])]
If we need the 'code'
dat$code[rowSums(dat[2:4] == test[row(dat[2:4])], na.rm = TRUE)==3]
#[1] C03

Using list's elements in loops in r (example: setDT)

I have multiple data frames and I want to perform the same action in all data frames, such, for example, transform all them into data.tables (this is just an example, I want to apply other functions too).
A simple example can be (df1=df2=df3, without loss of generality here)
df1 <- data.frame(var1 = c(1, 2, 3, 4, 5), var2 =c(1, 2, 2, 1, 2), var3 = c(10, 8, 15, 7, 9))
df2 <- data.frame(var1 = c(1, 2, 3, 4, 5), var2 =c(1, 2, 2, 1, 2), var3 = c(10, 8, 15, 7, 9))
df3 <- data.frame(var1 = c(1, 2, 3, 4, 5), var2 =c(1, 2, 2, 1, 2), var3 = c(10, 8, 15, 7, 9))
My approach was: (i) to create a list of the data frames (list.df), (ii) to create a list of how they should be called afterwards (list.dt) and (iii) to loop into those two lists:
list.df:
list.df<-vector('list',3)
for(j in 1:3){
name <- paste('df',j,sep='')
list.df[j] <- name
}
list.dt
list.dt<-vector('list',3)
for(j in 1:3){
name <- paste('dt',j,sep='')
list.dt[j] <- name
}
Loop (to make all data frames into data tables):
for(i in 1:3){
name<-list.dt[i]
assign(unlist(name), setDT(list.df[i]))
}
I am definitely doing something wrong as the result of this are three data tables with 1 variable, 1 observation (exactly the name list.df[i]).
I've tried to unlist the list.df thinking r would recognize that as an entire data frame and not only as a string:
for(i in 1:3){
name<-list.dt[i]
assign(unlist(name), setDT(unlist(list.df[i])))
}
But I get the error message:
Error in setDT(unlist(list.df[i])) :
Argument 'x' to 'setDT' should be a 'list', 'data.frame' or 'data.table'
Any suggestions?
You can just put all the data into one dataframe. Then, if you want to iterate through dataframes, use dplyr::do or, preferably, other dplyr functions
library(dplyr)
data =
list(df1 = df2, df2 = df2, df3 = df3) %>%
bind_rows(.id = "source") %>%
group_by(source)
Change your last snippet to this:
for(i in 1:3){
name <- list.dt[i]
assign(unlist(name), setDT(get(list.df[[i]])))
}
# Alternative to using lists
list.df <- paste0("df", 1:3)
# For loop that works with the length of the input 'list'/vector
# Creates the 'dt' objects on the fly
for(i in seq_along(list.df)){
assign(paste0("dt", i), setDT(get(list.df[i])))
}
Using data.table (which deserve far more advertising):
a) If you need all your data.frames converted to data.tables, then as was already suggested in the comments by #A5C1D2H2I1M1N2O1R2T1, iterate over your data.frames with setDT
library(data.table)
lapply(mget(paste0("df", 1:3)), setDT)
# or, if you wish to type them one by one:
lapply(list(df1, df2, df3), setDT)
class(df1) # check if coercion took place
# [1] "data.table" "data.frame"
b) If you need to bind your data.frames by rows, then use data.table::rbindlist
data <- rbindlist(mget(paste0("df", 1:3)), idcol = TRUE)
# or, if you wish to type them one by one:
data <- rbindlist(list(df1 = df1, df2 = df2, df3 = df3), idcol = TRUE)
Side note: If you like chaining/piping with the magrittr package (which you see almost always in combination with dplyr syntax), then it goes like:
library(data.table)
library(magrittr)
# for a)
mget(paste0("df", 1:3)) %>% lapply(setDT)
# for b)
data <- mget(paste0("df", 1:3)) %>% rbindlist(idcol = TRUE)

Passing arguments to function inside sapply

Lets say that we have a list with numbers and we apply a function,
for example the mean function, to each element of the list:
l <- list(a = 1:10, b = 11:20)
l.mean <- sapply(l, mean)
l.mean # it works
But what if the list consists of strings and we want to paste them:
ll <- list(a=c("1", "2"), b=c("3", "4"))
ll.paste <- sapply(ll, as.call(list(paste, ll, sep = ", ")))
ll.paste # it does not work
The output I expect should be something like that:
# 1, 2
# 3, 4
We need the collapse argument of paste.
unname(sapply(ll, paste, collapse=', '))
A wrapper function for paste(., collapse=', ') is toString
unname(sapply(ll, toString))

Resources