Is there a better way to write this R if statement - r

I have a piece of code that creates an SQL query from the columns of a dataframe row:
a <- c("a1")
b <- c("b1")
c <- c("c1")
df <- data.frame(a, b, c)
query = "INSERT INTO table (a, b, c) VALUES ("
for (j in 1:ncol(df)) {
if (j < ncol(df)) {
query <- paste0(query, df[1, j], ", ")
} else {
query <- paste0(query, df[1, j], ");")
}
}
The point is I have to insert a comma between the elements, but no comma after the last element so that the query works.
Here is what I want to get:
query = "INSERT INTO table (a, b, c) VALUES ("a1", "b1", "c1");"
Do you have an idea of a simpler way to write it?

We assume:
a comma is wanted between elements even though the question refers to a colon
df_layer and i are used in the question but not defined. We assume that the output shown is what is wanted, that df has one row as in the question and that i and df_layer can be disregarded.
1) Use sprintf, shQuote and toString. shQuote(df, "cmd") can be optionally shortened to just shQuote(df) on Windows.
s <- sprintf('INSERT INTO table (a, b, c) VALUES (%s);',
toString(shQuote(df, "cmd")))
cat(s, "\n")
## INSERT INTO table (a, b, c) VALUES ("a1", "b1", "c1");
2) or possibly this variation to also insert the column names
s2 <- sprintf('INSERT INTO table (%s) VALUES (%s);',
toString(names(df)), toString(shQuote(df, "cmd")))
cat(s2, "\n")
## INSERT INTO table (a, b, c) VALUES ("a1", "b1", "c1");
Note
Input is
df <- data.frame(a = "a1", b = "b1", c = "c1")

paste0(
'INSERT INTO TABLE (a, b, c) VALUES ("',
paste(df$a, df$b, df$c, sep = '", "'),
'")'
)

I think just using the paste fonction can work well in your case i used the collapse parameters to specify the delimiters and just added with another parse the end of the string i just changed the string delimiters to enable the use of " inside of the string.
b <- c("b1")
c <- c("c1")
df <- data.frame(a, b, c)
i <- 1
query = 'INSERT INTO table (a, b, c) VALUES ("'
df
paste0(query,paste(df[i,], collapse = '", "'),'");')
if you want you could do the same for the column names
b <- c("b1")
c <- c("c1")
df <- data.frame(a, b, c)
i <- 1
query = 'INSERT INTO table ('
query2 = ') VALUES ("'
colnames(df)
paste0(query,paste(colnames(df),collapse=", "),query2,paste(df[i,], collapse = '", "'),'");')

You can just join all values like this:
paste(df[1,], collapse = '", "')
in your example it would look like this:
a <- c("a1")
b <- c("b1")
c <- c("c1")
df <- data.frame(a, b, c)
query = "INSERT INTO table (a, b, c) VALUES ("
query <- paste0(query, paste(df[1,], collapse = '", "'), ");")

I like to use the glue package, though I highly, highly recommend using the glue_data_safe() function for anything outward facing that takes user input.
df = data.frame(a = "a1", b = "b1", c = "c1")
table = "(a, b, c)"
glue::glue("insert into {table} values ({paste(df[1,], collapse = ',')})")
> "insert into (abc) values (a1,b1,c1)"

Related

R cbind with get paste

cbind() function works as x <- cbind(a,b)
where column name 'b' can be specified for the function b = get(paste0('var',i)),
that is x <- cbind(a,b = get(paste0('var',i)))
I am trying to do the following:
x <- cbind(a, get(paste0('var',i))) = j), where "j" can be a vector or a function.
however, got the following error: Error: unexpected '=' in "x <- cbind(a, get(paste0('var',i))) = j)"
If i just specify "x <- cbind(a, get(paste0('var',i))))", then the 2nd column name is "get(paste0('var',i))))", which is not convenient.
How can I define column names with a function get(paste()) within cbind() or rbind() or bind_cols()? Or what would be the alternative solution?
An example would have been helpful to understand the problem but maybe this?
x <- cbind(a, j)
colnames(x)[2] <- get(paste0('var',i))
Or if you want to do it in single line -
x <- cbind(a, setNames(j, get(paste0('var',i))))
We can use
x <- data.frame(a, j)
colnames(x)[2] <- get(paste('var', i, sep=""))
Or use tibble
tibble(a, !! b := j)

Affecting variable through eval or similar

I know that if i have the name of a variable stored like a = "var.name", i can call this var.name by doing eval(as.symbol(a)) or get(a), but i wanted to not only call a variable, but also make changes to it. Example:
names = c("X1","X2")
for(i in names){
assign(i, cbind(replicate(2,rnorm(3))) #Just creating a 3x2 matrix with dummy data
###
At ### i'd like to make a change to the variables, specifically change its column names to "a" and "b".
I tried colnames(get(i)) = c("a","b"), or colnames(eval(as.symbol(i))) = c("a","b"), but they return errors like could not find function "eval<-"
One option could be to create the matrix in the first step, and name and assign to a new name in the second step.
names = c("X1","X2")
for(i in names){
x <- cbind(replicate(2,rnorm(3)))
assign(i, provideDimnames(x))
}
#--------------
> X1
A B
A -0.59174062 1.8527780
B -0.53088643 -3.2713544
C -0.09330006 -0.5977568
Another option would be to assign the dimnames at the time of creation of the matrix.
for (i in names) {
x <- matrix(replicate(2, rnorm(3)),
ncol = 2,
dimnames = list(a = c(LETTERS[1:3]), b = c(LETTERS[1:2])))
assign(i, x)
}
#-------------------
> X1
b
a A B
A -0.2313692 -0.93161762
B -0.9666849 0.06164904
C 1.5614446 -0.09391062

rev() in r and how to apply it to a list using loops

I have a list of say {a,b,c,d,...} and each element, a,b,c,d, ... are data.table that I need to reverse the order of, however, for the data.table I only want to rev() all of it except the first column, as it is an ID. I tried using loops to do it but it returned
Error in `[<-.data.table`(`*tmp*`, , -1, value = list(code_a = c("a", :
Item 1 of column numbers in j is -1 which is outside range [1,ncol=4]. Use column names instead in j to add new columns.
Example:
a <- c("a","b","c","d","e","f")
b <- 1:6
c <- c("F","E","D","C","B","A")
d <- 10:15
dt1 <- data.table("ID" = b, "code_a" = a)
dt2 <- data.table("ID" = b, "code_c" = c)
dt3 <- data.table("ID" = b, "code_d" = d)
dt <- list(dt1,dt2,dt3)
rev_dt <- rev(dt)
merged_list <- list()
rev_merged_list <- list()
rev_merged_list <- Reduce(merge, rev_dt, accumulate = TRUE)
merged_list <- rev_merged_list
merged_list <- rev(merged_list)
for(z in 1:length(dt)){
merged_list[[z]][,-1] = rev(merged_list[[z]][,-1])
}
More Information:
The for loop here is supposed to be:
- for z from 1 to the length of dt
- the merged_list element z (which with double square brackets) should be a data.table
- where the data does not include the first column
- should be assigned to the rev of the same element z, where the first column is also excluded
Does this logic hold for the above loop? I am unsure what is wrong!
Expected Output:
output_ <- list()
a_ <- data.table("ID" = b, "code_a" = a, "code_c" = c, "code_d" = d)
b_ <- data.table("ID" = b, "code_c" = c, "code_d" = d)
c_ <- data.table("ID" = b, "code_d" = d)
output_[[1]] <- a_
output_[[2]] <- b_
output_[[3]] <- c_
output_
I was told yesterday that the merge above i can specify a right hand merge, however in doing so, I need to specify a by = "ID" in the merge, but I am unsure what is the x and y values in the case of merging multiple sets of data.
I am also under the impression that lapply() can do the same thing instead of loop, but I am unsure in this case how might I achieved that. Thanks~
We can use setcolorder
for(i in seq_along(merged_list)){
setcolorder(merged_list[[i]],
c(names(merged_list[[i]])[1], rev(names(merged_list[[i]])[-1])))
}
all.equal(merged_list, output_, check.attributes = FALSE)
#[1] TRUE

How to write a function() with arguments x,y that only returns values from column x that are == y

I have a data frame:
df <- data.frame( a = 1:5, b = 1:5, c = 1:5, d = as.factor(1:5))
I want to write a function that takes as its argument one of the columns a,b or c, and one of the factors of column d, and returns only the values of column a, b, or c, that have said factor value for column d.
I tried the following code:
fun1 <- function(x,y) {
u <- x[data$d == "y"]
return(u)
}
and I keep getting back numeric(0) as the output of the function. When I try similar code outside of the function() environment, it appears to work fine. Any help would be appreciated.
Probably a duplicate but I don't know how I would find it in the haystack of items with tags: data.frame, indexing, columns, values. Best practice is to pass the "data" as well as the search terms. (Calling the object df1 rather than df.)
fun1 <- function(dfrm, col,val) {
u <- dfrm[dfrm$d == val , col]
return(u)
}
fun1(df1, 'b', 3)
#[1] 3

Priority/Decision Based Choice of Row

I have a data.frame that has a number of duplicate rows, akin to something like this:
con <- textConnection(Lines <- "
First, Last, Address, Address 2, Email, Custom1, Custom2, Custom3
A, B, C, D, F#G.com,1,2,3
A, B, C, D, F#G.com,1,2,2
A, B, C, D, F#G.com,1,2,1
")
x <- read.csv(con)
close(con)
Now, when I de-duplicate, in the following manner:
x <- x[!duplicated(x[,c("email")]),]
Could you recommend a method for prioritizing those rows that contain Custom3=1? Or is there a better mechanism for de-duplication?
Try sorting before finding duplicates:
x <- x[order(x[,c("Custom3")]),]
x <- x[!duplicated(x[,c("email")]),]

Resources