Priority/Decision Based Choice of Row - r

I have a data.frame that has a number of duplicate rows, akin to something like this:
con <- textConnection(Lines <- "
First, Last, Address, Address 2, Email, Custom1, Custom2, Custom3
A, B, C, D, F#G.com,1,2,3
A, B, C, D, F#G.com,1,2,2
A, B, C, D, F#G.com,1,2,1
")
x <- read.csv(con)
close(con)
Now, when I de-duplicate, in the following manner:
x <- x[!duplicated(x[,c("email")]),]
Could you recommend a method for prioritizing those rows that contain Custom3=1? Or is there a better mechanism for de-duplication?

Try sorting before finding duplicates:
x <- x[order(x[,c("Custom3")]),]
x <- x[!duplicated(x[,c("email")]),]

Related

Is there a better way to write this R if statement

I have a piece of code that creates an SQL query from the columns of a dataframe row:
a <- c("a1")
b <- c("b1")
c <- c("c1")
df <- data.frame(a, b, c)
query = "INSERT INTO table (a, b, c) VALUES ("
for (j in 1:ncol(df)) {
if (j < ncol(df)) {
query <- paste0(query, df[1, j], ", ")
} else {
query <- paste0(query, df[1, j], ");")
}
}
The point is I have to insert a comma between the elements, but no comma after the last element so that the query works.
Here is what I want to get:
query = "INSERT INTO table (a, b, c) VALUES ("a1", "b1", "c1");"
Do you have an idea of a simpler way to write it?
We assume:
a comma is wanted between elements even though the question refers to a colon
df_layer and i are used in the question but not defined. We assume that the output shown is what is wanted, that df has one row as in the question and that i and df_layer can be disregarded.
1) Use sprintf, shQuote and toString. shQuote(df, "cmd") can be optionally shortened to just shQuote(df) on Windows.
s <- sprintf('INSERT INTO table (a, b, c) VALUES (%s);',
toString(shQuote(df, "cmd")))
cat(s, "\n")
## INSERT INTO table (a, b, c) VALUES ("a1", "b1", "c1");
2) or possibly this variation to also insert the column names
s2 <- sprintf('INSERT INTO table (%s) VALUES (%s);',
toString(names(df)), toString(shQuote(df, "cmd")))
cat(s2, "\n")
## INSERT INTO table (a, b, c) VALUES ("a1", "b1", "c1");
Note
Input is
df <- data.frame(a = "a1", b = "b1", c = "c1")
paste0(
'INSERT INTO TABLE (a, b, c) VALUES ("',
paste(df$a, df$b, df$c, sep = '", "'),
'")'
)
I think just using the paste fonction can work well in your case i used the collapse parameters to specify the delimiters and just added with another parse the end of the string i just changed the string delimiters to enable the use of " inside of the string.
b <- c("b1")
c <- c("c1")
df <- data.frame(a, b, c)
i <- 1
query = 'INSERT INTO table (a, b, c) VALUES ("'
df
paste0(query,paste(df[i,], collapse = '", "'),'");')
if you want you could do the same for the column names
b <- c("b1")
c <- c("c1")
df <- data.frame(a, b, c)
i <- 1
query = 'INSERT INTO table ('
query2 = ') VALUES ("'
colnames(df)
paste0(query,paste(colnames(df),collapse=", "),query2,paste(df[i,], collapse = '", "'),'");')
You can just join all values like this:
paste(df[1,], collapse = '", "')
in your example it would look like this:
a <- c("a1")
b <- c("b1")
c <- c("c1")
df <- data.frame(a, b, c)
query = "INSERT INTO table (a, b, c) VALUES ("
query <- paste0(query, paste(df[1,], collapse = '", "'), ");")
I like to use the glue package, though I highly, highly recommend using the glue_data_safe() function for anything outward facing that takes user input.
df = data.frame(a = "a1", b = "b1", c = "c1")
table = "(a, b, c)"
glue::glue("insert into {table} values ({paste(df[1,], collapse = ',')})")
> "insert into (abc) values (a1,b1,c1)"

How to rename an object created within a function each run?

I made a function that generates a data frame. Since I want to store my data frame, I saved it in the global environment. I want to run the function again, but with new parameters and avoid overwriting my previous data frames. Basically, I want to rename my data frame each time I run my function.
fun <- function(x, y) {
a <- x*1000
b <- a + pi
c <- a + b
return(data_frame <- data.frame(a, b, c))
}
Thanks!
Here's one solution
fun <- function(x, y, name) {
a <- x*1000
b <- a + pi
c <- a + b
assign(deparse(substitute(name)),data.frame(a, b, c), envir=.GlobalEnv)}
fun(1,2,df.name)
df.name
This returns:
a b c
1 1000 1003.1 2003.1

How to write a function() with arguments x,y that only returns values from column x that are == y

I have a data frame:
df <- data.frame( a = 1:5, b = 1:5, c = 1:5, d = as.factor(1:5))
I want to write a function that takes as its argument one of the columns a,b or c, and one of the factors of column d, and returns only the values of column a, b, or c, that have said factor value for column d.
I tried the following code:
fun1 <- function(x,y) {
u <- x[data$d == "y"]
return(u)
}
and I keep getting back numeric(0) as the output of the function. When I try similar code outside of the function() environment, it appears to work fine. Any help would be appreciated.
Probably a duplicate but I don't know how I would find it in the haystack of items with tags: data.frame, indexing, columns, values. Best practice is to pass the "data" as well as the search terms. (Calling the object df1 rather than df.)
fun1 <- function(dfrm, col,val) {
u <- dfrm[dfrm$d == val , col]
return(u)
}
fun1(df1, 'b', 3)
#[1] 3

melt dataframe with multiple IDs

My data.frame
a<-sample(12)
b<-sample(-100:100, 12)
d<-c(-11:0)
O<-rep(c("N","H"), each=6)
H<-rep(c("In+", "In-"), each=3, times=2)
ID<-rep(c("bo","co", "do", "fo"), each=3)
mydata_1<-data.frame(ID, a, b, d, O, H)
I want to melt the dataframe variables a, b, d; while O and H should be ordered like the ID. My solution below:
mydata_2<-data.frame(ID, a, b, d)
gg.df <- melt(mydata, id="ID", variable.name="int")
O<-rep(c("N","H"), each=6, times=3)
H<-rep(rep(c("In+", "In-"), each=3, times=2), times=3)
gg.df[, "OX"] <- O
gg.df["HI"] <- H
I am wondering how this can be done inside the melt function by using the full dataframe (mydata_1)

do.call(), multiple parameters

I have a function with many arguments:
fun(A,B,C,D,E)
Now I want to assign fixed value a,b,c,d to A,B,C,D and assign E a list of 1 : 7
I want to use do.call() as below, but it doesn't work.
a <- do.call(function(x) fun(A = a, B = b, C = c, D = d, E = x), list(1:7))
I turn to lapply, and it works,
a <- lapply(c(1:7), function(x) fun(A = a, B = b, C = c, D = d, E = x))
As Joshua Ulrich's answer, when I try
a `<- do.call(fun, list(A = a, B = b, C = c, D = d, E = list(1:7)))`
it says
(list) object cannot be coerced to type 'double'
So I guess fun needs a double value for E, but do.call() doesn't give the values one by one, but a list.
I don't want to use lapply because it returns a list of list, which, if I want to point at a special list, I have to use [[]], and only single value is allowed in [[]], and I cannot use a vector to point at, e.g. [[a]],with a <- c(1:7).
How to make do.call() work?
That should be:
a <- do.call(fun, list(A = a, B = b, C = c, D = d, E = list(1:7)))
And I have a feeling you want E to be a vector, not a list, in which case it should be:
a <- do.call(fun, list(A = a, B = b, C = c, D = d, E = 1:7))

Resources