R: create new column with name coming from variable - r

I have a dataframe with several columns and would like to add a new column and name it according to a previous variable. For example:
df <- data.frame("A" = c(1, 2, 3, 4), "B" = c("a", "c", "d", "b"))
Variable <- "C"
This is part of a function where the variable will be changing and rather than each time specifying:
df$C <- NA
I would like a one line that will take the "Variable" to name the additional column

Try [ instead of $:
> df[, Variable] <- NA
> df
A B C
1 1 a NA
2 2 c NA
3 3 d NA
4 4 b NA

In the context of a data.frame name also taken in a variable this might be helpful.
df <- data.frame("A" = c(1, 2, 3, 4), "B" = c("a", "c", "d", "b") )
Variable<-"C"
dfname<-"df"
df<-within ( assign(dfname , get(dfname) ),
assign(Variable, NA )
)

Related

Select for every row between two columns based on condition in another column in R

may someone help me to find the answer thread or provide a method for solution? I can not find a solution.
What I want to do:
For every row if the value in column "x" is "A" then select the value in column "y" from the same row and if the value in column "x" is "B" then select the value in column "z" from the same row.
Ideally collected in a vector to include as a new column in the df afterwards.
df <- data.frame(x = c("A", "B", "B", "A"), y = c(1,2,3,4), z = c(4,3,2,1), fix.empty.names = FALSE)
df
x y z
1 A 1 4
2 B 2 3
3 B 3 2
4 A 4 1
result
[1] 1 3 2 4
Thank you very much in advance
If we can assume x is always "A" or "B":
ifelse(df$x == "A", df$y, df$z)
More generally:
ifelse(df$x == "A", df$y, ifelse(df$x == "B", df$z, NA))
You can, of course, assign this directly as a new column: df$result <- ifelse...
If you like dplyr:
library(dplyr)
df %>%
mutate(
result = case_when(
x == "A" ~ y,
x == "B" ~ z,
TRUE ~ NA_real_
)
)

Using grep in list in order to fill a new df in R

Hello I have a list :
list=c("OK_67J","GGT_je","Ojj_OK_778","JUu3","JJE")
and i would like to transforme it as a df :
COL1 COL2
OK_67J A
GGT_je B
Ojj_OK_778 A
JUu3 B
JJE B
where I add a A if there is the OK_pattern and B if not.
I tried :
COL2<-rep("Virus",length(list))
list[grep("OK_",tips)]<-"A"
df <- data.frame(COL1=list,COL2=COL2)
Use grepl :
ifelse(grepl('OK_', list), "A", "B")
#[1] "A" "B" "A" "B" "B"
You can also do it without ifelse :
c("B", "A")[grepl('OK_', list) + 1]
It is better to not use variable name as list since it's a default function in R.
When you exchange your list[grep("OK_",tips)]<-"A" with COL2[grep("OK_",list)] <- "A" your solution will work.
list <- c("OK_67J", "GGT_je", "Ojj_OK_778", "JUu3", "JJE")
COL2 <- rep("B", length(list))
COL2[grep("OK_", list)] <- "A"
df <- data.frame(COL1 = list, COL2 = COL2)
df
# COL1 COL2
#1 OK_67J A
#2 GGT_je B
#3 Ojj_OK_778 A
#4 JUu3 B
#5 JJE B
First off, list is not a list but a character vector:
list=c("OK_67J","GGT_je","Ojj_OK_778","JUu3","JJE")
class(list)
[1] "character"
To transform it to a dataframe:
df <- data.frame(v1 = list)
To add the new column use grepl:
df$v2 <- ifelse(grepl("OK_", df$v1), "A", "B")
or use str_detect:
library(stringr)
df$v2 <- ifelse(str_detect(df$v1, "OK_"), "A", "B")
Result:
df
v1 v2
1 OK_67J A
2 GGT_je B
3 Ojj_OK_778 A
4 JUu3 B
5 JJE B

How to paste vector elements comma-separated and in quotation marks?

I want to select columns of data frame dfr by their names in a certain order, that i obtain with the numbers in first place.
> (x <- names(dfr)[c(3, 4, 2, 1, 5)])
[1] "c" "d" "b" "a" "e"
In the final code there only should be included the names version, because it's safer.
dfr[, c("c", "d", "b", "a", "e")
I want to paste the elements separated with commas and quotation marks into a string, in order to include it into the final code. I've tried a few options, but they don't give me what I want:
> paste(x, collapse='", "')
[1] "c\", \"d\", \"b\", \"a\", \"e"
> paste(x, collapse="', '")
[1] "c', 'd', 'b', 'a', 'e"
I need something like "'c', 'd', 'b', 'a', 'e'",—of course "c", "d", "b", "a", "e" would be much nicer.
Data
dfr <- setNames(data.frame(matrix(1:15, 3, 5)), letters[1:5])
So dput(x) is the correct answer but just in case you were wondering how to achieve this by modifying your existing code you could do something like the following:
cat(paste0('c("', paste(x, collapse='", "'), '")'))
c("c", "d", "b", "a", "e")
Can also be done with packages (as Tung has showed), here is an example using glue:
library(glue)
glue('c("{v}")', v = glue_collapse(x, '", "'))
c("c", "d", "b", "a", "e")
Try vector_paste() function from the datapasta package
library(datapasta)
vector_paste(input_vector = letters[1:3])
#> c("a", "b", "c")
vector_paste_vertical(input_vector = letters[1:3])
#> c("a",
#> "b",
#> "c")
Or, using base R, this gives you what you want:
(x <- letters[1:3])
q <- "\""
( y <- paste0("c(", paste(paste0(q, x, q), collapse = ", ") , ")" ))
[1] "c(\"a\", \"b\", \"c\")"
Though I'm not realy sure why you want it? Surely you can simply subset like this:
df <- data.frame(a=1:3, b = 1:3, c = 1:3)
df[ , x]
a b c
1 1 1 1
2 2 2 2
3 3 3 3
df[ , rev(x)]
c b a
1 1 1 1
2 2 2 2
3 3 3 3
suppose you want to add a quotation infront and at the end of a text, and save it as an R object - use the capture.output function from utils pkg.
Example. I want ABCDEFG to be saved as an R object as "ABCDEFG"
> cat("ABCDEFG")
> ABCDEFG
> cat("\"ABCDEFG\"")
> "ABCDEFG"
>
#To save output of the cat as an R object including the quotation marks at the start and end of the word use the capture.ouput
> add_quote <- capture.output(cat("\"ABCDEFG\""))
> add_quote
[1] "\"ABCDEFG\""

How to convert a factor to numeric in a predefined order in R

I have a factor column, with three values: "b", "c" and "free".
I did
df$new_col = as.numeric (df$factor_col)
But it will convert "b" to 1, "c" to 2 and "free" to 3.
But I want to convert "free" to 0, "b" to 2 and "c" to 5. How can I do it in R?
Thanks a lot
f <- factor(c("b", "c", "c", "free", "b", "free"))
You can try renaming the factor levels,
levels(f)[levels(f)=="b"] <- 2
levels(f)[levels(f)=="c"] <- 5
levels(f)[levels(f)=="free"] <- 0
> f
#[1] 2 5 5 0 2 0
#Levels: 2 5 0
One option would be to call the 'factor' again and specify the levels and labels argument based on the custom order and change to numeric after converting to 'character' or through the levels
df$new_col <- as.numeric(as.character(factor(df$factor_col,
levels=c('b', 'c', 'free'), labels=c(2, 5, 0))))
Another option is recode from library(car). The output will be factor class. If we need to convert to 'numeric', we can do this as in the earlier solution (as.numeric(..).
library(car)
df$new_col <- with(df, recode(factor_col, "'b'=2; 'c'=5; 'free'=0"))
data
df <- data.frame(factor_col= c('b', 'c', 'b', 'free', 'c', 'free'))
You can use the following approach to create the new column:
# an example data frame
f <- data.frame(factor_col = c("b", "c", "free"))
# create new_col
f <- transform(f, new_col = (factor_col == "b") * 2 + (factor_col == "c") * 5)
The result (f):
factor_col new_col
1 b 2
2 c 5
3 free 0

Merge and paste duplicate columns in R

Suppose I have two data frames with some common variable x:
df1 <- data.frame(
x=c(1, 2, 3, 4),
y=c("a", "b", "c", "d")
)
df2 <- data.frame(
x=c(1, 1, 2, 2, 3, 4, 5),
z=c("A", "B", "C", "D", "E", "F", "G")
)
We can assume that each entry of the variable we're merging over, x, appears exactly once in df1; however, it may appear an arbitrary number of times in df2.
I want to merge df2 'into' df1, while preserving df1. Is there a fast way of merging these two data frames such that the merged output would be of the form (for example):
df_merged <- data.frame(
x=c(1, 2, 3, 4),
y=c("a", "b", "c", "d"),
z=c("A B", "C D", "E", "F")
)
Essentially, I want df_merged to be a composition of the original df1, in addition to any variables in df2 coerced to match the format of df1. The various incantations of merge will append new rows to the merged output, which I want to avoid.
We can assume that each entry of the variable we're merging over, x, appears exactly once.
Speed is also a priority since I'll be merging fairly large data frames.
merge( df1,
aggregate(df2$z , df2[1], FUN=paste, collapse=" ", sep=""),
by.x="x", by.y=1)
x y x
1 1 a A B
2 2 b C D
3 3 c E
4 4 d F
Warning message:
In merge.data.frame(df1, aggregate(df2$z, df2[1], FUN = paste, collapse = " ", :
column name ‘x’ is duplicated in the result
> M1 <- .Last.value
> names(M1)[3] <- "z"
> M1
x y z
1 1 a A B
2 2 b C D
3 3 c E
4 4 d F
Another option:
df2.z <- with(df2, tapply(z, x, paste, collapse=' '))
transform(df1, z=df2.z[match(x, names(df2.z))])
# x y z
# 1 1 a A B
# 2 2 b C D
# 3 3 c E
# 4 4 d F
If df1$x is in order, then use df2.z[names(df2.z) %in% x] in the transform statement.
I'm submitting this question with my own potential answer, but it is fairly slow and I'm curious what other methods might be available.
by <- "x"
df2_processed <- as.data.frame(
sapply( names(df2), function(x) {
tapply( df2[[x]], df2[[by]], function(xx) {
if( x == by ) {
return(xx[1])
} else {
paste(xx, collapse=" ")
}
})
}), optional=TRUE, stringsAsFactors=FALSE )
merge( df1, df2_processed, all.x=TRUE )

Resources