I'm trying to modify the data in a data set based on a vector of columns to change. That way I could factorize the treatment based on a config file which would have the list of columns to change as a variable.
Ideally, I'd like to be able to use ddply like that :
column <- "var2"
df <- ddply(df, .(), transform, column = func(column))
The output would be the same dataframe but in the column "B", each letter would have an "A" added behind it
Which would change each element of the column var2 by the element through func (func here is used to trim a chr in a particular way). I've tried several solutions, like :
df[do.call(func, df[,column]), ]
which doesn't accept the df[,column] as argument (not a list), or
param = c("var1", "var2")
for(p in param){
df <- df[func(df[,p]),]
}
which destroys the other data, or
df[, column] <- lapply(df[, column], func)
Which doesn't work because it takes the whole column as argument instead of changing each element 1 by 1. I'm kinda out of ideas on how to make this treatment more automatic.
Example :
df <- data.frame(A=1:10, B=letters[2:11])
colname <- "B"
addA <- function(text) { paste0(text, "A") }
And I would like to do something like this :
df <- ddply(df, .(), transform, colname = addA(colname))
Though if the solution does not use ddply, it's not an issue, it's just what I'm the most used to
You could use mutate_at from package dplyr for this.
library(dplyr)
mutate_at(df, colname, addA)
A B
1 1 bA
2 2 cA
3 3 dA
4 4 eA
5 5 fA
6 6 gA
7 7 hA
8 8 iA
9 9 jA
10 10 kA
Related
I am trying to train a data that's converted from a document term matrix to a dataframe. There are separate fields for the positive and negative comments, so I wanted to add a string to the column names to serve as a "tag", to differentiate the same word coming from the different fields - for example, the word hello can appear both in the positive and negative comment fields (and thus, represented as a column in my dataframe), so in my model, I want to differentiate these by making the column names positive_hello and negative_hello.
I am looking for a way to rename columns in such a way that a specific string will be appended to all columns in the dataframe. Say, for mtcars, I want to rename all of the columns to have "_sample" at the end, so that the column names would become mpg_sample, cyl_sample, disp_sample and so on, which were originally mpg, cyl, and disp.
I'm considering using sapplyor lapply, but I haven't had any progress on it. Any help would be greatly appreciated.
Use colnames and paste0 functions:
df = data.frame(x = 1:2, y = 2:1)
colnames(df)
[1] "x" "y"
colnames(df) <- paste0('tag_', colnames(df))
colnames(df)
[1] "tag_x" "tag_y"
If you want to prefix each item in a column with a string, you can use paste():
# Generate sample data
df <- data.frame(good=letters, bad=LETTERS)
# Use the paste() function to append the same word to each item in a column
df$good2 <- paste('positive', df$good, sep='_')
df$bad2 <- paste('negative', df$bad, sep='_')
# Look at the results
head(df)
good bad good2 bad2
1 a A positive_a negative_A
2 b B positive_b negative_B
3 c C positive_c negative_C
4 d D positive_d negative_D
5 e E positive_e negative_E
6 f F positive_f negative_F
Edit:
Looks like I misunderstood the question. But you can rename columns in a similar way:
colnames(df) <- paste(colnames(df), 'sample', sep='_')
colnames(df)
[1] "good_sample" "bad_sample" "good2_sample" "bad2_sample"
Or to rename one specific column (column one, in this case):
colnames(df)[1] <- paste('prefix', colnames(df)[1], sep='_')
colnames(df)
[1] "prefix_good_sample" "bad_sample" "good2_sample" "bad2_sample"
You can use setnames from the data.table package, it doesn't create any copy of your data.
library(data.table)
df <- data.frame(a=c(1,2),b=c(3,4))
# a b
# 1 1 3
# 2 2 4
setnames(df,paste0(names(df),"_tag"))
print(df)
# a_tag b_tag
# 1 1 3
# 2 2 4
I am having a problem with get() in R.
I have a set of data.frames with a common structure in my environment. I want to loop through these data frames and change the name of the 2nd column so that the name of the 2nd column contains a prefix from the 1st column.
For example, if column 1 = A_cat and column 2 is dog, I want column 2 to be changed to A_dog.
Below is an example of the R code I am using:
df <- data.frame('A_cat'= 1:10 , 'dog' = 11:20)
for( element in grep('^df$', names(environment()), value=TRUE) ) {
colnames(get(element))[2] <- paste(strsplit(colnames(get(element)) [1], '`_`')[[1]][1],
colnames(get(element))[2], sep='`_`')
}
The arguments within the for loop, on either side of the assignment operator, both give the expected result if I run them separately but when run together produce the following error.
Error in colnames(get(element))[2] <- paste(strsplit(colnames(get(element))[1], :
could not find function "get<-"
Any help with this problem would be greatly appreciated.
This does the same thing as the code in the question without using get:
df <- data.frame('A_cat'= 1:10 , 'dog' = 11:20)
e <- environment() ##
df.names <- grep("^df$", names(e), value = TRUE)
# nm is the current data frame name and nms are its column names
for(nm in df.names) {
nms <- names(e[[nm]])
names(e[[nm]])[2] <- paste0(sub("_.*", "_", nms[1]), nms[2])
}
giving:
> df
A_cat A_dog
1 1 11
2 2 12
3 3 13
4 4 14
5 5 15
6 6 16
7 7 17
8 8 18
9 9 19
10 10 20
Keeping the data.frames in a named list as suggested in a comment to the question might be even better. For example, if instead of keeping the data.frames in an environment they were in a list called e
e <- list(df = df)
then omit the line marked ## and the rest works as is.
Here would be one way to accomplish this goal if the data.frames have systematic names (here, df1 df2 df3, etc) and the prefix ends with "_" as in the example:
# suggested by #roland roll them up in a list:
myDfList <- mget(ls(pattern="^df"))
# change names
for(dfName in names(myDfList)) {
names(myDfList[[dfName]])[2] <- paste0(gsub("^(.*_)", "\\1",
names(myDfList[[dfName]])[1]),
names(myDfList[[dfName]])[2])
}
I have two dataframes which look like follows:
df1 <- data.frame(V1 = 1:4, V2 = rep(2, 4), V3 = 7:4)
df2 <- data.frame(V2 = rep(NA, 4), V1 = rep(NA, 4), V3 = rep(NA, 4))
I need to write a function which assigns the values of df1 to df2, if the columnnames of both dataframes are the same. The structure of the function should look like this:
fun <- function(x){
if(# If the name of x is the same like the name of a column in df1)
out <- df1$? # Here I need to assign df1$"x" somehow
out
}
fun(df2$V1)
The output should look like this:
[1] 1 2 3 4
Unfortunately I couldnt find a solution by myself. Is there a way how I could do this? Thank you very much in advance!
I need to write a function which assigns the values of df1 to df2, if
the columnnames of both dataframes are the same.
Are you sure you need a function?
names_in_common <- intersect(names(df1),names(df2))
df2[,names_in_common] <- df1[,names_in_common]
Using Joachim Schork's code:
names_in_common <- intersect(names(df1),names(df2))
df2[,names_in_common] <- df1[,names_in_common]
and if you want to change a single column of df2:
names_in_common <- intersect(names(df1), names(df2[, "V1", drop=FALSE]))
df2[,names_in_common] <- df1[,names_in_common]
This is impossible, because when you access a column of a data.frame using the dollar syntax you lose the column name. There's no way for fun() to determine the column name of the vector that was passed in as an argument.
Instead, you can simply call fun() using the column name itself as the argument, rather than the vector of NAs, which are not useful and not used at all inside the function. In other words, the call becomes
fun('V1');
Then you can write the function as follows:
fun <- function(name) df1[[name]];
Demo:
fun('V1');
## [1] 1 2 3 4
Although now that I think about it, you might as well just index df1 directly, since that's all the function does now:
df1$V1;
## [1] 1 2 3 4
Rereading your question, you said you want to assign the column from df1 to df2, although your example code doesn't do that. Assuming you did want to carry out this assignment inside the function, you could do this:
fun <- function(name) df2[[name]] <<- df1[[name]];
Demo:
fun('V1');
df2;
## V2 V1 V3
## 1 NA 1 NA
## 2 NA 2 NA
## 3 NA 3 NA
## 4 NA 4 NA
This makes use of the superassignment operator <<-.
Is there a way to assign a value to a specific column within a data frame? e.g.,
dat2 = data.frame(c1 = 101:149, VAR1 = 151:200)
j = "dat2[,"VAR1"]" ## or, j = "dat2[,2]"
assign(j,1:50)
The approach above doesn't work. Neither does this:
j = "dat2"
assign(get(j)[,"VAR1"],1:50)
lets assume that we have a valid data.frame with 50 rows in each
dat2 <- data.frame(c1 = 1:50, VAR1 = 51:100)
1 . Don't use assign and get if you can avoid it.
"dat2[,"VAR1"]" is not valid in R.
You can also note this from the help page for assign
assign does not dispatch assignment methods, so it cannot be used to
set elements of vectors, names, attributes, etc.
Note that assignment to an attached list or data frame changes the
attached copy and not the original object: see attach and with.
A column of a data.frame is an element of a list
What you are looking for is [[<-
# assign the values from column (named element of the list) `VAR1`
j <- dat2[['VAR1']]
If you want to assign new values to VAR1 within dat2,
dat2[['VAR1']] <- 1:50
The answer to your question....
To manipulate entirely using character strings using get and assign
assign('dat2', `[[<-`(get('dat2'), 'VAR1', value = 2:51))
Other approaches
data.table::set
if you want to assign by reference within a data.frame or data.table (replacing an existing column only) then set from the data.table package works (even with data.frames)
library(data.table)
set(dat2, j = 'VAR1', value = 5:54)
eval and bquote
dat1 <- data.frame(x=1:5)
dat2 <- data.frame(x=2:6)
for(x in sapply(c('dat1','dat2'),as.name)) {
eval(bquote(.(x)[['VAR1']] <- 2:6))
}
eapply
Or if you use a separate environment
ee <- new.env()
ee$dat1 <- dat1
ee$dat2 <- dat2
# eapply returns a list, so use list2env to assign back to ee
list2env(eapply(ee, `[[<-`, 'y', value =1:5), envir = ee)
set2 <- function(x, val) {
eval.parent(substitute(x <- val))
}
> dat2 = data.frame(c1 = 101:150, VAR1 = 151:200)
> set2(dat2[["VAR1"]], 1:50)
> str(dat2)
'data.frame': 50 obs. of 2 variables:
$ c1 : int 101 102 103 104 105 106 107 108 109 110 ...
$ VAR1: int 1 2 3 4 5 6 7 8 9 10 ...
In my code, I am filling the columns of a dataframe with vectors, as so:
df1[columnNum] <- barWidth
This works fine, except for one thing: I want the name of the vector variable (barWidth above) to be retained as the column header, one column at a time. Furthermore, I do not wish to use cbind. This slows the execution of my code down considerably. Consequently, I am using a pre-allocated dataframe.
Can this be done in the vector-to-column assignment? If not, then how do I change it after the fact? I can't find the right syntax to do this with colNames().
TIA
It's being done by the [<-.data.frame function. It could conceivably be replaced by one that looked at the name of the argument but it's such a fundamental function I would be hesitant. Furthermore there appears to be an aversion to that practice signaled by this code at the top of the function definition:
> `[<-.data.frame`
function (x, i, j, value)
{
if (!all(names(sys.call()) %in% c("", "value")))
warning("named arguments are discouraged")
nA <- nargs()
if (nA == 4L) {
<snipped rest of rather long definition>
I don't know why that is there, but it is. Maybe you should either be thinking about using names<- after the column assignment, or using this method:
> dfrm["barWidth"] <- barWidth
> dfrm
a V2 barWidth
1 a 1 1
2 b 2 2
3 c 3 3
4 d 4 4
This can be generalized to a list of new columns:
dfrm <- data.frame(a=letters[1:4])
barWidth <- 1:4
newcols <- list(barWidth=barWidth, bw2 =barWidth)
dfrm[names(newcol)] <- newcol
dfrm
#
a barWidth bw2
1 a 1 1
2 b 2 2
3 c 3 3
4 d 4 4
If you have the list of names of vectors you want to apply you could do:
namevec <- c(...,"barWidth"...,)
columnNums <- c(...,10,...)
df1[columnNums[i]] <- get(namevec[i])
names(df1)[columnNums[i]] <- namevec[i]
or even
columnNums <- c(barWidth=4,...)
for (i in seq_along(columnNums)) {
df1[columnNums[i]] <- get(names(columnNums)[i])
}
names(df1)[columnNums] <- names(columnNums)
but the deeper question would be where this set of vectors is coming from in the first place: could you have them in a list all along?
I'd simply use cbind():
df1 <- cbind( df1, barWidth )
which retains the name. It will, however, end up as the last column in df1