My goal is to be able to allocate column names to a data frame that I create based on a passed variable. For instance:
i='column1'
data.frame(i=1)
i
1 1
Above the column name is 'i' when I want it to be 'column1'. I know the following works but isn't as efficient as I'd like:
i='column1'
df<-data.frame(x=1)
setnames(df,i)
column1
1 1
It's good to learn how base R works this way:
i <- 'cloumn1'
df <- `names<-`(data.frame(1), i)
df
# cloumn1
#1 1
Aside from the answers posted by other users, I think you may be stuck with the solution you've already presented. If you already have a data frame with the intended number of rows, you can add a new column using brackets:
df <- data.frame('column1'=1)
i <- 'column2'
df[[i]] <- 2
df
column1 column2
1 2
If the idea is to get rid of the setNames, you would probably never do this but
i <- 'column1'
data.frame(`attr<-`(list(1), "names", i))
# column1
# 1 1
You can see in data.frame, it has the code
x <- list(...)
vnames <- names(x)
so, you can mess with the name attribute.
Not exactly sure how you want it more efficient but you could add all the column names at once after your data frame has been assembled with colnames. Here's an example based on yours.
data.frame(Td)
a b
1 1 4
2 1 5
nam<-c("Test1","Test2")
colnames(Td)<-nam
data.frame(Td)
Test1 Test2
1 1 4
2 1 5
You could simply pass the name of your column variable and its values as arguments to a dataframe, without adding more lines:
df <- data.frame(column1=1)
df
# column1
#1 1
Related
I'm trying to do something that I thought would be pretty simple that has me stumped.
Say I have the following data frame:
id <- c("bob_geldof", "billy_bragg", "melvin_smith")
code <- c("blah", "di", "blink")
df <- as.data.frame(cbind(id,code))
> df
id code
1 bob_geldof blah
2 billy_bragg di
3 melvin_smith blink
And another like this:
ID1 <- c("bob_geldof", "melvin_smith")
ID2 <- c("the_builder", "kelvin")
alternates <- as.data.frame(cbind(ID1, ID2))
> alternates
ID1 ID2
1 bob_geldof the_builder
2 melvin_smith kelvin
If the character string in df$id matches alternates$ID1, I'd like to replace it with alternates$ID2. If it doesn't match I'd like to just leave it as it is.
The final df should look like
> df
id code
1 bob_the_builder blah
2 billy_bragg di
3 melvin_kelvin blink
This is obviously a silly example and my real dataset requires lots of replacements.
I've included the 'code' column to demonstrate that I'm working with a data frame and not just a character vector.
I’ve been using gsub to replace them individually but it's time consuming and the list keeps changing.
I looked into str_replace but it seems you can only specify one replacement value.
Any help would be much appreciated.
Cheers!
EDIT: Not all ids contain underscores, and I need to retain the bit that does match. E.g. bob_geldolf becomes bob_the_builder.
EDIT 2(!): Thanks for your suggestions everyone. I've got round the problem by merging the data frames (so that there are NAs where there's no change to be made), and creating new IDs using an ifelse statement. It's a bit clunky but it works!
When creating the dataframes use stringsAsFactors = FALSE so as to not deal with factors. Then, if the rows are ordered, just apply:
df <- as.data.frame(cbind(id,code),stringsAsFactors = FALSE)
alternates <- as.data.frame(cbind(ID1, ID2),stringsAsFactors = FALSE)
df$id[c(TRUE,FALSE)]=paste(gsub("(.*)(_.*)","\\1",df$id[c(TRUE,FALSE)]),
alternates$ID2,sep="_")
> df
id code
1 bob_the_builder blah
2 billy_bragg di
3 melvin_kelvin blink
If they are unordered, we can use dlyr:
df%>%rowwise()%>%mutate(id=if_else(length(which(alternates$ID1==id))>0,
paste(gsub("(.*)(_.*)","\\1",id),
alternates$ID2[which(alternates$ID1==id)],sep="_"),
id))
# A tibble: 3 x 2
id code
<chr> <chr>
1 bob_the_builder blah
2 billy_bragg di
3 melvin_kelvin blink
We are using the same logic as before. Here we check the df by row. If its id matches any of alternatives$ID1 (checked by length()), we update it.
The following solution uses base-R and is streamlined a bit. Step 1: merge the main "df" and the "alternates" df together, using a left-join. Step 2: check where there the ID2 value is not missing (NA) and then assign those values to "id". This will keep your original id where available; and replace it with ID2 where those matching IDs are available
The solution:
combined <- merge(x=df,y=alternates,by.x="id",by.y="ID1",all.x=T)
combined$id[!is.na(combined$ID2)] <- combined$ID2[!is.na(combined$ID2)]
With full original data frame definitions (using stringsAsFactors=F):
id <- c("bob_geldof", "billy_bragg", "melvin_smith")
code <- c("blah", "di", "blink")
df <- as.data.frame(cbind(id,code),stringsAsFactors = F)
ID1 <- c("bob_geldof", "melvin_smith")
ID2 <- c("the_builder", "kelvin")
alternates <- as.data.frame(cbind(ID1, ID2),stringsAsFactors = F)
combined <- merge(x=df,y=alternates,by.x="id",by.y="ID1",all.x=T)
combined$id[!is.na(combined$ID2)] <- combined$ID2[!is.na(combined$ID2)]
Results: (the full merge below, you can also do combined[,c("id","code")] for the streamlined results). Here, the non-matching "billy_bragg" is kept; and the others are replaced with the matched ID
> combined
id code ID2
1 billy_bragg di <NA>
2 the_builder blah the_builder
3 kelvin blink kelvin
I want to map the FactorName in the dataframe FName to the column header names of Stack. Ie Factor1 in Stack is actually named Value, Factor 2 is Leverage etc. I have a large dataset so manually renaming is not an option.
Stack <- data.frame(rowid=1:3, Factor1=2:4, Factor2=3:5, Factor3=4:6)
FName <- data.frame(FactorID=c("Factor1","Factor2","Factor3"), FactorName=c("Value","Leverage","Growth"))
Thanks.
How about this using match:
Stack <- data.frame(rowid=1:3, Factor1=2:4, Factor2=3:5, Factor3=4:6)
FName <- data.frame(
FactorID=c("Factor1","Factor2","Factor3"),
FactorName=c("Value","Leverage","Growth"))
# Matching entries from FName
colnames(Stack) <- ifelse(
!is.na(FName$FactorName[match(colnames(Stack), FName$FactorID)]),
as.character(FName$FactorName[match(colnames(Stack), FName$FactorID)]),
colnames(Stack));
Stack;
# rowid Value Leverage Growth
#1 1 2 3 4
#2 2 3 4 5
#3 3 4 5 6
Explanation: We match column names of Stack and entries from FName$FactorID. If there is a match, replace with FName$FactorName, else keep the original column name.
if we have factor names handy then we can use that to change the column names
colnames(Stack) <- "facotor header file"
Another approach using match, but using indexing instead of ifelse
# Get indices of matches
m <- match(names(Stack), FName$FactorID)
# replace names where a match is found.
names(Stack)[!is.na(m)] <- as.character(FName$FactorName[m[!is.na(m)]])
I am trying to train a data that's converted from a document term matrix to a dataframe. There are separate fields for the positive and negative comments, so I wanted to add a string to the column names to serve as a "tag", to differentiate the same word coming from the different fields - for example, the word hello can appear both in the positive and negative comment fields (and thus, represented as a column in my dataframe), so in my model, I want to differentiate these by making the column names positive_hello and negative_hello.
I am looking for a way to rename columns in such a way that a specific string will be appended to all columns in the dataframe. Say, for mtcars, I want to rename all of the columns to have "_sample" at the end, so that the column names would become mpg_sample, cyl_sample, disp_sample and so on, which were originally mpg, cyl, and disp.
I'm considering using sapplyor lapply, but I haven't had any progress on it. Any help would be greatly appreciated.
Use colnames and paste0 functions:
df = data.frame(x = 1:2, y = 2:1)
colnames(df)
[1] "x" "y"
colnames(df) <- paste0('tag_', colnames(df))
colnames(df)
[1] "tag_x" "tag_y"
If you want to prefix each item in a column with a string, you can use paste():
# Generate sample data
df <- data.frame(good=letters, bad=LETTERS)
# Use the paste() function to append the same word to each item in a column
df$good2 <- paste('positive', df$good, sep='_')
df$bad2 <- paste('negative', df$bad, sep='_')
# Look at the results
head(df)
good bad good2 bad2
1 a A positive_a negative_A
2 b B positive_b negative_B
3 c C positive_c negative_C
4 d D positive_d negative_D
5 e E positive_e negative_E
6 f F positive_f negative_F
Edit:
Looks like I misunderstood the question. But you can rename columns in a similar way:
colnames(df) <- paste(colnames(df), 'sample', sep='_')
colnames(df)
[1] "good_sample" "bad_sample" "good2_sample" "bad2_sample"
Or to rename one specific column (column one, in this case):
colnames(df)[1] <- paste('prefix', colnames(df)[1], sep='_')
colnames(df)
[1] "prefix_good_sample" "bad_sample" "good2_sample" "bad2_sample"
You can use setnames from the data.table package, it doesn't create any copy of your data.
library(data.table)
df <- data.frame(a=c(1,2),b=c(3,4))
# a b
# 1 1 3
# 2 2 4
setnames(df,paste0(names(df),"_tag"))
print(df)
# a_tag b_tag
# 1 1 3
# 2 2 4
I am new to R. I am trying to use the "write.csv" command to write a csv file in R. Unfortunately, when I do this, the resulting data frame produces colnames with a prefix X in it eventhough the file already has a column name.
It produces, X_name1 ,X_name2
Please kindly tell me your suggestions
I have added an example code similar to my data.
a<- c("1","2")
b <- c("3","4")
df <- rbind(a,b)
df <- as.data.frame(df)
names(df) <- c("11_a","12_b")
write.csv(df,"mydf.csv")
a <- read.csv("mydf.csv")
a
#Result
X X11_a X12_b
1 a 1 2
2 b 3 4
All I need is to have only "11_a" and "12_b" as column names. But it incudes prefix X also.
Use check.names=FALSE when reading your data back in - names starting with numbers are not generally acceptable in R:
read.csv(text="11_a,12_b
a,1,2
b,3,4", check.names=FALSE)
# 11_a 12_b
#a 1 2
#b 3 4
read.csv(text="11_a,12_b
a,1,2
b,3,4", check.names=TRUE)
# X11_a X12_b
#a 1 2
#b 3 4
All you have to do is add header=TRUE to your code when you read in the .csv file. It would look like:
a <- read.csv("mydf.csv", header=TRUE)
How do I add a column in the middle of an R data frame? I want to see if I have a column named "LastName" and then add it as the third column if it does not already exist.
One approach is to just add the column to the end of the data frame, and then use subsetting to move it into the desired position:
d$LastName <- c("Flim", "Flom", "Flam")
bar <- d[c("x", "y", "Lastname", "fac")]
1) Testing for existence: Use %in% on the colnames, e.g.
> example(data.frame) # to get 'd'
> "fac" %in% colnames(d)
[1] TRUE
> "bar" %in% colnames(d)
[1] FALSE
2) You essentially have to create a new data.frame from the first half of the old, your new column, and the second half:
> bar <- data.frame(d[1:3,1:2], LastName=c("Flim", "Flom", "Flam"), fac=d[1:3,3])
> bar
x y LastName fac
1 1 1 Flim C
2 1 2 Flom A
3 1 3 Flam A
>
Of the many silly little helper functions I've written, this gets used every time I load R. It just makes a list of the column names and indices but I use it constantly.
##creates an object from a data.frame listing the column names and location
namesind=function(df){
temp1=names(df)
temp2=seq(1,length(temp1))
temp3=data.frame(temp1,temp2)
names(temp3)=c("VAR","COL")
return(temp3)
rm(temp1,temp2,temp3)
}
ni <- namesind
Use ni to see your column numbers. (ni is just an alias for namesind, I never use namesind but thought it was a better name originally) Then if you want insert your column in say, position 12, and your data.frame is named bob with 20 columns, it would be
bob2 <- data.frame(bob[,1:11],newcolumn, bob[,12:20]
though I liked the add at the end and rearrange answer from Hadley as well.
Dirk Eddelbuettel's answer works, but you don't need to indicate row numbers or specify entries in the lastname column. This code should do it for a data frame named df:
if(!("LastName" %in% names(df))){
df <- cbind(df[1:2],LastName=NA,df[3:length(df)])
}
(this defaults LastName to NA, but you could just as easily use "LastName='Smith'")
or using cbind:
> example(data.frame) # to get 'd'
> bar <- cbind(d[1:3,1:2],LastName=c("Flim", "Flom", "Flam"),fac=d[1:3,3])
> bar
x y LastName fac
1 1 1 Flim A
2 1 2 Flom B
3 1 3 Flam B
I always thought something like append() [though unfortunate the name is] should be a generic function
## redefine append() as generic function
append.default <- append
append <- `body<-`(args(append),value=quote(UseMethod("append")))
append.data.frame <- function(x,values,after=length(x))
`row.names<-`(data.frame(append.default(x,values,after)),
row.names(x))
## apply the function
d <- (if( !"LastName" %in% names(d) )
append(d,values=list(LastName=c("Flim","Flom","Flam")),after=2) else d)