I have a data frame with the default column names V1, V2, V3, V230, etc.
I have another data frame which has 2 columns, one containing V1, V2, V3 etc and the second column containing a character string.
I would like to rename the columns in the forst data frame using the second data frame as a lookup table.
Note that the first data frame has less columns than are listed in the second "lookup" data frame.
Any ideas?
We can use match
colnames(firstdat) <- seconddat[,2][match(colnames(firstdat),
seconddat[,1], nomatch=0)]
Say the first data frame is x and second one is y:
colnames(x) <- merge(data.frame(colnames(x)), y, by.x ="colnames.x.", by.y= "Col1" )[,2]
### Col1 is the name of column 1 of *y* (containing V1, V2 etc)
Consider using data.table
library(data.table)
DT <- as.data.table(df)
temp.lookup <- lookup[lookup$oldnames %in% names(DT), ]
setnames(DT, old = temp.lookup$oldnames, new = temp.lookup$newnames)
Related
I have two data tables as below:
library(data.table)
x <- data.table(id = c(1,1,1,2,2,2,3,3,3,4,4,4), date = as.Date(c("2015-5-26","2015-6-15","2015-4-03","2015-5-26","2015-6-15","2015-4-03","2015-5-26","2015-6-15","2015-4-03","2015-5-26","2015-6-15","2015-4-03")))
y <- data.table(id=c(1,2,3,4),new_id=c(10,20,30,40))
As mentioned now I want to append the new_id column in the data table x and then later drop column id .
I can do this by
merge(x,y,by="id")
But I wanted to try the lapply .
So I tried
x[,new_id:=0]
nm <- c("new_id")
x[nm] <- lapply(nm, function(z) y[[z]][match(y$id, x$id)])
Also which method will be good if I have wide columns and more rows.
It does not matches the column it seems.
Also which method will be efficient if I have wide columns and more rows.
Any help is appreciated.
There are 3 data frames. The ID variable is in the 12th column of each data frame. I created a vector list_cc_q1 that contains all the unique IDs across all data frames (hence each entry in this vector appears in the 12th column of at least one data frame).
I wish to create a vector v1 that adds, for each ID, the values in the 7th column from each data frame which contains that ID (hence v1 would be of the same length as list_cc_q1). Here's the code I'm using:
f1 <- function(x,y){
ifelse(length(get(y)[which(get(y)[x,12]),7])>0, get(y)[which(get(y)[x,12]),7], 0)}
g1 <- function(x){sum(sapply(ls()[1:3], function(y){ f1(x,y)}))}
v1 <- sapply(list_cc_q1, function(z){ g1(z) })
This returns the following error:
Error in get(y)[x, 12] : incorrect number of dimensions
Called from: which(get(y)[x, 12])
I think I've overcomplicated the code, a simpler method will be immensely helpful.
But why doesn't this work?
Not sure I understand correctly, but how about:
library(data.table)
dt <- data.table(value = c(df1[[7]],df2[[7]],df3[[7]]), id = c(df1[[12]],df2[[12]],df3[[12]]))
dt[, .(sum = sum(value)), by = id]
This concatenates the 7th column of each of the three data.frames (df1, df2, df3) to a value column and the 12th column of each of the data.frames (df1, df2, df3) to an id column to form a data.table with two columns (value and id). It then sums the value column by the id column.
EDIT: Your code might not work because of the
ls()[1:3]
The ls() command is executed in the function-environment which does not contain your three data.frames if I see this correctly. You can see this by comparing the following:
ls()[1:3]
# [1] "df1" "df2" "df3"
function_ls <- function(){cat(ls()[1:3])}
function_ls()
# NA NA NA
I have a data set that looks similar to the image shown below. Total, it is over a 1000 observations long. I want to create a new data frame that separates the single variable into 3 variables. Each variable is separated by a "+" in each observation, so it will need to be separated by using that as a factor.
Here is a solution using data.table:
library(data.table)
# Data frame
df <- data.frame(MovieId.Title.Genres = c("yyyy+xxxx+wwww", "zzzz+aaaa+aaaa"))
# Data frame to data table.
df <- data.table(df)
# Split column into parts.
df[, c("MovieId", "Title", "Genres") := tstrsplit(MovieId.Title.Genres, "\\+")]
# Print data table
df
I'll assume that your movieData object is a single column data.frame object.
If you want to split a single element from your data set, use strsplit using the character + (which R wants to see written as "\\+"):
# split the first element of movieData into a vector of strings:
strsplit(as.character(movieData[1,1]), "\\+")
Use lapply to apply this to the entire column, then massage the resulting list into a nice, usable data.frame:
# convert to a list of vectors:
step1 = lapply(movieData[,1], function(x) strsplit(as.character(x), "\\+"))
# step1 is a list, so make it into a data.frame:
step2 = as.data.frame(step1)
# step2 is a nice data.frame, but its names are garbage. Fix it:
movieDataWithColumns = setNames(step2, c("MovieId", "Title", "Genres"))
I have 4 data frames all with the same number of columns and identical column names.
The order of the columns is different.
I want to combine all 4 data frames together and match them with the column name.
Working Azure ML - This was the best option I found to automate this merge.
df <- maml.mapInputPort(1)
df2 <- maml.mapInputPort(2)
if (length(df2.toAdd <- setdiff (names(df), names(df2))))
df2[, c(df2.toAdd) := NA]
if (length(df.toAdd <- setdiff (names(df2), names(df))))
df[, c(df.toAdd) := NA]
df3 <- rbind(df, df2, use.names=TRUE)
maml.mapOutputPort("df3");
Suppose your 4 data frames are named df1, df2, df3 and df4, since the number of columns and the column names are identical, then why not this:
cl <- sort(colnames(df1))
mrg <- rbind(df1[,cl], df2[,cl], df3[,cl], df4[,cl])
If you want to have them in a specific order of columns, for example the order of columns in df2, then you can do this:
mrg <- mrg[,colnames(df2)]
I have the following data frame df2 and a vector n. How can I create a new data frame where df2 column names are same as vector n
df2 <- data.frame(x1=c(1,6,3),x2=c(4,3,1),x3=c(5,4,6),x4=c(7,6,7))
n<-c("x1","x4")
Any of these would work:
df2[n]
df2[, n] # see note below for caveat
subset(df2, select = n)
Note that in the second one if n can be of length one, i.e. one column, then it returns a vector rather than a data frame and if you want it to always return a data frame you would need instead:
df2[, n, drop = FALSE]
df3 <- subset(df2, select=c("x1", "x4"))
df3
hope it helps