I am trying to re-order columns in R using a for loop since the column range needs to be dynamic. Does anyone know what is missing from my code?
Group <- c("A","B","C","D")
Attrib1 <- c("x","y","x","z")
Attrib2 <- c("q","w","u","i")
Day1A <- c(5,4,6,3)
Day2A <- c(6,5,7,4)
Day3A <- c(9,8,10,7)
Day1B <- c(4,3,5,2)
Day2B <- c(3,2,4,1)
Day3B <- c(2,1,3,0)
df <- data.frame(Group, Attrib1,Attrib2,Day1A,Day2A,Day3A,Day1B,Day2B,Day3B)
day_count <- 3
for(i in 4:ncol(df)) {
if (i == day_count+3) break
df[c(i,day_count+i)]
}
Here is my desired result:
df <- data.frame(Group, Attrib1,Attrib2,Day1A,Day1B,Day2A,Day2B,Day3A,Day3B)
So, in theory you can just do sort(colnames(df)[4:ncol(df)]) to get that, but it gets tricky when you have say Day1A..Day10A..Day20A
Below is a quick workaround, to get the numbers and alphabets:
COLS = colnames(df)[4:ncol(df)]
day_no = as.numeric(gsub("[^0-9]","",COLS))
day_letter = gsub("Day[0-9]*","",COLS)
o = order(day_no,day_letter)
To get your final dataframe:
df[,c(colnames(df)[1:3],COLS[o])]
An option with select
library(dplyr)
library(stringr)
df %>%
select(Group, starts_with('Attrib'),
names(.)[-(1:3)][order(str_remove_all(names(.)[-(1:3)], '\\D+'))])
Related
I want to make names based on the number of iterations in an R loop. For example, I want my columns in the data frame to be called "column_1", "column_2" and so on. So far, I have tried the following code, but it does not work:
df = data.frame(rep(0, 5))
for (i in 1:5) {
df = cbind(df, paste0("column_", i) = rnorm(5))
}
Also, please note that if it did work, I would need to remove the first column using:
df = df[,-1]
What is the best way to avoid creating such an initial column? I created it because an empty data frame df = data.frame() does not take a new column when using df = cbind(df, rnorm(5)) because of the mismatch between the number of rows.
Try like this
df = list()
for (i in 1:5) {
df[[paste0("column_", i)]] = rnorm(5)
}
do.call('data.frame', df)
column_1 column_2 column_3 column_4 column_5
1 -0.47624689 0.1192452 1.6756969 -0.5739735 0.05974994
2 -0.78860284 0.2436874 -0.4411632 0.6179858 -0.70459646
3 -0.59461727 1.2324759 -0.7230660 1.1098481 -0.71721816
4 1.65090747 -0.5160638 -1.2362731 0.7075884 0.88465050
5 -0.05402813 -0.9925072 -1.2847157 -0.3636573 -1.01559258
Alternatively, in order to pre-allocate df, I might try also this
df = vector('list', 5)
names(df) = paste0("column_", 1:5)
for(i in 1:5) df[[i]] = rnorm(5)
do.call('data.frame', df)
Alternativ to the existing solution, you can allocate the appropiate number of rows beforehand using:
len <- 5
df <- data.frame(numeric(len))
for (i in 1:10){
df[paste0("column_",i)] <- rnorm(len)
}
df[[1]] <- NULL
df
I have two data frames:
DF <- data.frame(A=letters[1:5],B=1:5)
DF_2 <- data.frame(match_col = c("a","a","c"))
Here we have to get only matching columns of DF_2$match_col
final_df <- data.frame(A=c("a","a","c","d","e"),B=1:5)
Your question here is not very clear. For youR DF_2, I am not sure if there is a column of B in it. I assume you forgot to include it, as I assume you need that column to perform matching.
Please see below:
DF <- data.frame(A=letters[1:5],B=1:5)
DF_2 <- data.frame(match_col = c("a","a","c"))
DF_2$B=c(1:3)
DF$A= as.character(DF$A)
DF_2$match_col= as.character(DF_2$match_col)
for(id in 1:nrow(DF_2)){
DF$A[DF$B %in% DF_2$B[id]] <- DF_2$match_col[id]
}
DF
Here my DF matches with your final_df, therefore I presume my assumption is right.
I have this loop to compute the mean per column, which works.
for (i in 1:length(DF1)) {
tempA <- DF1[i] # save column of DF1 onto temp variable
names(tempA) <- 'word' # label temp variable for inner_join function
DF2 <- inner_join(tempA, DF0, by='word') # match words with numeric value from look-up DF0
tempB <- as.data.frame(t(colMeans(DF2[-1]))) # compute mean of column
DF3<- rbind(tempB, DF3) # save results togther
}
The script uses the dplyr package for inner_join.
DF0 is the look-up database with 3 columns (word, value1, value2, value3).
DF 1 is the text data with one word per cell.
DF3 is the output.
Now I want to compute the median instead of the mean. It seemed easy enough with the colMedians function from 'robustbase', but I can't get the below to work.
library(robustbase)
for (i in 1:length(DF1)) {
tempA <- DF1[i]
names(tempA) <- 'word'
DF2 <- inner_join(tempA, DF0, by='word')
tempB <- as.data.frame(t(colMedians(DF2[-1])))
DF3<- rbind(tempB, DF3)
}
The error message reads:
Error in colMedians(tog[-1]) : Argument 'x' must be a matrix.
I've tried to format DF2 as a matrix prior to the colMedians function, but still get the error message:
Error in colMedians(tog[-1]) : Argument 'x' must be a matrix.
I don't understand what is going on here. Thanks for the help!
Happy to provide sample data and error traceback, but trying to keep it as crisp and simple as possible.
According to the comment by the OP, the following solved the problem.
I have added a call to library(dplyr).
My contribution was colMedians(data.matrix(DF2[-1]), na.rm = TRUE).
library(robustbase)
library(dplyr)
for (i in 1:length(DF1)) {
tempA <- DF1[i]
names(tempA) <- 'word'
DF2 <- inner_join(tempA, DF0, by='word')
tempB <- colMedians(data.matrix(DF2[-1]), na.rm = TRUE)
DF3 <- rbind(tempB, DF3)
}
Stumbled on this answer which helped me fix the loop as following:
DF3Mean <- data.frame() # instantiate dataframe
DF4Median <- data.frame( # instantiate dataframe
for (i in 1:length(DF1)) {
tempA <- DF1[i] # save column of DF1 onto temp variable
names(tempA) <- 'word' # label temp variable for inner_join function
DF2 <- inner_join(tempA, DF0, by='word') # match words with numeric value from look-up DF0
tempMean <- as.data.frame(t(colMeans(DF2[-1]))) # compute mean of column
DF3Mean <- rbind(tempMean, DF3Mean) # save results togther
tempMedian <- apply(DF2[ ,2:4], 2, median) #compute mean for columns 2,3, and 4
DF4Median <- rbind(tempMedian, DF4Median) # save results togther
}
I guess I was too stuck in my mind on the colMedian function.
I have the following sample data:
set.seed(8760)
ID <- c(rep(1:4, each = 6))
i <- paste(rep(LETTERS[1:6], times=4))
value <- sample(1:10000, 24)
input <- data.frame(k, i, value)
wf <- data.frame("ID"=unique(k), "WF"=sample(1:365, 4))
I just can't find an efficient way to extend my input dataframe with a column that comprises of wf-values corresponding to each rows ID. Would someone help on that one?
Thanks in advance,
BenR
You could use merge, assuming that k is ID
merge(input, wf, by='ID')
or use match
input$WF <- wf$WF[match(input$ID, wf$ID)]
data
input <- data.frame(ID, i, value)
set.seed(24)
wf <- data.frame(ID=unique(ID), WF=sample(1:365, 4))
I have two data frames. First one looks like
dat <- data.frame(matrix(nrow=2,ncol=3))
names(dat) <- c("Locus", "Pos", "NVAR")
dat[1,] <- c("ACTC1-001_1", "chr15:35087734..35087734", "1" )
dat[2,] <- c("ACTC1-001_2 ", "chr15:35086890..35086919", "2")
where chr15:35086890..35086919 indicates all the numbers within this range.
The second looks like:
dat2 <- data.frame(matrix(nrow=2,ncol=3))
names(dat2) <- c("VAR","REF.ALT"," FUNC")
dat2[1,] <- c("chr1:116242719", "T/A", "intergenic" )
dat2[2,] <- c("chr1:116242855", "A/G", "intergenic")
I want to merge these by the values in dat$Pos and dat2$VAR. If the single number in a cell in dat2$VAR is contained within the range of a cell in dat$Pos, I want to merge those rows. If this occurs more than once (dat2$VAR in more than one range in dat$Pos, I want it merged each time). What's the easiest way to do this?
Here is a solution, quite short but not particularly efficient so I would not recommend it for large data. However, you seemed to indicate your data was not that large so give it a try and let me know:
library(plyr)
exploded.dat <- adply(dat, 1, function(x){
parts <- strsplit(x$Pos, ":")[[1]]
chr <- parts[1]
range <- strsplit(parts[2], "..", fixed = TRUE)[[1]]
start <- range[1]
end <- range[2]
data.frame(VAR = paste(chr, seq(from = start, to = end), sep = ":"), x)
})
merge(dat2, exploded.dat, by = "VAR")
If it is too slow or uses too much memory for your needs, you'll have to implement something a bit more complex and this other question looks like a good starting point: Merge by Range in R - Applying Loops.
Please try this out and let us know how it works. Without a larger data set it is a bit hard to trouble shoot. If for whatever reason it does not work, please share a few more rows from your data tables (specifically ones that would match)
SPLICE THE DATA
range.strings <- do.call(rbind, strsplit(dat$Pos, ":"))[, 2]
range.strings <- do.call(rbind, strsplit(range.strings, "\\.\\."))
mins <- as.numeric(range.strings[,1])
maxs <- as.numeric(range.strings[,2])
d2.vars <- as.numeric(do.call(rbind, str_split(dat2$VAR, ":"))[,2])
names(d2.vars) <- seq(d2.vars)
FIND THE MATCHES
# row numebr is the row in dat
# col number is the row in dat2
matches <- sapply(d2.vars, function(v) mins < v & v <= maxs)
MERGE
# create a column in dat to merge-by
dat <- cbind(dat, VAR=NA)
# use the VAR in dat2 as the merge id
sapply(seq(ncol(matches)), function(i)
dat$VAR <- dat2[i, "VAR"] )
merge(dat, dat2)