I have a data frame with a sequence of numeric columns, surrounded on both sides by (irrelevant) columns of characters. I want to obtain a new data frame that keeps the position of the irrelevant columns, and adds the numeric columns to eachother by a certain grouping vector (or applies some other row-wise function to the data frame, by group). Example:
sample = data.frame(cha1 = c("A","B"),num1=1:2,num2=3:4,num3=11:12,num4=13:14,cha2=c("C","D"))
> sample
cha1 num1 num2 num3 num4 cha2
1 A 1 3 11 13 C
2 B 2 4 12 14 D
with the goal to obtain
> goal
cha1 X1 X2 cha2
1 A 4 24 C
2 B 6 26 D
i.e. I've summed the 4 numeric columns according to the grouping vector gl(2,2,4) = (1,1,2,2) [levels: 1,2]
For a purely numeric data frame I've found the following method:
sample_num = sample[,2:5] #select numeric columns
data.frame(t(apply(sample_num,1,function(row) tapply(row, INDEX=gl(2,2,4),sum))))
I could combine this with re-inserting the character columns to give the intended result, but I'm really looking for a more elegant way. I'm particularly interested in a plyr method if there is one, as I'm trying to migrate to plyr for all my data frame manipulations. I imagine the first step would be to cast the data frame into long format, but I have no idea how to proceed from there.
One 'absolute' requirement is that I cannot do without the gl(n,k,l) method of grouping, as I need this to be applicable to a wide range of data frames and grouping factors.
EDIT: for simplicity assume that I know which columns are the relevant numeric columns. I'm not concerned with how to select them, I'm concerned with how to do my grouped sum without messing up the original data frame structure.
Thanks!
Grpindex<-gl(2,2,4)
goal<-cbind.data.frame(sample["cha1"],(t(rowsum(t(sample[,2:5]), paste0("X",Grpindex)))),sample["cha2"])
Output:
cha1 X1 X2 cha2
1 A 4 24 C
2 B 6 26 D
Related
I have a dataframe that I want to take only the values of one row, for all columns (as a numeric vector). One way of doing that would be df_trasposed = t(df), and then I can just take the wanted column with df_trasposed$column
I feel there is a better way of doing it, without creating a new data frame and taking more memory. I tried something like t(df)$column but this won't work obviously.
How can this be done?
Try this
as.numeric(df['rowname',])
Data frames are special types of lists, consisting of vectors of equal lengths. So we can treat it as lists and extract the nth element of each vector, where n is the row number of your data frame. Example:
df
# X1 X2 X3
# 1 1 5 9
# 2 2 6 10
# 3 3 7 11
# 4 4 8 12
sapply(df, `[`, 3)
# X1 X2 X3
# 3 7 11
You can wrap an unname(.) around it to delete element names, but this probably creates another copy in memory and actually is just cosmetics.
Data:
df <- data.frame(matrix(1:12, 4, 3))
I have a dataframe composed by several paired columns. So, for example, the first column is a list of names and the second column contains numeric values quantifying the variables of the first column. In the third column I have again a list of names and the fourth column is numeric and quantifies variables of the third column and so on.
I now want to automatically subset the first two columns to make a separate dataframe and the third-fourth columns to make a second dataframe. The final aim is to align the rows by name.
For example, from dataframe a
names_a<-c("a","b","c","d")
values_a<-c(1,2,3,4)
names_b<-c("a","b","e","f")
values_b<-c(5,6,7,8)
a<-as.data.frame(cbind(names_a,values_a,names_b,values_b))
I would obtain a dataframe containing names_a and values_a and another dataframe containing names_b and values_b, then aligning them to have dataframe a1:
names_a1<-c("a","b","c","d","e","f")
values_a1<-c(1,2,3,4,0,0)
values_b1<-c(5,6,0,0,7,8)
a1<-as.data.frame(cbind(names_a1,values_a1,values_b1))
Any suggestion?
Thanks in advance for any help
I can help for the first Part of your request. Please see how to create the separated data frames.
names_a<-c("a","b","c","d")
values_a<-c(1,2,3,4)
names_b<-c("a","b","e","f")
values_b<-c(5,6,7,8)
a<-as.data.frame(cbind(names_a,values_a,names_b,values_b))
#When you subset a data frame you focus on observations (rows), not on the variables (columns). You can create 2 new data frames out of the existing one.
#df contain 3+4 Variable
a34 <- data.frame(cbind(as.vector(a$names_b),as.vector(a$values_b)))
colnames(a34) <-c("names_b","values_b")
#then "subset" a (in fact you create a new one and replace it)
a <- data.frame(cbind(as.vector(a$names_a),as.vector(a$values_a)))
colnames(a) <-c("names_a","values_a")
This result in:
> a
names_a values_a
1 a 1
2 b 2
3 c 3
4 d 4
> a34
names_b values_b
1 a 5
2 b 6
3 e 7
4 f 8
I am trying to merge a data.frame and a column from another data.frame, but have so far been unsuccessful.
My first data.frame [Frequencies] consists of 2 columns, containing 47 upper/ lower case alpha characters and their frequency in a bigger data set. For example purposes:
Character<-c("A","a","B","b")
Frequency<-(100,230,500,420)
The second data.frame [Sequences] is 93,000 rows in length and contains 2 columns, with the 47 same upper/ lower case alpha characters and a corresponding qualitative description. For example:
Character<-c("a","a","b","A")
Descriptor<-c("Fast","Fast","Slow","Stop")
I wish to add the descriptor column to the [Frequencies] data.frame, but not the 93,000 rows! Rather, what each "Character" represents. For example:
Character<-c("a")
Frequency<-c("230")
Descriptor<-c("Fast")
Following can also be done:
> merge(adf, bdf[!duplicated(bdf$Character),])
Character Frequency Descriptor
1 a 230 Fast
2 A 100 Fast
3 b 420 Stop
4 B 500 Slow
Why not:
df1$Descriptor <- df2$Descriptor[ match(df1$Character, df2$Character) ]
I have a data object signal in R with 40,000+ rows (named variables) of numeric values and 200+ columns (samples). For every row of each column, I want to subtract the value for the row named background for that column.
The code below can be used to create an example signal object in R. With the example, for column A, the background value of 4 is to be subtracted from the values of channelNo1 to 3. Similarly, for column B, the value of 6 is to be subtracted. And so on. What is the simplest way to achieve this in R?
text <- textConnection('
A B C
channelNo1 12 22 32
channelNo2 13 21 33
channelNo3 12 21 30
background 4 6 8
')
signal <- read.table(text, header = TRUE)
close(text)
typeof(signal)
# returns 'list'
class(signal)
# returns 'data.frame'
Elements in an R matrix are oriented by column (check out matrix(1:12, nrow=3) and signal - signal[4,] is not doing what you think -- check out column B, where the second and third values should be the same (and equal to 15). You could write
as.data.frame(Map("-", signal, as.vector(signal[4,])))
(I think this would be relatively efficient) but since the data really seem to be a matrix (i.e., a rectangle of homogeneous type) it makes a lot more sense to manipulate it as a matrix
m = as.matrix(signal)
sweep(m, 2, m[4,], "-")
I have done lot of googling but I didn't find satisfactory solution to my problem.
Say we have data file as:
Tag v1 v2 v3
A 1 2 3
B 1 2 2
C 5 6 1
A 9 2 7
C 1 0 1
The first line is header. The first column is Group id (the data have 3 groups A, B, C) while other column are values.
I want to read this file in R so that I can apply different functions on the data.
For example I tried to read the file and tried to get column mean
dt<-read.table(file_name,head=T) #gives warnings
apply(dt,2,mean) #gives NA NA NA
I want to read this file and want to get column mean. Then I want to separate the data in 3 groups (according to Tag A,B,C) and want to calculate mean(column wise) for each group. Any help
apply(dt,2,mean) doesn't work because apply coerces the first argument to an array via as.matrix (as is stated in the first paragraph of the Details section of ?apply). Since the first column is character, all elements in the coerced matrix object will be character.
Try this instead:
sapply(dt,mean) # works because data.frames are lists
To calculate column means by groups:
# using base functions
grpMeans1 <- t(sapply(split(dt[,c("v1","v2","v3")], dt[,"Tag"]), colMeans))
# using plyr
library(plyr)
grpMeans2 <- ddply(dt, "Tag", function(x) colMeans(x[,c("v1","v2","v3")]))