How do I draw the sum value of each class represented like the table:
a a,b a,b,c c
5 2 1 2
E.g for the above example expected result is:
a b c
8 3 3
I'm asking this since I couldn't even find a close solution anywhere in the stackoverflow, the closest solution represented was the dcast function, but that only checks for equality, not presence.
One way using base R,
sapply(unique(unlist(strsplit(names(df), '\\.'))), function(i)
sum(df[grepl(i, names(df))]))
#a b c
#8 3 3
Note: I used \\. for strsplit instead of , since names were read like that.
Related
I need to update all the values of a column, using as reference another df.
The two dataframes have equal structures:
cod name dom_by
1 A 3
2 B 4
3 C 1
4 D 2
I tried to use the following line, but apparently it did not work:
df2$name[df2$dom_by==df1$cod] <- df1$name[df2$dom_by==df1$cod]
It keeps saying that replacement has 92 rows, data has 2.
(df1 has 92 rows and df2 has 2).
Although it seems like a simple problem, I still can not solve it, even after some searches.
I have created a numeric vector using tapply(characters,numbers,sum) which looks like this (just a sample below):
a c d or f e ar fu bar
1 5 9 1 1 1 1 1 1
Now i need to retrieve the character labels on another vector. Any ideas?
The original character vector contains multiple instances of the characters, so I'm not sure how much use it will be.
Desired output a vector with the characters listed:
a c d or f e ar fu bar
I thought that objects such as these could be accessed using some simple command since they are embedded so to speak into the numeric vector, but alas haven't been able to find this function. as.character() just gives me the numbers in character format.
I think you want 'names':
names(tapply(characters,numbers,sum))
I have this kind of data frame :
df<- data.frame(cluster=c('1','1','2','3','3','3'), class=c('A','B','C','B','B','C'))
I would like to get for each cluster (1,2,3), the class which appears the most often. In case of a tie, it would also be great to get an info, as for example the combination of the classes (or if not possible just have NA).
So for my example, I would like to have something like this as result:
cluster class.max
1 'A B' (or NA)
2 'C'
3 'B'
Maybe I should use aggregate() but don't know how.
rank has ways of dealing with ties:
aggregate(class~cluster,df,function(x) paste(names(table(x)[rank(-1*table(x),ties.method="min")==1]),collapse=" "))
cluster class
1 1 A B
2 2 C
3 3 B
I've got a lovely dataframe, my very first, and I'm starting to get the hang of R. One thing I haven't been able to find is a test for duplicate values. I have one column that I'm pretty sure is all unique values, but I don't know that.
Is there a way I can ask? For simplicity, let's pretend this is my data:
var1 var2 var3
1 1 A 1
2 2 B 3
3 3 C NA
4 4 D NA
5 5 E 4
and I want to know whether var1 ever repeats.
Check out the duplicated function:
duplicated(dat$var1) # the rows of dat var1 duplicated
Documentation is here.
You should also look at the unique function.
Remove duplicates based on columns:
my_data[!duplicated(my_data$Col_id), ] # Where ! is a logical negation:
I have done lot of googling but I didn't find satisfactory solution to my problem.
Say we have data file as:
Tag v1 v2 v3
A 1 2 3
B 1 2 2
C 5 6 1
A 9 2 7
C 1 0 1
The first line is header. The first column is Group id (the data have 3 groups A, B, C) while other column are values.
I want to read this file in R so that I can apply different functions on the data.
For example I tried to read the file and tried to get column mean
dt<-read.table(file_name,head=T) #gives warnings
apply(dt,2,mean) #gives NA NA NA
I want to read this file and want to get column mean. Then I want to separate the data in 3 groups (according to Tag A,B,C) and want to calculate mean(column wise) for each group. Any help
apply(dt,2,mean) doesn't work because apply coerces the first argument to an array via as.matrix (as is stated in the first paragraph of the Details section of ?apply). Since the first column is character, all elements in the coerced matrix object will be character.
Try this instead:
sapply(dt,mean) # works because data.frames are lists
To calculate column means by groups:
# using base functions
grpMeans1 <- t(sapply(split(dt[,c("v1","v2","v3")], dt[,"Tag"]), colMeans))
# using plyr
library(plyr)
grpMeans2 <- ddply(dt, "Tag", function(x) colMeans(x[,c("v1","v2","v3")]))