I have created a numeric vector using tapply(characters,numbers,sum) which looks like this (just a sample below):
a c d or f e ar fu bar
1 5 9 1 1 1 1 1 1
Now i need to retrieve the character labels on another vector. Any ideas?
The original character vector contains multiple instances of the characters, so I'm not sure how much use it will be.
Desired output a vector with the characters listed:
a c d or f e ar fu bar
I thought that objects such as these could be accessed using some simple command since they are embedded so to speak into the numeric vector, but alas haven't been able to find this function. as.character() just gives me the numbers in character format.
I think you want 'names':
names(tapply(characters,numbers,sum))
Related
I think the best way to explain my question is by an example:
we have a vector:
vector1 (1,2,3,3,5,6,3,7,7)
and a dataframe:
ID VAL
1 a
2 b
3 c
4 d
5 e
6 f
7 g
8 h
I want to create a vector that will look like this:
vector2 (a,b,c,c,e,f,c,g,g)
Sounds very simple and probably is very simple with some trick that I don't know about.
I tried with "%in%" but it produced a vector of values from rows(of the dataframe) present in the vector as opposed to my goal which is a vector of values from the dataframe corresponding to the values in the vector.
Thank you.
Thank you David, following your suggestion I was able solve my problem.
Though I needed to make some preparation (it was because I oversimplified the example)
Actually, (if we will continue with the naming convention from my example) The "ID" column had some strings so the dataframe looked like so:
ID VAL
one a
two b
three c
four d
five e
six f
seven g
eight h
And vector1 looked like this: (one,two,three,three,five,six,three,seven,seven)
Then, I figured I should rename the rownames of the dataframe to the names in "ID" and then perform the command you have suggested.
My preparation looked like this:
rownames(dataframe) <- dataframe$ID
vector2 <- dataframe[vector1, "VAL"]
How do I draw the sum value of each class represented like the table:
a a,b a,b,c c
5 2 1 2
E.g for the above example expected result is:
a b c
8 3 3
I'm asking this since I couldn't even find a close solution anywhere in the stackoverflow, the closest solution represented was the dcast function, but that only checks for equality, not presence.
One way using base R,
sapply(unique(unlist(strsplit(names(df), '\\.'))), function(i)
sum(df[grepl(i, names(df))]))
#a b c
#8 3 3
Note: I used \\. for strsplit instead of , since names were read like that.
I have a vector v <- c(6,8,5,5,8) of which I can obtain the unique values using
> u <- unique(v)
> u
[1] 6 8 5
Now I need an index i = [2,3,1,1,3] that returns the original vector v when indexed into u.
> u[i]
[1] 6,8,5,5,8
I know such an index can be generated automatically in Matlab, the ci index, but does not seem to be part of the standard repertoire in R. Is anyone aware of a function that can do this?
The background is that I have several vectors with anonymized IDs that are long character strings:
ids
"PTefkd43fmkl28en==3rnl4"
"cmdREW3rFDS32fDSdd;32FF"
"PTefkd43fmkl28en==3rnl4"
"PTefkd43fmkl28en==3rnl4"
"cmdREW3rFDS32fDSdd;32FF"
To reduce the file size and simplify the code, I want to transform them into integers of the sort
ids
1
2
1
1
2
and found that the index of the unique vector does just this. Since there are many rows, I am hesitant to write a function that loops over each element of the unique vector and wonder whether there is a more efficient way — or a completely different way to transform the character strings into matching integers.
Try with match
df1$ids <- with(df1, match(ids, unique(ids)) )
df1$ids
#[1] 1 2 1 1 2
Or we can convert to factor and coerce to numeric
with(df1,as.integer(factor(ids, levels=unique(ids))))
#[1] 1 2 1 1 2
Using u and v. Based on the output of 'u' in the OP's post, it must have been sorted
u <- sort(unique(v))
match(v, u)
#[1] 2 3 1 1 3
Or using findInterval. Make sure that 'u' is sorted.
findInterval(v,u)
#[1] 2 3 1 1 3
I am using R to analyze a survey. Several of the columns include numbers 1-10, depending on how survey respondents answered the respective questions. I'd like to change the 1-10 scale to a 1-3 scale. Is there a simple way to do this? I was writing a complicated set of for loops and if statements, but I feel like there must be a better way in R.
I'd like to change numbers 1-3 to 1; numbers 4 and 8 to 2; numbers 5-7 to 3, and numbers 9 and 10 to NA.
So in the snippet below, OriginalColumn would become NewColumn.
OriginalColumn=c(4,9,1,10,8,3,2,7,5,6)
NewColumn=c(2,NA,1,NA,2,1,1,3,3,3)
Is there an easy way to do this without a bunch of crazy for loops? Thanks!
You can do this using positional indexing:
> c(1,1,1,2,3,3,3,2,NA,NA)[OriginalColumn]
[1] 2 NA 1 NA 2 1 1 3 3 3
It is better than repeated/nested ifelse because it is vectorized (thus easier to read, write, and understand; and probably faster). In essence, you're creating a new vector that contains that new values for every value you want to replace. So, for values 1:3 you want 1, thus the first three elements of the vector are 1, and so forth. You then use your original vector to extract the new values based on the positions of the original values.
You could also try
library(car)
recode(OriginalColumn, '1:3=1; c(4,8)=2; 5:7=3; else=NA')
#[1] 2 NA 1 NA 2 1 1 3 3 3
I have done lot of googling but I didn't find satisfactory solution to my problem.
Say we have data file as:
Tag v1 v2 v3
A 1 2 3
B 1 2 2
C 5 6 1
A 9 2 7
C 1 0 1
The first line is header. The first column is Group id (the data have 3 groups A, B, C) while other column are values.
I want to read this file in R so that I can apply different functions on the data.
For example I tried to read the file and tried to get column mean
dt<-read.table(file_name,head=T) #gives warnings
apply(dt,2,mean) #gives NA NA NA
I want to read this file and want to get column mean. Then I want to separate the data in 3 groups (according to Tag A,B,C) and want to calculate mean(column wise) for each group. Any help
apply(dt,2,mean) doesn't work because apply coerces the first argument to an array via as.matrix (as is stated in the first paragraph of the Details section of ?apply). Since the first column is character, all elements in the coerced matrix object will be character.
Try this instead:
sapply(dt,mean) # works because data.frames are lists
To calculate column means by groups:
# using base functions
grpMeans1 <- t(sapply(split(dt[,c("v1","v2","v3")], dt[,"Tag"]), colMeans))
# using plyr
library(plyr)
grpMeans2 <- ddply(dt, "Tag", function(x) colMeans(x[,c("v1","v2","v3")]))