I have a R dataframe like this one:
a<-c(1,2,3,4,5)
b<-c(6,7,8,9,10)
df<-data.frame(a,b)
colnames(df)<-c("a","b")
df
a b
1 1 6
2 2 7
3 3 8
4 4 9
5 5 10
I would like to get the 1st, 2nd, 3rd AND 5th row of the column a, so 1 2 3 5, by selecting rows by their number.
I have tried df$a[1:3,5] but I get Error in df$a[1:3, 5] : incorrect number of dimensions.
What DOES work is c(df$a[1:3],df$a[5]) but I was wondering if there was an easier way to achieve this with R?
Your data frame has two dimensions (rows and columns). When you use the square brackets to extract values, R expects everything prior to the comma to indicate the rows desired, and everything after the comma to indicate the columns desired (see: ?[). Hence, df[1:3,5] means rows 1 through 3, from column 5. To turn your desired rows into a single vector, you need to concatenate (i.e., c(1:3,5)). That would all go before the comma, the column indicator, 1 or "a", would go after the comma. Thus, df[c(1:3,5), 1] is what you need.
For alternative answer (that might be more appropriate to a dataframe with many more columns), df[c(1:3, 5), "a"] as suggested by #Mamoun Benghezal would also get it done!
Related
I'm trying to extract a set of genes (row names) from my large data set based on another data matrix that contains a list of my genes of interest. I've read about that I should use the filter and %in% command, but am unsure as to how to write it properly.
example:
my large database:
Gene Week1 Week 2. Week 3
A. 20. 14. 5
B. 5. 10. 15
C. 2. 4. 6
D. 20. 18. 19
my small data base:
Gene
A
C
D
And I want my result to be:
Gene Week1 Week 2. Week 3
A. 20. 14. 5
C. 2. 4. 6
D. 20. 18. 19
Could anybody please help out? I'd really appreciate it and my apologies for the rather simple question :)
Using logical row indexes:
large_database[large_database$Gene %in% unique(small_data_base$Gene), ]
Explanation:
large_database$Gene %in% unique(small_data_base$Gene)
Checks for each entry (i.e. row) in large_database$Gene if it appears in unique(small_database$Gene) i.e. the list of unique values in the column Gene of small_data_base and returns a boolean vector (a vector of TRUE and FALSE).
We then can use this vector as a row 'index' to selecet only rows where the vector is TRUE (i.e. the value of large_database$Gene was in unique(small_database$Gene)
I have a matrix called a=
lab col1 col2 col3
one 1 4 7
two 2 5 8
three 3 6 9
and i want to select only the lines that have the "lab"="one" and "two".
In fact my matrix is way bigger and i want to select a lot of different value from the column "lab".
I tried to do a vector
selected.lines=c("one","two")
a=a[a$lab==selected.lines,]
but it doesn't work, i guess because R tries to select the lines from the column "lab" that have a value equals to "one" AND "two"?
any help would be appreciated.
We need to use %in% when the number of elements to compare are greater than 1 as == does a recycling with the elements in 'selected.lines' i.e. the first elements in 'lab' are compared with the elements in 'selected.lines', then the third element in 'lab' is compared with the first element in 'selected.lines' and so on it till the end of the 'lab' column . Also, with matrix, we use [ for subsetting instead of $
a[a[,"lab"] %in% selected.lines,]
here is my question.
I have a dataframe with 30 rows (corresponding to 30 questions in a questionnaire) with values from 1 to 5 as answers.
I would like to sum all values equal to 1 that appears in the 30 rows.
I tried with the command aggregate, but it doesn't work.
The question could use more clarity, code would help, but I will give you a theoretical of what I believe you are asking for
If you have a data frame df such that:
questions ob1 ob2 ob 3
q1 5 3 1
q2 2 1 1
q3 4 1 5
and you want to add up all the values where something is equal to answer of 1 you have a number of options, but the most obvious is simply subset with a logical
or you could
sumob1<- sum(df$ob1[ , which(df$ob1==1)])
Watch for the leading comma in the [] it tells R to include all rows (on the left side of the comma) and just the values equal to the subset column on the right.
Which basically says I would like to make sumob1 equal to the sum of the column ob1 for all row cells in which column df$ob1 has a value of 1.
You can do that for each column.
I am trying to merge a data.frame and a column from another data.frame, but have so far been unsuccessful.
My first data.frame [Frequencies] consists of 2 columns, containing 47 upper/ lower case alpha characters and their frequency in a bigger data set. For example purposes:
Character<-c("A","a","B","b")
Frequency<-(100,230,500,420)
The second data.frame [Sequences] is 93,000 rows in length and contains 2 columns, with the 47 same upper/ lower case alpha characters and a corresponding qualitative description. For example:
Character<-c("a","a","b","A")
Descriptor<-c("Fast","Fast","Slow","Stop")
I wish to add the descriptor column to the [Frequencies] data.frame, but not the 93,000 rows! Rather, what each "Character" represents. For example:
Character<-c("a")
Frequency<-c("230")
Descriptor<-c("Fast")
Following can also be done:
> merge(adf, bdf[!duplicated(bdf$Character),])
Character Frequency Descriptor
1 a 230 Fast
2 A 100 Fast
3 b 420 Stop
4 B 500 Slow
Why not:
df1$Descriptor <- df2$Descriptor[ match(df1$Character, df2$Character) ]
I was wondering about the following thing:
I have a 16x2 matrix with in the first column numerical values and in the second column also numerical values but actually they're position numbers so they need to be treated as a factor.
I want to order the values from the first column from low to high but I need the numbers of the second column to stay with their original partner value from the first column.
So let's say you've got:
4 1
6 2
2 3
And now I want to sort the first column from low to high.
Then I want to get
2 3
4 1
6 2
Does anybody know how I can do this?
R doesn't seem to provide a variable type for paired data...
You can do:
dat[order(dat[, 1]), ]