This question already has answers here:
Order data frame by two columns in R
(2 answers)
Closed 8 years ago.
I would like to order all lines based on two column values in R. This is my input:
chr start no
4 85 non1
4 23 non2
6 10 non2
8 25 non2
22 56 non4
2 15 non1
This is my expected output:
chr start no
2 15 non1
4 23 non2
4 85 non1
6 10 non2
8 25 non2
22 56 non4
Thank You. Cheers.
The order function accepts a variable number of input vectors, ordering by the first , then second and so on ...
BED=read.table(text=
"chr start no
4 85 non1
4 23 non2
6 10 non2
8 25 non2
22 56 non4
2 15 non1", header=T)
BED[order(BED$chr, BED$start),]
chr start no
6 2 15 non1
2 4 23 non2
1 4 85 non1
3 6 10 non2
4 8 25 non2
5 22 56 non4
While you can certainly use order from the base package, for working with data frames I'd highly recommend using the plyr package.
chr <- c(4,4,6,8,22,2)
start <- c(85, 23, 10, 25, 56, 15)
no <- c("non1", "non2", "non2", "non2", "non4", "non1")
myframe <- data.frame(chr, start, no)
creates your data frame. In terms of dealing with the character column:
myframe$chr <- as.numeric(myframe$chr)
and then getting the arranged version is very easy:
library(plyr)
arrangedFrame <- arrange(myframe, chr, start)
print(arrangedFrame)
chr start no
1 2 15 non1
2 4 23 non2
3 4 85 non1
4 6 10 non2
5 8 25 non2
6 22 56 non4
there are also a lot of easily modified options using arrange that make different reorderings easier than using order. And while I haven't used it a lot yet, I know Hadley released dplyr not too long ago which offers even more functionality and which I'd encourage you to check out.
Related
This question already has answers here:
How to deal with nonstandard column names (white space, punctuation, starts with numbers)
(3 answers)
Remove rows in R matrix where all data is NA [duplicate]
(2 answers)
Closed 1 year ago.
The data is like
example<-matrix(NA,40,7)
colnames(example)=c("1month","2month","3month","4month","5month","6month","7month")
example[,1]<-rep(c(1,3,6,2,4,98,5,3,NA),len=40)
example[,2]<-rep(c(2,7,NA,8,2,NA,3,NA),len=40)
example[,3]<-rep(c(5,3,2,NA),len=40)
example[,4]<-rep(c(NA,91,98,52,35,NA),len=40)
example[,5]<-rep(c(3,NA),len=40)
example[,6]<-rep(c(98,NA,NA,123),len=40)
example[,7]<-rep(c(3,51,NA,NA,4,NA,5,NA),len=40)
example<-as.data.frame(example)
I want to remove 'NA' for each column.
I can do it using drop_na function
but !is.na() doesn't work.
example %>% select('1month') %>% drop_na('1month')<- this work
example %>% select('1month') %>% filter(!is.na('1month')) <- this doesn't work. the result for this is under.
I wonder why this doesn't work and there is any way that I can use != or !is.na() function.
Thank you for your help. Sincerely.
1month
1 1
2 3
3 6
4 2
5 4
6 98
7 5
8 3
9 NA
10 1
11 3
12 6
13 2
14 4
15 98
16 5
17 3
18 NA
19 1
20 3
21 6
22 2
23 4
24 98
25 5
26 3
27 NA
28 1
29 3
30 6
31 2
32 4
33 98
34 5
35 3
36 NA
37 1
38 3
39 6
40 2
This question already has an answer here:
The rules of subsetting
(1 answer)
Closed 8 years ago.
Good day
I have a data set I got from a txt file
> MyData
Xdat Ydat
1 1 12
2 2 23
3 3 34
4 4 45
5 5 56
6 6 67
7 7 78
I need to use this set to extract rows that correspond to the case where the 2nd column(Ydat) is greater than 40.
Resulting in
MyData2
Xdat Ydat
4 4 45
5 5 56
6 6 67
7 7 78
Simple subsetting will do it -
MyData[which(MyData[,2]>40),]
as #DavidArenburg points out, this works fine:
MyData[(MyData[,2]>40),]
I have the following table in R:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
162 148 108 93 67 83 44 53 37 47 25 34 17 22 11 11 5
I want to divide in into 7 parts had title of 1 2 3 4 5 6 7&greater, where it needs to combine all the number after 7 and merge it into the last one.
I have looked at aggregate & tapply but doesn't seem like the right function I need.
x <- c(x[1:6], "7 and above"=sum(x[-(1:6)]))
1 2 3 4 5 6 7 and above
162 148 108 93 67 83 306
data
x <- table(rep(c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17), c(162,148,108,93,67,83,44,53,37,47,25,34,17,22,11,11,5)))
If you are using table to generate the output above you can use pmin to keep minimum between the values in your data and 7 and then use table to count the frequency.
Assuming your dataframe is called df and column name is col_name you can do.
tab <- table(pmin(df$col_name, 7))
The values under 7 would include all the 7 & above values together. You can rename it to make it more clear.
names(tab)[7] <- '7&above'
This question already has answers here:
Extracting specific columns from a data frame
(10 answers)
Closed 6 years ago.
I have a data frame in R that consists of around 400 variables (as columns), though I only need 25 of them. While I know how to delete specific columns, because of the impracticality of deleting 375 variables - is there any method in which I could delete all of them, except the specified 25 by using the variable's string name?
Thanks.
Sample example:
df <- data.frame(a=1:5,b=6:10,c=11:15,d=16:20,e=21:25,f=26:30) # Six columns
df
a b c d e f
1 1 6 11 16 21 26
2 2 7 12 17 22 27
3 3 8 13 18 23 28
4 4 9 14 19 24 29
5 5 10 15 20 25 30
reqd <- as.vector(c("a","c","d","e")) # Storing the columns I want to extract as a vector
reqd
[1] "a" "c" "d" "e"
Result <- df[,reqd] # Extracting only four columns
Result
a c d e
1 1 11 16 21
2 2 12 17 22
3 3 13 18 23
4 4 14 19 24
5 5 15 20 25
This question already has an answer here:
The rules of subsetting
(1 answer)
Closed 8 years ago.
Good day
I have a data set I got from a txt file
> MyData
Xdat Ydat
1 1 12
2 2 23
3 3 34
4 4 45
5 5 56
6 6 67
7 7 78
I need to use this set to extract rows that correspond to the case where the 2nd column(Ydat) is greater than 40.
Resulting in
MyData2
Xdat Ydat
4 4 45
5 5 56
6 6 67
7 7 78
Simple subsetting will do it -
MyData[which(MyData[,2]>40),]
as #DavidArenburg points out, this works fine:
MyData[(MyData[,2]>40),]