How to rank data from multiple rows and columns? - r

Example data:
>data.frame("A" = c(20,40,53), "B" = c(40,11,60))
What's the easiest way in R to get from this
A B
1 20 40
2 40 11
3 53 60
to this?
A B
1 2.0 3.5
2 3.5 1.0
3 5.0 6.0
I couldn't find a way to make rank() or frank() work on multiple rows/columns and googling things like "r rank dataframe" "r rank multiple rows" yielded only questions on how to rank multiple rows/columns individually, which is weird, as I suspect the question must have been answered before.

Try rank like below
df[] <- rank(df)
or
df <- list2DF(relist(rank(df),skeleton = unclass(df)))
and you will get
> df
A B
1 2.0 3.5
2 3.5 1.0
3 5.0 6.0

Related

Rank with tied numbers

I have to rank high values in a column of a data frame like this:
example <- data.frame(
country=c("Arg","Uru","Arg","Uru","Arg","Uru","Arg","Uru"),
value=c(1,1,2,3,4,5,6,10))
If I would rank the values with
example$rank<- rank(-example$value)
I would have something like this:
print(example$rank)
#[1] 7.5 7.5 6.0 5.0 4.0 3.0 2.0 1.0
When actually I am looking something like this:
#[1] 7 8 6 5 4 3 2 1
If they are tied, I don't mind which one has a higher rank.

R: Creating an index vector

I need some help with R coding here.
The data set Glass consists of 214 rows of data in which each row corresponds to a glass sample. Each row consists of 10 columns. When viewed as a classification problem, column 10
(Type) specifies the class of each observation/instance. The remaining columns are attributes that might beused to infer column 10. Here is an example of the first row
RI Na Mg Al Si K Ca Ba Fe Type
1 1.52101 13.64 4.49 1.10 71.78 0.06 8.75 0.0 0.0 1
First, I casted column 10 so that it is interpreted by R as a factor instead of an integer value.
Now I need to create a vector with indices for all observations (must have values 1-214). This needs to be done to creating training data for Naive Bayes. I know how to create a vector with 214 values, but not one that has specific indices for observations from a data frame.
If it helps this is being done to set up training data for Naive Bayes, thanks
I'm not totally sure that I get what you're trying to do... So please forgive me if my solution isn't helpful. If your df's name is 'df', just use the dplyr package for reordering your columns and write
library(dplyr)
df['index'] <- 1:214
df <- df %>% select(index,everything())
Here's an example. So that I can post full dataframes, my dataframes will only have 10 rows...
Let's say my dataframe is:
df <- data.frame(col1 = c(2.3,6.3,9.2,1.7,5.0,8.5,7.9,3.5,2.2,11.5),
col2 = c(1.5,2.8,1.7,3.5,6.0,9.0,12.0,18.0,20.0,25.0))
So it looks like
col1 col2
1 2.3 1.5
2 6.3 2.8
3 9.2 1.7
4 1.7 3.5
5 5.0 6.0
6 8.5 9.0
7 7.9 12.0
8 3.5 18.0
9 2.2 20.0
10 11.5 25.0
If I want to add another column that just is 1,2,3,4,5,6,7,8,9,10... and I'll call it 'index' ...I could do this:
library(dplyr)
df['index'] <- 1:10
df <- df %>% select(index, everything())
That will give me
index col1 col2
1 1 2.3 1.5
2 2 6.3 2.8
3 3 9.2 1.7
4 4 1.7 3.5
5 5 5.0 6.0
6 6 8.5 9.0
7 7 7.9 12.0
8 8 3.5 18.0
9 9 2.2 20.0
10 10 11.5 25.0
Hope this will help
df$ind <- seq.int(nrow(df))

R: subset data.frame by another vector

I have a dataframe with 241 rows. It is called master and it looks like this:
Patient Sample PDMax FileName
1 1.1 6 GSM1
1 1.2 6 GSM2
2 2.1 8 GSM3
3 3.1 5 GSM4
3 3.2 7 GSM5
Now I have a vector called Biopsy with the important samples. I would like to subset the master dataframe, so that only the important informations are left.
This is the vector biopsy:
1.2 2.1 3.2
The result should be like this:
Patient Sample PDMax FileName
1 1.2 6 GSM2
2 2.1 8 GSM3
3 3.2 7 GSM5
How can I do that? I tried different things like merge() or subset(), but everything failed.
Thanks!
Have a look at the data wrangling verbs inside dplyr. Hadley Wickham's book is a great place to start (http://r4ds.had.co.nz/transform.html#filter-rows-with-filter)
library (dplyr)
master %>% filter(Sample %in% Biopsy)

how to sort a column in a table in r

I tried to merge two tables, but the result is like this,
subj gamble_gamble n_gambles expected_value
1 19 32 1.7
10 3 4 1.5
100 3 4 1.5
101 6 32 1.4
102 3 4 1.5
103 19 32 1.7
The subj column isn't ordered in usual way (e.g. 1,2,3,4,5,6). I tried to order the subj column with this command:
newdata <- table3[order(subj),]
but it doesnt work. Can somebody help me?
Use this:
newdata <- table3[order(as.numeric(as.character(table3$subj))),]
This works even if subj is a factor (not just character).

drawing a stratified sample in R

Designing my stratified sample
library(survey)
design <- svydesign(id=~1,strata=~Category, data=billa, fpc=~fpc)
So far so good, but how can I draw now a sample in the same way I was able for simple sampling?
set.seed(67359)
samplerows <- sort(sample(x=1:N, size=n.pre$n))
If you have a stratified design, then I believe you can sample randomly within each stratum. Here is a short algorithm to do proportional sampling in each stratum, using ddply:
library(plyr)
set.seed(1)
dat <- data.frame(
id = 1:100,
Category = sample(LETTERS[1:3], 100, replace=TRUE, prob=c(0.2, 0.3, 0.5))
)
sampleOne <- function(id, fraction=0.1){
sort(sample(id, round(length(id)*fraction)))
}
ddply(dat, .(Category), summarize, sampleID=sampleOne(id, fraction=0.2))
Category sampleID
1 A 21
2 A 29
3 A 72
4 B 13
5 B 20
6 B 42
7 B 58
8 B 82
9 B 100
10 C 1
11 C 11
12 C 14
13 C 33
14 C 38
15 C 40
16 C 63
17 C 64
18 C 71
19 C 92
Take a look at the sampling package on CRAN (pdf here), and the strata function in particular.
This is a good package to know if you're doing surveys; there are several vignettes available from its page on CRAN.
The task view on "Official Statistics" includes several topics that are closely related to these issues of survey design and sampling - browsing through it and the packages recommended may also introduce other tools that you can use in your work.
You can draw a stratified sample using dplyr. First we group by the column or columns in which we are interested in. In our example, 3 records of each Species.
library(dplyr)
set.seed(1)
iris %>%
group_by (Species) %>%
sample_n(., 3)
Output:
Source: local data frame [9 x 5]
Groups: Species
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 4.3 3.0 1.1 0.1 setosa
2 5.7 3.8 1.7 0.3 setosa
3 5.2 3.5 1.5 0.2 setosa
4 5.7 3.0 4.2 1.2 versicolor
5 5.2 2.7 3.9 1.4 versicolor
6 5.0 2.3 3.3 1.0 versicolor
7 6.5 3.0 5.2 2.0 virginica
8 6.4 2.8 5.6 2.2 virginica
9 7.4 2.8 6.1 1.9 virginica
here's a quick way to sample three records per distinct 'carb' value from the mtcars data frame without replacement
# choose how many records to sample per unique 'carb' value
records.per.carb.value <- 3
# draw the sample
your.sample <-
mtcars[
unlist(
tapply(
1:nrow( mtcars ) ,
mtcars$carb ,
sample ,
records.per.carb.value
)
) , ]
# print the results to the screen
your.sample
note that the survey package is mostly used for analyzing complex sample survey data, not creating it. #Iterator is right that you should check out the sampling package for more advanced ways to create complex sample survey data. :)

Resources