Histogram from character string in R (from chr to num) - r

I'm a newbie in R and have been stuck with this problem for a long time now. Given how simple it seems, I'm puzzled to be stuck with this for so long. So here we go:
Basically, I have a vector, let's call it "test", which contains a series of numbers.
[1] "9 29 7 22 5 5 5 8 14 5 5 8 7 9 15 15 7 5 5 6 6 5 9 5 6 7 6 7 11 5 6 10 5 5 7 8 23 11 15 24 5 5 11 5 7 19 6 6 30 6 7 7 24 9 8 15 5 5 29 10 17 6 6 11 26 9 19 32 7 8 14 5 8 8 18 6 5 9 6 11 5 7 6 8 5 6 54 6 7 8 22 7 5 8 6 31 6 5 8 26 12 9 7 5 11 6 27 9 6 15 17 5 8 5 6 5 5 5 9 6 5 7 7 9 10 11 33 19 13 6 18 6 9 7 5 6 8 5 5 5 6 5 6 5 18 6 6 7 8 9 5 8 5 8 16 5 8 6 8 7 12 8 13 11 5 17 15 5 12 7 7 11 6 6 5 10 9 5 5 14 7 12 6 5 5 7 5 30 7 5 8 5 9 10 21 6 14 9 7 14 26 23 7 24 7 13 7 5 5 9 12 11 6 5 5 6 5 6 7 76 5 10 6 16 5 12 11 15 6 28 7 14 8 5 6 5 8 5 12 6 5 10 5 14 7 8 6 5 5 8 19 15 10 7 5 14 5 15 7 8 6 6 5 35 5 6 5 11 5 13 5 7 12 11 5 6 10 5 15 6 12 9 11 5 7 9 8 17 8 8 11 6 7 5 15 10 8 8 9 26,6 25 6 13 11 6 15 5 7 7 38 9 5 10 10 11 6 8 6 13 10 7 5 18 9 12 6 16 13 8 8 6 5 5 8 8 8 5 6 5 5 5 5 7 13 6 12 6 6 10 8 8 18 6 5 12 5 8 17 5 18 5 5 17 8 7 6 7 16 10 7 6 10 6 6 10 17 5 10 7 10 6 11 9 5 25 12 13 6 11 5"
R interprets this as a character string:
str(test)
chr "9 29 7 22 5 5 5 8 14 5 5 8 7 9 15 15 7 5 5 6 6 5 9 5 6 7 6 7 11 5 6 10 5 5 7 8 23 11 15 24 5 5 11 5 7 19 6 6 30..."
What I wish to do is no more complex than this: I would like to create a histogram, plotting the frequency of each number in the character string above (in fact, this is the degree distribution for a network).
The problem is that I'm dealing with a character string.
> hist(test)
Error in hist.default(test) : 'x' must be numeric
However, if I try to convert "test" into numeric, it also fails.
> as.numeric(test)
[1] NA
Warning message:
NAs introduced by coercion
I'm sure the solution is something very simple here, but I've tried to search for a solution for a long time without success.
Thank you in advance for your help!

The str(test) shows that is a single string, so we can extract the elements with scan and then use hist
hist(scan(text = test, what = numeric(), quiet = TRUE))
Upon looking at the OP's data, there are spaces and ,. So, we change it to a single delimiter and then use scan
hist(scan(text = gsub(",", " ", test), what = numeric(), quiet = TRUE))

I suggest using stringr package to split character string into a list, then unlist and store as numeric vector:
a <- "9 29 7 22 5 5 5 8 14 5 5 8 7 9 15 15 7 5 5 6 6 5 9 5 6 7 6 7 11 5 6 10 5 5 7 8 23 11 15 24 5 5 11 5 7 19 6 6 30 6 7 7 24 9 8 15 5 5 29 10 17 6 6 11 26 9 19 32 7 8 14 5 8 8 18 6 5 9 6 11 5 7 6 8 5 6 54 6 7 8 22 7 5 8 6 31 6 5 8 26 12 9 7 5 11 6 27 9 6 15 17 5 8 5 6 5 5 5 9 6 5 7 7 9 10 11 33 19 13 6 18 6 9 7 5 6 8 5 5 5 6 5 6 5 18 6 6 7 8 9 5 8 5 8 16 5 8 6 8 7 12 8 13 11 5 17 15 5 12 7 7 11 6 6 5 10 9 5 5 14 7 12 6 5 5 7 5 30 7 5 8 5 9 10 21 6 14 9 7 14 26 23 7 24 7 13 7 5 5 9 12 11 6 5 5 6 5 6 7 76 5 10 6 16 5 12 11 15 6 28 7 14 8 5 6 5 8 5 12 6 5 10 5 14 7 8 6 5 5 8 19 15 10 7 5 14 5 15 7 8 6 6 5 35 5 6 5 11 5 13 5 7 12 11 5 6 10 5 15 6 12 9 11 5 7 9 8 17 8 8 11 6 7 5 15 10 8 8 9 26,6 25 6 13 11 6 15 5 7 7 38 9 5 10 10 11 6 8 6 13 10 7 5 18 9 12 6 16 13 8 8 6 5 5 8 8 8 5 6 5 5 5 5 7 13 6 12 6 6 10 8 8 18 6 5 12 5 8 17 5 18 5 5 17 8 7 6 7 16 10 7 6 10 6 6 10 17 5 10 7 10 6 11 9 5 25 12 13 6 11 5"
library(stringr)
b <- as.numeric( unlist ( str_split (a, " ")))
hist(b)
The histogram I am getting:

It looks like your test "vector" is just one long string.
A numeric vector is as follows:
nums <- c(1,2,3,4,5,6)
You could also make a character vector and convert it, like you tried:
chars <- c("1","2","3","4","5","6")
nums <- as.numeric(chars)
Your values are more like:
char <- "1 2 3 4 5 6"
which cannot be converted to a numeric value with as.numeric(), as it is one long string rather than a vector of numbers or characters

Related

Convert dataframe from vertical to horizontal

I already checked many questions and I don't seem to find the suitable answer.
I have this df
df = data.frame(x = 1:10,y=11:20)
the output
x y
1 1 11
2 2 12
3 3 13
4 4 14
5 5 15
6 6 16
7 7 17
8 8 18
9 9 19
10 10 20
I just wish the output to be:
1 2 3 4 5 6 7 8 9 10
x 1 2 3 4 5 6 7 8 9 10
y 11 12 13 14 15 16 17 18 19 20
thanks
Try t() like below
> data.frame(t(df), check.names = FALSE)
1 2 3 4 5 6 7 8 9 10
x 1 2 3 4 5 6 7 8 9 10
y 11 12 13 14 15 16 17 18 19 20
A transpose should do it
setNames(data.frame(t(df)), df[,"x"])
1 2 3 4 5 6 7 8 9 10
x 1 2 3 4 5 6 7 8 9 10
y 11 12 13 14 15 16 17 18 19 20

Fixing the First and Last Numbers in a Random List

I used this code to generate these random numbers (corresponds to an edge list for a graph) such that (Generating Random Graphs According to Some Conditions):
The first and last "nodes" are the same (e.g. starts at "1" and ends at "1")
Each node is visited exactly once
See below:
d = 15
relations = data.frame(tibble(
from = sample(data$d),
to = lead(from, default=from[1]),
))
> relations
from to
1 1 11
2 11 7
3 7 5
4 5 10
5 10 13
6 13 9
7 9 15
8 15 2
9 2 3
10 3 4
11 4 8
12 8 6
13 6 12
14 12 14
15 14 1
If I re-run this above code, it will (naturally) produce a different list:
relations
from to
1 6 9
2 9 2
3 2 5
4 5 8
5 8 13
6 13 1
7 1 14
8 14 3
9 3 11
10 11 12
11 12 7
12 7 15
13 15 4
14 4 10
15 10 6
Can I do something so that each time I generate a new random set of numbers, I can fix the first and last number to a specific number?
For instance, could I make it so that the first number and the last number are always "7"?
#example 1
from to
1 7 11
2 11 1
3 1 5
4 5 10
5 10 13
6 13 9
7 9 15
8 15 2
9 2 3
10 3 4
11 4 8
12 8 6
13 6 12
14 12 14
15 14 7
#example 2
from to
1 7 9
2 9 2
3 2 5
4 5 8
5 8 13
6 13 1
7 1 14
8 14 3
9 3 11
10 11 12
11 12 6
12 6 15
13 15 4
14 4 10
15 10 7
In the above examples (example 1, example 2), I took the first two random lists I made and manually replaced the first number and last number with 7 - and then replaced the replacement numbers as well.
But is there a way to "automatically" do this instead of making a manual correction?
For example, I think I figured out how to do this:
#run twice to make sure the output is correct
relations = data.frame(tibble(
from = sample(data$d),
to = lead(from, default=from[1]),
))
orig_first = relations[1,1]
relations[1,1] = 7
relations[15,2] = 7
relation = relations[-c(1,15),]
r1 = relations[1,]
r2 = relations[15,]
final_relation = rbind(r1, relation, r2)
#output 1 : seems correct (starts with 7, ends with 7, all nodes visited exactly once)
from to
1 7 8
2 8 4
3 4 7
4 7 13
5 13 1
6 1 14
7 14 6
8 6 9
9 9 11
10 11 10
11 10 12
12 12 2
13 2 5
14 5 15
15 15 7
#output 2: looks correct
from to
1 7 9
2 9 2
3 2 1
4 1 6
5 6 3
6 3 10
7 10 11
8 11 14
9 14 12
10 12 7
11 7 13
12 13 4
13 4 15
14 15 8
15 8 7
Am I doing this correctly? Is there an easier way to do this?
Thank you!
Here is a way to do this -
library(dplyr)
set.seed(2021)
d = 15
fix_num <- 7
relations = tibble(
from = c(fix_num, sample(setdiff(1:d, fix_num))),
to = lead(from, default=from[1]),
)
relations
# A tibble: 15 x 2
# from to
# <dbl> <dbl>
# 1 7 8
# 2 8 6
# 3 6 11
# 4 11 15
# 5 15 4
# 6 4 14
# 7 14 9
# 8 9 10
# 9 10 3
#10 3 5
#11 5 12
#12 12 13
#13 13 1
#14 1 2
#15 2 7

Change units of time dimension in NetCDF file from months to months since

I currently have multiple NetCDF files with 4 dimensions, (latitude, longitude, time, and depth). Each represents a single year of monthly data. The unit of time is "month", 1-12, and therefore quite useless if I want to merge these files across years to give me a single NetCDF file with a time dimension of size months*years.
The time dimension attributes for a single file:
time Size:12 *** is unlimited ***
long_nime: time
units: month
I used ncrcat of nco to merge.
ncrcat soda3.3.1*sst.nc -O soda3.3.1_1980_2015_sst.nc
This works except that when merged, time values read
#in R
soda.info$var$temp$dim[[3]]$vals
[1] 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1
[26] 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2
[51] 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3
[76] 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4
[101] 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5
[126] 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6
[151] 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7
[176] 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8
[201] 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9
[226] 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10
[251] 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11
[276] 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12
[301] 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1
[326] 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2
[351] 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3
[376] 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4
[401] 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5
[426] 6 7 8 9 10 11 12
...which obviously isn't much help if I want to keep track of time.
In the past I've only used NetCDF files with a "months since..." unit. Is there a way to change these rather groundless 'month' units to 'months since...'?
Would it suffice to enumerate the months sequentially?
ncap2 -s 'time=array(0,1,$time)' soda3.3.1_1980_2015_sst.nc out.nc
You can also add a "months since ..." unit to time as described in the comment by Chelmy and/or in the NCO manual. I leave that as an exercise for you, gentle reader.

Extracting numbers from very long string into vector

I have the fairly long string shown below (~50k characters)
https://gist.github.com/anonymous/9de31de2e6fc9888f3debeda4698b739
I want to extract numbers (always 1 or 2 digit), that are always between "'>" and "<" and add them to a vector (must be in the correct order).
for example:
><td class='td-val ball-8'>13</td><td class='td-val ball-8'>9</td>
Would output a vector, [13,9]
I couldn't even get it to let me enter my string into r, when I tried to do it in the form.
mystring <- "text here"
When I would try to press enter then, it would just have a + next to the command line. So I think some of the symbols in the text were messing it up.
Since it's HTML that you're trying to parse, it's best to use an HTML parsing package like rvest:
library(rvest)
url <- 'https://gist.githubusercontent.com/anonymous/9de31de2e6fc9888f3debeda4698b739/raw/c07c2d6c6f00060806b15ec57ed06d4a4e0d9d74/gistfile1.txt'
url %>% read_html() %>% html_nodes('td.td-val') %>% html_text() %>% as.integer()
which returns
[1] 13 9 8 8 1 2 0 8 11 2 13 5 13 4 4 5 4 7 3 8 10 13 1 7 14 13 10 2 0 8
[31] 13 0 10 5 11 9 3 1 4 3 5 12 4 14 1 9 13 5 9 7 12 10 2 10 14 4 11 11 13 8
[61] 8 10 10 12 12 6 8 13 7 2 2 9 10 9 13 3 14 14 0 14 4 11 14 6 10 2 0 0 10 14
[91] 2 8 3 6 14 6 1 9 11 12 1 12 4 0 7 9 2 10 1 12 0 8 0 9 3 11 11 0 8 5
[121] 0 6 1 9 8 10 7 4 7 0 3 12 10 11 11 8 4 11 1 5 12 2 14 9 12 8 1 9 14 13
[151] 8 2 1 5 7 9 14 14 12 3 6 3 9 0 6 9 3 3 10 3 8 6 9 2 4 12 2 2 14 7
[181] 12 8 0 8 12 2 12 9 6 8 9 9 3 7 9 0 6 13 0 12 3 14 12 4 8 9 14 4 5 9
[211] 6 3 2 5 1 2 0 5 0 5 9 0 12 14 11 11 7 4 12 1 14 2 13 3 13 2 0 12 13 6
[241] 5 3 13 9 12 2 11 6 8 12 9 6 13 9 0 0 4 2 1 0 0 3 0 3 7 9 11 1 8 10
[271] 11 13 12 9 10 8 10 3 7 12 4 9 0 4 14 1 7 0 7 1 2 6 0 6 6 1 0 9 4 8
[301] 0 7 13 8 11 4 1 12 1 14 11 13 9 12 8 2 8 7 12 13 12 5 8 5 10 2 7 5 9 12
[331] 12 13 8 7 6 4 12 13 4 9 12 2 0 11 8 9 1 10 5 10 9 11 10 1 8 1 12 10 9 5
[361] 7 10 5 2 7 12 4 10 6 9 0 6 0 4 13 7 0 8 3 3 11 8 4 12 10 5 7 1 11 3
[391] 1 11 7 14 13 13 14 4 2 11 2 12 3 6 14 10 6 13 9 12 4 13 10 3 9 11 8 4 8 10
[421] 9 6 3 6 7 5 11 0 2 7 6 11 11 13 13 12 7 9 6 9 5 12 14 3 13 10 1 2 7 1
[451] 14 1 0 7 8 13 6 3 9 12 2 2 2 7 11 1 2 14 6 13 11 3 6 11 5 9 0 9 13 10
[481] 11 13 3 12 12 3 7 6 5 14 3 9 10 6 13 5 7 4 5 12 8 14 5 6 8 7 0 0 2 1
[511] 1 9 13 13 5 6 10 8 0 2 3 4 4 5 14 13 5 2 2 4 6 5 9 6 14 8 4 12 4 6
[541] 9 1 4 2 4 9 1 7 1 10 0 1 1 8 6 5 8 4 9 11 14 2 3 8 2 11 3 7 11 2
[571] 4 9 5 3 4 1 4 8 13 4 8 8 1 7 2 7 3 11 13 1 13 7 9 3 7 7 4 12 9 14
[601] 11 9 2 12 12 14 10 4 12 11 12 10 14 3 11 6 12 3 6 3 11 8 10 2 6 3 1 11 2 6
[631] 0 8 12 5 5 3 6 2 14 11 7 14 14 8 11 2 7 0 10 2 0 4 8 9 8 3 2 13 4 10
[661] 2 5 13 2 2 12 12 0 10 4 1 5 13 3 10 3 11 2 5 3 9 6 11 0 8 12 0 11 2 11
[691] 7 8 1 3 4 14 4 4 9 5 12 7 6 9 12 13 2 11 1 11 12 0 4 6 10 8 5 14 7 6
[721] 4 7 2 5 2 14 3 8 10 6 14 7 14 3 2 6 5 0 3 0 12 0 12 3 5 5 8 5 14 6
[751] 10 14 5 2 3 11 3 4 3 11 4 2 0 11 11 13 4 0 6 14 2 6 9 10 4 9 5 7 1 13
[781] 8 3 13 3 10 4 8 1 3 11 2 8 5 10 7 6 10 14 14 2 2 12 8 4 13 7 11 13 4 5
[811] 7 2 3 8 14 3 9 12 6 2 6 0 3 5 8 8 0 14 13 13 7 10 9 6 1 0 4 8 6 8
[841] 14 1 9 0 9 2 7 10 8 5 10 7 1 8 2 13 3 1 8 12 12 2 5 6 3 9 4 5 4 13
[871] 6 3 10 7 9 2 1 12 1 11 0 10 0 11 8 8 0 7 0 11 10 3 14 6 9 11 11 0 12 1
[901] 10 13 1 7 7 2 0 3 13 9 2 4 12 3 0 11 1 8 8 13 12 6 8 13 8 1 13 11 2 9
[931] 11 8 10 8 3 14 6 14 7 6 7 10 3 11 3 13 11 3 9 13 8 10 8 7 12 4 11 12 12 9
[961] 6 10 2 8 13 7 11 5 7 12 10 14 1 6 7 6 7 2 3 5 13 6 10 9 5 2 0 1 11 8
[991] 9 5 1 3 3 1 12 1 13 2 14 5 7 1 10 9 0 9 11 10 6 2 7 12 10 6 2 10 13 4
[1021] 9 9 14 4 4 5 7 13 13 13 6 7 12 1 6 11 12 14 4 11 6 4 10 0 9 12 10 10 13 8
[1051] 3 3 0 8 5 14 10 3 7 5 0 14 5 6 10 14 7 4 8 9 1 6 14 1 14 5 5 14 4 11
[1081] 12 14 9 13 14 13 2 13 11 9 14 2 1 9 8 11 13 11 14 13 3 4 9 6 9 6 10 13 1 12
[1111] 10 14 11 5 8 9 3 5 6 14 1 11 10 12 7 7 2 13 13 12 12 4 3 14 6 4 2 5 9 4
[1141] 14 11 6 4 11 6 4 4 8 2 2 5 14 1 7 11 8 9 11 11 10 6 14 3 0 3 8 8 14 13
[1171] 10 6 10 4 9 12 0 9 2 9 13 12 1 12 3 5 5 3 12 2 1 5 1 0 10 7 3 10 14 13
[1201] 11 8 0 10 12 9 4 5 4 8 5 6 2 11 7 5 5 8 4 9 9 10 14 3 7 9 1 9 9 8
[1231] 1 8 11 5 2 4 9 14 14 6 10 7 4 14 6 5 1 4 3 8 13 10 5 1 8 8 6 8 7 1
[1261] 14 4 4 7 2 12 10 8 10 5 6 7 2 3 5 13 1 2 9 8 5 14 1 11 9 5 8 12 13 0
[1291] 4 2 0 8 8 2 5 3 13 11 5 11 14 14 9 12 4 5 9 3 13 14 1 5 10 4 9 6 5 8
[1321] 7 5 7 3 14 8 4 8 4 6 5 8 11 0 14 13 2 13 12 13 3 4 7 8 11 4 14 12 3 6
[1351] 11 8 8 9 6 7 4 3 10 9 2 9 12 12 0 1 10 9 8 0 12 9 3 14 13 7 8 12 10 9
[1381] 10 10 2 11
You can use readLines to import string from the url which you can get by clicking the Raw button.
mystring <- readLines("https://gist.githubusercontent.com/anonymous/9de31de2e6fc9888f3debeda4698b739/raw/c07c2d6c6f00060806b15ec57ed06d4a4e0d9d74/gistfile1.txt")
Use some regular expression as follows should give you all the numbers you want:
library(stringr)
num <- gsub(">|<", "", str_extract_all(mystring, ">\\d+<", simplify = T))
head(as.vector(num))
[1] "13" "9" "8" "8" "1" "2"

divide dataframe into subgroups based on several columns successively in R

I have to sort a datapool with following structure into subgroups based on the value of 3 columns in R, but I cannot figure it out.
What I want to do is:
First, sort the datapool based on the column V1, the datapool should be divided into three subgroups according to the value of V1 (the value of V1 should be sorted by descending at first).
Sort each of the 3 subgroups into another 3 subgroups according to the value of V2, now we should have 9 subgroups.
Similarly, subdivide each of the 9 groups into 3 groups again,and resulting in 27 subgroups all together.
the following data is only a simple example, the data have 1545 firms.
Firm value V1 V2 V3
1 7 7 11 8
2 9 9 11 7
3 8 14 8 10
4 9 9 7 14
5 8 11 15 14
6 9 10 9 7
7 8 8 6 14
8 4 8 11 14
9 8 10 13 10
10 2 11 6 13
11 3 5 12 14
12 5 12 15 12
13 1 9 13 7
14 4 5 14 7
15 5 10 5 9
16 5 8 13 14
17 2 10 10 7
18 5 12 12 9
19 7 6 11 7
20 6 9 14 14
21 6 14 9 14
22 8 6 6 7
23 9 11 9 5
24 7 7 6 9
25 10 5 15 11
26 4 6 10 9
27 4 13 14 8
And the result should be:
Firm value V1 V2 V3
5 8 11 15 14
12 5 12 15 12
27 4 13 14 8
21 6 14 9 14
18 5 12 12 9
23 9 11 9 5
10 2 11 6 13
3 8 14 8 10
6 9 10 9 7
20 6 9 14 14
9 8 10 13 10
13 1 9 13 7
8 4 8 11 14
2 9 9 11 7
17 2 10 10 7
4 9 9 7 14
7 8 8 6 14
15 5 10 5 9
16 5 8 13 14
25 10 5 15 11
14 4 5 14 7
11 3 5 12 14
1 7 7 11 8
19 7 6 11 7
26 4 6 10 9
24 7 7 6 9
22 8 6 6 7
I have tried for a long time, also searched Google without success. :(
As #Codoremifa said, data.table can be used here:
require(data.table)
DT <- data.table(dat)
DT[order(V1),G1:=rep(1:3,each=9)]
DT[order(V2),G2:=rep(1:3,each=3),by=G1]
DT[order(V3),G3:=1:3,by='G1,G2']
Now your groups are labeled using the additional columns G1 and G2. To sort, so that it's easier to see the groups, use
setkey(DT,G1,G2,G3)
A couple of the OP's columns are just noise unrelated to the question; to verify that this works by eye, try DT[,list(V1,V2,V3,G1,G2,G3)]
EDIT: The OP did not specify a means of dealing with ties. I guess it makes sense to use the value in the later columns to break ties, so...
DT <- data.table(dat)
DT[order(rank(V1)+rank(V2)/100+rank(V3)/100^2),
G1:=rep(1:3,each=9)]
DT[order(rank(V2)+rank(V3)/100),
G2:=rep(1:3,each=3),by=G1]
DT[order(V3),
G3:=1:3,by='G1,G2']
setkey(DT,G1,G2,G3)
DT[27:1] (the result backwards) is
Firm value V1 V2 V3 G1 G2 G3
1: 5 8 11 15 14 3 3 3
2: 12 5 12 15 12 3 3 2
3: 27 4 13 14 8 3 3 1
4: 21 6 14 9 14 3 2 3
5: 9 8 10 13 10 3 2 2
6: 18 5 12 12 9 3 2 1
7: 10 2 11 6 13 3 1 3
8: 3 8 14 8 10 3 1 2
9: 23 9 11 9 5 3 1 1
10: 20 6 9 14 14 2 3 3
11: 16 5 8 13 14 2 3 2
12: 13 1 9 13 7 2 3 1
13: 8 4 8 11 14 2 2 3
14: 17 2 10 10 7 2 2 2
15: 2 9 9 11 7 2 2 1
16: 4 9 9 7 14 2 1 3
17: 15 5 10 5 9 2 1 2
18: 6 9 10 9 7 2 1 1
19: 11 3 5 12 14 1 3 3
20: 25 10 5 15 11 1 3 2
21: 14 4 5 14 7 1 3 1
22: 26 4 6 10 9 1 2 3
23: 1 7 7 11 8 1 2 2
24: 19 7 6 11 7 1 2 1
25: 7 8 8 6 14 1 1 3
26: 24 7 7 6 9 1 1 2
27: 22 8 6 6 7 1 1 1
Firm value V1 V2 V3 G1 G2 G3
Here is an answer using transform and then ddply from plyr. I don't address the ties, which really means that in case of a tie the value from the lowest row number is used first. This is what the OP shows in the example output.
First, order the dataset in descending order of V1 and create three groups of 9 by creating a new variable, fv1.
dat1 = transform(dat1[order(-dat1$V1),], fv1 = factor(rep(1:3, each = 9)))
Then order the dataset in descending order of V2 and create three groups of 3 within each level of fv1.
require(plyr)
dat1 = ddply(dat1[order(-dat1$V2),], .(fv1), transform, fv2 = factor(rep(1:3, each = 3)))
Finally order the dataset by the two factors and V3. I use arrange from plyr for typing efficiency compared to order
(finaldat = arrange(dat1, fv1, fv2, -V3) )
This isn't a particularly generalizable answer, as the group sizes are known in advance for the factors. If the V3 group size was larger than one, a similar process as for V2 would be needed.

Resources