I am trying to implement the simple variant of quicksort defined in Wikipedia. However, it seems to me that local variables in an older recursive call are leaking to a later call. (My current interpretation.). Is this the case? Any workarounds?
Here's the sample code
# Quick sort - Simple variant. Requires 0(n) extra store
# Divide and conquer. Pick a pivot, compare elements to
# pivot generating two sublists; one greater than pivot
# and the other less than pivot. Sort these two recursively.
# See http://en.wikipedia.org/wiki/Quicksort#Simple_version
function dump_array(arr_arg){
arr_arg_len = length(arr_arg)
RSEP = ORS;
ORS=" ";
dctr = 1;
# Do not use length(arr_arg) in place of arr_arg
# It fails. The following doesn't work
#while (dctr <= arr_arg_len){
while (dctr <= arr_arg_len){
print arr_arg[dctr];
dctr++;
};
ORS = RSEP;
print "\n";
}
function simple_quicksort(unsorted_str)
{
# Unpack from the str - space separated
print "******************************"
print "Called with "unsorted_str
# Split the space separated string into an array
split(unsorted_str, unsorted_array, " ");
array_len = length(unsorted_array);
# No more sorting to be done. Break recursion
if (array_len <= 1){
print "Ending recursion with "unsorted_str
return unsorted_str
}
# Pick a random value as pivot
# index must not be 0
idx = 0
while (idx == 0){
srand()
idx = int(rand() * array_len)
}
pivot = unsorted_array[idx]
if (debug >= 1){
print "idx:"idx" pivot is: "pivot
}
num = 1;
# we don't use the zero'th element,
# this helps us declare an empty array
# dunno any other method
# we'll remove it anyway
less_arr[0] = 0
less_ctr = 1
more_arr[0] = 0
more_ctr = 1
while (num <= array_len){
# Skip pivot
if (idx != num){
if (unsorted_array[num] <= pivot){
if (debug >= 1){
print "Element less than pivot: "unsorted_array[num]
}
less_arr[less_ctr] = unsorted_array[num]
less_ctr++;
}else{
if (debug >= 1){
print "Element more than pivot: "unsorted_array[num]
}
more_arr[more_ctr] = unsorted_array[num]
more_ctr++;
}
}
num++
};
# strip out the holder in idx 0
delete less_arr[0]
delete more_arr[0]
if (debug >= 1){
print "Less than pivot:"
print dump_array(less_arr)
print "More than pivot:"
print dump_array(more_arr)
}
# Marshal array back to a string
less_str=""
less_length = length(less_arr)
num = 1
print "Less length: "less_length
while (num <= less_length){
less_str = less_str" "less_arr[num]
num++;
}
# same thing for more
more_str=""
more_length = length(more_arr)
num = 1
while (num <= more_length){
more_str = more_str" "more_arr[num]
num++;
}
if (debug >= 1){
print "Going for a recursive call with elements < pivot: "less_str
print "Going for a recursive call with elements > pivot: "more_str
print "pivot was: "pivot
}
# Tried to delete the local variables
# Coz it seems like local vars are visible to recursed functions
# Is this why it fails?
delete less_arr
delete more_arr
delete unsorted_array
print "^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^"
print ""
return simple_quicksort(less_str) " "pivot" "simple_quicksort(more_str)
}
BEGIN{
print "-- quick sort --"
}
{
# print the unsorted objects
print "Unsorted "NF" objects:\n"$0;
# We'll use a slightly different method,
# Pass the $0 string to the sorter, let it split
# it into an array, qsort that array, generate sub-
# strings and recursively qsort them
#Simple version
sorted = simple_quicksort($0)
}
END{
print "Sorted "NF" objects";
print "Sorted >>"sorted
}
A sample run:
echo 5 12 7 2 13719 28019 21444 30578 30647 | awk -f devel/andorian-blog/awk/sorting/quick_sort.awk -v debug=1
-- quick sort --
Unsorted 9 objects:
5 12 7 2 13719 28019 21444 30578 30647
******************************
Called with 5 12 7 2 13719 28019 21444 30578 30647
idx:1 pivot is: 5
Element more than pivot: 12
Element more than pivot: 7
Element less than pivot: 2
Element more than pivot: 13719
Element more than pivot: 28019
Element more than pivot: 21444
Element more than pivot: 30578
Element more than pivot: 30647
Less than pivot:
2
More than pivot:
12 7 13719 28019 21444 30578 30647
Less length: 1
Going for a recursive call with elements < pivot: 2
Going for a recursive call with elements > pivot: 12 7 13719 28019 21444 30578 30647
pivot was: 5
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
******************************
Called with 2
Ending recursion with 2
******************************
Called with 12 7 13719 28019 21444 30578 30647
idx:1 pivot is: 12
Element less than pivot: 7
Element more than pivot: 13719
Element more than pivot: 28019
Element more than pivot: 21444
Element more than pivot: 30578
Element more than pivot: 30647
Less than pivot:
7
More than pivot:
13719 28019 21444 30578 30647
Less length: 1
Going for a recursive call with elements < pivot: 7
Going for a recursive call with elements > pivot: 13719 28019 21444 30578 30647
pivot was: 12
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
******************************
Called with 7
Ending recursion with 7
******************************
Called with 13719 28019 21444 30578 30647
idx:3 pivot is: 21444
Element less than pivot: 13719
Element more than pivot: 28019
Element more than pivot: 30578
Element more than pivot: 30647
Less than pivot:
13719
More than pivot:
28019 30578 30647
Less length: 1
Going for a recursive call with elements < pivot: 13719
Going for a recursive call with elements > pivot: 28019 30578 30647
pivot was: 21444
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
******************************
Called with 13719
Ending recursion with 13719
******************************
Called with 28019 30578 30647
idx:1 pivot is: 28019
Element more than pivot: 30578
Element more than pivot: 30647
Less than pivot:
More than pivot:
30578 30647
Less length: 0
Going for a recursive call with elements < pivot:
Going for a recursive call with elements > pivot: 30578 30647
pivot was: 28019
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
******************************
Called with
Ending recursion with
******************************
Called with 30578 30647
idx:1 pivot is: 30578
Element more than pivot: 30647
Less than pivot:
More than pivot:
30647
Less length: 0
Going for a recursive call with elements < pivot:
Going for a recursive call with elements > pivot: 30647
pivot was: 30578
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
******************************
Called with
Ending recursion with
******************************
Called with 30647
Ending recursion with 30647
Sorted 9 objects
Sorted >> 2 5 7 12 13719 21444 28019 30578 30647
That run looks good. Compare it with the next one:
$ echo 5 12 7 2 13719 28019 21444 30578 30647 | awk -f devel/andorian-blog/awk/sorting/quick_sort.awk -v debug=1
-- quick sort --
Unsorted 9 objects:
5 12 7 2 13719 28019 21444 30578 30647
******************************
Called with 5 12 7 2 13719 28019 21444 30578 30647
idx:6 pivot is: 28019
Element less than pivot: 5
Element less than pivot: 12
Element less than pivot: 7
Element less than pivot: 2
Element less than pivot: 13719
Element less than pivot: 21444
Element more than pivot: 30578
Element more than pivot: 30647
Less than pivot:
5 12 7 2 13719 21444
More than pivot:
30578 30647
Less length: 6
Going for a recursive call with elements < pivot: 5 12 7 2 13719 21444
Going for a recursive call with elements > pivot: 30578 30647
pivot was: 28019
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
******************************
Called with 5 12 7 2 13719 21444
idx:4 pivot is: 2
Element more than pivot: 5
Element more than pivot: 12
Element more than pivot: 7
Element more than pivot: 13719
Element more than pivot: 21444
Less than pivot:
More than pivot:
5 12 7 13719 21444
Less length: 0
Going for a recursive call with elements < pivot:
Going for a recursive call with elements > pivot: 5 12 7 13719 21444
pivot was: 2
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
******************************
Called with
Ending recursion with
******************************
Called with 5 12 7 13719 21444
idx:3 pivot is: 7
Element less than pivot: 5
Element more than pivot: 12
Element more than pivot: 13719
E lement more than pivot: 21444
Less than pivot:
5
More than pivot:
12 13719 21444
Less length: 1
Going for a recursive call with elements < pivot: 5
Going for a recursive call with elements > pivot: 12 13719 21444
pivot was: 7
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
******************************
Called with 5
Ending recursion with 5
******************************
Called with 12 13719 21444
idx:2 pivot is: 13719
Element less than pivot: 12
Element more than pivot: 21444
Less than pivot:
1 2
More than pivot:
21444
Less length: 1
Going for a recursive call with elements < pivot: 12
Going for a recursive call with elements > pivot: 21444
pivot was: 13719
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
******************************
Called with 12
Ending recursion with 12
******************************
Called with 21444
Ending recursion with 21444
******************************
Called with 21444
Ending recursion with 21444
Sorted 9 objects
Sorted >> 2 5 7 12 13719 21444 13719 21444
In AWK, all the variables you reference to in your functions are global variables. There is no such thing as "local variables".
However, function arguments in AWK behave like local variables in the sense they are "hidden" when a nested or recursive function is called.
So you simply need to take all the "local variables" in your function and add them as extra arguments of your function.
When you call the function you can omit these extra arguments.
It is conventional to place some extra space between the arguments and the "local variables", in order to document how your function is supposed to be used.
This behavior and convention are documented on GAWK documentation: Function Definition Syntax.
Related
I need to generate following example:
month=12
n=1
output=[[1,2,3,4,5,6,7,8,9,10,11,12]]
month=12
n=2
output=[[1,3,5,7,9,11],[2,4,6,8,10,12]]
month=12
n=3
output=[[1,4,7,10],[2,5,8,11],[3,6,9,12]]
I tried but failed
v=function(a){
b=12/a
c=12%%a
d=seq(1,12,a)
e=seq(0,a)
f=list()
for (i in 0:a)
{
#f.append(e[i]+d)
}
return(f)
}
v(3)
Here is the code I have, but I don't know how to store vector into two dimensional list. Can you give me some suggestions?
split should give what your want:
v <- function(n, month = 12) {
month <- seq_len(month)
res <- split(month, factor((month - 1) %% n))
unname(res)
}
str(v(1))
#List of 1
# $ : int [1:12] 1 2 3 4 5 6 7 8 9 10 ...
str(v(2))
#List of 2
# $ : int [1:6] 1 3 5 7 9 11
# $ : int [1:6] 2 4 6 8 10 12
str(v(3))
#List of 3
# $ : int [1:4] 1 4 7 10
# $ : int [1:4] 2 5 8 11
# $ : int [1:4] 3 6 9 12
This question already has answers here:
Determining memory usage of objects?
(8 answers)
Closed 5 years ago.
I want to get the memory usage of all lists in the current environment. data.table has tables that summarizes all tables in memory including size. Here's what I am using right now but I'm wondering if there's a better way to do it:
sapply(ls()[grepl("list",sapply(ls(), function(z) class(get(z))))],
function(z) format(object.size(get(z)), units = "Mb") )
I've seen Determining memory usage of objects? and Tricks to manage the available memory in an R session but they seem more about knowing usage of a specific item or managing memory, respectively. What I want is to get memory usage for all lists (this example) or all items that follow a certain naming convention.
thanks!
One method would be to use eapply to search through all objects in the desired environment, check if each is a list and return the object.size if TRUE, else return NA.
eapply(as.environment(-1),
FUN=function(x) if(is.list(x)) format(object.size(x), units = "Mb") else NA)
$a
[1] "7.2 Mb"
$b
[1] "72.5 Mb"
$f
[1] NA
The as.environment(-1) tells eapply to run over the environment that it was called from, which is the global environment here.
Also, ls.str might be useful here to return the str of list objects:
ls.str(mode = "list")
a : List of 2
$ : int [1:1000000] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:899999] 2 3 4 5 6 7 8 9 10 11 ...
b : List of 2
$ : int [1:10000000] 1 2 3 4 5 6 7 8 9 10 ...
$ : int [1:8999999] 2 3 4 5 6 7 8 9 10 11 ...
data
#rm(list=ls())
f <- function() return(1)
a <- list(1:1e6, 2:9e5)
b <- list(1:1e7, 2:9e6)
The length of the elements the rle function returns is reported back, but it is accessible with code? I know I can run the function length on rle$lengths to get to the value I want. But when you look at the product returned by rle you see that number displayed right in front of your eyes. The question is, is it retrievable?
v1 <- rep(seq(5),seq(3,7))
rle(v1)
gives us:
# Run Length Encoding
# lengths: int [1:5] 3 4 5 6 7
# values : int [1:5] 1 2 3 4 5
the 5 in int [1:5] is the length of each of the returned elements. It's already there, is there a way to retrieve it, or do we have to recalculate it with length?
str() returns the structure of the object passed to it so you can see which components are actually stored. In this case, using str() on rle(v):
v <- rle(v1)
str(v)
#List of 2
#$ lengths: int [1:5] 3 4 5 6 7
#$ values : int [1:5] 1 2 3 4 5
#- attr(*, "class")= chr "rle"
it looks like the output is a list of 2, so you'd still have to use something like length to retrieve it.
If I want to list all rows of a column in a dataset in R, I am able to do it in these two ways:
> dataset[,'column']
> dataset$column
It appears that both give me the same result. What is the difference?
In practice, not much, as long as dataset is a data frame. The main difference is that the dataset[, "column"] formulation accepts variable arguments, like j <- "column"; dataset[, j] while dataset$j would instead return the column named j, which is not what you want.
dataset$column is list syntax and dataset[ , "column"] is matrix syntax. Data frames are really lists, where each list element is a column and every element has the same length. This is why length(dataset) returns the number of columns. Because they are "rectangular," we are able to treat them like matrices, and R kindly allows us to use matrix syntax on data frames.
Note that, for lists, list$item and list[["item"]] are almost synonymous. Again, the biggest difference is that the latter form evaluates its argument, whereas the former does not. This is true even in the form `$`(list, item), which is exactly equivalent to list$item. In Hadley Wickham's terminology, $ uses "non-standard evaluation."
Also, as mentioned in the comments, $ always uses partial name matching, [[ does not by default (but has the option to use partial matching), and [ does not allow it at all.
I recently answered a similar question with some additional details that might interest you.
Use 'str' command to see the difference:
> mydf
user_id Gender Age
1 1 F 13
2 2 M 17
3 3 F 13
4 4 F 12
5 5 F 14
6 6 M 16
>
> str(mydf)
'data.frame': 6 obs. of 3 variables:
$ user_id: int 1 2 3 4 5 6
$ Gender : Factor w/ 2 levels "F","M": 1 2 1 1 1 2
$ Age : int 13 17 13 12 14 16
>
> str(mydf[1])
'data.frame': 6 obs. of 1 variable:
$ user_id: int 1 2 3 4 5 6
>
> str(mydf[,1])
int [1:6] 1 2 3 4 5 6
>
> str(mydf[,'user_id'])
int [1:6] 1 2 3 4 5 6
> str(mydf$user_id)
int [1:6] 1 2 3 4 5 6
>
> str(mydf[[1]])
int [1:6] 1 2 3 4 5 6
>
> str(mydf[['user_id']])
int [1:6] 1 2 3 4 5 6
mydf[1] is a data frame while mydf[,1] , mydf[,'user_id'], mydf$user_id, mydf[[1]], mydf[['user_id']] are vectors.
I have a list with several elements, say 10.
testList <- split(1:10,1:10)
How to insert a new element in the middle of the list, say at position 3?
The brute force way of looping through all the elements will work, but just wondering if there is a more elegant way of doing this?
I think the append-function is what you are looking for:
append(testList, list(x=42), 3)
$`1`
[1] 1
$`2`
[1] 2
$`3`
[1] 3
$x
[1] 42
$`4`
[1] 4
#snipped....
For more complex lists you might find the modifyList function in the utils package to be of use. It allows targeted modifications. What it does not support is insertions of rows in a dataframe.
Using extraction indices:
> testList[5:11] <- c('something', testList[5:10])
> str(testList)
List of 11
$ 1 : int 1
$ 2 : int 2
$ 3 : int 3
$ 4 : int 4
$ 5 : chr "something"
$ 6 : int 5
$ 7 : int 6
$ 8 : int 7
$ 9 : int 8
$ 10: int 9
$ : int 10