Apache Pig: OutOfMemory exception with simple GROUP BY in local mode - out-of-memory

I'm getting an OutOfMemory exception from Pig when trying to execute a very simple GROUP BY on a tiny (3KB), randomly-generated, example data set.
The pig script:
$ cat example.pig
raw =
LOAD 'example-data'
USING PigStorage()
AS (thing1_id:int,
thing2_id:int,
name:chararray,
timestamp:long);
grouped =
GROUP raw BY thing1_id;
DUMP grouped;
The data:
$ cat example-data
281906 13636091 hide 1334350350
174952 20148444 save 1334427826
1082780 16033108 hide 1334500374
2932953 14682185 save 1334501648
1908385 28928536 hide 1334367665
[snip]
$ wc example-data
100 400 3239 example-data
Here we go:
$ pig -x local example.pig
[snip]
java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:949)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:674)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
[snip]
And some extra info:
$ apt-cache show hadoop | grep Version
Version: 1.0.2
$ pig --version
Apache Pig version 0.9.2 (r1232772)
compiled Jan 17 2012, 23:49:20
$ echo $PIG_HEAPSIZE
4096
At this point, I feel like I must be doing something drastically wrong because I can't see any reason why 3 kB of text would ever cause the heap to fill up.

Check this: [link] http://sumedha.blogspot.in/2012/01/solving-apache-pig-javalangoutofmemorye.html
neil, you are right, let me explain the things like this: In the bin/pig script file, the source code is :
JAVA_HEAP_MAX=-Xmx1000m
# check envvars which might override default args
if [ "$PIG_HEAPSIZE" != "" ]; then
JAVA_HEAP_MAX="-Xmx""$PIG_HEAPSIZE""m"
fi
It is setting the Java_heap_size to maxium ("x") using the -Xmx switch only,but i didnot know why this script overriding is not working, that is the reason, i asked you to specify directly the java heap size using the paramters as specified in the link. I didnot got time to check why this problem is raising. If any one have idea please post it here.

You pig job is failing around the following code in MapTask.java:
931 final float recper = job.getFloat("io.sort.record.percent",(float)0.05);
932 final int sortmb = job.getInt("io.sort.mb", 100);
...
945 // buffers and accounting
946 int maxMemUsage = sortmb << 20;
947 int recordCapacity = (int)(maxMemUsage * recper);
948 recordCapacity -= recordCapacity % RECSIZE;
949 kvbuffer = new byte[maxMemUsage - recordCapacity];
So i suggest that you check what the configured value of io.sort.mb and io.sort.record.percent is, and whether following the above logic, maxMemUsage - recordCapacity this is close to, or bigger than your configured JVM heap size (4096 MB)

I toyed with it for a while and ended up switching from the debian packages for hadoop/pig to the raw tarballs, and the problem went away. Not sure what to make of that :)

Related

How to save fbset setting?

I am working on a embedded Linux project using Qt, when the Qt program runs, it does not sit on the middle of the 7" LCD,so I used "fbset -move -step" to move it,then it is ok.
But when the board is switched off and on again, the setting is lost, the Qt program still not sit on the middle of the LCD. I checked the etc/fb.modes, and I also modified it, but the problem still remains. Can anyone help me?
Very lucky this time, I solved the question by myself. After "fbset -move -step", I printed the current setting using "fbset" command, and then I write these setting into the /etc/fb.modes. If you want to use this setting every bootup, you should add one line in /etc/rc.local: fbset mymode (the name you set in the fb.modes).
You can output the current settings by running fbset with no arguments other than -s/--show or -fb:
# fbset
mode "1024x768-60"
# D: 65.003 MHz, H: 48.365 kHz, V: 60.006 Hz
geometry 1024 768 1024 768 16
timings 15384 160 24 29 3 136 6
hsync high
vsync high
rgba 5/11,6/5,5/0,0/0
endmode
And you can write that into a file:
fbset >>/etc/local.fb.modes
Edit to rename the mode, add any comments you want; you can then use your new file with the -db argument:
fbset -db /etc/local.fb.modes --all "1024x768-60"
You can put that command into your /etc/rc.local to take effect every boot.
Tip: if setting mode in /etc/rc.local fails with:
systemctl status rc-local.service -l
"open /dev/fb0: No such file or directory"
Then simply run "fbset" 1st before setting mode:
/etc/rc.local
fbset
fbset -g 800 600 800 600 32
Had this problem in VMWare..

Why can't I use hPutStr after printing the result of hGetContents?

I'm new to stackoverflow so forgive me if I do something wrong. I trying to understand how a simple server would work in Haskell. I think I'm missing something very simple or fundamental about how hGetContents works.
import Network
import System.IO
main = withSocketsDo $ do
socket <- listenOn $ PortNumber 5002
(h, _, _) <- accept socket
c <- hGetContents h
-- putStrLn c -- doesn't work
-- putStrLn $ head $ lines c -- works!
-- putStrLn $ unlines $ take 2 $ lines c -- works!
-- putStrLn $ unlines $ take 3 $ lines c -- works!
-- putStrLn $ unlines $ take 6 $ lines c -- works!
putStrLn $ unlines $ take 10 $ lines c -- doesn't work
hPutStr h $ "HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nHello!\r\n"
hClose h
After running the program, I navigate via web browser to http://localhost:5002. The problem seems to be that, depending on how much I've parsed the handle contents, I eventually am unable to send a response. I'd like to be able to parse the request before I send a response. I've commented in the code the cases that work and the cases that don't. Hoogle says that for hGetContents (lazy) the handle is "semi-closed" as it is being read. Am I misunderstanding the laziness or should I consider the handle closed once I begin parsing its contents?
The error I get is "hPutChar: resource vanished (Broken pipe)." Thanks for any help.
I tried to reproduce your problem. For that I executed your code and send it a request using nc:
printf "1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n11" | nc localhost 5002
As expected the server (code from your question) printed out first 10 lines and exited without any error. The client (nc) printed:
HTTP/1.0 200 OK
Content-Length: 5
Hello!
and also exited without an error.
So, at first I couldn't understand what's your problem, but then I tried to send a smaller request:
printf "1\n2\n3\n4\n5\n6\n" | nc localhost 5002
The server printed first 6 lines and didn't exit. The client also didn't exit, so I interrupted it with Ctrl-C and after that the server exited with "resource vanished" error.
I took some thinking and it started making sense to me. I don't understand lazy IO too good, so if my explanation isn't clear or correct it would be helpful if someone with better understanding would improve it.
Let's follow your code. First:
(h, _, _) <- accept socket
c <- hGetContents h
You open a handle and read it's content. Note that the handle is lazy and the content that you get is also lazy. When we say that something is lazy we mean that it can be passed around without being evaluated (it's often referred as 'call by name' vs 'call by value').
Now:
putStrLn $ unlines $ take 10 $ lines c
Here it is, you pass your lazy, unevaluated content to another function take 10. take 10 will try to evaluate first 10 elements of a list and return them, if there are less than 10 elements in the list it would simply return all of them. After take 10 we have putStrLn and unlines which both perfectly compatible with laziness.
Now let's say that client sends an input that is only 6 lines long and then starts waiting for the respond. Our server lazily receives the content and tries to print first 10 lines. First, take 10 function happily consumes the first 6 lines and passes them over to putStrLn . unlines, what happens then? take 10 can't just finish it's output because there is absolutely no indication that it is the end. The handle is still open, bytes still can be floating from client to server, so it just waits for more input.
This behaviour can be observed by running:
nc localhost 5002
and manually typing there 10 lines. The input would appear on server line-by-line as you type. After you will type the 10th line the server will respond with "Hello" message.
P.S: I guess that the behaviour that you described happens because you web browser sends 6 to 9 lines of something with the request.
To test, debug and analyze this kind of low level servers you should use simple tools like nc and curl instead of your web browser :)
When you initiate a lazy read on a handle, you give up the right to do anything much else with the handle until the contents string is fully forced, or you close the handle manually (at which point attempting to force any more of the contents string will lead to bad behavior or an error).
TL;DR
This is not a situation where lazy I/O is appropriate. The situations where a lazy read on a socket is appropriate can probably be counted on zero fingers. You can use regular strict I/O if you like, or conduit, or pipes, or some Haskell web framework like Yesod or Scotty or various other competitors.
Calling hGetContents puts the handle into a "semi-closed" state. You should not perform any operations on the handle after that point. You should only use the string returned from hGetContents.
Put simply, don't use lazy I/O here. You need to manually read and write individual strings one at a time, since the timing matters.
In general, lazy I/O is kind of neat, but it doesn't work well for anything much beyond toy examples.

File date metadata not displaying properly

I have been trying to write a Powershell script based on some code online that will read the metadata info from picture, video and other files and then sort them based on one of the dates (date taken currently seems to be the best bet if it's available, but date modified works on files that have not yet been altered).
However, when I run the script and pull the info, I can't convert the string to a date. Here's roughly how I get the info through a COM object:
PS C:/> $objShell = New-Object -ComObject Shell.Application
PS C:/> $objFolder = $objShell.namespace("C:\MyFolder")
PS C:/> $date = $objFolder.GetDetailsOf($objFolder.Items().Item(0), 12)
PS C:/> $date
7/‎10/‎2014 ‏‎7:09 PM
The problem is I should be able to convert this to a datetime object. For instance, if I manually write it in it works:
PS C:/> [datetime]::ParseExact("7/10/2014 7:09 PM","g",$null)
Thursday, July 10, 2014 7:09:00 PM
But if I substitute the variable it doesn't work:
PS C:/> [datetime]::ParseExact($date,"g",$null)
Exception calling "ParseExact" with "3" argument(s): "String was not recognized as a valid DateTime."
At line:1 char:1
+ [datetime]::ParseExact($date,"g",$null)
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (:) [], MethodInvocationException
+ FullyQualifiedErrorId : FormatException
This is most likely due to the fact that the variable isn't actually what I'm seeing. It's in fact longer. Not to mention the fact that if you iterate through all the characters, you can see where the extra length is coming from:
PS C:\> $date.Length #should be about 16, you'd think
22
PS C:\> $datearray = #()
PS C:\> for ($i = 0; $i -lt $date.Length; $i++) {$datearray += $date[$i]}
PS C:\> $datearray #i'm printing on one line and in quotes for ease of viewing
" 7/ 10/ 2014 7:09 PM"
If you try printing the array with a join or something similar, the results are (to me, without knowing what's going on) unpredictable. It treats it like it has 22 characters, but prints ignoring the spaces.
I'm sure I could spend a bit of time and do some string formatting, but I'd rather just be able to parse the given date. What's going on?
Edit: I'm able to access the file info easily, though I prefer not to. I'm mainly focusing on why the results I'm seeing are inconstant and showing a length that doesn't match how it prints out, and how I can deal with them. If nothing else, I'm curious as to what is going on.
I wanted to know what the extra characters were ( you dont mention already looking at this. ). I Updated your array code $datearray += $date[$i] to $datearray += [int][char]$date[$i]. The truncated output showed two oddities 8207 and 8206 which translate to left-to-right mark and right-to-left mark. They are normally associated with html. Unfortunately i cannot provide insight to their presense. Good news is that they are easy to remove.
$date = ($date -replace [char]8206) -replace [char]8207
[datetime]::ParseExact($date,"g",$null)
Which nets the output
Thursday, March 26, 2009 1:43:00 PM
Hopefully this is a little bit closer of what you wanted. I tried searching for reasons for the presence of those ascii codes but i didn't find anything useful.
Extra Information for other readers
I did this since i didnt know what the index 12 was. So i made an array that contains the friendly names of all the file meta data possible (288 entries!). I have the here-string located here for brevity.
With this i was testing the following code against a picture of mine.
$objShell = New-Object -ComObject Shell.Application
$objFolder = $objShell.namespace("C:\Temp\")
0..$Meta.GetUpperBound(0)| %{
$metaValue = $objFolder.GetDetailsOf($objFolder.Items().Item(2), $_)
If ($metaValue) {Write-Host "$_ - $($meta[$_]) - $metaValue"}
}
$Meta is the array i spoke of earlier.
The code will cycle though all the details of my file, indicated by Item(2), writing to screen all file details that contain values. In the end there is a line converting the string to a date value. Script output below
0 - Name - IMG_0571.JPG
1 - Size - 3.12 MB
2 - Item type - JPEG image
3 - Date modified - 3/26/2009 3:34 PM
4 - Date created - 8/24/2014 5:19 PM
5 - Date accessed - 8/24/2014 5:19 PM
6 - Attributes - A
9 - Perceived type - Image
10 - Owner - TE_ST\Cameron
11 - Kind - Picture
12 - Date taken - ‎3/‎26/‎2009 ‏‎1:43 PM
19 - Rating - Unrated
30 - Camera model - Canon EOS DIGITAL REBEL XS
31 - Dimensions - ‪3888 x 2592‬
32 - Camera maker - Canon
53 - Computer - TE_ST (this computer)
155 - Filename - IMG_0571.JPG
160 - Bit depth - 24
161 - Horizontal resolution - ‎72 dpi
162 - Width - ‎3888 pixels
....output truncated....
247 - Program mode - Normal program
250 - White balance - Auto
269 - Sharing status - Not shared
When I run your code, $objFolder.GetDetailsOf($objFolder.Items().Item(0), 12) I get an empty string. I even changed the item number and the folder I was looking in to make sure I was getting a file object.
However if I do this:
$objShell = New-Object -ComObject Shell.Application
$objFolder = $objShell.namespace("C:\Temp")
$date = $objFolder.Items().Item(3).ModifyDate
I get a value that is already a DateTime object.
(the code above uses my folder and item index)

libxively C API frequently does nothing

I'm trying to use libxively to update my feed, but it frequently seems to do nothing. I've got a basic call:
{
xi_datastream_t& ds = mXIFeed.datastreams[2];
::xi_str_copy_untiln(ds.datastream_id, sizeof (ds.datastream_id), "cc-output-power", '\0');
xi_datapoint_t& dp = ds.datapoints[0];
ds.datapoint_count = 1;
::xi_set_value_f32(&dp, mChargeController->outputPower());
}
const xi_context_t* ctx = ::xi_nob_feed_update(mXIContext, &mXIFeed);
it logs the following:
[io/posix/posix_io_layer.c:182 (posix_io_layer_init)] [posix_io_layer_init]
[io/posix/posix_io_layer.c:191 (posix_io_layer_init)] Creating socket...
[io/posix/posix_io_layer.c:202 (posix_io_layer_init)] Socket creation [ok]
Once or twice I saw my Xively developer page show a GET feed, but otherwise, nothing seems to get written. Any suggestions on what I should look at?
I tried to rebuild the library using blocking calls (would be nice if nob didn't mean no blocking calls), but I couldn't figure out how to build it.
Thanks!
EDIT:
I was able to build a synchronous version of the library, and that seems to work. Can anyone verify that the async version works? Is there more to it than simply calling xi_nob_feed_update()?
EDIT 2:
I tried running the async example, but I'm doing something wrong, as it always complains of no data received:
$ bin/asynch_feed_update <my key> <my feed ID> example 1 example 4 example 20 example 58 example 11 example 17
example: 1 7
example: 4 7
example: 20 7
example: 58 7
example: 11 7
example: 17 7
[io/posix_asynch/posix_asynch_io_layer.c:165 (posix_asynch_io_layer_init)] [posix_io_layer_init]
[io/posix_asynch/posix_asynch_io_layer.c:174 (posix_asynch_io_layer_init)] Creating socket...
[io/posix_asynch/posix_asynch_io_layer.c:185 (posix_asynch_io_layer_init)] Setting socket non blocking behaviour...
[io/posix_asynch/posix_asynch_io_layer.c:203 (posix_asynch_io_layer_init)] Socket creation [ok]
No data within five seconds.
The asynchronous version should work. The xi_nob_feed_update() is the right function to make a feed update request.
You have to call process_xively_nob_step() in a loop just after select().
In general, you should follow the asynchronous example.

Prefix Sum with global memory and an error with local memory

I have a Mali GPU which does not support local memory at all.
Everytime I run code consisting of local memory it gives me some errors from the device.
So, I want to transfer my codes to a version that only uses global memory.
I was thinking if it is possible to run a prefix sum/parallel reduction algorithm using global memory only on GPU.
EDITED : I was debugging the error and found a strange thing that one particular line is giving the erorr.
I have e line like this:
`#define LOG_LSIZE 8`
`#define LSIZE_SHIFT_VALUE 4`
`#define LOG_NUM_BANKS 2`
`#define GET_CONFLICT_OFFSET(lid) ((lid) >> LOG_NUM_BANKS)`
`#define LSIZE 32`
`__local int lm_sum[2][LSIZE + LOG_LSIZE]`
`**lm_sum[lid >> LSIZE_SHIFT_VALUE][bi] += lm_sum[lid >> LSIZE_SHIFT_VALUE][ai]**`
lid is local id and I used qork groups size 32. I found that the highlighted line is the cause of the error. I tried using fixed values and found that I cannot use lm_sum on the right side of a statement. If I do, that gives me an error. For example, this line also gives me error:
int temp= lm_sum[0][0]
Any idea on what is going on?
Error:
`In initial.cpp***[14100.684249] Mali<ERROR, BASE_MMU>: In file: /home/jbmaster/work/01.LPD_OpenCL_RFS/01.arm_work_3_0_31/SEC_All_EVT0_TX013-BU-00001-r2p0-00rel0/TX013-BU-00001-r2p0-00rel0/driver/product/kernel/drivers/gpu/arm/t6xx/kbase/src/common/mali_kbase_mmu.c line: 1240 function:kbase_mmu_report_fault_and_kill
[14100.709724] Unhandled Page fault in AS0 at VA 0x00000002000EC1A0
[14100.709728] raw fault status 0x500003C3
[14100.709730] decoded fault status: SLAVE FAULT
[14100.709733] exception type 0xC3: TRANSLATION_FAULT
[14100.709736] access type 0x3: WRITE
[14100.709738] source id 0x5000
[14100.734958]
[14100.736432] Mali<ERROR, BASE_JD>: In file: /home/jbmaster/work/01.LPD_OpenCL_RFS/01.arm_work_3_0_31/SEC_All_EVT0_TX013-BU-00001-r2p0-00rel0/TX013-BU-00001-r2p0-00rel0/driver/product/kernel/drivers/gpu/arm/t6xx/kbase/src/common/mali_kbase_jm.c line: 899 function:kbase_job_slot_hardstop
[14100.761458] Issueing GPU soft-reset instead of hard stopping job due to a hardware issue
[14100.769517] `
Since lm_sum[0][0] doesn't work, the memory for the array is not allocated. You said your GPU doesn't support local memory. Well, you are trying to use lm_sum which is declared to be in local memory (__local int lm_sum[2][LSIZE + LOG_LSIZE]).

Resources