Write system call and blocking the process - unix

In UNIX: read system call blocks the process until it is done.
How does write system call behaves? does it block the process when it is writing on the disk?
With write system call I mean write(fd, bf, nbyte) procedure call.

No, it only blocks the process until the content of the buffer is copied to kernel space. This is usually very short time, but there are some cases where it may wait for some disk operations:
If there are no free pages, some must be freed. If there are clean pages, their content can be discarded (as it is just copy from disk), but if there are not, some pages must be laundered, which involves write. Since pages are laundered automatically after few seconds, this almost never happens if you have enough memory.
If the write is to the middle of the file, the surrounding content may need to be read, because page cache has page granularity (aligned 4 KiB blocks on most platforms). This happens rarely because it is rare to update file without reading it and if you read it first, the content is cached already.
If you want to wait until the data actually hit the plates, you need to follow up with fsync(2).

Related

Why Json.Net on deserialize fisrt time cause a lot allocation

A 16kb file deserialized first time allocate about 3.6M memory~~
and the second ~only allocate 50kb memory,I know it cache the reflection infos , But How could I realse the memory by manual?
I want to know how to control the GC used in Unity3d, help~~~
First:
Second:
Unity uses Automatic Memory Management. In most cases, you don't need to manually collect garbage.
You should call GC.Collect only when you are absolutely sure it's the "right" time. You definitely don't want this process to freeze your game character.
To quote Unity on this topic:
If we know that heap memory has been allocated but is no longer used
(for example, if our code has generated garbage when loading assets)
and we know that a garbage collection freeze won’t affect the player
(for example, while the loading screen is still showing), we can
request garbage collection
You can read more on this Unity Page.

how does msync() work?

I use mmap to map file F to block B, and then I only write one byte of B.
If I call msync() for B with MS_SYNC, does the OS write all the block to F? Or it only writes the one byte modified to F?
This is OS- and architecture-specific, but most likely only the dirty page will be written to disk.
What does the man page on your particular system say? If it's not open source, that's about the best you have to go on, unless you have can find more detailed documentation for your UNIX platform.
On at least one system, man msync says:
The msync() system call writes modified whole pages back to the
filesystem and updates the file modification time. Only those pages
containing addr and len-1 succeeding locations will be examined.

If I perform a write on an SSD that only changes 0s to 1s, can I rely on the drive not to erase the entire block before writing?

It is my understanding that it is the erases that wear out SSDs, not the writes themselves. Therefore, optimizing away the need for erases would be hugely beneficial from the point of view of drive manufacturers. Can I take it as a given that they do this?
I'd like you to assume that I'm writing directly to the disk and that there isn't a filesystem to mess things up.
If there are empty pages in a block and the SSD wants to write to those pages it will not erase the block first. A SSD will only erase a block before a write if it cannot find any empty pages to write to because doing a full read-erase-write is very slow.
Besides, the wear-out from writing and erasing is about the same. Both involve pulling electrons through the oxide layer, just in different direction.
Also, the erased state for NAND is all 1. You then write 1 to 0. You need to erase the 0 to get it back to a 1.
Unless I'm reading your question wrong I think you misunderstand how SSDs work.
SSDs are made up of large blocks (usually 512k), which are much larger than we are used to in a filesystem (usually 4k).
The Erase pass is necessary before anything can be written to the block unless the block is already empty.
So the problem with erases wearing out the disk is that if 4k of your 512k block is used, you must erase the whole 512k block and write the original 4k + anything else you are adding. This creates excessive wear and slows things down as instead of one "write" you need a "read-wipe-write" (known as "write amplification").
This is simplifying it a bit as the drive firmware does a lot of clever things to try and make sure the blocks are optimally filled e.g. it tries to keep lots of empty blocks to avoid slow writes.
Hope that helps/didn't confuse things further!
In the case that the SSD are using read-erase-write, it first read the content of the block, then erase it and write the new values. So it erase the entire block before writing, but it has saved the content for the next write operation. in some SSDs which are not read-erase-write, when you are writing, it will write on the new page (may be on the same block, then invalid the previous page). In this case, it erase the block only after making sure that every pages in the block are invalid or has been copied to another place.

How to automatically update time in cics

I have two questions first is the main one.
1. I was able to display date in a cics map but what i need is, i want it to be ticking i.e., it should be display everysecond updated.
2. I have a COBOL-DB2 program which automatically inserts the data from database(DB2) to a file. I want this program to be called on a timestamp basis i.e., every 1hr, 2hr, or every day.
Thank you
You can do this, but you will need to change modify traditional psuedo-conversationl approach. Instead of returning and waiting for a user event, you can start your tran after some number of seconds with your current commarea and quit. If a user event occurs in that time, you can cancel your start request, if it doesn't, you can refresh the screen timestamp and repeat.
It is kinda a pain just to get a timestamp refreshed. Doesn't make much sense to bother with unless you have a really good reason.
The DB2 stuff is plain easy. Start your tran using interval control, the same START AFTER() described above, and you can have it run hourly, or bihourly, or whatever.
I don't think that you need to modify your pseudo-conversational approach to achieve what you need. Just issue a EXEC CICS START command with a one second delay (just do this once) for a small program that just issues a Send Map (or TC Write) to the terminal facility. Ideally reserve a common area on the screen so all transactions can use a common program. At some point, when the updates are no longer required, CANCEL the START request.The way I see it, the timer update transaction will mix in nicely with you user-initiated transaction flow. If a user transaction is active when the start timer pops, the timer update program will just be delayed a little.
While this should work, you need to bear in mind that you might be driving 3,600 transactions per hour for each user. Is this feature really worth all that?
This is not possible in standard CICS using maps. The 3270 protocol does not lend itself to continually updating screens. The majority of automatic updating screens such as consoles and monitoring displays use native VTAM methods, building their own data streams.
It might be possible to do this using unformatted data, but I would not recommend it in CICS. Pseudo-conversational CICS does not have a program in control during screen display, and conversational programming is highly discouraged.
You can't really do this in CICS, which was designed for pseudo-interactive responses at best. It was designed for use on mainframes where your terminal was sent a whole page or screen, the program read the screen as received (which has some fields the user would update and if you didn't change them the terminal did not send the data back) then, the CICS transaction having taken a part of a screen containing changes, sends the response back and quits.
This makes for very efficient data entry and inquiry programs. But realize, when the program has finished processing the screen, it's quit, it's gone, and it's not even in memory any more, all the resources have been reclaimed. This allows the company to run a mainframe with 300 terminals and maybe 10 megabytes of real memory, because when the program is waiting for you to respond, it's not using any resources at all, if there are 200 people running a data entry program, they are running a re-entrant program in which all 200 of them are running the same copy of the same program and the only thing they're using is maybe 1K of writable storage per user for the part that has to read a screen or a file record and do some calculations. Think about that, 200 people are running the same program and all of them, simultaneously, are using one module that uses 20K of memory for the application - and it's the same 20K for every single one of them - and 1K each of actual read/write data.
Think about that for a moment, the first user to start that data entry program uses 20K of memory for the application, plus 1K for the writable data. Each user after that who is being processed on that program uses an additional 1K of memory, that's all. When they're sitting there looking at the terminal, all they might be using is 4 bytes in a table to tell the system there's a terminal connected. No resources are used at all.
To be able to have a screen updated on a regular basis means that something has to keep running, which is not something CICS does very well. CICS is not intended to be used for interactive processing the way a PC does because you're actually running live on the PC.
EXEC CICS ASK TIME END-EXEC to update the timestamp.
EXEC CICS SEND MAP DATA ONLY END-EXEC to update the screen.
However, using the suggested
EXEC CICS START TRANSID ('name' | namefld)
DELAY (time)
END-EXEC.
is actually the better way.

implementing a download manager that supports resuming

I intend on writing a small download manager in C++ that supports resuming (and multiple connections per download).
From the info I gathered so far, when sending the http request I need to add a header field with a key of "Range" and the value "bytes=startoff-endoff". Then the server returns a http response with the data between those offsets.
So roughly what I have in mind is to split the file to the number of allowed connections per file and send a http request per splitted part with the appropriate "Range". So if I have a 4mb file and 4 allowed connections, I'd split the file to 4 and have 4 http requests going, each with the appropriate "Range" field. Implementing the resume feature would involve remembering which offsets are already downloaded and simply not request those.
Is this the right way to do this?
What if the web server doesn't support resuming? (my guess is it will ignore the "Range" and just send the entire file)
When sending the http requests, should I specify in the range the entire splitted size? Or maybe ask smaller pieces, say 1024k per request?
When reading the data, should I write it immediately to the file or do some kind of buffering? I guess it could be wasteful to write small chunks.
Should I use a memory mapped file? If I remember correctly, it's recommended for frequent reads rather than writes (I could be wrong). Is it memory wise? What if I have several downloads simultaneously?
If I'm not using a memory mapped file, should I open the file per allowed connection? Or when needing to write to the file simply seek? (if I did use a memory mapped file this would be really easy, since I could simply have several pointers).
Note: I'll probably be using Qt, but this is a general question so I left code out of it.
Regarding the request/response:
for a Range-d request, you could get three different responses:
206 Partial Content - resuming supported and possible; check Content-Range header for size/range of response
200 OK - byte ranges ("resuming") not supported, whole resource ("file") follows
416 Requested Range Not Satisfiable - incorrect range (past EOF etc.)
Content-Range usu. looks like this: Content-Range: bytes 21010-47000/47022, that is bytes start-end/total.
Check the HTTP spec for details, esp. sections 14.5, 14.16 and 14.35
I am not an expert on C++, however, I had once done a .net application which needed similar functionality (download scheduling, resume support, prioritizing downloads)
i used microsoft bits (Background Intelligent Transfer Service) component - which has been developed in c. windows update uses BITS too. I went for this solution because I don't think I am a good enough a programmer to write something of this level myself ;-)
Although I am not sure if you can get the code of BITS - I do think you should just have a look at its documentation which might help you understand how they implemented it, the architecture, interfaces, etc.
Here it is - http://msdn.microsoft.com/en-us/library/aa362708(VS.85).aspx
I can't answer all your questions, but here is my take on two of them.
Chunk size
There are two things you should consider about chunk size:
The smaller they are the more overhead you get form sending the HTTP request.
With larger chunks you run the risk of re-downloading the same data twice, if one download fails.
I'd recommend you go with smaller chunks of data. You'll have to do some test to see what size is best for your purpose though.
In memory vs. files
You should write the data chunks to in memory buffer, and then when it is full write it to the disk. If you are going to download large files, it can be troublesome for your users, if they run out of RAM. If I remember correctly the IIS stores requests smaller than 256kb in memory, anything larger will be written to the disk, you may want to consider a simmilar approach.
Besides keeping track of what were the offsets marking the beginning of your segments and each segment length (unless you want to compute that upon resume, which would involve sort the offset list and calculate the distance between two of them) you will want to check the Accept-Ranges header of the HTTP response sent by the server to make sure it supports the usage of the Range header. The best way to specify the range is "Range: bytes=START_BYTE-END_BYTE" and the range you request includes both START_BYTE and byte END_BYTE, thus consisting of (END_BYTE-START_BYTE)+1 bytes.
Requesting micro chunks is something I'd advise against as you might be blacklisted by a firewall rule to block HTTP flood. In general, I'd suggest you don't make chunks smaller than 1MB and don't make more than 10 chunks.
Depending on what control you plan to have on your download, if you've got socket-level control you can consider writing only once every 32K at least, or writing data asynchronously.
I couldn't comment on the MMF idea, but if the downloaded file is large that's not going to be a good idea as you'll eat up a lot of RAM and eventually even cause the system to swap, which is not efficient.
About handling the chunks, you could just create several files - one per segment, optionally preallocate the disk space filling up the file with as many \x00 as the size of the chunk (preallocating might save you sometime while you write during the download, but will make starting the download slower), and then finally just write all of the chunks sequentially into the final file.
One thing you should beware of is that several servers have a max. concurrent connections limit, and you don't get to know it in advance, so you should be prepared to handle http errors/timeouts and to change the size of the chunks or to create a queue of the chunks in case you created more chunks than max. connections.
Not really an answer to the original questions, but another thing worth mentioning is that a resumable downloader should also check the last modified date on a resource before trying to grab the next chunk of something that may have changed.
It seems to me you would want to limit the size per download chunk. Large chunks could force you to repeat download of data if the connection aborted close to the end of the data part. Specially an issue with slower connections.
for the pause resume support look at this simple example
Simple download manager in Qt with puase/ resume support

Resources