Will high-frequency calling file write function damage the hard disk? [closed] - qt

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
I want to write the received data from serial port to the hard disk in real time.
So is it appropriate to call the file.write() function at a high frequency(e.g. 100Hz or higher)?
Will it damage the hard disk or reduce my software performance or any other problem?
Can you recommond a good method to save data in real time if my idea is bad?
below is my function to write data to file
int32_t MainWindow::appendDataToFile(QString path, QByteArray &buff)
{
if(path.isEmpty())
return -1;
QFile file(path);
QTextStream stream(&file);
if(file.open(QFile::WriteOnly|QFile::Append)){
stream<<buff;
file.close();
return 0;
}
return -1;
}

So is it appropriate to call the file.write() function at a high
frequency(e.g. 100Hz or higher)?
Yes, you can call the function frequently, because the actual write to disk does not happen on every call.
But calls file.open() and file.close() frequently is definitely a bad idea. Among other things, file.close() will cause a physical write to disk every time in most cases.
Will it damage the hard disk or reduce my software performance or any
other problem?
Frequently calls file.write() itself will not damage hard disk because it buffers data automatically. But note that each hard drive is designed for a certain number of rewriting cycles. Also, different models of hard drives have different hardware cache sizes. For regular recording of large amounts of data, there are special server models of hard drives.
As for any other problems: The writing may freeze you GUI in case you call it in the main thread. Therefore probably you need to move you writing code to additional thread. Read Qt Threading Basics
Can you recommend a good method to save data in real time if my idea
is bad?
We don't know how much data and how long you actually want to write.
But in general:
Use database. For example, Qt SQL module. Databases have special functions for managing hard writing disk policy, for example PRAGMA schema.synchronous for SQLite.
You can use file.write(), but call file.flush() periodically, it will actually write data to the file. See https://doc.qt.io/qt-5/qfiledevice.html#flush .
Generally speaking, you can write:
for (int i=0; i<10000; i++)
{
// write on every call
file.write(data);
if (i % 100 == 0)
{
// flush every 100th call only
file.flush();
}
}
If you are facing the problem of keeping your hard drive safe with large data recording amounts, you should use tools specially designed for this, such as databases.

Related

Parallel I/O read files from disk in R

I happen to have a large number of files to read and process in R (~20000 files, total ~40gb)
I was thinking about paralellizing the read; yet one phylosophical question comes to my mind regarding parallelizing. Perhaps the question is just wrong and my wording is not correct since I am not expert on the subject, so please correct me where I am wrong : Even with parallelizing, the disk reader pin still needs to access the file sequentially (there is only one reader head traversing the disk). We are parallelizing the cpu process, but would at any point the fact that mechanical reading becomes a hinder on the CPU parallelizing ? Will separating the files into clusters help the read, as we are trying to parallelize the physical read as well ?

is this good to have pointers in programming languages such as golang,C or C++? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
Most of the modern programming compilers such as JAVA etc does not have the support of pointers.
But in golang google introduce pointers again.
So, i just want to understand how pointer effects a programming language?
is there any kind of security thread because of pointers?
if this is because of security then why we have world's most secured system on LINUX and UNIX(both are build in C)
Technically, all languages use pointers. When you create an instance of an object in Java or C# or Javascript and pass it to a method you are actually passing a pointer to the piece of memory that contains that object by which you will manipulate the data on that piece of memory. Imagine a language where you could not pass by reference. You wouldn't be able to do much, now would you? Whenever you pass by reference in any language, you're - in the most basic of terms - using glorified pointers.
What you probably mean, however, is "why expose pointers to people; they are so much more complicated", or something of that sort... And the answer is probably speed and tradition. Some of the fastest languages we have are C and C++... They've been around for a relatively long time. They are tried and true, and people know how to use them. Most software is based off of them in one way or another. Most operating systems are written in C or some variation thereof. Even other programming languages.
As for your Go example, we have already had a question on that..
C/C++ pointer is operational.
Example
void f(int* a) {
a++
Direct operation is danger.
But, Golang pointer is not operational.
So, same name and same mean "pointer".
But, there are difference how to use.
The 'modern' comparison of JAVA and C# to C++ is the worst thing a programmer can make. JAVA and C# are managed languages and that means that the memory is not managed by the programmer at all (that is the main function of pointers).C++ is an unmanaged language and that is why C++ is so much faster than any managed language. Almost every modern PC game you will ever see is made using C++ because it runs faster than any managed language.
Pointers makes call-by-reference easier, but are more vulnerable to breach, because through pointers we can directly access to the memory location, and thus can be a security concern.
Those problems can be defensively coded to prevent but that requires users to be knowledgeable and diligent.

How easy is it to fake asynchronicity? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
Clearly I don't understand the big deal about "asynchronous" environments (such as NodeJS) versus "synchronous" ones.
Let's say you're trapped in a synchronous environment. Can't your main loop just say:
while(1) {
events << check_for_stuff_from_the_outside_world();
for e in events {e.process()}
}
What's wrong with doing that, how is that not an asynchronous environment, how are asynchronous environments different?
Yes, this is more or less what Node.js does, except that instead of check_for_stuff_from_the_outside_world(), it should really be check_for_stuff_from_the_outside_world_plus_follow_on_stuff_from_previous_events(); and all of your events must also be written in such a way that, instead of completing their processing, they simply do a chunk of their work and then call register_stuff_for_follow_up(follow_on_event). In other words, you actually have to write all of your code to interact with this event framework; it can't be done "transparently", with only the main loop having to worry about it.
That's a big part of why Node.js is JavaScript; most languages have pre-existing standard libraries (for I/O and so on) that aren't built on top of asynchronous frameworks. JavaScript is relatively unusual in expecting each hosting environment to supply a library that's appropriate for its own purposes (e.g., the "standard library" of browser JS might have almost nothing in common with the "standard library" of a command-line JS environment such as SpiderMonkey), which gave Node.js the flexibility to design libraries that worked together with its event loop.
Take a look at the example on the Wikipedia page:
https://en.wikipedia.org/wiki/Nodejs#Examples
Notice how the code is really focused on the functionality of the server - what it should do. Node.js basically says, "give me a funciton for what you want to do when stuff arrives from the network, and we'll call it when stuff arrives from the network" so you're relieved of having to write all the code to deal with managing network connections, etc.
If you've ever written network code by hand, you know that you end up writing the same stuff over and over again, but it's also non-trivial code (in both size and complexity) if you're trying to make it professional quality, robust, highly performant, and scalable... (This is the hidden complexity of check_for_stuff_from_the_outside_world() that everyone keeps refering to.) So Node.js takes the responsibility for doing all of that for you (including hadling the HTTP protocol, if you're using HTTP) and you only need to write your server logic.
So it's not that asynchronous is better, per se. It just hapens to be the natural model to fit the functionality they're providing.
You'll see the asynchronous model come up in a lot of other places too: event-based programming (which is used in a lot of GUI stuff), RPC servers (e.g., Thrift), REST servers, just to name a few... and of course, asynchronous I/O. ;)

Double paging definition

This is not a programming question but more of an operating system question
Right now I'm trying to learn what exactly Double paging means.
I see two different terms, double paging on disk and double paging in memory.
Apparently this problem arises when we introduce a buffer cache to store disk blocks when doing File I/O
But I'm not really sure what exactly this term means. If anybody could specify it would be very helpful.
It's a problem that occurs when you have a system that is running in a very high-memory utilization state where much of the physical memory is owned by a critical OS resource (like the kernel) and therefore, can't be swapped out by the usual means. It's a fairly common problem to have to dodge in virtualizing OS instances. There's a brief blurb on it here:
http://www.usenix.org/events/osdi02/tech/waldspurger/waldspurger_html/node5.html
What is the specific context of your question?

What are some ways to optimize your use of ASP.NET caching? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
I have been doing some reading on this subject, but I'm curious to see what the best ways are to optimize your use of the ASP.NET cache and what some of the tips are in regards to how to determine what should and should not go in the cache. Also, are there any rules of thumb for determining how long something should say in the cache?
Some rules of thumb
Think in terms of cache miss to request ratio each time you contemplate using the cache. If cache requests for the item will miss most of the time then the benefits may not outweigh the cost of maintaining that cache item
Contemplate the query expense vs cache retrieval expense (e.g. for simple reads, SQL Server is often faster than distributed cache due to serialization costs)
Some tricks
gzip strings before sticking them in cache. Effectively expands the cache and reduces network traffic in a distributed cache situation
If you're worried about how long to cache aggregates (e.g. counts) consider having non-expiring (or long-lived) cached aggregates and pro-actively updating those when changing the underlying data. This is a controversial technique and you should really consider your request/invalidation ratio before proceeding but in some cases the benefits can be worth it (e.g. SO rep for each user might be a good candidate depending on implementation details, number of unanswered SO questions would probably be a poor candidate)
Don't implement caching yet.
Put it off until you've exhausted all the Indexing, query tuning, page simplification, and other more pedestrian means of boosting performance. If you flip caching on before it's the last resort, you're going to have a much harder time figuring out where the performance bottlenecks really live.
And, of course, if you have the backend tuned right when you finally do turn on caching, it will work a lot better for a lot longer than it would if you did it today.
The best quote i've heard about performance tuning and caching is that it's an art not a science, sorry can't remember who said it but the point here is that there are so many factors that can have an effect on the performance of your app that you need to evaluate each situation case by case and make considered tweaks to that case until you reach a desired outcome.
I realise i'm not giving any specifics here but I don't really think you can
I will give one previous example though. I worked on an app that made alot of calls to webservices to built up a client profile e.g.
GET client
GET client quotes
GET client quote
Each object returned by the webservice contributed to a higher level object that was then used to build the resulting page. At first we gathered up all the objects into the master object and cached that. However we realised when things were not as quick as we would like that it would make more sense to cache each called object individually, this way it could be re-used on the next page the client sees e.g.
[Cache] client
[Cache] client quotes
[Cache] client quote
GET client quote upgrades
Unfortunately there is no pre-established rules...but to give you a common sense, I would say that you can easily cache:
Application Parameters (list of countries, phone codes, etc...)
Any other application non-volatile data (list of roles even if configurable)
Business data that is often read and does not change much (or not a big deal if it is not 100% accurate)
What you should not cache:
Volatile data that change frequently (usually the business data)
As for the cache duration, I tend to use different durations depending on the type of data and its size. Application Parameters can be cached for several hours or even days.
For some business data, you may want to have smaller cache duration (minutes to 1h)
One last thing is always to challenge the amount of data you manipulate. Remember that the end-user won't read thousands of records at the same time.
Hope this will give you some guidance.
It's very hard to generalize this sort of thing. The only hard-and-fast rule to follow is not to waste time optimizing something unless you know it needs to be done. Then the proper course of action is going to be very much dependent on the nitty gritty details of your application.
That said... I'll almost always cache global applications parameters in some easy to use object. This is certainly more of a programming convenience rather than optimization.
The one time I've written specific data caching code was for an app that interfaced with a very slow accounting database, and then it was read-only for data that didn't change very often. All writes went to the DB. With SQL Server, I've never run into a situation where the built-in ASP.NET-to-SQL Server interface was the slow part of the equation.

Resources