I'm seeing that the individual slice time information from the Private_0019_1029 field of the DICOM header has negative values and sometime only positive values.
I assumed that these times are with respect to the Volume Acquisition time recorded in the header.
Going by that assumption, it would mean that the Acquisition time varies. But upon checking the difference between successive volume acquisition times, I see that it's equal to TR.
So I'm at a loss about what's happening.
I'm trying to look at the raw fMRI data without slice time correction; hence it's necessary to have the individual slice times.
Does the moco series do time shifting in addition to motion correction? (I don't believe it used to, but your experience may show otherwise).
This indicates how their slice timing is measured. Try the computations with the raw and the moco series and see if the times line up. That may give you your answer.
When dealing with private tag, you should really include the Private Vendor ID, in your case the value of tag (0019,0010).
You may also want to have a look at the output of:
gdcmdump --csa input.dcm
This will dump the SIEMENS CSA header directly from the DICOM attribute.
Related
I got a power consumption sensor (kWh) sending data to my TSI Gen2 environment, and it is malfunctioning in a way that it is losing its accumulated measuremente value when it is shut down. I need to create a new aggregate/variable that would "stack" the measurements , never letting it drop to zero, but always adding to the last greatest value.
I thought about creating a dataset with values from differences from right to left over a fixed timespan, if positive, and then I could create a SUM aggregation over the bucket period on top of it. I am clueless on how to do such thing based on the poor official documentation provided by Microsoft. Any Ideas?
Here are a couple of pictures illustrating my problem and What I am trying to accomplish:
You probably need to add something in the middle (before the IoT Hub/Event Hub) to save the last state of the sensor, and do the appropriate sum if if detects the device was rebooted.
I am working on my bachelor's final project, which is about the comparison between Apache Spark Streaming and Apache Flink (only streaming) and I have just arrived to "Physical partitioning" in Flink's documentation. The matter is that in this documentation it doesn't explain well how this two transformations work. Directly from the documentation:
shuffle(): Partitions elements randomly according to a uniform distribution.
rebalance(): Partitions elements round-robin, creating equal load per partition. Useful for performance optimisation in the presence of data skew.
Source: https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/datastream_api.html#physical-partitioning
Both are automatically done, so what I understand is that they both redistribute equally (shuffle() > uniform distribution & rebalance() > round-robin) and randomly the data. Then I deduce that rebalance() distributes the data in a better way ("equal load per partitions") so the tasks have to process the same amount of data, but shuffle() may create bigger and smaller partitions. Then, in which cases might you prefer to use shuffle() than rebalance()?
The only thing that comes to my mind is that probably rebalance()requires some processing time so in some cases it might use more time to do the rebalancing than the time it will improve in the future transformations.
I have been looking for this and nobody has talked about this, only in a mailing list of Flink, but they don't explain how shuffle() works.
Thanks to Sneftel who has helped me to improve my question asking me things to let me rethink about what I wanted to ask; and to Till who answered quite well my question. :D
As the documentation states, shuffle will randomly distribute the data whereas rebalance will distribute the data in a round robin fashion. The latter is more efficient since you don't have to compute a random number. Moreover, depending on the randomness, you might end up with some kind of not so uniform distribution.
On the other hand, rebalance will always start sending the first element to the first channel. Thus, if you have only few elements (fewer elements than subtasks), then only some of the subtasks will receive elements, because you always start to send the first element to the first subtask. In the streaming case this should eventually not matter because you usually have an unbounded input stream.
The actual reason why both methods exist is a historically reason. shuffle was introduced first. In order to make the batch an streaming API more similar, rebalance was then introduced.
This statement by Flink is misleading:
Useful for performance optimisation in the presence of data skew.
Since it's used to describe rebalance, but not shuffle, it suggests it's the distinguishing factor. My understanding of it was that if some items are slow to process and some fast, the partitioner will use the next free channel to send the item to. But this is not the case, compare the code for rebalance and shuffle. The rebalance just adds to next channel regardless how busy it is.
// rebalance
nextChannelToSendTo = (nextChannelToSendTo + 1) % numberOfChannels;
// shuffle
nextChannelToSendTo = random.nextInt(numberOfChannels);
The statement can be also understood differently: the "load" doesn't mean actual processing time, just the number of items. If your original partitioning has skew (vastly different number of items in partitions), the operation will assign items to partitions uniformly. However in this case it applies to both operations.
My conclusion: shuffle and rebalance do the same thing, but rebalance does it slightly more efficiently. But the difference is so small that it's unlikely that you'll notice it, java.util.Random can generate 70m random numbers in a single thread on my machine.
I'm doing a cousework for a distributed sytems module, and within it I neef to apply a variable clock incrementor; my tutor has gone over both Lamport and Vector clocks, but said "I cant hint at that" when I asked him about applying a variable length/size per clock.
I wish I knew what to do,
Andy
I suppose you mean vector clocks of variable size?
This is technically not possible due to the way vector clocks are defined and used, however it brings the problem, that you would need to know about all nodes which will communicate together and use a vector clock right in the beginning. This way you wouldn’t be allowed to expand your service, and also if you tear down a node, to never start it again, the time for it would be still sent around and waste resources.
One of my professors in distributed systems mentioned, that Amazon is/was using “dynamic” vector clocks for some services, and they had an algorithm which automatically removed “old” entries from the vector clcoks. They supposesdly concluded something like, this worked so far fine. However I never saw the paper about this.
If I ask for this data:
https://graphite.it.daliaresearch.com/render?from=-2hours&until=now&target=my.key&format=json
I get, among other datapoints, this one:
[
2867588,
1398790800
]
If I ask for this data:
https://graphite.it.daliaresearch.com/render?from=-10hours&until=now&target=my.key&format=json
The datapoint looks like this:
[
null,
1398790800
]
Why this datapoint is being nullified when I choose a wider time range?
Update
I'm seeing that for a chosen date range smaller than 7 hours the resolution of the datapoints are every 10 seconds and when the date range chosen is 7 hours or bigger the the resolution goes to one datapoint every 1 minute.. and continue this diretion as the date range chosen is getting bigger to one datapoint every 10 minutes and so.
So when the resolution of the datapoints is every 10 seconds the data is there, when the resolution is every 1 minute or more, then the datapoint has not the value :/
I'm sending a data point every 1 hour, maybe it is a conflict with the resolutions configuration and me sending only one datapoint per hour
There are several things happening here, but basically the problem is that you have misconfigured graphite (or at least, configured it in a way that makes it do things that you aren't expecting!)
Specifically, you should set xFilesFactor = 0.0 in your storage-aggregation.conf file. Since you are new at this, you probably just want this (mine is in /opt/graphite/conf/storage-aggregation.conf):
[default]
pattern = .*
xFilesFactor = 0.0
aggregationMethod = average
The graphite docs describe xFilesFactor like this:
xFilesFactor should be a floating point number between 0 and 1, and specifies what fraction of the previous retention level’s slots must have non-null values in order to aggregate to a non-null value. The default is 0.5.
But wait! This wont change existing statistics! These aggregation settings are set once per metric at the time the metric is created. Since you are new at this, the easy way out is to just go to your whisper directory and delete the prior data and start over:
cd /opt/graphite/storage/whisper/my/
rm key.wsp
your root whisper directory may be different depending on platform, etc. After removing the data files graphite should recreate them automatically upon the next metric write and they should get your updated settings (dont forget to restart carbon-cache after changing your storage-aggregation settings).
Alternatively, if you need to keep your old data you will need to run whisper-resize.py against your whisper (.wsp) data files with --xFilesFactor=0.0 and also likely all of your retention settings from storage-schemas.conf (also viewable with whisper-info.py)
Finally, I should add that the reason you get non-null data in your first query, but null data in your second is because graphite will try to pick the best available retention period from which to serve your request based on the time window you requested. For the smaller window, graphite is deciding that it can serve your request using the highest precision data (i.e., non aggregated) and so you are seeing your raw metrics. For the longer time window, graphite is finding that the high precision, non-aggregated data is not available for the entire window -- these periods are configured in storage-schemas.conf -- so it skips to the next highest-precision data set available (i.e. first aggregation tier) and returns only aggregated data. Because your aggregation config is writing null data, you are therefore seeing null metrics! So fix the aggregation, and you should fix the null data problem. But remember that graphite never combines aggregation tiers in a single request/response, so anytime you see differences between results from the same query when all you are changing is the from / to params, the problem is pretty much always due to aggregation configs.
I'm not quite sure about your specific situation, but I think I can give you some general pointers.
First off, you are right about the changing resolution depending on the time range. This is configured in storage-schemas.conf and is done to save space when storing data over large periods of time. An example could be: 15s:7d,1m:21d,15m:5y, meaning 15 seconds resolution for 7 days, then 1 minute resolution for 21 days, then 15min for 5 years.
Then there is the way Graphite does the actual aggregation from one resolution to the other. This is configured in: storage-aggregation.conf. The default settings are: xFilesFactor=0.5 and aggregationMethod=average. The xFilesFactor setting is saying that a minimum of 50% of the slots in the previous retention level must have values for next retention level to contain an aggregate. The aggregationMethod is saying that all the values of the slots in the previous retention level will be combined by averaging. My guess is that your stat doesn't have enough data points to fulfill the 50% requirement, resulting in a null value.
For more information, check out the docs, they are pretty complete: http://graphite.readthedocs.org/en/latest/config-carbon.html
I have an international application that handles lengths and weights of people, and stores these in a database. I was wondering how to deal with this in case users can switch between using centimeters/inches in the application.
I was thinking to always use centimeters in the database, and convert to inches if the user chose to use inches. But of course, if the user enters a length in inches and it is converted to and stored as centimeters, the value may change slightly because of rounding errors.
How would you handle this scenario?
There is much to consider in your question beyond the information that is available. Before deciding how to store and convert the information, you must know what your acceptable error is. For instance if you are calculating trajectory to intercept an incoming missile with another missile, extremely minute precision is necessary to be successful. If this is a medical application and being used to precisely control medication formulation it could be more important to be precise than if you are simply calculating BMI.
In short, pick a standard whether metric or other and stick with that for your storage type. depending on the precision required, choose the smallest unit of the measurement system that will provide you the accuracy you need. All display of units of a different measurement system would be converted from this base measurement.
And try not to over-engineer the solution. If it will not conceivably be important to measure out to 52 decimal places you are wasting effort and injecting unnecessary complication accounting for that scenario.
Personally I would use one of two methods.
Always store it in the same unit of measure
Store a unit of measure in a separate field so that you know if the units is CM or Inches.
I prefer the first method since it makes it easier to process.
So you would have rounding errors if you convert from inches to centimeters, and if you convert from centimeters to inches. The Problem would be the same, no matter what you store in the DB.
You could possibly store the values not in centimeters in the Database but in millimeters. So i think how smaller the unit is, so more exact it would be, even in case of conversion.
If the same user should be able to switch at front end, you should definitely store one field representing the value in one unit you decided, because the rounding errors will happen anyway.
If you have a group of users only dealing with inches and another only dealing with cm and each of these groups have their one database or at least "own values", then decide for two fields, value/unit (e.g. same software, different customer installation in different countries)
I'd store non float values representing for example micro meters (with an unsigned 32-bit you can represent everything from 4.2 km to 0.001 mm).
Not sure why you would need a database unless you were storing your conversion rates
There would be no way to detact metric or imperial because they are just numbers
Your rounding errors will happen in accordance with the degree of accuracy you wish to display....
Depending on what you're going to be doing with those values (whether you need to do much aggregation at the DB layer, etc), the best way to ensure there is no cumulative rounding error is to store in the original unit of measure with that unit of measure (id) in a separate column, and have a separate conversion table that you use for on-the-fly calculations when comparing, aggragating, etc.
This will not be super-efficient or convenient, however: you will always have to join to a conversion table before doing any work with the values stored.
We can do the task very simple. We have only coefficient for converting inches to cm and all calculations and result we did not save into DB. We can only multiply or divide the number on the ratio and we get result. So if you have cm need to multiply the ratio and get result. You can see how it work by the example I found it in 2 min: http://inchpro.com/metric-system/convert-inches-to-centimeters
I think you know that store all the values in a database occupies a lot of space.