Protecting Protected Health Information from DICOM images - dicom

I am new to PACS and I would like to get some clarification about the web-based PACS system. Almost all the articles in internet talks about the Protected Health Information (PHI) that is associated with a DICOM image and recommending to rip off this information before sharing the image to someone else. I would like to understand how it can be done.
I am aware that if we convert the DICOM image into a PNG or JPEG image the DICOM header information will be removed. But, I wonder what if we need the original DICOM image any time and how do we re-create the PHI into a PNG image and get it as a DICOM
I have an apache web server and a MySQL database, both are installed in separate ubuntu servers. I want to know how can I securely share the patient scan/X-ray images via internet.
I really appreciate if someone could explain me in detail and thank you for your time and consideration.

PHI stands for Personal Health Information, not Protected Health Information.
A scan stored in the DICOM format contains many tags, some of which could identify the subject. There are some anonymizer programs you could incorporate in your setup.
Two programs I have tested are:
the CTP program by the RSNA project. (free and open source java)
Neologica's Dicom Anonymizer (free to use trial)
I'd post the link, but need 10 rep to post more than two links
which is completely stupid, but please try it out.
Both have an understandable interface and easy configuration wherein
you can decide which tag content to remove, or what to replace it with.
You should really read up on the tags and possibilities, but to give you an idea:
CTP anonymizer
Neologica's anonymizer

You need to make a distinction in between :
An Anonymization process
A de-identification process
In the case of Anonymization everything is lost for good as you mentionned. In the case of de-identification everything is hidden. This is described specifically within the DICOM Standard E.1 Application Level Confidentiality Profiles.
While there are plenty of non-standard DICOM anonymizer out there (use dd or hexedit in the worse case), there are very few de-identifiers out there. gdcmanon implement a previous DICOM release (before Supp 142 came out) in the command line tool.
You may want to read also: An Open Source Toolkit for Medical Imaging
De-Identification.
And if this still not enough reading, I suggest you also dive into the world of 'Private Attributes' (!= Public attributes), with the particular issue explained here regarding PHI.

Related

How to reuse reviewed documents in further translations?

Our instruction manual is in markdown format. We improve the manual daily or weekly and then we would like to translate to other languages. Until now, this process is manual and we use TRADOS (a assisted translation software)
We are studying to improve the process. What we want to do is to translate it using Microsoft translator and then a human reviewer do the fine tune. The problem is how we can reuse the corrected document in Microsoft Translator for the next translation.
Many times we only change 2 or 3 words of a topic and then the translator would create a complete new translation, doing the reviewer work very tedious if the translation is not accurate.
I know that we can train the model, but I think that there is not a 100% of probabilities that the translator uses the review. Also, it seems very time consuming to maintain the distortionary after reviewing the document.
I was wondering if somebody has solved this kind a problem.
We can automate the translation of the document using Microsoft Flow. In Microsoft flow we can use the SharePoint service to store the file which we need to translate and then we can create a flow of operation.
Need to create a folder in SharePoint and mention the site address
Mention the folder name and to which language you need to translate it when the file is available in the SharePoint folder
Give the destination folder. Again, it can be a SharePoint folder.
Note: Can't assure that the meaning after translation will be the same as the original statement.
You are looking for a translation memory system (TMS). A TMS will store your human edits to documents and reapply them to future translations of documents where a paragraph is repeated. A TMS will also help your human translator find close matches of a new segment with a previously translated one, highlighting the change the human will have to perform.
Most TMSes integrate machine translation systems like Microsoft Translator and others as a suggestion to the human editor, who can then approve or edit the suggestion.
https://en.wikipedia.org/wiki/Translation_memory
When selecting a TMS here are some features to consider:
Integration with the content management system (CMS) your business uses for storing, maintaining and publishing documents
Collaboration: Multiple people working on a shared set of documents
Workflow management: Initiate a translation job, track the progress of it, and execute payments to the collaborating humans
Ease of use and translator acceptance of the human-facing CAT component of the TMS
Pretranslate with your favorite machine translation system from inside the TMS
Once you have collected a significant amount of domain-specific human translations, you can train a custom translation system directly from the content of your TMS, tuning future machine translations in the direction of the terminology and style that are exemplified by your human translations.
You said you already use Trados. So why don't you create a Translation Memory in Trados, and save your corrected work to that Translation Memory. Then the next time you create your project in Trados, presumably also using the MS Translator plugin that is freely available from the RWS AppStore you will pre-translate using your translation memory AND MS Translator. Any work you translated before and is in your translation memory will be used in preference to the machine translation result.
If you have the professional version of Trados you can also use Perfect Match. If you do this then when you receive your updated document with only a few words to change you match it against the bilingual file from your previous translation. Everything that remains the same receives a Perfect Match status and is optionally locked making it simple to identify what needs to be changed by the translator.

How to manage the article content in an asp.net web site

I'm planning to create a site for learning technologies, such as codeproject or codeplex. Can you please suggest to me the different ways to manage huge articles?
Look at a content management system, such as SiteFinity: http://www.sitefinity.com/. There are others, some free. You can find some on codeplex.com.
HTH.
Check out DotNetNuke CMS too >> http://www.dotnetnuke.com/
And here's a very hot list available of ASP.NET CMS systems:
http://en.wikipedia.org/wiki/List_of_content_management_systems#Microsoft_ASP.NET_2
Different ways to manage articles while building the entire system yourself. Hmm, ok, let me give it a try... here's the short version.
There are several ways you can "store" your articles (content, data, whatever), and the best way to do so is to use a Database. (SQL Server, MySQL, SQLCE, SQLite, Oracle, the list goes on).
If you're not sold on the idea of a database, you can use any other type of persistent storage that you like. IE: XML, or even flat "TXT" files.
Since you're using ASP.NET you now need to either write your code behind, or some other complied code to access your stored data. You pull it out of the storage and display it on the page/view.
Last but not least, I'd like to give you a suggestion (even though it's not part of your original question). As the other answerers have stated, you should look at a pre-built CMS. If nothing else, to see how it's done (not necessarily to use it as is). My philosophy is quite simple, if you want to be productive in your development, don't bother reinventing the wheel just for the sake of it. If someone else has already build and given away exactly what you need, you should at very least give it a look and use what you can. It will save you piles of time and heartache.
Your question is not vague enough to be closed, but is vague enough that answering all of the nuances could take several thousand lines.

How to scrape websites such as Hype Machine?

I'm curious about website scraping (i.e. how it's done etc..), specifically that I'd like to write a script to perform the task for the site Hype Machine.
I'm actually a Software Engineering Undergraduate (4th year) however we don't really cover any web programming so my understanding of Javascript/RESTFul API/All things Web are pretty limited as we're mainly focused around theory and client side applications.
Any help or directions greatly appreciated.
The first thing to look for is whether the site already offers some sort of structured data, or if you need to parse through the HTML yourself. Looks like there is an RSS feed of latest songs. If that's what you're looking for, it would be good to start there.
You can use a scripting language to download the feed and parse it. I use python, but you could pick a different scripting language if you like. Here's some docs on how you might download a url in python and parse XML in python.
Another thing to be conscious of when you write a program that downloads a site or RSS feed is how often your scraping script runs. If you have it run constantly so that you'll get the new data the second it becomes available, you'll put a lot of load on the site, and there's a good chance they'll block you. Try not to run your script more often than you need to.
You may want to check the following books:
"Webbots, Spiders, and Screen Scrapers: A Guide to Developing Internet Agents with PHP/CURL"
http://www.amazon.com/Webbots-Spiders-Screen-Scrapers-Developing/dp/1593271204
"HTTP Programming Recipes for C# Bots"
http://www.amazon.com/HTTP-Programming-Recipes-C-Bots/dp/0977320677
"HTTP Programming Recipes for Java Bots"
http://www.amazon.com/HTTP-Programming-Recipes-Java-Bots/dp/0977320669
I believe that the most important thing you must analyze is which kind of information do you want to extract. If you want to extract entire websites like google does probably your best option is to analyze tools like nutch from Apache.org or flaptor solution http://ww.hounder.org If you need to extract particular areas on unstructured data documents - websites, docs, pdf - probably you can extend nutch plugins to fit particular needs. nutch.apache.org
On the other hand if you need to extract particular text or clipping areas of a website where you set rules using DOM of the page probably what you need to check is more related to tools like mozenda.com. with those tools you will be able to set up extraction rules in order to scrap particular information on a website. You must take into consideration that any change on a webpage will give you an error on your robot.
Finally, If you are planning to develop a website using information sources you could purchase information from companies such as spinn3r.com were they sell particular niches of information ready to be consume. You will be able to save lots of money on infrastructure.
hope it helps!.
sebastian.
Python has the feedparser module, located at feedparser.org that actually handles RSS in its various flavours and ATOM in its various flavours. No reason to reinvent the wheel.

Categorized Document Management System

At the company I work for, we have an intranet that provides employees with access to a wide variety of documents. These documents fall into several categories and subcategories, and each of these categories have their own web page. Below is one such page (each of the links shown will link to a similar view for that category):
http://img16.imageshack.us/img16/9800/dmss.jpg
We currently store each document as a file on the web server and hand-code links to these documents whenever we need to add a new document. This is tedious and error-prone, and it also means we lack any sort of security for accessing these documents. I began looking into document management systems (like KnowledgeTree and OpenKM), however, none of these systems seem to provide a categorized view like in the preview above.
My question is ... does anyone know of any Document Management System that allow for the type of flexibility we currently have with hand-coding links to our documents into various webpages (major and minor , while also providing security, ease of use, and (less important) version control? Or do you think I'd be better off developing such a system from scratch?
If you are trying to categorize the files or folders in the document management system, That's not a difficult task. You only need to access to admin panel to maintain the folders or categorize the folders
In Laserfiche, You can easily categorize your folders regarding the departments and can also be subcategorized them
You should look into Alfresco. It's extremely extensible and provides a lot of ways of accessing the repository.
Note: click the "Developers" tab for the community edition.
My question is ... does anyone know of
any Document Management System that
allow for the type of flexibility we
currently have with hand-coding links
to our documents into various webpages
(major and minor , while also
providing security, ease of use, and
(less important) version control?
Or do you think I'd be better off developing such a system from scratch?
Well there are companies that make a living selling doc management software. Anything you can get off the shelf is going to be a huge time saver, and its going to be better than anything you could reasonably develop by hand.
I've used a few systems:
Sharepoint: although I hear some people don't like it, I didn't either ;)
HyperOffice worked really well for my company of around 150 employees and has all the features you describe.
Current company uses Confluence, I like it :) But its probably one of those tools whose pricetag isn't worth it, especially if you're only using a subset of its features like doc management.
I haven't used it, but one guy I know raves about Alfresco, a free and open source doc management system. I looked at its website, seems simple enough to use.
We also faced a similar problem. However version control was more on our priority and we did look into many solutions in and around. We found Globodox extremely easy to install and use and more important the support team was absolutely fantastic
Try Mayan EDMS, it's Django based, and open source, used it as a base and build the custom features you wish on top of it.
Code location: https://gitlab.com/mayan-edms/mayan-edms
Homepage at: http://www.mayan-edms.com
The project is also available via PyPI at: https://pypi.python.org/pypi/mayan-edms/

Is Wiki Content Portable?

I'm thinking of starting a wiki, probably on a low cost LAMP hosting account. I'd like the option of exporting my content later in case I want to run it on IIS/ASP.NET down the line. I know in the weblog world, there's an open standard called BlogML which will let you export your blog content to an XML based format on one site and import it into another. Is there something similar with wikis?
The correct answer is ... "it depends".
It depends on which wiki you're using or planning to use. I've used various over the years MoinMoin was ok, used files rather than database, Ubuntu seem to like it. MediaWiki, everyone knows about and JAMWiki is a java clone(ish) of MediaWiki with the aim to be markup compatible with MediaWiki, both use databases and you can generally connect whichever database you want, JAMWiki is pre-configured to use an internal HSQLDB instance.
I recently converted about 80 pages from a MoinMoin wiki into JAMWiki pages and this was probably 90% handled by a tiny perl script I found somewhere (I'll provide a link if I can find it again). The other 10% was unfortunately a by-hand experience (they were of the utmost importance with them being recipies for the missus) ;-)
I also recently setup a Mediawiki instance for work and that took all of about 8 minutes to do. So that'd be my choice.
To answer your question I don't believe that there's such a standard as WikiML as Till called it.
As strange as it sounds, I've investigated screen scraping a wiki for a co-worker to help him port it to another wiki engine. It turned out that screen scraping would have been easier, quicker and more efficient to write to move this particular file based wiki to another one or a CMS.
Given the context that you wrote the question in I would bite the bullet now and pay the little extra for a windows hosted account and put Screwturn wiki on it. You're got the option of using file based or SQL Server based back end for it but because one of your requirements is low cost I'm guessing that you would use file based now for a cheaper hosted account and then you can always upscale the back end to SQL Server.
I haven't heard of WikiML.
I think your biggest obstacle is gonna be converting one wiki markup to another. For example, some wikis use markdown (which is what Stack Overflow uses), others use another markup syntax (e.g. BBCode, ...), etc.. The bottom line is - assuming the contents are databased it's not impossible to export and parse it to make it "fit" in another system. It might just be a pain in the ass.
And if the contents are not databased, it's gonna be a royal pain in the ass. :D
Another solution would be to stay with the same system. I am not sure what the reason is for changing the technology later on. It's not like a growing project requires IIS/ASP.NET all of the sudden. (It might just be the other way around.) But for example, if you could stick with PHP for a while, you could also run that on IIS.

Resources