MODEL IMPROVES PREDICTION OF MORTALITY RISK

IN ICU PATIENTS

In intensive care units, where patients come in with a wide range of health conditions, triaging relies heavily on clinical judgment. ICU staff run numerous physiological tests, such as bloodwork and checking vital signs, to determine if patients are at immediate risk of dying if not treated aggressively.

Enter: machine learning. Numerous models have been developed in recent years to help predict patient mortality in the ICU, based on various health factors during their stay. These models, however, have performance drawbacks. One common type of “global” model is trained on a single large patient population. These might work well on average, but poorly on some patient subpopulations. On the other hand, another type of model analyzes different subpopulations — for instance, those grouped by similar conditions, patient ages, or hospital departments — but often have limited data for training and testing.

In a paper recently presented at the Proceedings of Knowledge Discovery and Data Mining conference, MIT researchers describe a machine-learning model that functions as the best of both worlds: It trains specifically on patient subpopulations, but also shares data across all subpopulations to get better predictions. In doing so, the model can better predict a patient’s risk of mortality during their first two days in the ICU, compared to strictly global and other models.

The model first crunches physiological data in electronic health records of previously admitted ICU patients, some who had died during their stay. In doing so, it learns high predictors of mortality, such as low heart rate, high blood pressure, and various lab test results — high glucose levels and white blood cell count, among others — over the first few days and breaks the patients into subpopulations based on their health status. Given a new patient, the model can look at that patient’s physiological data from the first 24 hours and, using what it’s learned through analyzing those patient subpopulations, better estimate the likelihood that the new patient will also die in the following 48 hours.

Moreover, the researchers found that evaluating (testing and validating) the model by specific subpopulations also highlights performance disparities of global models in predicting mortality across patient subpopulations. This is important information for developing models that can more accurately work with specific patients.

“ICUs are very high-bandwidth, with a lot of patients,” says first author Harini Suresh, a graduate student in the Computer Science and Artificial Intelligence Laboratory (CSAIL). “It’s important to figure out well ahead of time which patients are actually at risk and in more need of immediate attention.”

Co-authors on the paper are CSAIL graduate student Jen Gong, and John Guttag, the Dugald C. Jackson Professor in Electrical Engineering.

Multitasking and patient subpopulations

A key innovation of the work is that, during training, the model separates patients into distinct subpopulations, which captures aspects of a patient’s overall state of health and mortality risks. It does so by calculating a combination of physiological data, broken down by the hour. Physiological data include, for example, levels of glucose, potassium, and nitrogen, as well as heart rate, blood pH, oxygen saturation, and respiratory rate. Increases in blood pressure and potassium levels — a sign of a heart failure — may indicate health problems over other subpopulations.

Next, the model employs a multitasking method of learning to build predictive models. When the patients are broken into subpopulations, differently tuned models are assigned to each subpopulation. Each variant model can then more accurately make predictions for its personalized group of patients. This approach also allows the model to share data across all subpopulations when it’s making predictions. When given a new patient, it will match the patient’s physiological data to all subpopulations, find the best fit, and then better estimate the mortality risk from there.

“We’re using all the patient data and sharing information across populations where it’s relevant,” Suresh says. “In this way, we’re able to … not suffer from data scarcity problems, while taking into account the differences between the different patient subpopulations.”

“Patients admitted to the ICU often differ in why they’re there and what their health status is like. Because of this, they’ll be treated very differently,” Gong adds. Clinical decision-making aids “should account for the heterogeneity of these patient populations … and make sure there is enough data for accurate predictions.”

A key insight from this method, Gong says, came from using a multitasking approach to also evaluate a model’s performance on specific subpopulations. Global models are often evaluated in overall performance, across entire patient populations. But the researchers’ experiments showed these models actually underperform on subpopulations. The global model tested in the paper predicted mortality fairly accurately overall, but dropped several percentage points in accuracy when tested on individual subpopulations.

Such performance disparities are difficult to measure without evaluating by subpopulations, Gong says: “We want to evaluate how well our model does, not just on a whole cohort of patients, but also when we break it down for each cohort with different medical characteristics. That can help researchers in better predictive model training and evaluation.”

Getting results

The researchers tested their model using data from the MIMIC Critical Care Database, which contains scores of data on heterogeneous patient populations. Of around 32,000 patients in the dataset, more than 2,200 died in the hospital. They used 80 percent of the dataset to train, and 20 percent to test the model.

In using data from the first 24 hours, the model clustered the patients into subpopulations with important clinical differences. Two subpopulations, for instance, contained patients with elevated blood pressure over the first several hours — but one decreased over time, while the other maintained the elevation throughout the day. This subpopulation had the highest mortality rate.

Using those subpopulations, the model predicted the mortality of the patients over the following 48 hours with high specificity and sensitivity, and various other metrics. The multitasking model significantly outperformed a global model by several percentage points.

Next, the researchers aim to use more data from electronic health records, such as treatments the patients are receiving. They also hope, in the future, to train the model to extract keywords from digitized clinical notes and other information.

The work was supported by the National Institutes of Health.

Ms.P.Revathi,M.Sc.,M.Phil.,

Assistant Professor,

Department of Computer Science

Marudhar Kesari JainCollege,

Vaniyambadi,[Vlr DT],TamilNadu.

Revathipriya50@gmail.com,

Digital Signal Processing

Ms.P.Revathi,M.Sc.,M.Phil., Assistant Professor,

Department of Computer Science,

Marudhar Kesari JainCollege,

Vaniyambadi,[Vlr DT],TamilNadu

Revathipriya50@gmail.com

_______________________________________________________________

I. Introduction ( Dsp)

Before the digital revolution, image and signal processing was performed using analog circuitry. Today digital signal processing (DSP) has defined our lives. Although some mixed-signal designs are of current interest, DSP dominates everything that we own or use everyday. DSP chips exist in many devices such as our cell phones, our iPods, our wireless router, our new HDTV.

The purpose of this paper is to consider possibilities of DSP outside the semiconductor or electronic domain. Organic elements (such as DNA and polymers) that conduct electricity can be used to build organic semiconductors at the molecular level [1]. However, more fundamental questions can be asked. Can DSP be performed in exotic materials, such as chemical substrates, cells, organisms, or even DNA, without the use of electrical currents? Will we be able to build fully blown DSP systems out of these materials? Or will some DSP functions (such as storage and data archiving) be implemented with such materials? We do not attempt to provide a thorough scientific review of such

II. Chemical based dsp

There has been an extensive amount of research on reaction-diffusion media, which are implementations of chemical oscillators. Chemical oscillators are systems of chemical reac-tions that exhibit oscillatory behavior when not in the equilibrium state. The concentration of reagents and products varies over time. A certain category of those are light-sensitive and can store input information during long periods of time. When stimulated by light and controlled by the acidity

of the medium (mixture of chemical compounds), basic or complex image transformations such as contour enhancement, image segmentation, image half toning, and others can be achieved (see [2, ch. 3, 4]). The mathematics behind the above processes are systems of nonlinear differential equations. Although nonlinear image processing with computers is not new, it is extremely fascinating to see

chemical media perform complicated image-processing tasks in a short period of time.

III. Organism Based Dsp

The beautiful colors found in Impressionistic paintings were the result of scientific discovery of novel pigments But sometimes art can drive scientific evolution. Take, for example, the

artist/scientist Cameron Jones from Australia. He used fungi to process audio signals. He recorded music on compact discs (CDs) and then used the CDs as substrates to grow fungi. He put the CDs in his CD player and watched how the optically recorded sound was distorted by the fungi. Surprisingly the fungus growth pat-terns were dependent on the optical grooves recorded on the CD. The fungi were reacting to the recorded information. The audio track was Bprocessed[ by the grown fungus. This interface of optics and biomaterials was a clear demonstration that signal processing can be performed with other means.

Going into a smaller scale, individual proteins can be used to perform processing. For example, protein-based memories utilize the light sensitivity of a special category of proteins. A protein that is often used is bacteriorhodopsin. This protein is found in the bacterium Halobacterium halobium, which thrives in environments with high salt and low oxygen concentration. If oxygen levels drop, the purple membrane of the bacterium grows to expose bacteriorhodopsin. The protein converts light into energy by pumping a proton through the membrane, creating a chemical and osmotic potential. This cycle can be repeated millions of times, and the protein can survive high tempera-tures. In a few words, the protein is an excellent medium for storing information, since it can last a long time and has rewritable abilities and truly nanoscale size. A film of the protein can be deposited as a layer on an appropriate substrate. Light expo-sure, via direct light or laser, can be used to stimulate the protein and thus record the input light information (could be an image). Information can be read out using a laser as well.

The fact that biological materials are used for storage opens the door to a unique method of material optimiza-tion. Although such optimization is complex and requires extensive knowl-

edge of the substance’s properties, evolution through genetic modifica-tions can be used to generate altered versions of the substance. Each out-come can be tested for its performance, and new and improved generations of substances can be further generated as mutations of the previous outcomes.

In fact, this has happened already. Light-sensitive proteins are taken from one bacterium and are placed in more Bprogrammable[ bacteria such as E.coli (since its genetic code has been studied longer). Synthetic biology is doing exactly this. It modifies the genetic code of organ-isms to add Bnovel[ functionality, such as light responsivity, NOT, AND, and OR gates. There is a lot of activity in this area with conferences and dedicated journals, and an established database of standard biological parts (BioBricks^TM). There is even a com-petition (the International Genetically Engineered Machine Competition) where students compete in design-ing biological systems that can per-form simple computations. In the 2005 competition, students made a biofilm (layer) of bacteria that could perform distributed edge detection on an image. E.coli cells were modified to react to light using a light-sensing protein and change their state accord-ing to their neighbor cells. All the needed parts were taken from the bio-Bricks database. As a result, the edges of a projected image were found. In another example [3], E.coli was mod-ified in a similar fashion to store image data with a theoretical resolution of 100-Mpixels/in² (or 108 bacteria/in²). An image was projected to the bacteria layer and read using a weak laser.

IV. Dsp With DNA

Similarly, logic circuits can be built using just DNA molecules. The DNA double helix is made from two single strands of DNA, each of which is a sequence from the quaternary alphabet (A,T,G,C). The two single strands are held together due to hybridization of the complementary sequences. A com-plement sequence of a strand is the one found by performing the Watson–Crick complement rule (A-T, G-C). Using DNA strands as input and processing elements, the simple hybridization force can act as a powerful computa-tional tool. The sequence of input and processing strands can be designed in such a way that their hybridization can be predicted and controlled. Using this basic principle, complex molecular structures and basic arithmetic opera-tions can be performed. DNA comput-ing is the field that utilizes DNA to perform computation.

The literature abounds with de-monstrations of DNA circuits that behave like transistors and adders and point to the future when complex DNA circuits acting as digital filters are realizable. For example, Winfree’s group at Caltech presented a method for the self-assembling computation of the Sierpinski triangle [4]. DNA single strands hybridize with each other to form tiles (building blocks) that self-assemble to build complex structures (like the triangle). They described the correlation between the Sierpicski triangle and the binary version of wavelet and Fourier trans-forms, as well as the Hadamard trans-form. Self-assembly and tiling can also be used to study Markov fields, which have been extensively used in image processing. In the future, one can imagine a self-assembly approach to image processing following similar principles. The same group has demonstrated a hybridization-only [5] and entropy driven [6] protocols for implementing logical gates, signal restoration, amplification, cascade, and feedback, thus developing DNA-based logical circuits. Such development brought us closer to the possible realization of DNA-based DSP circuits.

V . C O N C L U S I O N

If we were allowed to be as to offer a prediction for the future, we believe that the turning point in the organic future of DSP is to see which technology, from the aforementioned or from the ones to come, will allow for the implementation of a fast Fourier transform combined with elegant and not tedious input–output procedures. We believe that DNA-based logical circuit design will mate-rialize first followed by synthetic gene networks. DNA exploration is driven by two large forces: i) human sustain-ability, as in understanding organism formation, development, evolution, and function, and hence finding cures for diseases, and ii) engineer-ing curiosity, as in trying to utilize DNA and genes to do computations. This has led to a growth and cost reduction similar to that witnessed by the semiconductor industry (see Moore’s law). The cost and delivery time of DNA synthesis is being reduced exponentially, thus making data input elegant and economical. DNA sequencing, replication, and filtering are getting cheaper and faster everyday, having a similar effect on data output. For example, sequencing the human genome the first time took ten years and a couple of billion dollars. Now there exist commercially available sequencers (for example, from 454 Life Sciences) that can do it in months at a fraction of the cost, with prospects to reduce it to days and below $1000, as set by The $1000 Genome Project and the re-cently announced X-Prize. DNA equip-ment is getting even smaller, considering, for example, NEC’s porta-ble DNA lab in a briefcase, and cheaper, such that anybody can process the signals in the office and later at home pull out their Discovery’s DNA Explorer Kit or CSI’s DNA Lab Kit and with their kids (or alone satisfying their inner child) manipulate and ana-lyze DNA in their living room.

References

[1] J. M. Seminario, L. Yan, and Y. Ma, BScenarios for molecular-level signal processing,’’ Proc. IEEE, vol. 93, no. 10,

pp. 1753–1764, Oct. 2005.

[2] T. Sienko, A. Adamatzky, N. G. Rambidi, and

M. Conrad, Eds., Molecular

ComputingR Cambridge, MA: MIT Press, Sep. 2003.

[3] M. Levy, J. J. Tabor, and S. T. C. Wong, BTaking pictures with E.coli: Signal processing using synthetic biology,’’

IEEE Signal Process. Mag., vol. 23, no. 3,

pp. 142–144, May 2006.

[4] P. W. Rothemund, N. Papadakis, and

E.Winfree, BAlgorithmic self-assembly of DNA Sierpinski triangles,’’ PLos Biol., vol. 2, no. 12, Dec. 2004.

[5] G. Seelig, D. Soloveichik, D. Y. Zhang, and

E.Winfree, BEnzyme-free nucleic acid

logic circuits,’’ Science, vol. 314, no. 5805,

pp. 1585–1588, Dec. 2006.

[6] D. Y. Zhang, A. J. Turberfield, B. Yurke, and E. Winfree, BEngineering entropy-driven reactions and networks catalyzed by DNA,’’ Science, vol. 318, no. 5853, pp. 1121–1125, Nov. 2007.

[7] S. A. Tsaftaris, A. K. Katsaggelos,

T. N. Pappas, and E. T. Papoutsakis, BHow can DNA computing be applied to digital signal processing?’’ IEEE Signal Process. Mag., vol. 21, no. 6, pp. 57–61, Nov. 2004.

[8] S. A. Tsaftaris and A. K. Katsaggelos,

‘‘On designing DNA databases for the storage and retrieval of digital signals,’’ in Proc. 1st ICNC Adv. Natural Comput., vol. 3611, Changsha, China, Aug. 27–29, 2005, Part II, L. Wang, K. Chen, and Y. S. Ong, Eds. Berlin/Heidelberg: Springer, Jul. 2005,

COMPUTER SCIENCE DEPARTMENT

Friday, 31 August 2018