Tuesday, February 5, 2019


Data Science Needs a Clinical Program

Clinical programs help students learn from real-world situations, bringing them closer to the concerns and issues of real-world practice, and resulting in better overall education. Currently, such programs are associated with professional degrees in medicine and law. We argue that data science education needs a clinical component, designed to serve the needs of data science, given the ubiquitous nature of data. Indeed, it is this clinical aspect that would distinguish data science from its related and affiliated fields, thereby crystalizing its distinct role in academia. A well-trained data scientist must understand real-world tradeoffs, including the constraints associated with use of data in practical applications, along with having a grounding in foundational approaches and technologies. A serious data science program must incorporate a clinical component taught by clinical faculty.

Background

Data science distinguishes itself from foundational areas—including computer science, statistics, and mathematics—in engaging directly with “real data” under real-world constraints. It is about real data “in action”; about understanding and tackling data issues in real-world application scenarios; and about meeting end-user needs and expectations under real-world constraints. A clinical component is essential to gaining the necessary understanding and to developing the corresponding skills and expertise.

Data science could, of course, be taught with sample data and synthetic datasets. Or, perhaps, by bringing real data into classroom settings, which removes data from its application context. Yet, students trained in that fashion may find themselves immediately confronting difficult issues in their jobs, related to data wrangling, data cleaning, communicating with users to glean their “real” needs, understanding which data and analyses may be “good enough” under the given time and budget constraints, and the like. They may not be ready to hit the ground running and may find themselves in a de facto “trainee” status—thereby leaving them feeling frustrated and unfulfilled early in their new careers, while employers continue to lament about data science programs not graduating students with the requisite skills! A clinical component in data science would educate students in the concerns of real applications, and the constraints and idiosyncrasies of real data, ensuring that students are much better prepared to confront the vagaries of real-world situations.

As a new field positioned to tackle the challenges of the future, data science must adopt novel techniques and strategies to prepare the workforce of the future. The special needs of data science include teamwork, collaboration, and multidisciplinarity. Data science, and the affiliated area of artificial intelligence, are technical fields on the one hand, but also fields that greatly impact society and the future of work, on the other. A clinical program is essential for producing competent, well-qualified, well-rounded data scientists—at the undergraduate as well as graduate levels.

Clinical programs

The fields of Medicine and Law have an established tradition in clinical programs. In its very earliest days, medicine was taught as a theoretical field, based on the balancing of the humors. However, bedside clinical instruction emerged in western medicine in the mid 16th century [Rolleston1939]. Major advances in the understanding of human physiology and the importance of clinical practice at the bedside occurred in the 17th century [Weatherall 2014]. Clinical medicine and bedside teaching came to the United Kingdom in 1746 [Rolleston 1939] and, in the US, “Many of the early schools had neither hospital nor university connections and were mainly diploma mills created solely to capitalize on the popular upsurge in medical education. However, the better schools that have survived to this day soon realized the indispensability of a hospital affiliation and took steps to avail themselves of such facilities.  Thus, we see three professors of King's College in New York in 1776 founding a hospital to enhance the medical education program.  For the same purpose, the Harvard Medical School moved from Cambridge to Boston in 1807 in order to be near a hospital.” [Burbridge 1957]. Today, there are 141 accredited MD-granting institutions in the US, and all have affiliated teaching hospitals where they provide clinical training. In other words, one cannot imagine contemporary medical education without a clinical, bedside component to it.

In law, “Clinical legal education, born in the early twentieth century, only became widespread, in the United States and other countries…since the early 1960s and 1970s. …Today, there is a ‘global  clinical movement’ confirming the success of a legal education methodology that, being based on experiential learning, allows law students to develop professional skills and at the same time provide a useful service to the community.” [Concetta 2016].

“Clinical courses bring to legal education a new dimension in learning. Traditional legal education occurs, by and large, in a logical, deliberate, rational, and abstract pattern of thinking and analyzing. Traditional law school courses involve two-dimensional situations focusing on oral discussions of printed book pages.  Simulated clinic courses involve a third dimension, albeit not real. Fieldwork clinical experiences provide a 3-D real world experience.  No other pedagogical method affords students this opportunity to learn by personal involvement, action, discovery, and reflection. The magic of clinical education is that it provides students with a relevant, enriching, exciting, challenging, and intellectual educational experience.  Students in clinical courses immerse themselves in learning to an extent and to a degree uncalled for and unavailable in other law school courses. The dynamics of lawyering come alive in a clinic course and reduce the chance of students becoming bored, apathetic, or indifferent.” [Haydock 1983]. One only needs to look at websites of the leading law schools to notice how important law clinics have become in attracting students. Students want to gain an education to make a difference—and the clinics allow them to do that while they are still in law school. In the process, they introduce students to a variety of real-world issues, and instill in many a sense of community service.

The Needs of Data Science

A clinical component in data science programs would provide both the “bedside” experience with data, as in medicine, and the “3-D real world experience” with data, as in law. The clinical component in data science should be an integral and intrinsic part of the overall program. Students would be immersed from the very beginning in project-oriented courses requiring participation in real-world situations to understand the needs, requirements, and constraints of dealing with real-world data-driven applications. In addition to providing a technical introduction to data science—in topics such as programming, statistics, linear algebra, and the like—there would be an introduction to clinical data science—providing engagement with real-world users and data[1]. One would expect that data science programs in future would compete and attract students based on the quality of their clinical programs—just as law schools currently attract students based on the breadth and quality of their clinical programs.

Most universities have either rolled out new programs in data science at the graduate and undergraduate level or are working on it. In 2015, there were about 50 Master’s level programs in data science (or in affiliated areas, such as Data Analytics, Business Analytics, etc.) and few, if any, undergraduate programs across the country. Today, there are more than 500 Master’s level programs and 10’s, if not more, undergraduate programs[2]. Many institutions have also announced related, high-profile data science institutes.

Most are still struggling to define these new programs—not in terms of attracting students, there is clearly plenty of interest there—but in terms of defining the intellectual space and, indeed, the unique value proposition of data science, as opposed to computer science, statistics, business analytics, or any number of other allied fields, including data-intensive activities within various academic disciplines themselves. We posit that the difficulties that academia is encountering in crisply defining data science is due to the lack of the realization of the need for a clinical program in data science. Indeed, it is the incorporating the clinical component with clinical faculty, at the undergraduate and graduate levels, that would make a data science program so uniquely different from programs in other allied fields. This is a key insight that, we believe, will help set free the myriad academic committees at various institutions currently struggling to solve the puzzle of how to uniquely define data science from a pedagogical perspective.

Any academic research institution that has initiated a data science program is rife with stories of how they have been inundated with requests from researchers from across their own campus—leave alone industry, government, or other collaborators—for data science assistance.  There is, indeed, a big demand on campuses for expertise and skill for “helping with data”. Requests come from all parts of campus—science, engineering, social science, medicine, humanities, etc. There is increasing recognition of the need for people who are able to straddle the line between computational and data expertise and skills on the one hand, and familiarity of the types of data, analyses, tools, and culture of a discipline on the other.

Emergence of Clinical Data Science

The National Academies of Science Committee on Envisioning the Data Science Discipline: The Undergraduate Perspective produced a report in May 2018 on Data Science For Undergraduates: Opportunities And Options [NAS 2018], which contained a number of recommendations including, “developing a cadre of faculty equipped to teach the new field”, “prepare [students] for success in a variety of careers”, “prepare students for an array of data science roles in the workplace”, and ensuring that “ethics is woven into the data science curriculum from the beginning and throughout.”  To realize the goals and objectives behind the recommendations in the report from the NAS study, as well as to respond to the pressing data science needs on campuses, we need a clinical program in data science.

The key aspect that differentiates those comfortable with calling themselves “data scientists”, versus those who are not so sure—and, perhaps, feel inauthentic in that role—is the practical experience in dealing with real-world data in real situations. Indeed, we believe that any bona fide data science program must provide clinical experiences as an intrinsic part of education in data science, along with the grounding in theoretical and foundational aspects.

An example of a data science activity that has the characteristics of a clinical program is the “data science for social good” movement, which was initiated by Rayid Ghani at the University of Chicago in 2013, with support from the Schmidt Foundation [DSSG]. This program has been replicated several times by others across the country, and around the world [Turing 2018]. A recently announced solicitation that supports clinical work in data science is the NSF Data Science Corps solicitation (even though that term does not appear per se in the text of the solicitation) [NSF 2018]. A workshop on the Data Science Corps idea was organized in December 2017 at Georgetown University. The workshop report goes into further details of ideas for such a program [DSC 2017]. A few years ago, NSF also introduced the Big Data Regional Innovation Hubs program [Hubs], which also gets to this idea of creating mutually beneficial partnerships among academia, industry, and government in order to address and solve real-world problems. I recently attended a meeting at UC San Diego with officials from the City of San Diego and the City of Carlsbad where this issue came up again. Cities are generating more data; they need help with data analytics; and they need training in the new techniques and technologies of data science. All of these examples are nibbling at the edges of clinical data science.

Taking the Next Step: Clinical Faculty in Data Science

This single innovation of embedding a clinical component in data science programs will revolutionize data science education. It will help crystalize the vision for data science and its place among various other academic disciplines and programs.

In 2017-2018, NSF supported three workshops on the topic of Translational Data Science—to bridge the gap between foundational methods and practical applications in data science [Baru 2018, TDS]. These workshops recognized the “virtuous cycle” between data science foundations and data science applications. Systematizing the clinical component and supporting translational data science requires the recruitment of a clinical faculty in data science, to create, nurture, and sustain the clinical program, including relationships with various external stakeholders. The clinical component becomes the natural venue for hands-on projects, engagement with local communities, capstone projects, summer internships, and the like.

Universities need to develop a vision for hiring such clinical faculty. There are extant institutions, centers, and programs where one can find examples of clinical programs and faculty, even if they are not (yet) named as such. This includes institutions like my own—the San Diego Supercomputer Center that has done pioneering work in related areas for decades, and centers like the e-Science Institute at UW, and other similar efforts.

How to hire clinical faculty and jumpstart a clinical program in data science is the topic for a future blog post.

REFERENCES
[Baru 2018] How to deliver translational data-science benefits to science and society, Nature 561, 464 (2018), doi: 10.1038/d41586-018-06804-4, Chaitanya Baru, Sarah Bird, Alan Blatecky, David Culler, Robert L. Grossman, Bill Howe, Vandana P. Janeja, Meredith M. Lee, Raghu Machiraju, Elena Zheleva.

[Burbridge 1957] The Historical Background of the Teaching Hospital in the United States Charles E.  Burbridge, Ph.D. Superintendent, Freedmen's Hospital, Washington, D.  C., Journal of the National Medical Association, May 1957.

[Concetta 2016] Appendix A: The History of legal clinics in the US, Europe, and around the world, Maria Concetta Romano, The Inquiry on Legal Clinical Education in European Territory, 16 Diritto & Questioni Pubbliche 41 (2016).

[DSC 2017] Data Science Corps Conference,  https://mccourt.georgetown.edu/DataScienceCorp#, December 2017.

[DSSG] Data Science for Social Good, https://dssg.uchicago.edu/.

[Haydock 1983] Haydock, Roger S. (1983) Clinical Legal Education: The History and Development of a Law Clinic, William Mitchell Law Review: Vol. 9: Iss. 1, Article 4. Available at: http://open.mitchellhamline.edu/wmlr/vol9/iss1/4.

[Hubs] The Big Data Regional Innovation Hubs, https://www.bigdatahubs.io/.

[NAS 2018] National Academies of Sciences, Engineering, and Medicine. 2018. Data Science for Undergraduates: Opportunities and Options. Washington, DC: The National Academies Press. https://doi.org/10.17226/25104.

[NSF 2018] NSF Data Science Corps solicitation, https://www.nsf.gov/pubs/2019/nsf19518/nsf19518.htm.

[Rolleston 1939] The History of Clinical Medicine (Principally of Clinical Teaching) in the British Isles, Sir Humphrey Rolleston, Procs of the Royal Society of Medicine, Vol XXXII 1185, 1939.

[TDS] First Translational Data Science Workshop, June 2017, https://ctds.uchicago.edu/tds-17/; Second Translational Data Science Workshop, November 2017, https://bids.berkeley.edu/events/translational-data-science-workshop; Third Translational Data Science Workshop, October 2018, http://nebigdatahub.org/3rd-tds-workshop/.

[Turing 2018] Data Science for Social Good, The Alan Turing Institute, https://www.turing.ac.uk/collaborate-turing/data-science-social-good.

[Weatherall 2014] History of Clinical Medicine, MW Weatherall, DJ Weatherall, First published: 09 December 2014, https://doi.org/10.1002/9780470015902.a0003087.pub2



[1] In Computer Science, some institutions provide project management experience and real-world interactions via their Software Engineering course(s). However, those courses are typically upper level, optional courses. Such experiences and interactions are not part of an introductory course(s), nor are they built into courses throughout the program.
[2] Based on an internal study conducted by AAAS Fellows at NSF.

Welcome to my blog--Perspectives on Data Science!


I will use this blog to share my ideas, thoughts, and observations in Data Science--based on a career spent on software tools and technologies in support of data-intensive applications that includes my past 22.5 years at the San Diego Supercomputer center; the past 4 years I spent at the National Science Foundation, as Senior Advisor for Data Science; and my current role at UC San Diego as Senior Advisor, Data Science Research Initiatives.

I will cover topics in research, education, and training in Data Science and Artificial Intelligence. In a recent conversation with a colleague, we began wondering about the terms "Big Data" and "AI"--whether they needed to stand apart, or come together. After all, much of today's AI success is based on machine learning and deep learning--in other words, on data! And, conversely, the term "big data" has always been difficult to define as a standalone term, without referencing the applications that arise from that data--many of which are based on machine learning and deep learning.

But...I digress. This first post was simply meant to be a "Hello", but I am already getting into perspectives and thoughts!

In this blog, I would like to look at the "big picture" and discuss strategic directions, as well as comment on tools and technologies for Data Science. In my position at NSF, I was fortunate to be able to gain a national and international perspective on data science research and education activities. I co-chaired the NSF working group on Harnessing the Data Revolution--one of NSF's stated Big Ideas for research investment in the near-term future. All of that background will serve to inform my viewpoints.

There is so much to talk about, so please watch this space! We are only beginning to understand the deep advances brought upon by data, as well as their full set of ramifications!