Data Science Needs a
Clinical Program
Clinical programs help students learn from real-world situations,
bringing them closer to the concerns and issues of real-world practice, and resulting
in better overall education. Currently, such programs are associated with professional
degrees in medicine and law. We argue that data
science education needs a clinical component, designed to serve the needs of data science, given the ubiquitous nature of data. Indeed, it is this clinical
aspect that would distinguish data science
from its related and affiliated fields, thereby crystalizing its distinct role in
academia. A well-trained data scientist must understand real-world tradeoffs,
including the constraints associated with use of data in practical applications,
along with having a grounding in foundational approaches and technologies. A serious data science program must
incorporate a clinical component taught by clinical faculty.
Background
Data science distinguishes itself from foundational
areas—including computer science, statistics, and mathematics—in engaging
directly with “real data” under real-world constraints. It is about real data
“in action”; about understanding and tackling data issues in real-world
application scenarios; and about meeting end-user needs and expectations under
real-world constraints. A clinical component is essential to gaining the
necessary understanding and to developing the corresponding skills and expertise.
Data science could, of course, be taught with sample data
and synthetic datasets. Or, perhaps, by bringing real data into classroom settings,
which removes data from its application context. Yet, students trained in that
fashion may find themselves immediately confronting difficult issues in their
jobs, related to data wrangling, data cleaning, communicating with users to
glean their “real” needs, understanding which data and analyses may be “good
enough” under the given time and budget constraints, and the like. They may not
be ready to hit the ground running and may find themselves in a de facto “trainee” status—thereby
leaving them feeling frustrated and unfulfilled early in their new careers,
while employers continue to lament about data science programs not graduating students
with the requisite skills! A clinical component in data science would educate
students in the concerns of real applications, and the constraints and
idiosyncrasies of real data, ensuring that students are much better prepared to
confront the vagaries of real-world situations.
As a new field positioned to tackle the challenges of the
future, data science must adopt novel techniques and strategies to prepare the
workforce of the future. The special needs of data science include teamwork, collaboration,
and multidisciplinarity. Data science, and the affiliated area of artificial
intelligence, are technical fields on the one hand, but also fields that
greatly impact society and the future of work, on the other. A clinical program
is essential for producing competent, well-qualified, well-rounded data
scientists—at the undergraduate as well as graduate levels.
Clinical programs
The fields of Medicine and Law have an established tradition in
clinical programs. In its very earliest days, medicine was taught as a
theoretical field, based on the balancing of the humors. However, bedside
clinical instruction emerged in western medicine in the mid 16th
century [Rolleston1939]. Major advances in
the understanding of human physiology and the importance of clinical practice
at the bedside occurred in the 17th century [Weatherall 2014]. Clinical
medicine and bedside teaching came to the United Kingdom in 1746 [Rolleston
1939] and, in the US, “Many of the early schools had neither hospital nor
university connections and were mainly diploma mills created solely to
capitalize on the popular upsurge in medical education. However, the better
schools that have survived to this day soon realized the indispensability of a
hospital affiliation and took steps to avail themselves of such
facilities. Thus, we see three
professors of King's College in New York in 1776 founding a hospital to enhance
the medical education program. For the
same purpose, the Harvard Medical School moved from Cambridge to Boston in 1807
in order to be near a hospital.” [Burbridge 1957]. Today, there are 141
accredited MD-granting institutions in the US, and all have affiliated teaching
hospitals where they provide clinical training. In other words, one cannot imagine contemporary medical education
without a clinical, bedside component to it.
In law, “Clinical legal
education, born in the early twentieth century, only became widespread, in the
United States and other countries…since the early 1960s and 1970s. …Today,
there is a ‘global clinical movement’
confirming the success of a legal education methodology that, being based on
experiential learning, allows law students to develop professional skills and
at the same time provide a useful service to the community.” [Concetta 2016].
“Clinical courses bring to legal
education a new dimension in learning. Traditional legal education occurs, by
and large, in a logical, deliberate, rational, and abstract pattern of thinking
and analyzing. Traditional law school courses involve two-dimensional
situations focusing on oral discussions of printed book pages. Simulated
clinic courses involve a third dimension, albeit not real. Fieldwork
clinical experiences provide a 3-D real world experience. No other pedagogical method affords students
this opportunity to learn by personal involvement, action, discovery, and
reflection. The magic of clinical education is that it provides students with a
relevant, enriching, exciting, challenging, and intellectual educational
experience. Students in clinical courses
immerse themselves in learning to an extent and to a degree uncalled for and
unavailable in other law school courses. The dynamics of lawyering come alive
in a clinic course and reduce the chance of students becoming bored, apathetic,
or indifferent.” [Haydock 1983]. One only needs to look at websites of the
leading law schools to notice how important law clinics have become in
attracting students. Students want to gain an education to make a difference—and
the clinics allow them to do that while they are still in law school. In the process,
they introduce students to a variety of real-world issues, and instill in many
a sense of community service.
The Needs of Data
Science
A clinical component in data science
programs would provide both the “bedside” experience with data, as in medicine,
and the “3-D real world experience” with data, as in law. The clinical
component in data science should be an integral and intrinsic part of the
overall program. Students would be immersed from the very beginning in
project-oriented courses requiring participation in real-world situations to
understand the needs, requirements, and constraints of dealing with real-world
data-driven applications. In addition to providing a technical introduction to
data science—in topics such as programming, statistics, linear algebra, and the
like—there would be an introduction to clinical
data science—providing engagement with real-world users and data[1]. One
would expect that data science programs in future would compete and attract
students based on the quality of their clinical programs—just as law schools currently
attract students based on the breadth and quality of their clinical programs.
Most universities have either rolled out
new programs in data science at the graduate and undergraduate level or are
working on it. In 2015, there were about 50 Master’s level programs in data
science (or in affiliated areas, such as Data Analytics, Business Analytics,
etc.) and few, if any, undergraduate programs across the country. Today, there
are more than 500 Master’s level programs and 10’s, if not more, undergraduate
programs[2]. Many institutions
have also announced related, high-profile data science institutes.
Most are still struggling to define
these new programs—not in terms of attracting students, there is clearly plenty
of interest there—but in terms of defining the intellectual space and, indeed,
the unique value proposition of data
science, as opposed to computer science, statistics, business analytics, or any
number of other allied fields, including data-intensive activities within various
academic disciplines themselves. We posit that the difficulties that academia
is encountering in crisply defining data science is due to the lack of the
realization of the need for a clinical
program in data science. Indeed, it is the incorporating the clinical component
with clinical faculty, at the undergraduate and graduate levels, that would
make a data science program so uniquely different from programs in other allied
fields. This is a key insight that, we believe, will help set free the myriad academic
committees at various institutions currently struggling to solve the puzzle of
how to uniquely define data science from a pedagogical perspective.
Any academic research institution that
has initiated a data science program is rife with stories of how they have been
inundated with requests from researchers from across their own campus—leave
alone industry, government, or other collaborators—for data science
assistance. There is, indeed, a big
demand on campuses for expertise and skill for “helping with data”. Requests come
from all parts of campus—science, engineering, social science, medicine,
humanities, etc. There is increasing recognition of the need for people who are
able to straddle the line between computational and data expertise and skills
on the one hand, and familiarity of the types of data, analyses, tools, and
culture of a discipline on the other.
Emergence of Clinical Data Science
The National Academies of Science Committee on Envisioning the Data Science Discipline: The Undergraduate
Perspective produced a report in May 2018 on Data Science For
Undergraduates: Opportunities And Options [NAS 2018], which contained a number
of recommendations including, “developing a cadre of faculty equipped to teach
the new field”, “prepare [students] for success in a variety of careers”, “prepare students for an array of data science
roles in the workplace”, and ensuring that “ethics is woven into the data
science curriculum from the beginning and throughout.” To realize the goals and objectives behind the
recommendations in the report from the NAS study, as well as to respond to the pressing
data science needs on campuses, we need a clinical
program in data science.
The key aspect that differentiates those comfortable with
calling themselves “data scientists”, versus those who are not so sure—and,
perhaps, feel inauthentic in that role—is the practical experience in dealing
with real-world data in real situations. Indeed, we believe that any bona fide data science program must
provide clinical experiences as an intrinsic part of education in data science,
along with the grounding in theoretical and foundational aspects.
An example of a data science activity that has the characteristics
of a clinical program is the “data science for social good” movement, which was
initiated by Rayid Ghani at the University of Chicago in 2013, with support from
the Schmidt Foundation [DSSG]. This program has been replicated several times
by others across the country, and around the world [Turing 2018]. A recently
announced solicitation that supports clinical work in data science is the NSF Data Science Corps solicitation (even
though that term does not appear per se
in the text of the solicitation) [NSF 2018]. A workshop on the Data Science
Corps idea was organized in December 2017 at Georgetown University. The
workshop report goes into further details of ideas for such a program [DSC 2017].
A few years ago, NSF also introduced the Big
Data Regional Innovation Hubs program [Hubs], which also gets to this idea
of creating mutually beneficial partnerships among academia, industry, and
government in order to address and solve real-world problems. I recently
attended a meeting at UC San Diego with officials from the City of San Diego
and the City of Carlsbad where this issue came up again. Cities are generating
more data; they need help with data analytics; and they need training in the
new techniques and technologies of data science. All of these examples are nibbling
at the edges of clinical data science.
Taking the Next Step: Clinical Faculty in Data Science
This single innovation of embedding
a clinical component in data science programs will revolutionize data science
education. It will help crystalize the vision for data science and its place
among various other academic disciplines and programs.
In 2017-2018, NSF supported three
workshops on the topic of Translational
Data Science—to bridge the gap between foundational methods and practical
applications in data science [Baru 2018, TDS]. These workshops recognized the “virtuous
cycle” between data science foundations and data science applications. Systematizing
the clinical component and supporting translational data science requires the recruitment
of a clinical faculty in data science,
to create, nurture, and sustain the clinical program, including relationships with
various external stakeholders. The clinical component becomes the natural venue
for hands-on projects, engagement with local communities, capstone projects,
summer internships, and the like.
Universities need to develop a
vision for hiring such clinical faculty. There are extant institutions,
centers, and programs where one can find examples of clinical programs and faculty,
even if they are not (yet) named as such. This includes institutions like my
own—the San Diego Supercomputer Center that has done pioneering work in related
areas for decades, and centers like the e-Science Institute at UW, and other
similar efforts.
How to hire clinical faculty and
jumpstart a clinical program in data science is the topic for a future blog
post.
REFERENCES
[Baru 2018] How to deliver translational data-science benefits to science and
society, Nature 561, 464 (2018), doi: 10.1038/d41586-018-06804-4, Chaitanya
Baru, Sarah Bird, Alan Blatecky, David Culler, Robert L. Grossman, Bill Howe,
Vandana P. Janeja, Meredith M. Lee, Raghu Machiraju, Elena Zheleva.
[Burbridge 1957] The Historical Background of the Teaching
Hospital in the United States Charles E.
Burbridge, Ph.D. Superintendent, Freedmen's Hospital, Washington,
D. C., Journal of the National Medical
Association, May 1957.
[Concetta 2016] Appendix A: The History of legal clinics in
the US, Europe, and around the world, Maria Concetta Romano, The Inquiry on Legal Clinical Education in European
Territory, 16 Diritto & Questioni Pubbliche 41 (2016).
[DSC 2017] Data Science Corps
Conference, https://mccourt.georgetown.edu/DataScienceCorp#,
December 2017.
[DSSG] Data
Science for Social Good, https://dssg.uchicago.edu/.
[Haydock 1983] Haydock,
Roger S. (1983) Clinical Legal Education:
The History and Development of a Law Clinic, William Mitchell Law Review:
Vol. 9: Iss. 1, Article 4. Available
at: http://open.mitchellhamline.edu/wmlr/vol9/iss1/4.
[Hubs] The Big Data Regional
Innovation Hubs, https://www.bigdatahubs.io/.
[NAS 2018] National Academies of
Sciences, Engineering, and Medicine. 2018. Data Science for Undergraduates:
Opportunities and Options. Washington, DC: The National Academies Press.
https://doi.org/10.17226/25104.
[NSF 2018] NSF Data Science Corps
solicitation, https://www.nsf.gov/pubs/2019/nsf19518/nsf19518.htm.
[Rolleston 1939] The History of Clinical Medicine
(Principally of Clinical Teaching) in the British Isles, Sir Humphrey
Rolleston, Procs of the Royal Society of Medicine, Vol XXXII 1185, 1939.
[TDS] First Translational Data
Science Workshop, June 2017, https://ctds.uchicago.edu/tds-17/;
Second Translational Data Science Workshop, November 2017, https://bids.berkeley.edu/events/translational-data-science-workshop;
Third Translational Data Science Workshop, October 2018, http://nebigdatahub.org/3rd-tds-workshop/.
[Turing 2018] Data Science for
Social Good, The Alan Turing Institute, https://www.turing.ac.uk/collaborate-turing/data-science-social-good.
[Weatherall 2014] History of Clinical Medicine, MW
Weatherall, DJ Weatherall, First published: 09 December 2014, https://doi.org/10.1002/9780470015902.a0003087.pub2
[1] In Computer Science, some institutions provide project management experience
and real-world interactions via their Software Engineering course(s). However,
those courses are typically upper level, optional courses. Such experiences and
interactions are not part of an introductory course(s), nor are they built into
courses throughout the program.
[2]
Based on an internal study conducted by AAAS Fellows at NSF.