LADS 2018: The 2018 Liberal Arts Data Science Workshop

9am Friday

Multivariate thinking and the introductory statistics and data science course: preparing students to make sense of a world of observational data

We live in a world of ever expanding "found" (or observational) data. To make decisions and disentangle complex relationships, students need a solid background in design and confounding. The revised Guidelines for Assessment and Instruction in Statistical Education (GAISE) College Report enunicated the importance of multivariate thinking as a way to move beyond bivariate thinking. But how do such learning outcomes compete with other aspects of statistics knowledge (e.g., inference and p-values) in introductory courses that are already overfull. In this talk I will offer some reflections and guidance about how we might move forward, with specific implications for introductory statistics and data science courses.

Nicholas Horton

Nicholas Horton is Professor of Statistics at Amherst College, with methodologic research interests in longitudinal regression models, missing data methods, and statistical computing. He graduated from the Harvard TH Chan School of Public Health in 1999. Nick has received the ASA's Founders Award, the Waller Education Award, the William Warde Mu Sigma Rho Education Award, and the MAA Hogg Award for Excellence in Teaching. He has published more than 160 papers, co-authored a series of four books on statistical computing and data science, and was co-PI on the NSF funded MOSAIC project. Nick is a Fellow of the ASA and the AAAS, served as a member of the ASA Board, chairs the Committee of Presidents of Statistical Societies and is chaired the ASA Section on Statistical Education. He is a member of the National Academies of Sciences Committee on Applied and Theoretical Statistics and two Academy studies on undergraduate data science education.

9:55am Friday

Writing in the introductory statistics class

In statistics and data-science curricula, learning the how’s and why’s of methods is only a part of the larger picture. An essential skill for our future practitioners will be the ability to communicate their results to broad audiences, both verbally and in writing. As with any other skill, clear communication takes practice. At Bucknell University our introductory statistics course is classified as a writing-intensive course. We achieve this by emphasizing writing to learn, writing to explore, and writing to explain components throughout the course. These learning opportunities allow students to express statistical concepts in their own words and to practice the use of precise statistical language. The presentation will be interactive with the opportunity to explore existing prompts and to develop prompts to be used in one’s own course.

KB Boomer, Bucknell University

1:45pm Friday

Projects first in an interdisciplinary data analytics curriculum

Let’s face it: most college students aren’t particularly motivated by the prospect of learning about “regression” or “for loops.” To engage more (and more diverse) students, we need to appeal to their interests and the problems they want to solve. This is also how most quantitative scientists came to data science, via problems they were passionate about. Denison’s new Data Analytics major was created from the ground up with this awareness and is deeply interdisciplinary throughout. For example, our introductory Data Analytics and Computer Science courses lead with applied questions, show how to answer them using mathematical and computational techniques, and then give students projects in which they can practice their skills. Chosen carefully, leading with a sequence of applied topics covers the same ground as a traditional “methods first” course, but engages the students more effectively and facilitates learning by making concrete connections with prior knowledge in familiar contexts.

Jessen Havill, Denison University

Jessen Havill is a Professor of Computer Science and Benjamin Barney Chair of Mathematics at Denison University. He is also the founding Director of Denison’s new interdisciplinary Data Analytics major. Dr. Havill teaches courses across the computer science curriculum, as well as an interdisciplinary elective in Computational Biology. In 2009, he developed a new problem-first, project-oriented introductory computer science course that led to the publication in 2016 of Discovering Computer Science: Interdisciplinary Problems, Principles, and Python Programming. He was awarded the college's highest teaching honor, the Charles A. Brickman Teaching Excellence Award, in 2013. Dr. Havill's primary research interest is in the development and analysis of online algorithms. In addition, he has collaborated with colleagues in biology and geosciences to develop computational tools to support research and teaching in those fields. Dr. Havill earned his bachelor's degree from Bucknell University and his Ph.D. in Computer Science from The College of William and Mary.

2:40pm Friday

Experiences with big data analytics in the clinic and the classroom at Harvey Mudd College

The Data Explosion is changing how professors teach undergraduate students in every area of the curriculum including courses, researches and industry projects. In this talk I will share some experiences and challenges with big data analytics in the Clinic and the classroom at Harvey Mudd College (HMC). The Clinic is a program of collaboration between industry and the College that engages juniors and seniors in the solution of real-world technical problems for industrial clients. One of my jobs as mathematics clinic director at HMC is to recruit clinic projects for the HMC students. There are big data problems in every company I visit. These companies know that machine learning tools exist to help solve their data problems, but either they are too busy dealing with large amounts of data to identify the right tools and algorithms, or they lack the modeling skills to use the tools effectively. To address these challenges I have designed a course to turn HMC students into big data consultants, familiar with standard tools and algorithms and able to apply them to real-world problems. This course presents challenges of its own, and I will discuss it as a work in progress.

Weiqing Gu, Harvey Mudd College

Weiqing Gu, professor of mathematics and director of the Harvey Mudd College Mathematics Clinic, specializes in differential geometry and topology, with applications to Big Data analysis, computer-aided design and robotics. Her research on the geometry of a manifold (e.g., a sphere or a Grassmann manifold) and in computational geometry directly applies to Big data-to-decision, fundamental problems in dynamics, control theory, robotics and computer graphics. For example, Gu is currently investigating possible applications of differential geometry in anomaly detection and predictive models involving big data. She is also applying Lie Theory, Grassmann-Cayley algebra, and Riemannian geometry to Jacobi fields and geodesics on the Euclidean group and its subgroups, for the purpose of synthesizing smooth motions for computer animation and for planning optimal motions of robot manipulators. Gu also researches applications to math-biology and applications to industrial mathematics including optimal control, encryption, computer vision, and color scheme.

9am Saturday

Incorporating student projects into the introductory statistics classes

Student projects in the introductory statistics class can fulfill the American Statistical Association’s recommendations for both what to teach and how to teach it. Projects teach statistics as an investigative process. Students must formulate a question, collect and analyze data, and interpret and present their results. The process requires students to think statistically and to understand and apply the concepts they learn in the course. The project requires active engagement and hence constitutes active learning. Students use technology to collect, store, share, clean, explore, and draw inferences from data. Projects are inherently multivariate. Students receive regular assessments throughout the process. Some are verbal and some written, both from the instructors and from their peers through peer reviews. In this talk I will describe the requirements for the project proposals, the interim reports we collect, the project day posters and presentation guidelines, the poster and presentation evaluation sheets, and the form I use for feedback from each student about her contribution to the project and the contributions of every member of her team. I will also share some of our more successful project ideas.

Katherine Taylor Halvorsen, Smith College

9:55am Saturday

Projects using municipal data

Patricia Boyle-McKenna, City of Boston

1:45pm Saturday

The role of visualization capacity building in data science

Data Visualization, transforming raw data in a way that informs but does not overwhelm has become a prerequisite for daily operations in our data driven society. Visualization capacity building, the ability to understand the visualization process from data acquisition to transformation of data into something that is useful that provides insight, is becoming increasingly important to prepare students for a data enabled workforce. How do we enable students to go beyond being data generators to becoming “Agents of Insight?” The introduction to data visualization must begin early and often. This presentation will describe initiatives designed to introduce data visualization at the undergraduate level, strengthen student data visualization skills and capabilities and broaden participation in data visualization and data science.

Vetria Byrd, Purdue Polytechnic Institute

Vetria Byrd is an Assistant Professor at the Polytechnic Institute at Purdue University (Main Campus in West Lafayette, Indiana) in the Department of Computer Graphics Technology. Dr. Byrd is the Director of the Byrd Visualization Lab where her research interests include the visualization of heterogeneous data, big data visualization and uncertainty visualization. Dr. Byrd teaches data visualization courses at the undergraduate and graduate levels and is contributing to curriculum development for a recently approved undergraduate major in data visualization at Purdue. Dr. Byrd is the founding PI for the highly competitive 2014/2015 NSF funded REU Site: Research Experience for Undergraduates in Collaborative Data Visualization Applications and is the founding organizer for the Biennial Broadening Participation in Visualization (BPViz) Workshop which is designed to broaden participation of women and members of underrepresented groups in data visualization. Dr. Byrd has given numerous national and international workshops and lectures on data visualization as an invited lecturer for the Annual International High Performance Computing Summer School (2015-2017), 2014 Plenary speaker for Extreme Science and Engineering Discovery Environment (XSEDE) Conference, her talk was featured on HPCWire, as well as webinars as part of the 2017 Blue Waters Data Visualization Seminar Series (NSF funded) and NASA Datanauts Program. Dr. Byrd holds PhD, master’s and bachelor’s degrees in computer science and a master’s degree in biomedical engineering, from the University of Alabama at Birmingham.

2:40pm Saturday

It may be deep, but is it learning?

As statistician David Moore was leaving Princeton with his newly minted PhD, his adviser Jack Kiefer warned him: “Just remember, there’s a lot of statistics every sociologist knows that you don’t know.” The thesis of my talk is borrowed from Kiefer’s caution. When it comes to learning from observed data, there’s plenty that students of the Humanities know that data science has yet to address. In my talk I plan to raise some challenges for data science raised by learning in the Humanities. I plan to raise these challenges in the context of a question that should concern all of us: How can we teach data science as a subject worthy of the Liberal Arts? I rely in part on two recent opinion pieces: “What is Statistics?” (Brown and Kass, The American Statistician 2012) and “Statistical Modeling: The Two Cultures” (Leo Breiman, Statistical Science 2001). My thesis is that what most distinguishes a Liberal Arts education from vocational training is a self- referential emphasis on learning how to learn.

George Cobb, Mount Holyoke College