What is the question?

Letʻs talk about scientific excellence & what data can and cannot do
module 2
week 3
data
questions
data mining
Author
Affiliation

School of Life Sciences, University of Hawaii

Published

January 31, 2023

Pre-lecture materials

Read ahead

  • Take a look at the data in this recent paper: Winchell et. al, (2023) Genome-wide parallelism underlies contemporary adaptation in urban lizards. PNAS 120 (3) e2216789120 Check SLACK channel for pdf

Watch ahead: A Motivating Example for the whole endeavor of data analysis

In the age of Artificial Intelligence (AI), we have many data-intensive tools available. But can we just throw more data at a problem to get better outcomes? Please watch this thought provoking short talk by Sebastian Wernicke “How to use data to make a hit TV show”… What goes wrong when we look for decisions in the wrong places r emojifont::emoji('palm_tree')

Can we design a hit TV show (or anything of importance) with data?

The Wizard of Oz (1939) starring Judy Garland was the first major motion picture in color via the complex Technicolor process. It was a tremendous success. I remember my father telling me about what a huge event it was when it came to his small town. Yet, would it have been an inevitable success? Even at that time, there were many other movies, and there were a lot of doubts about whether the American audience would accept the fantasy story, the length, the musical choices, the actors, and so many other variables. Even though the movie landscape was much simpler then, it was still multivariate.

Yet there must be something to the analysis of data into human behavior. Internet companies sell our web browsing history, there are still opinion polls, demographic surveys, and the like. Do these types of data differ? Can we do better?

Questions for “How to use data to make a hit TV show”
  • The first study produced a TV show that was perfectly average. How do you imagine they approached the data, and what might have been the difference with the study that led to the hit show?
  • How do shows become hits? Is the underlying mechanim complex? Or is the predictive data complex? or both? What contributes to complexity?
  • What is a good role for data analysis in this type of question?
  • Can we really “let the data tell us the answer”? What does such a statement leave unstated?

Genome-wide parallelism underlies contemporary adaptation in urban lizards

This is a hot-off-the-presses study into a hot topic.

Discussion Questions
  • What are the questions in this study?
  • What are the types of data?
  • At the most basic level, what are the questions in the data analysis?

When we break it down, what can we do with data?

  • Same vs. different
    • Similar vs. less similar
  • Moving in the same direction
  • Are groupings real?
  • Larger vs. smaller
    • Predictive order

Can we identify these data comparisons in Winchell et. al, (2023)?