Introduction and The Big Idea

School of Life Sciences, University of Hawaii

2025-01-14

The Modern Computationally-Literate Scientist

  • Uses computational tools to test ideas
  • Has Computing Skills to:
    • Handle any kind of data
    • Implement any kind of test
    • Produce graphics for exploration and communication
    • Test and validate code (how do we know its right?)
    • Interact with other computing systems and the Cloud
    • Can archive and disseminate data and workflow
    • Produce reproducible results!

Data Science Workflow

From R for Data Science 2e by Hadley Wickam, Garrett Grolemund, and Mine Çetinkaya-Rundel

R for Data Science

  • Question Development
  • Exploration and Testing
  • Communication

Our Work is Interdisciplinary

  • Disciplinary Knowledge (Biology) -> Question Development
  • Statistics -> Exploration and Testing
  • Computer Science -> Repeatable, Scalable, Reusable
    • Code MUST be FREE OF ERROR
    • Clean and Well documented (understandable)
    • Modular - enhances creativity and scalability

Classwork to Professional Science

There is a difference between one-time “getting it to work” vs. professional science (publication)

  • The “answer” must be correct - code validation
  • Must be repeatable
  • Workflow must be complete, well organized, documented
  • Data and code shared on a public repository with a DOI

How the Tools Fit Together

R for Data Science

Need Tools
Observe -> Record Data -> Data Table Notebooks
Code -> Document -> Comment (annotate) R
Organize Project -> Version Control -> Share Git/GitHub
Communicate Quarto/Rmarkdown

Open Source Tools you will learn in this course

How to Succeed

  • Practice
  • Make errors – figure out how to fix them
  • Fearlessly ask questions
  • Trial and Error is critical to learning
  • Validate – check that the answer is right
  • When you are developing a script, go back and clean it up!
  • Save the correct, good code, throw out the mistakes
  • Document so that you can understand it 1 year from now

Course Topics

  • Your Computer
    • Where information is stored - FILEPATHS
    • Your OS (Operating System)
  • Git/GitHub
  • R
  • Making them talk to each other
  • Coding Fundamentals
  • Tour of Univariate + Multivariate Statistics
  • Graphics
  • Special Topics - Tell me your interests! [Google Form]

Software

Learning R - A first session

  • Think about how R works as you try out commands
  • “Mistakes” are opportunities to learn how R works!
  • Learning Language involves a lot of trial and error
  • Don’t be afraid to try - poke it - it won’t break!
  • https://www.r-project.org Go to manuals, click on An Introduction to R
  • Follow Section 2.1
  • input -> R -> output
  • What came out? What does it tell you about the rules R follows?
  • Computers only do Exactly what you tell them to do
  • Jump to Appendix A - letʻs try to understand some rules of R together