Workshop Overview
In collaboration with the Departments of Statistics and Biological Data Science, the CQLS offers a number of workshops and classes available to both internal and external faculty, staff, postdocs, and students. Generally these are 1- or 2-credit, 5- or 10-week classes offered in academic terms according to the schedule illustrated below. For questions on course content, please see the descriptions below and/or contact the trainers with questions.
Most of these utilize our Advanced Cyberinfrastructure Teaching Facility.
REDCap workshops are 1.5 hour long and occur periodically throughout the year. CQLS training REDCap resources are used.
Registration Winter Workshops
We are accepting registrations for Winter term, 2025 (see descriptions below):
Python I / Python II
(Instructor: Justin Elser)
- These two modules are offered as a back to back pair:
- Python I, January 6 - February 7, Mon/Wed, 12:00pm - 12:50pm
- Python II, February 10 - March 14, Mon/Wed, 12:00pm - 12:50pm
- They are available for student credit:
- Python I, BDS 599 CRN 35375
- Python II, BDS 599 CRN 35376
- Non-students can attend as a workshop
- Python I, $250 OSU rate, $375 non-OSU rate*
- Python II, $250 OSU rate, $375 non-OSU rate*
- To sign up for a workshop or to get other information, email Justin Elser the following information:
- Name of the course/workshop
- Index to bill
- Name of Associated PI
- Participants ONID or Student ID
*(Workshop payments can also be made via check or credit card)
Note: If you are from an Oregon college or university, please contact us about possible discounts
Environmental Sequence Analysis
(Instructor: Steven Carrell)
- Dates: January 6 - March 14, Mon/Wed, 12:00pm - 12:50pm
- Available for student credit:
- Environmental Sequence Analysis, BDS 599 CRN 35373
- Non-students can attend as a workshop
- Environmental Sequence Analylsis, $500 OSU rate, $750 non-OSU rate*
To sign up for a workshop or to get other information, email Steven Carrell the following information:
- Name of the course/workshop
- Index to bill
- Name of Associated PI
- Participants ONID or Student ID
*(Workshop payments can also be made via check or credit card)
Note: If you are from an Oregon college or university, please contact us about possible discounts
Workshop Descriptions
Workshop Descriptions
Infrastructure
Command Line
Introduction to Unix/Linux (5 weeks @ 2 hrs per week)
This module introduces the natural environment of bioinformatics: the Linux command line. Material will cover logging into remote machines, filesystem organization and file manipulation, and installing and using software (including examples such as HMMER, BLAST, and MUSCLE). Finally, we introduce the CQLS research infrastructure (including submitting batch jobs) and concepts for data analysis on the command line with tools such as grep and wc.
Command-Line Data Analysis (5 weeks @ 2 hrs per week)
The Linux command-line environment has long been used for analyzing text-based and scientific data, and there are a large number of tools pre-installed for data analysis. These can be chained together to form powerful pipelines. Material will cover these and related tools (including grep, sort, awk, sed, etc.) driven by examples of biological data in a problem-solving context that introduces programmatic thinking. This module also covers regular expressions, a useful syntax for matching and substituting string and sequence data.
Programming
Python
Python I (5 weeks @ 2 hrs per week)
This module introduces programming concepts, driven by examples of biological data analysis, in the Python programming language. Topics covered will include variables and data types (including strings, integers and floats, dictionaries and lists), control flow (loops, conditionals, and some boolean logic), variable scope and its proper use, basic usage of regular expressions, functions, file input and output, and interacting with the larger Unix/Linux environment. Prior experience with the Unix/Linux command-line is recommended (previously or simultaneously taking Intro to Unix/Linux satisfies).
Python II (5 weeks @ 2 hrs per week)
Part II of the Python series expands on basic programming topics and explores a common concept in modern software development called Object Oriented design, driven again by examples of biological data analysis. Although we will not cover the subtopics of inheritance or public/private variables, we will discuss the use of objects (and their blueprints: classes) in encapsulating functionality into easily used blocks of code that more closely match the biological concepts at hand. Other topics in this area include APIs and syntactic sugar. Finally, we’ll use these ideas to explore creating and using packages such as the BioPython package. Prior experience with the Unix/Linux command-line is recommended (previously or simultaneously taking Intro to Unix/Linux satisfies).
Analysis
Genotyping By Sequencing
GBS (10 weeks @ 2 hrs per week)
This course provides a general introduction to, and practical experience with, Genotyping By Sequencing (GBS). After a general overview, hands on experience will be obtained in basic concepts of command line, R-studio, and accessing and utilizing a computing infrastructure before exploring the methodology associated with GBS and other types of restriction-based sequencing techniques (e.g. RAD-seq). Starting with raw sequence data, students will then work through a series of exercises to generate and analyze test GBS data.
RNA-Sequencing
RNA-Seq (10 weeks @ 2 hrs per week)
This course provides an introduction to, and practical experience with, the computational component of bulk-RNA-sequencing. After a general overview, participants will obtain a working introduction to command line, R-studio, and accessing and utilizing a computing infrastructure. Students with then work through a series of exercises cleaning raw FASTQ files, aligning reads to a reference genome, quasi-mapping reads to a transcriptome / de novo assembly, followed by data visualization and Differential Gene Expression analysis.
Environmental Sequence Analysis
Environmental Sequence Analysis (10 weeks @ 2 hrs per week)
This course provides practical experience with, 16s rRNA amplicon sequencing and shotgun metagenomics. After a general overview, participants will be given a working introduction to command line, R-studio, and accessing and utilizing a computing infrastructure. Beginning with raw sequence data, students will then work through a series of hands-on exercises for profiling 16s rRNA sequences (using MOTHUR & DADA2) and determining the taxonomy and functional composition of metagenomic samples (using METAPHLAN2 & HUMANN2).
REDCap
Basic Workshop
Introduction to REDCap (1.5 hour workshop)
Learn how to:
- Collect sensitive data for research such as personal health/identifying information
- Build data collection instruments, collect data, view data, stats, and charts, and export data.
- Configure longitudinally to reuse data collection elements to more easily analyze data changes over time
- Safely make changes to your database without disrupting existing data
Intermediate Workshop
Surveys in REDCap (1.5 hour workshop)
Learn how to:
- Create projects with an xml file -Deploy surveys for participant data capture and different methods to distribute surveys
- Electronic consent (e-consent)
- Action tags to customize and improve the data entry experience
Primer for Computational Biology
Read Online at Open Oregon State
Open-Access & Free
Order Print Copy: Amazon, OSU Press
Scribble in the Margins
A Primer for Computational Biology aims to provide life scientists and students the skills necessary for research in a data-rich world. The text covers accessing and using remote servers via the command-line, writing programs and pipelines for data analysis, and provides useful vocabulary for interdisciplinary work. The book is broken into three parts:
Part I, Introduction to Unix/Linux: The command-line is the natural environment of scientific computing, and this part covers a wide range of topics, including logging in, working with files and directories, installing programs and writing scripts, and the powerful “pipe” operator for file and data manipulation.
Part II, Programming in Python: Python is both a premier language for learning and a common choice in scientific software development. This part covers the basic concepts in programming (data types, if-statements and loops, functions) via examples of DNA-sequence analysis. This part also covers more complex subjects in software development such as objects and classes, modules, and APIs.
Part III, Programming in R: The R language specializes in statistical data analysis, and is also quite useful for visualizing large datasets. This third part covers the basics of R as a programming language (data types, if-statements, functions, loops and when to use them) as well as techniques for large-scale, multi-test analyses. Other topics include S3 classes and data visualization with ggplot2.
About the Author
Shawn T. O’Neil earned a BS in computer science from Northern Michigan University, and later an MS and PhD in the same subject from the University of Notre Dame. His past and current research focuses on bioinformatics. O’Neil has developed and taught several courses in computational biology at both Notre Dame and Oregon State University.