bioinformatics by explaining the exercises and introducing
students to the bioinformatics resources and tools they will
use, namely NCBI database, NCBI ORF finder, NCBI BLAST,
and Microscope (MaGe). The tutorial video (Supplemental
Material) should help teachers in this task and assist students
throughout the exercises.
III. Bioinformatics exercises (estimated time: 70 minutes): Students
carry out the exercises autonomously with the teacher’s supervision to identify difficulties and answer questions.
IV. Discussion of the results (estimated time: 20 minutes): The
class discusses the results obtained in each exercise and assay to
draw conclusions. Ultimately, the teacher might challenge the
students to explore other case studies and study different genomic regions. In addition, we should not neglect students’
endeavor to explore autonomously the bioinformatics resources, particularly taking into account their user-friendly and
intuitive interfaces. In fact, during the pilot trial, we observed
that some students took the initiative to extend their in silico
experiments beyond the assigned activities by pursuing their
own research queries, as, for instance: “What is the size of the
genome of a spider?”; “Are virus genomes such as HIV also
available at this database?”; or “Let’s search for the gene coding
The bioinformatics-based activities described below are structured
according to four distinct exercises (see the video tutorial): 1 – getting
the target DNA sequence; 2 – looking for ORFs; 3 – deciding which of
the retrieved ORFs are likely to be genes; and 4 – analyzing the gene(s)
identified within their expected genomic context. Having in mind that
laboratory-based activities should meet the curricular agenda, and
acknowledging the fact that lac operon is a common example for
teaching gene regulation, the query DNA sequence chosen to exemplify these exercises corresponds to lacI and flanking regions. Furthermore, to frame the bioinformatics-based activities in an inquiry-based
approach, all exercises start with a guiding question.
1. Getting the DNA Sequence
This initial exercise aims to answer the question “How does one
access a comprehensive gene bank database to obtain the specific
DNA sequence to be studied?”
1.1. Access NCBI website: http://www.ncbi.nlm.nih.gov/.
1.2. Choose Genome in menu next to the search box.
1.3. Search by “E. coli”.
1.4. At the beginning of the new page, select Reference Genome
by clicking the E. coli strain K12.
1.5. Scroll down and click on the accessing number corresponding to E. coli strain K12 in the Reference Sequence command to retrieve the full genome sequence.
1.6. Choose the FASTA format.
1.7. Open the selection box Change region shown and type
down the coordinates 366001–368041.
1.8. Copy, paste and save the sequence in a Word or Notepad
Learning objectives. Through the exploration of the comprehensive
bioinformatics database NCBI, students learn
• how the database is organized, its complexity, and
• how to search for DNA sequences and gene sequences for different organisms.
2. Deconstructing the DNA Sequence
This exercise was planned to instruct students how to go from an
unknown DNA sequence to the identification of hypothetical coding
sequences. Students are introduced to the notion of ORFs, which
frequently escapes the scientific lexicon of elementary and high
school biology curricula, but which is instrumental for answering
the question “How is a new DNA sequence deconstructed?”
2.1. Access NCBI ORFfinder: http://www.ncbi.nlm.nih.gov/orf-finder/.
2.2. Paste the sequence previously saved as Word or Notepad
document into the text box provided.
2.3. Choose the genetic code: 11. Bacterial, Archaeal and Plant
2.4. Choose the option “ATG” and alternative initiation codons.
2.5. Click Submit.
2.6. Analyze the obtained results (Figure 2).
Learning objectives. With this exercise, students
• recognize the six different reading frames in a DNA sequence,
• understand the meaning of ORF, and
• recognize the importance of start and stop codons for identifying all possible ORFs.
3. Which ORFs Are Potential Genes?
Basic Local Alignment Search Tool (BLAST) is a powerful algorithm
capable of finding similarities between a query sequence (DNA or a
protein sequence) and the sequences available in databases (
Altschul et al., 1990). Using this application, the students can address
the following questions: “Which of the ORFs retrieved in the previous
exercise are probable genes? Which ORFs are unlikely functional coding
3.1. Select one ORF to study (example: ORF 28).
3.2. Start BLAST of the selected ORF by clicking on BLAST
3.3. Click on BLAST in the new page opened.
3.4. Identify the gene (Figure 3).
3.5. Repeat the procedure for other ORFs and analyze the
Learning objectives. Students learn that
• not all DNA sequences bracketed by a start and a stop codon
( i.e., ORFs) are coding sequences,
• ORFs can be located in different reading frames and oriented in
either direction, and
• scrutinizing gene banks by a BLAST search is an effective
approach for identifying putative genes among retrieved ORFs.
Students can discuss possible scenarios to explain a BLAST
search in which no similarities are found.