Tutorial

This tutorial takes you through the various features available with WebGBrowse in order to configure your gff3 dataset for GBrowse. Before introducing you to the tutorial, we assume that you know how to load your data into a GFF3 (General Feature Format version 3) standard file. It is important to remember that WebGBrowse is designed for GBrowse versions 1.69 and above which use GFF3 format files (.gff3), not the GFF version 2 (.gff) files used by earlier GBrowse versions. The GBrowse Administration Tutorial (Lincoln Stein, 2008) has a section introducing the GFF3 data file format for first time users.

Table of Contents

  1. Requirements
  2. Sample Dataset
  3. Providing Input to WebGBrowse
  4. Configuration Panel
  5. Working with Tracks
  6. Adding Links
  7. Displaying Features with Multiple Subparts
  8. Displaying Protein-Coding Genes
  9. Reading Frames
  10. Displaying Quantitative Data
  11. Displaying Grouped Features
  12. Using a Pre-existing Configuration File as a Template
  13. Conclusion

1. Requirements

You can run WebGBrowse either directly form our server (http://webgbrowse.cgb.indiana.edu) or you can download and install it locally. For details on downloading the standalone version and the installation instructions, please visit the software page. The online version of WebGBrowse does not require any specific software to be installed on the client's side (except for a web browser). WebGBrowse is tested to work on IE 7.0 and above, Mozilla Firefox and Safari on Windows, Mac and Linux platforms.

2. Sample Dataset

In this tutorial, we will use a simulated Volvox genome annotation GFF3 dataset derived from the GBrowse Administration Tutorial from the GMOD website. This dataset presents different feature types (column3) that can be configured to illustrate default generic display, multi-segmented features, protein-coding genes, reading frames, grouped features and quantitative data display and this tutorial is intended to show you how to achieve that configuration using WebGBrowse. This dataset also includes the genome sequence information at the end in FASTA format. GBrowse associates this sequence information with the corresponding features that are being displayed. Download and save this sample dataset (volvox.gff3) to your hard drive. Open a browser window and go to the WebGBrowse homepage (http://webgbrowse.cgb.indiana.edu/) from where you can upload your dataset (Figure 1).


WebGBrowse Input Form
Figure 1: WebGBrowse Input Form
 

3. Providing Input to WebGBrowse

Click the "Browse..." button in the "GFF3 File" section of the WebGBrowse Input Form and select the file, volvox.gff3, you downloaded earlier. The input can also be provided as a compressed file in .zip or .gz format, provided the compressed file contains a single .gff3 file without any sub-directories. You can use an existing configuration file (if you have one) as a template instead of starting from the scratch. We recommend the usage of only those configuration files which are generated using WebGBrowse to be used as templates. There is a section on using a pre-existing configuration file as a template included in this tutorial. For now, we will build the configuration right from the scratch. You can also enter your email address, which is optional. By providing email address you can perform the configuration in multiple sessions, receive the configuration results to your email and keep track of all your previous submissions. To learn more, please see the section on Tracking the Previous Submissions to WebGBrowse. After entering the required input, click submit. Upon validating the contents of the input, WebGBrowse will navigate to the "Configuration Panel".

4. Configuration Panel

The configuration panel is the place where you can add, edit or delete feature tracks for the GBrowse display of your dataset. If you have provided your email address in the WebGBrowse input, you will see a "Save Progress" button at the top right corner of the configuration panel which lets you perform the configuration process in multiple sessions. You can save your progress at any stage by clicking on that button and WebGBrowse will send a link to your email. You can resume your work by clicking on that link. You can provide a short description for your dataset in the "Description" field. For now, enter "Sample Volvox Dataset for WebGBrowse" as description.

WebGBrowse populates the "Feature" listbox in the "Add New Track" section (Figure 2) with the list of the unique features derived from your dataset which can be configured into individual GBrowse tracks. This list is derived based on the values from the feature type column (column3) and the source column (column2) from the data. Besides each of the unique feature types (column3), the unique type:source (column3:column2) sets with common "type" values will form the items in the unique features list. For example, in the sample Volvox genome data, the feature type "remark" would make a single list item (since all the corresponding rows have the same source) where as the feature type "CDS" would make four list items - "CDS", "CDS:example", "CDS:predicted", "CDS:exonerate".


Add New Track Section in the Configuration Panel
Figure 2: "Add New Track" Section in the Configuration Panel
 

Next to the "Feature" list box, you will see a "Glyph" dropdown list (Figure 2) which contains the names of various glyphs available with WebGBrowse. A glyph determines the shape of each of the genomic features associated with a specific GBrowse track in the GBrowse display. It presents a configurable set of parameters specific to its type allowing further control over the display features such as height, color, width etc. The configuration panel is powered by a "Glyph Library" which stores the complete configuration details of the various glyphs available. For more details on the glyph library, please visit the glyph library page. A short description and a sample image for the selected glyph will be displayed next to the glyph dropdown list.

5. Working with Tracks

Adding Tracks
You can define the GBrowse tracks by associating a feature to be displayed with a glyph from the glyph library. If you open the contents of the volvox.gff3 file, you will notice a series of features of type"remark" as shown below.

ctgA	example	remark	1659	1984	.	+	.	Name=f07;Note=This is an example
ctgA	example	remark	3014	6130	.	+	.	Name=f06;Note=This is another example
ctgA	example	remark	4715	5968	.	-	.	Name=f05;Note=Ok! Ok! I get the message.
ctgA	example	remark	13280	16394	.	+	.	Name=f08
ctgA	example	remark	15329	15533	.	+	.	Name=f10
ctgA	example	remark	19157	22915	.	-	.	Name=f13
...

Let's add a new track based on the feature remark using the generic glyph. In the configuration panel select "remark" from the feature list box and "generic" from the glyph list box and click "Add Track". It flashes the "Generic Glyph Parameters" form (Figure 3).


Glyph Parameters Form
Figure 3: Glyph Parameters Form
 

Change the "Key" to "ExampleFeatures" and "Glyph Background Color" to "navy". You may want to play with the other parameter values including those available in the "Advanced Section". A "Save and Continue" and a "Cancel Changes" buttons will be available at the top and bottom ends of the form. Once you are done setting the parameter values, click "Save and Continue" to add the new track and go back to the Configuration Panel. Since, we now have a track added, you will notice a new section "Tracks Added" extending the configuration panel(Figure 4). A Configured Tracks listbox shows the currently added tracks (currently the "remark" track). Selecting a track will display the configuration for the selected item in a Configuration Settings box on the right.


Glyph Parameters Form
Figure 4: "Tracks Added" Section in the Configuration Panel
 

Click the "Display in GBrowse 1.70" button or "Display in GBrowse 2.0" button to display the configured dataset in your desired version of GBrowse (Figure 5). A WebGBrowse Control Panel hosting a "Edit Configuration" and a "Download Configuration" buttons is provided at the top of the GBrowse display. These buttons will enable you to go back to the Configuration panel to make further changes or to download the generated configuration file. You may use this downloaded configuration file as a template while configuring other similar datasets.


GBrowse Display with WebGBrowse Control Panel
Figure 5: GBrowse Display with WebGBrowse Control Panel
 

Modifying Tracks
You can edit or delete the added tracks by clicking on "Edit Track" or "Delete Track" buttons respectively in the configuration panel (Figure 4). Clicking the "Edit Track" button will open the configuration information for the selected track in the Glyph Parameters form (Figure 3) for editing. But, there is no direct way to change the glyph associated with a track, and should you decide to change, you will need to delete the track by clicking on "Delete Track" button and add it once again.

6. Adding Links

Let's examine a few more features by adding more tracks. From the configuration panel add a new track based on the feature "polypeptide_domain" and the glyph "span".

ctgA	example	polypeptide_domain	11911	15561	.	+	.	Name=m11;Note=kinase
ctgA	example	polypeptide_domain	13801	14007	.	-	.	Name=m05;Note=helix loop helix
ctgA	example	polypeptide_domain	14731	17239	.	-	.	Name=m14;Note=kinase
ctgA	example	polypeptide_domain	15396	16159	.	+	.	Name=m03;Note=zinc finger
ctgA	example	polypeptide_domain	17023	17675	.	+	.	Name=m08;Note=7-transmembrane
ctgA	example	polypeptide_domain	17667	17690	.	+	.	Name=m13;Note=DEAD box
...

In the "Span Glyph Parameters" form change the "Key" to "Example Motifs" and "Glyph Height" to "5". The default value for "Link" would be "AUTO" which makes GBrowse generate an automatic link to a helper script named "gbrowse_details". You can change it to any URL you may wish. You can also include any of the built in recognized variables such as $name, $description etc. For the complete list of recognized variables please see the sub-section "link" in GBrowse Configuration HOWTO. For now change the link to "http://www.google.com/search?q=$description" and click "Next". This will make GBrowse generate a link to google search page with the description being the search term.

7. Displaying Features with Multiple Subparts

The sample dataset contains features of type "match" as shown below. You will notice that multiple lines carry the same ID representing the subparts of the same feature. Such features can be best displayed using "segments" or "transcript" or "transcript2" glyphs.

ctgA	example	match	32329	32359	.	+	.	ID=match-seg01;Name=seg01;Note=This is a segment
ctgA	example	match	26122	26126	.	+	.	ID=match-seg02;Name=seg02
ctgA	example	match	26497	26869	.	+	.	ID=match-seg02;Name=seg02
ctgA	example	match	27201	27325	.	+	.	ID=match-seg02;Name=seg02
ctgA	example	match	27372	27433	.	+	.	ID=match-seg02;Name=seg02
ctgA	example	match	27565	27565	.	+	.	ID=match-seg02;Name=seg02
...

Add a track using the feature "match" and the glyph "segments". Change the "Key" to "Example Alignments", "Glyph Background Color" to "lightgrey" and "Connector Type" to "Solid" (Figure 6). You can also experiment with the glyphs "transcript" and "transcript2" to display features with subparts. For more description on these glyphs, please visit the glyph library page.


Segments Glyph Displaying Features with Multiple Subparts
Figure 6: Segments Glyph Displaying Features with Multiple Subparts
 

8. Displaying Protein-Coding Genes

The sample dataset we downloaded earlier contains the "gene" feature which is a three-tiered structure to represent the gene, descending from gene to mRNA to CDS and UTR features. The various parts fit together using ID and Parent features. The gene in our dataset is named EDEN with its three spliced forms named EDEN.1, EDEN.2 and EDEN.3.

ctgA	example	gene	1050	9000	.	+	.	ID=EDEN;Name=EDEN;Note=protein kinase
ctgA	example	mRNA	1050	9000	.	+	.	ID=EDEN.1;Parent=EDEN;Name=EDEN.1;Note=Eden splice form 1;Index=1
ctgA	example	five_prime_UTR	1050	1200	.	+	.	Parent=EDEN.1
ctgA	example	CDS	1201	1500	.	+	0	Parent=EDEN.1
ctgA	example	CDS	3000	3902	.	+	0	Parent=EDEN.1
ctgA	example	CDS	5000	5500	.	+	0	Parent=EDEN.1
ctgA	example	CDS	7000	7608	.	+	0	Parent=EDEN.1
ctgA	example	three_prime_UTR	7609	9000	.	+	.	Parent=EDEN.1

ctgA	example	mRNA	1050	9000	.	+	.	ID=EDEN.2;Parent=EDEN;Name=EDEN.2;Note=Eden splice form 2;Index=1
ctgA	example	five_prime_UTR	1050	1200	.	+	.	Parent=EDEN.2
ctgA	example	CDS	1201	1500	.	+	0	Parent=EDEN.2
ctgA	example	CDS	5000	5500	.	+	0	Parent=EDEN.2
ctgA	example	CDS	7000	7608	.	+	0	Parent=EDEN.2
ctgA	example	three_prime_UTR	7609	9000	.	+	.	Parent=EDEN.2

ctgA	example	mRNA	1300	9000	.	+	.	ID=EDEN.3;Parent=EDEN;Name=EDEN.3;Note=Eden splice form 3;Index=1
ctgA	example	five_prime_UTR	1300	1500	.	+	.	Parent=EDEN.3
ctgA	example	five_prime_UTR	3000	3300	.	+	.	Parent=EDEN.3
ctgA	example	CDS	3301	3902	.	+	0	Parent=EDEN.3
ctgA	example	CDS	5000	5500	.	+	1	Parent=EDEN.3
ctgA	example	CDS	7000	7600	.	+	1	Parent=EDEN.3
ctgA	example	three_prime_UTR	7601	9000	.	+	.	Parent=EDEN.3

The "gene" glyph can be used to depict alternatively spliced protein-coding genes. If you want to display the individual transcripts, use the "Processed Transcript" glyph (see below). Let's add a track for the feature "gene" and the glyph "gene" (Figure 7). Change the "Key" to "Protein-coding Genes", "Glyph Background color" to "peachpuff", "Category" to "Genes" and check the option "Show Label Transcripts". Changing the category will place the track under a different group in the gbrowse display. If you want to make the track look like UCSC Genome Browser, set "Glyph Background color" to "black", and in the "Transcript" section change "UTR Color" to "black" and check the options "Make UTRs thinner" and "Decorate Introns".


Gene Glyph Displaying Alternatively Spliced Protein-Coding Genes
Figure 7: Gene Glyph Displaying Alternatively Spliced Protein-Coding Genes
 

The "gene" glyph can also be used to illustrate simpler genes. For example choose the feature "CDS:predicted" and the glyph "gene". Change the "Key" to "Predicted genes", "Glyph Background Color" to "white" and "Category" to "Genes" (Figure 8).

ctgA	predicted	CDS	10000	11500	.	+	0	Name=Apple1;Note=A for Apple
ctgA	predicted	CDS	13000	13800	.	+	0	ID=cds-Apple2;Name=Apple2;Note=AnotherApple
ctgA	predicted	CDS	15000	15500	.	+	1	ID=cds-Apple2;Name=Apple2
ctgA	predicted	CDS	17000	17200	.	+	2	ID=cds-Apple2;Name=Apple2

The "gene" glyph can also be used to illustrate simpler genes. For example, choose the feature "CDS:predicted" and the glyph "gene". Change the "Key" to "Predicted genes", "Glyph Background Color" to "white" and "Category" to "Genes" (Figure 8).


Gene Glyph Displaying Simpler Genes
Figure 8: Gene Glyph Displaying Simpler Genes
 

Our sample dataset also contains a set of features as shown below which represent a single transcript that has both coding and non-coding regions. This transcript can be displayed using the "processed_transcript" glyph (also known as "so_transcript").

ctgA	exonerate	mRNA	17400	23000	.	+	.	ID=rna-Apple3;Name=Apple3;Note=Predicted
ctgA	exonerate	UTR	17400	17999	.	+	.	Parent=rna-Apple3
ctgA	exonerate	CDS	18000	18800	.	+	0	Parent=rna-Apple3
ctgA	exonerate	CDS	19000	19500	.	+	1	Parent=rna-Apple3
ctgA	exonerate	CDS	21000	21200	.	+	2	Parent=rna-Apple3
ctgA	exonerate	UTR	21201	23000	.	+	.	Parent=rna-Apple3

Choose the feature "mRNA:exonerate" and the glyph "processed_transcript". Change the "Key" to "Exonerate Predictions", "Glyph Background Color" to "beige" and "Category" to "Genes" (Figure 9).


processed_transcript Glyph Displaying Simpler Genes
Figure 9: processed_transcript Glyph Displaying Simpler Genes
 

9. Reading Frames

In the example described above as shown in Figure 7, you can display whether the reading frame for the third exon is preserved between EDEN.1 and EDEN.3 using the "cds" glyph. Add a track using the feature "mRNA" and the glyph "cds". Change the "Key" to "Frame usage", "Category" to "Genes" and check the option "Ignore features with empty phase value" in the "Advanced" section (Figure 10). You can also change the color of the different frames in the "Frames" section.


cds Glyph Displaying Reading Frames
Figure 10: cds Glyph Displaying Reading Frames
 

10. Displaying Quantitative Data

You can plot microarray or tiling array data using the "xyplot" glyph. The sample dataset has lines with feature type "microarray_oligo" representing microarray data.

ctgA	affy	microarray_oligo	1	100	281	.	.	Name=Expt1
ctgA	affy	microarray_oligo	101	200	183	.	.	Name=Expt1
ctgA	affy	microarray_oligo	201	300	213	.	.	Name=Expt1
ctgA	affy	microarray_oligo	301	400	191	.	.	Name=Expt1
ctgA	affy	microarray_oligo	401	500	288	.	.	Name=Expt1
ctgA	affy	microarray_oligo	501	600	184	.	.	Name=Expt1
...

Add a track with the feature "microarray_oligo" and the glyph "xyplot". Change the "Key" to "Transcriptional Profile", "Graph Type" to "Boxes" and "Scale Position" to "right" (Figure 11).


xyplot Glyph Displaying Microarray Data
Figure 11: xyplot Glyph Displaying Microarray Data
 

11. Displaying Grouped Features

Finally, let us see how to group 5' and 3' EST reads. The sample dataset contains lines with feature type "EST_match" as shown below. The name of the feature represents the read to which it belongs. For example, the EST_match agt830.5 has a 5' read where as agt830.3 has a 3' read.

ctgA	est	EST_match	1050	1500	.	+	.	ID=Match1;Name=agt830.5;Target=agt830.5 1 451
ctgA	est	EST_match	3000	3202	.	+	.	ID=Match1;Name=agt830.5;Target=agt830.5 452 654
ctgA	est	EST_match	5410	5500	.	-	.	ID=Match2;Name=agt830.3;Target=agt830.3 505 595
ctgA	est	EST_match	7000	7503	.	-	.	ID=Match2;Name=agt830.3;Target=agt830.3 1 504 
ctgA	est	EST_match	1050	1500	.	+	.	ID=Match3;Name=agt221.5;Target=agt221.5 1 451
ctgA	est	EST_match	5000	5500	.	+	.	ID=Match3;Name=agt221.5;Target=agt221.5 452 952
ctgA	est	EST_match	7000	7300	.	+	.	ID=Match3;Name=agt221.5;Target=agt221.5 953 1253
...

Add a track with the feature "EST_match" and the glyph "segments". Change the "Key" to "ESTs", "Glyph Background color" to "orange", "Glyph Height" to "6" and "Connector Type" to "Solid" and proceed to the GBrowse display by clicking the button "Display in GBrowse". You will notice that the segments of the individual features are joined together, but the ESTs from individual reads are not grouped yet. (Figure 12).


EST Match Display - Ungrouped
Figure 12: EST Match Display - Ungrouped
 

Let us group the ESTs with the same name together. From the WebGBrowse Control Panel, click the button "Edit Configuration" to proceed to the configuration panel. In the "Tracks Added" section, select the "EST_match" track from the "Configured Tracks" list and click the button "Edit Track". In the "Segments Glyph Parameters" form, change the value of "Group Pattern" in the "Advanced" section to "/\.[53]$/". This value is a regular expression that make the ESTs with same names but ending in ".3" or ".5" to be grouped (Figure 13). Save and proceed to the GBrowse display.


EST Match Display - Grouped Features
Figure 13: EST Match Display - Grouped Features
 

12. Using a Pre-existing Configuration File as a Template

You can use a pre-existing configuration file, such as the one downloaded from the WebGBrowse Control Panel in the GBrowse display (see Figure 5), as a template while configuring similar datasets, instead of starting from the scratch every single time. Download your configuration file by clicking the "Download Configuration" button in the WebGBrowse Control Panel (Figure 5), or download the sample configuration file (volvox.conf) provided in the WebGBrowse Input form (see Figure 1) of the WebGBrowse homepage (http://webgbrowse.cgb.indiana.edu/). Go to the WebGBrowse homepage (http://webgbrowse.cgb.indiana.edu/) and upload the volvox.gff3 and volvox.conf files as input for "GFF3 File" and "Configuration file to be used as template" fields respectively. You will notice in the "Tracks Added" section of the "Configuration Panel", the pre-configured tracks based on the template. WebGBrowse will ignore any tracks in the template associated with features that are missing in the uploaded dataset. You can add more tracks or modify the existing tracks as described earlier.

We recommend the usage of only the configuration files generated using WebGBrowse as the templates. For advanced users, if you want to use the GBrowse configuration files from other sources, you may try by renaming the track stanzas in the pattern "Track" followed by a four digit number, for example, "[Track9999]", but renaming track stanzas can not guarantee those files to work accurately as template files.

13. Conclusion

This tutorial is intended to give you a start with using WebGBrowse. It would be useful to explore other glyphs and their parameters which are not discussed above. Also, it is important to note that the role of WebGBrowse is only in generating the configuration information and hosting it on the GBrowse server. The actual display is generated by the GBrowse software itself. For details on the features available with GBrowse, please refer to the GBrowse user tutorial from OpenHelix. For questions, bug reports, support requests and suggestions please visit our support page.

Page last updated by Ram Podicheti on January 25, 2010 11:14 AM EST