REST API data interface

Contents

Why you may not want to use this API
What is REST?
What is JSON output data?
What is the access URL?
What type of data can be accessed?
Endpoint functions
Parameters to endpoint functions
Required and optional parameters
Supported track types
Using the API on mirrors and local installations
Example data access, list functions
Example data access, getData functions
Example data access, Search functions
Error return examples
Practical examples

Why you may not want to use this API

Genomic data is considerably large, and computational biologists generally need all the data that is available for their analyses. Web APIs, as a technology, were designed for retrieving relatively small pieces of data, often from Javascript. Consequently, Web APIs may not be the best way to get data from the UCSC Genome Browser.

For this reason, we also provide alternative methods to access our data:

All these options offer faster bulk downloads and are often easier to parse from scripting languages. If, however, you write Javascript clients or need only a few features within a given range, then the endpoints documented on this page may address your needs. If you require an option or endpoint not listed below for your specific use case, contact us and we can try and implement the feature.

Conditions of use

For more details about the Conditions of Use, please refer to the following page, Genome Browser Conditions of Use. If you plan to execute a query that you think may be excessive, contact UCSC first to avoid the possibility of temporarily restricting your access to the REST API.

What are REST and JSON?

REST is an acronym for REpresentational State Transfer. It states architectural guidelines for how an API will operate. See also: Principles of REST. Like most APIs, ours returns data in JSON format. JSON data is a data transfer syntax almost identical to the Javascript object and array syntax. See also: JSON Introduction

What is the access URL?

This access URL: https://api.genome.ucsc.edu/ is used to access the endpoint functions. For example:

    wget -O- 'https://api.genome.ucsc.edu/list/publicHubs'

What type of data can be accessed?

The following data sets can be accessed at this time:

Note: BLAT also supports programmatic URL queries which return in JSON format. See our BLAT FAQ for more info.

Endpoint functions to return data

The URL https://api.genome.ucsc.edu/ is used to access the endpoint functions. For example:

    curl -L 'https://api.genome.ucsc.edu/list/ucscGenomes'

Parameters to endpoint functions

The parameters are added to the endpoint URL beginning with a question mark ?, and multiple parameters are separated with the semi-colon ;. For example:

https://api.genome.ucsc.edu/getData/sequence?genome=hg38;chrom=chrM

Required and optional parameters

Endpoint functionRequiredOptional
/list/publicHubs(none)(none)
/list/ucscGenomes(none)(none)
/list/hubGenomeshubUrl(none)
/list/filesgenomeformat=text, maxItemsOutput
/list/tracksgenome or (hubUrl and genome)trackLeavesOnly=1
/list/chromosomesgenome or (hubUrl and genome)track
/list/schema(genome or (hubUrl and genome)) and track(none)
/getData/sequence(genome or (hubUrl and genome)) and chromstart, end, revComp=1
/getData/track(genome or (hubUrl and genome)) and trackchrom, (start and end), maxItemsOutput, jsonOutputArrays
/searchsearch and genomecategories=helpDocs, categories=publicHubs, categories=trackDb

The hubUrl and genome parameters are required together to specify a unique genome in an assembly or track hub. The genome for a track hub will usually be a UCSC database genome. Assembly hubs will have their own unique genome sequences. Specify genome without a hubUrl to refer to a UCSC Genome Browser assembly.

Using the chrom=<name> parameter will limit the request to the single specified chromosome. To limit the request to a specific position, both start=4321 and end=5678 must be given together. Using the revComp=1 parameter returns the reverse complement.

The /list/files endpoint only works for UCSC hosted genome assemblies, not for external hosted assembly hubs.

Any extra parameters not allowed in a function will be flagged as an error.

Supported track types for getData functions

Using the API on mirrors and local installations

In order to access the API from a mirror installation or one of the UCSC official mirrors, the complete URL with the cgi-bin should be used:

The URL can then be passed any of the functions described in this page:

Example data access

Your WEB browser can be configured to interpret JSON data and format in a convenient browsing format. Firefox has this function built in, other browsers have add-ons that can be turned on to format JSON data. With your browser thus configured, the following links can demonstrate the functions of the API interface.

Listing functions

  1. list public hubs - api.genome.ucsc.edu/list/publicHubs
  2. list UCSC database genomes - api.genome.ucsc.edu/list/ucscGenomes
  3. list genomes from specified hub - api.genome.ucsc.edu/list/hubGenomes?hubUrl=http://hgdownload.soe.ucsc.edu/hubs/mouseStrains/hub.txt
  4. list tracks from specified hub and genome - api.genome.ucsc.edu/list/tracks?hubUrl=http://hgdownload.soe.ucsc.edu/hubs/mouseStrains/hub.txt;genome=CAST_EiJ
  5. list tracks from UCSC database genome - api.genome.ucsc.edu/list/tracks?genome=hg38
  6. list chromosomes from UCSC database genome - api.genome.ucsc.edu/list/chromosomes?genome=hg38
  7. list chromosomes from specified track in UCSC database genome - api.genome.ucsc.edu/list/chromosomes?genome=hg38;track=gold
  8. list chromosomes from assembly hub genome -
    api.genome.ucsc.edu/list/chromosomes?hubUrl=http://hgdownload.soe.ucsc.edu/hubs/mouseStrains/hub.txt;genome=CAST_EiJ
  9. list chromosomes from specified track in assembly hub genome -
    api.genome.ucsc.edu/list/chromosomes?hubUrl=http://hgdownload.soe.ucsc.edu/hubs/mouseStrains/hub.txt;genome=CAST_EiJ;track=assembly
  10. list schema from specified track in UCSC database genome -
    api.genome.ucsc.edu/list/schema?genome=hg38;track=knownGene
  11. list download files for UCSC GenArk genome 'GCF_000955945.1' in JSON format limit 5 items output -
    api.genome.ucsc.edu/list/files?genome=GCF_000955945.1;maxItemsOutput=5
  12. list download files for UCSC genome 'hs1' in plain text format limit 5 items output -
    api.genome.ucsc.edu/list/files?genome=hs1;format=text;maxItemsOutput=5

getData functions

  1. Get DNA sequence from specified chromosome in UCSC database genome -
    api.genome.ucsc.edu/getData/sequence?genome=hg38;chrom=chrM
  2. Get DNA sequence from specified chromosome and start,end coordinates in UCSC database genome -
    api.genome.ucsc.edu/getData/sequence?genome=hg38;chrom=chrM;start=4321;end=5678
  3. Get the reverse complement of the DNA sequence from specified chromosome and start,end coordinates in UCSC database genome -
    api.genome.ucsc.edu/getData/sequence?genome=hg38;chrom=chrM;start=4321;end=5678;revComp=1
  4. Get DNA sequence from a track hub where 'genome' is a UCSC database -
    api.genome.ucsc.edu/getData/sequence?hubUrl=http://hgdownload.soe.ucsc.edu/hubs/mouseStrains/hub.txt;genome=mm10;chrom=chrM;start=4321;end=5678
  5. Get DNA sequence from specified chromosome and start,end coordinates in an assembly hub genome -
    api.genome.ucsc.edu/getData/sequence?hubUrl=http://hgdownload.soe.ucsc.edu/hubs/mouseStrains/hub.txt;genome=CAST_EiJ;chrom=chr1;start=4321;end=5678
  6. Get track data for specified track in UCSC database genome -
    api.genome.ucsc.edu/getData/track?genome=hg38;track=gold;maxItemsOutput=100
  7. Get track data for specified track and chromosome in UCSC database genome -
    api.genome.ucsc.edu/getData/track?genome=hg38;track=gold;chrom=chrM
  8. Get track data for specified track, chromosome and start,end coordinates in UCSC database genome -
    api.genome.ucsc.edu/getData/track?genome=hg38;track=gold;chrom=chr1;start=47000;end=48000
  9. Get track data for specified track in an assembly hub genome -
    api.genome.ucsc.edu/getData/track?hubUrl=http://hgdownload.soe.ucsc.edu/hubs/mouseStrains/hub.txt;genome=CAST_EiJ;track=assembly
  10. Get track data for specified track and chromosome in an assembly hub genome -
    api.genome.ucsc.edu/getData/track?hubUrl=http://hgdownload.soe.ucsc.edu/hubs/mouseStrains/hub.txt;genome=CAST_EiJ;track=assembly;chrom=chr1
  11. Get track data for specified track in a track hub -
    api.genome.ucsc.edu/getData/track?hubUrl=http://hgdownload.soe.ucsc.edu/hubs/mouseStrains/hub.txt;genome=CAST_EiJ;track=ensGene
  12. Get track data for specified track and chromosome in a track hub -
    api.genome.ucsc.edu/getData/track?hubUrl=http://hgdownload.soe.ucsc.edu/hubs/mouseStrains/hub.txt;genome=CAST_EiJ;track=ensGene;chrom=chr1
  13. Wiggle track data for specified track, chromosome with start and end limits in an assembly hub genome -
    api.genome.ucsc.edu/getData/track?hubUrl=http://hgdownload.soe.ucsc.edu/hubs/mouseStrains/hub.txt;genome=CAST_EiJ;track=gc5Base;chrom=chr1;start=4321;end=5678
  14. Wiggle track data for specified track in a UCSC database genome -
    api.genome.ucsc.edu/getData/track?genome=galGal6;track=gc5BaseBw;maxItemsOutput=100
  15. bigBed data from a UCSC database, chrom and start,end limits -
    api.genome.ucsc.edu/getData/track?genome=galGal6;track=ncbiRefSeqOther;chrom=chr1;start=750000;end=55700000

Search functions

  1. Search matches within a UCSC Genome Browser genome assembly -
    api.genome.ucsc.edu/search?search=brca1&genome=hg38
  2. Search matches within a UCSC Genome Browser genome assembly and restrict the search within the UCSC Genome Browser help documentation -
    api.genome.ucsc.edu/search?search=bigBed&genome=hg38&categories=helpDocs
  3. Search matches within a UCSC Genome Browser genome assembly and restrict the search within the UCSC Genome Browser Public Hubs -
    api.genome.ucsc.edu/search?search=cerebellum&genome=hg38&categories=publicHubs
  4. Search matches within a UCSC Genome Browser genome assembly and restrict the search within the track database (trackDb) settings -
    api.genome.ucsc.edu/search?search=signal&genome=hg38&categories=trackDb

Error return examples

  1. Request track data for non-existent chromosome in an assembly hub genome -
    api.genome.ucsc.edu/getData/track?hubUrl=http://hgdownload.soe.ucsc.edu/hubs/mouseStrains/hub.txt;genome=CAST_EiJ;track=assembly;chrom=chrI;start=43521;end=54321
  2. Request track data from a restricted track. See FAQ -
    api.genome.ucsc.edu/getData/track?genome=hg19;track=decipherSnvs

Practical examples

Looking up the schema of a specific track

The easiest way to get the schema for a track is to use the /list/schema function. This can be used on both file tracks and table tracks:

Request schema for the hg38 CRISPR Targets track -
api.genome.ucsc.edu/list/schema?genome=hg38;track=crisprAllTargets

There is a second indirect way to get the schema which may be preferable in certain cases. When querying track data with the /getData/track function, the jsonOutputArrays can be used in conjunction to see the track schema. This includes a description of each field present in the track. The data will also be returned in JSON array type.

Request data from hg38 gold track in array type -
api.genome.ucsc.edu/getData/track?genome=hg38;track=gold;chrom=chrM;jsonOutputArrays=1

Hide track container information with trackLeavesOnly parameter

When using the /list/tracks function to see the available tracks in an assembly, it can be useful to return all tracks in the same hierarchical level. By default, composite and supertracks will have the subtracks nested below, however, the trackLeavesOnly=1 parameter can be passed to hide the container information and display all tracks and subtracks at the same level.

In the following example, the first link does not include the trackLeavesOnly parameter. The output can be compared to the second link to see the difference, which can be observed in the conservation track. In the first link, the multiz20way track is nested within the cons20way track. In the second link, however, the multiz20way subtrack is seen at an equivalent level with all other tracks, and the container, cons20way, is not present in the list.

Request available tracks in the rn6 genome -
api.genome.ucsc.edu/list/tracks?genome=rn6

Request available tracks in the rn6 genome, hiding container information -
api.genome.ucsc.edu/list/tracks?genome=rn6;trackLeavesOnly=1

Requesting track data with over one million (1M) items in output

Certain tracks may contain over 1M items. When these tracks are queried using the /getData/track function, only the first million items are returned. The API assumes this default value of 1M unless a different value (less than 1M) is specified with the parameter maxItemsOutput.

One of these tracks is the knownGene track for hg19. Removing the maxItemsOutput parameter from the following link will lead to a 384Mb download, and may cause certain web browsers to time out.

Request items in knownGene track of hg19, remove maxItemsOutput parameter for 1M max return -
api.genome.ucsc.edu/getData/track?genome=hg19;track=knownGene;maxItemsOutput=5

There are different ways around this item limit, depending on how many items are in the track. For the knownGene track, breaking it down to component chromosome queries using the chrom parameter will suffice. In order to get a listing of the chrom names, and what chroms have data for that track, the /list/chromosomes function can be used.

Request listing of chroms that have data for the knownGene track in hg19 -
api.genome.ucsc.edu/list/chromosomes?genome=hg19;track=knownGene

With the list of chrom names that have data, the /getData/track function can be used again while specifying the chrom parameter. In the following example, chr1 is queried and the itemsReturned field shows a total of 7967 items in the output, well below the 1M limit, meaning all data for chr1 has been extracted. This can then be repeated for all chroms of interest.

Request items in knownGene track of hg19, only for chr1 -
api.genome.ucsc.edu/getData/track?genome=hg19;track=knownGene;chrom=chr1

For tracks that have additional items, such as SNP tracks, the query can be further broken down using the additional start and end parameters.