Main Content

ilmnbslookup

Look up Illumina BeadStudio target (probe) sequence and annotation information

Syntax

AnnotStruct = ilmnbslookup(AnnotationFile, ID)
AnnotStruct = ilmnbslookup(AnnotationFile, ID, 'LookUpField', LookUpFieldValue)

Input Arguments

AnnotationFile

Character vector or string specifying a file name or a path and file name of an Illumina® annotation file (CSV, BGX, or TXT format). If you specify only a file name, that file must be on the MATLAB® search path or in the current folder.

Tip

You can download Illumina annotation files, such as HumanRef-8_V3_0_R0_11282963_A.bgx, from the Illumina Web site.

ID

Character vector, string, string vector, or cell array of character vectors representing a unique identifier(s) for one or more targets (probes) on an Illumina microarray.

Tip

By default, ID must match the Search_key field in AnnotationFile. However, you can use an identifier that corresponds to any of the fields in AnnotationFile, then set the 'LookUpField' property appropriately. For example, if you want to look up annotation information for the targets (probes) on chromosome 7 only, set ID to '7', then set LookUpFieldValue to 'Chromosome'. For a list of all fields in AnnotationFile, see the following tables.

LookUpFieldValue

Field in AnnotationFile where ilmnbslookup looks for the specified ID. Default is the Search_key field.

Tip

Set this property so that it corresponds to the ID you use as input.

Output Arguments

AnnotStruct

Structure containing the probe sequence and annotation information for one or more targets (probes) specified by ID, and by AnnotationFile, an Illumina annotation file.

AnnotStruct contains the same fields as AnnotationFile. The fields are described in the following two tables.

Description

AnnotStruct = ilmnbslookup(AnnotationFile, ID) returns AnnotStruct, a structure containing probe sequence and annotation information for one or more targets (probes) specified by ID, and by AnnotationFile, an Illumina annotation file (CSV, BGX, or TXT format).

AnnotStruct contains the same fields as AnnotationFile. The fields are described in the following two tables.

Structure Created from Illumina CSV Annotation File

FieldDescription
Search_keyInternal identifier for the target, useful for custom design array
TargetUnique identifier for the target
ProbeIdIllumina probe identifier
GidGenBank® identifier for the gene
TranscriptIllumina internal transcript identifier
AccessionGenBank accession number for the gene
SymbolTypically, the gene symbol
TypeProbe type
StartStarting position of the probe sequence in the GenBank record
Probe_SequenceSequence of the probe
DefinitionDefinition field from the GenBank record
OntologyGene Ontology terms associated with the gene
SynonymSynonyms for the gene (from the GenBank record)

Structure Created from a BGX or TXT Annotation File

FieldDescription
AccessionGenBank accession number for the gene
Array_Address_IdDecoder identifier
ChromosomeChromosome on which the gene is located
CytobandCytogenetic banding region of the chromosome on which the gene associated with the target is located
DefinitionDefinition field from the GenBank record
Entrez_Gene_IDEntrez Gene database identifier for the gene
GIGenBank identifier for the gene
ILMN_GeneIlluminainternal gene symbol
Obsolete_Probe_IdProbe identifier before BGX annotation files
Ontology_ComponentGene Ontology cellular components associated with the gene
Ontology_FunctionGene Ontology molecular functions associated with the gene
Ontology_ProcessGene Ontology biological processes associated with the gene
Probe_Chr_OrientationOrientation of the probe on the NCBI genome build
Probe_CoordinatesGenomic position of the probe on the NCBI genome build
Probe_IdIlluminaprobe identifier
Probe_SequenceSequence of the probe
Probe_StartStart position of the probe relative to the 5' end of the source transcript sequence
Probe_TypeInformation about what the probe is targeting
Protein_ProductNCBI protein accession number
RefSeq_IDIdentifier from the NCBI RefSeq database
Reporter_Composite_mapInformation associated with control probes
Reporter_Group_NameInformation associated with control probes
Reporter_Group_idInformation associated with control probes
Search_KeyInternal identifier for the target, useful for custom design array
SourceSource from which the transcript sequence was obtained
Source_Reference_IDSource's identifier
SpeciesSpecies associated with the gene
SymbolTypically, the gene symbol
SynonymsSynonyms for the gene (from the GenBank record)
TranscriptIlluminainternal transcript identifier
Unigene_IDIdentifier from the NCBI UniGene database

AnnotStruct = ilmnbslookup(AnnotationFile, ID, 'LookUpField', LookUpFieldValue) looks for ID in the annotation file in the field specified by LookUpFieldValue. Default is the Search_key field.

Examples

Note

The gene expression file, TumorAdjacent-probe-raw.txt, and the annotation file, HumanRef-8_V3_0_R0_11282963_A.bgx, used in the following examples are not provided with the Bioinformatics Toolbox™ software.

Example 10. Look Up Annotation Information for a Single Target (Probe)
  1. Read the contents of a tab-delimited file exported from the Illumina BeadStudio™ software into a MATLAB structure.

    ilmnStruct = ilmnbsread('TumorAdjacent-probe-raw.txt')
    
    ilmnStruct = 
    
                 Header: [1x1 struct]
               TargetID: {22184x1 cell}
            ColumnNames: {1x37 cell}
                   Data: [22184x37 double]
        TextColumnNames: {1x23 cell}
               TextData: {22184x23 cell}
  2. Find the number of the Search_key column in the TextColumnNames cell array, which is returned in the ilmnStruct structure by the ilmnbsread function.

    srchCol = find(strcmpi('Search_Key',ilmnStruct.TextColumnNames))
    
    srchCol =
    
         1
  3. Look up the probe sequence and annotation information for the 10th entry in the annotation file, HumanRef-8_V3_0_R0_11282963_A.bgx.

    annotation = ilmnbslookup('HumanRef-8_V3_0_R0_11282963_A.bgx',... 
                               ilmnStruct.TextData{10,srchCol})
    annotation = 
    
                     Accession: 'NM_144670.2'
              Array_Address_Id: '0004050154'
                    Chromosome: '12'
                      Cytoband: '12p13.31b'
                    Definition: 'Homo sapiens alpha-2-macroglobulin-like 1 (A2ML1), mRNA.'
                Entrez_Gene_ID: '144568'
                            GI: '74271844'
                     ILMN_Gene: 'A2ML1'
             Obsolete_Probe_Id: ''
            Ontology_Component: ''
             Ontology_Function: 'endopeptidase inhibitor activity [goid 4866] [evidence IEA]'
              Ontology_Process: ''
         Probe_Chr_Orientation: '+'
             Probe_Coordinates: '8920412-8920461'
                      Probe_Id: 'ILMN_2136495'
                Probe_Sequence: 'TGTAATCGCAGCCCCTTGGAAGGCCAAGGCAGGAGAATCGCCTCAACACT'
                   Probe_Start: '4889'
                    Probe_Type: 'S'
               Protein_Product: 'NP_653271.2'
                     RefSeq_ID: 'NM_144670.2'
        Reporter_Composite_map: ''
           Reporter_Group_Name: ''
             Reporter_Group_id: ''
                    Search_Key: 'ILMN_17375'
                        Source: 'RefSeq'
           Source_Reference_ID: 'NM_144670.2'
                       Species: 'Homo sapiens'
                        Symbol: 'A2ML1'
                      Synonyms: [1x141 char]
                    Transcript: 'ILMN_17375'
                    Unigene_ID: ''
Example 11. Look Up Annotation Information for a Subset of Targets (Probes)

Use the ilmnbslookup function with the 'LookUpField' property to look up the annotation information for all targets located on chromosome 12 in the annotation file, HumanRef-8_V3_0_R0_11282963_A.bgx.

chr12annotation = ilmnbslookup('HumanRef-8_V3_0_R0_11282963_A.bgx',...
                               '12','LookUpField','Chromosome')

chr12annotation = 

                 Accession: {1x1186 cell}
          Array_Address_Id: {1x1186 cell}
                Chromosome: {1x1186 cell}
                  Cytoband: {1x1186 cell}
                Definition: {1x1186 cell}
            Entrez_Gene_ID: {1x1186 cell}
                        GI: {1x1186 cell}
                 ILMN_Gene: {1x1186 cell}
         Obsolete_Probe_Id: {1x1186 cell}
        Ontology_Component: {1x1186 cell}
         Ontology_Function: {1x1186 cell}
          Ontology_Process: {1x1186 cell}
     Probe_Chr_Orientation: {1x1186 cell}
         Probe_Coordinates: {1x1186 cell}
                  Probe_Id: {1x1186 cell}
            Probe_Sequence: {1x1186 cell}
               Probe_Start: {1x1186 cell}
                Probe_Type: {1x1186 cell}
           Protein_Product: {1x1186 cell}
                 RefSeq_ID: {1x1186 cell}
    Reporter_Composite_map: ''
       Reporter_Group_Name: ''
         Reporter_Group_id: ''
                Search_Key: {1x1186 cell}
                    Source: {1x1186 cell}
       Source_Reference_ID: {1x1186 cell}
                   Species: {1x1186 cell}
                    Symbol: {1x1186 cell}
                  Synonyms: {1x1186 cell}
                Transcript: {1x1186 cell}
                Unigene_ID: {1x1186 cell}

The output structure indicates that there are 1,186 targets located on chromosome 12.

Version History

Introduced in R2008a

See Also