ilmnbslookup

Look up Illumina BeadStudio target (probe) sequence and annotation information

Syntax

AnnotStruct = ilmnbslookup(AnnotationFile, ID) AnnotStruct = ilmnbslookup(AnnotationFile, ID, 'LookUpField', LookUpFieldValue)

Input Arguments

`AnnotationFile`	Character vector or string specifying a file name or a path and file name of an Illumina^® annotation file (CSV, BGX, or TXT format). If you specify only a file name, that file must be on the MATLAB^® search path or in the current folder. Tip You can download Illumina annotation files, such as `HumanRef-8_V3_0_R0_11282963_A.bgx`, from the Illumina Web site.
`ID`	Character vector, string, string vector, or cell array of character vectors representing a unique identifier(s) for one or more targets (probes) on an Illumina microarray. Tip By default, `ID` must match the `Search_key` field in `AnnotationFile`. However, you can use an identifier that corresponds to any of the fields in `AnnotationFile`, then set the `'LookUpField'` property appropriately. For example, if you want to look up annotation information for the targets (probes) on chromosome 7 only, set `ID` to `'7'`, then set `LookUpFieldValue` to `'Chromosome'`. For a list of all fields in `AnnotationFile`, see the following tables.
`LookUpFieldValue`	Field in `AnnotationFile` where `ilmnbslookup` looks for the specified `ID`. Default is the `Search_key` field. Tip Set this property so that it corresponds to the `ID` you use as input.

Output Arguments

AnnotStruct

Structure containing the probe sequence and annotation information for one or more targets (probes) specified by ID, and by AnnotationFile, an Illumina annotation file.

AnnotStruct contains the same fields as AnnotationFile. The fields are described in the following two tables.

Description

AnnotStruct = ilmnbslookup(AnnotationFile, ID) returns AnnotStruct, a structure containing probe sequence and annotation information for one or more targets (probes) specified by ID, and by AnnotationFile, an Illumina annotation file (CSV, BGX, or TXT format).

AnnotStruct contains the same fields as AnnotationFile. The fields are described in the following two tables.

Structure Created from Illumina CSV Annotation File

Field	Description
`Search_key`	Internal identifier for the target, useful for custom design array
`Target`	Unique identifier for the target
`ProbeId`	Illumina probe identifier
`Gid`	GenBank^® identifier for the gene
`Transcript`	Illumina internal transcript identifier
`Accession`	GenBank accession number for the gene
`Symbol`	Typically, the gene symbol
`Type`	Probe type
`Start`	Starting position of the probe sequence in the GenBank record
`Probe_Sequence`	Sequence of the probe
`Definition`	Definition field from the GenBank record
`Ontology`	Gene Ontology terms associated with the gene
`Synonym`	Synonyms for the gene (from the GenBank record)

Structure Created from a BGX or TXT Annotation File

Field	Description
`Accession`	GenBank accession number for the gene
`Array_Address_Id`	Decoder identifier
`Chromosome`	Chromosome on which the gene is located
`Cytoband`	Cytogenetic banding region of the chromosome on which the gene associated with the target is located
`Definition`	Definition field from the GenBank record
`Entrez_Gene_ID`	Entrez Gene database identifier for the gene
`GI`	GenBank identifier for the gene
`ILMN_Gene`	Illuminainternal gene symbol
`Obsolete_Probe_Id`	Probe identifier before BGX annotation files
`Ontology_Component`	Gene Ontology cellular components associated with the gene
`Ontology_Function`	Gene Ontology molecular functions associated with the gene
`Ontology_Process`	Gene Ontology biological processes associated with the gene
`Probe_Chr_Orientation`	Orientation of the probe on the NCBI genome build
`Probe_Coordinates`	Genomic position of the probe on the NCBI genome build
`Probe_Id`	Illuminaprobe identifier
`Probe_Sequence`	Sequence of the probe
`Probe_Start`	Start position of the probe relative to the 5`'` end of the source transcript sequence
`Probe_Type`	Information about what the probe is targeting
`Protein_Product`	NCBI protein accession number
`RefSeq_ID`	Identifier from the NCBI RefSeq database
`Reporter_Composite_map`	Information associated with control probes
`Reporter_Group_Name`	Information associated with control probes
`Reporter_Group_id`	Information associated with control probes
`Search_Key`	Internal identifier for the target, useful for custom design array
`Source`	Source from which the transcript sequence was obtained
`Source_Reference_ID`	Source's identifier
`Species`	Species associated with the gene
`Symbol`	Typically, the gene symbol
`Synonyms`	Synonyms for the gene (from the GenBank record)
`Transcript`	Illuminainternal transcript identifier
`Unigene_ID`	Identifier from the NCBI UniGene database

AnnotStruct = ilmnbslookup(AnnotationFile, ID, 'LookUpField', LookUpFieldValue) looks for ID in the annotation file in the field specified by LookUpFieldValue. Default is the Search_key field.

Examples

Note

The gene expression file, TumorAdjacent-probe-raw.txt, and the annotation file, HumanRef-8_V3_0_R0_11282963_A.bgx, used in the following examples are not provided with the Bioinformatics Toolbox™ software.

Example 10. Look Up Annotation Information for a Single Target (Probe)

Read the contents of a tab-delimited file exported from the Illumina BeadStudio™ software into a MATLAB structure.

ilmnStruct = ilmnbsread('TumorAdjacent-probe-raw.txt')

ilmnStruct = 

             Header: [1x1 struct]
           TargetID: {22184x1 cell}
        ColumnNames: {1x37 cell}
               Data: [22184x37 double]
    TextColumnNames: {1x23 cell}
           TextData: {22184x23 cell}

Find the number of the Search_key column in the TextColumnNames cell array, which is returned in the ilmnStruct structure by the ilmnbsread function.
```
srchCol = find(strcmpi('Search_Key',ilmnStruct.TextColumnNames))

srchCol =

     1
```

Look up the probe sequence and annotation information for the 10th entry in the annotation file, HumanRef-8_V3_0_R0_11282963_A.bgx.

annotation = ilmnbslookup('HumanRef-8_V3_0_R0_11282963_A.bgx',... 
                           ilmnStruct.TextData{10,srchCol})
annotation = 

                 Accession: 'NM_144670.2'
          Array_Address_Id: '0004050154'
                Chromosome: '12'
                  Cytoband: '12p13.31b'
                Definition: 'Homo sapiens alpha-2-macroglobulin-like 1 (A2ML1), mRNA.'
            Entrez_Gene_ID: '144568'
                        GI: '74271844'
                 ILMN_Gene: 'A2ML1'
         Obsolete_Probe_Id: ''
        Ontology_Component: ''
         Ontology_Function: 'endopeptidase inhibitor activity [goid 4866] [evidence IEA]'
          Ontology_Process: ''
     Probe_Chr_Orientation: '+'
         Probe_Coordinates: '8920412-8920461'
                  Probe_Id: 'ILMN_2136495'
            Probe_Sequence: 'TGTAATCGCAGCCCCTTGGAAGGCCAAGGCAGGAGAATCGCCTCAACACT'
               Probe_Start: '4889'
                Probe_Type: 'S'
           Protein_Product: 'NP_653271.2'
                 RefSeq_ID: 'NM_144670.2'
    Reporter_Composite_map: ''
       Reporter_Group_Name: ''
         Reporter_Group_id: ''
                Search_Key: 'ILMN_17375'
                    Source: 'RefSeq'
       Source_Reference_ID: 'NM_144670.2'
                   Species: 'Homo sapiens'
                    Symbol: 'A2ML1'
                  Synonyms: [1x141 char]
                Transcript: 'ILMN_17375'
                Unigene_ID: ''

Example 11. Look Up Annotation Information for a Subset of Targets (Probes)

Use the ilmnbslookup function with the 'LookUpField' property to look up the annotation information for all targets located on chromosome 12 in the annotation file, HumanRef-8_V3_0_R0_11282963_A.bgx.

chr12annotation = ilmnbslookup('HumanRef-8_V3_0_R0_11282963_A.bgx',...
                               '12','LookUpField','Chromosome')

chr12annotation = 

                 Accession: {1x1186 cell}
          Array_Address_Id: {1x1186 cell}
                Chromosome: {1x1186 cell}
                  Cytoband: {1x1186 cell}
                Definition: {1x1186 cell}
            Entrez_Gene_ID: {1x1186 cell}
                        GI: {1x1186 cell}
                 ILMN_Gene: {1x1186 cell}
         Obsolete_Probe_Id: {1x1186 cell}
        Ontology_Component: {1x1186 cell}
         Ontology_Function: {1x1186 cell}
          Ontology_Process: {1x1186 cell}
     Probe_Chr_Orientation: {1x1186 cell}
         Probe_Coordinates: {1x1186 cell}
                  Probe_Id: {1x1186 cell}
            Probe_Sequence: {1x1186 cell}
               Probe_Start: {1x1186 cell}
                Probe_Type: {1x1186 cell}
           Protein_Product: {1x1186 cell}
                 RefSeq_ID: {1x1186 cell}
    Reporter_Composite_map: ''
       Reporter_Group_Name: ''
         Reporter_Group_id: ''
                Search_Key: {1x1186 cell}
                    Source: {1x1186 cell}
       Source_Reference_ID: {1x1186 cell}
                   Species: {1x1186 cell}
                    Symbol: {1x1186 cell}
                  Synonyms: {1x1186 cell}
                Transcript: {1x1186 cell}
                Unigene_ID: {1x1186 cell}

The output structure indicates that there are 1,186 targets located on chromosome 12.

Version History

Introduced in R2008a