Contenido principal

cpgisland

Locate CpG islands in DNA sequence

    Description

    cpgStruct = cpgisland(SeqDNA) searches a DNA nucleotide sequence for CpG islands with a GC content greater than 50% and a CpGobserved/CpGexpected ratio greater than 60%. It marks bases meeting this criteria within a moving window of 100 DNA bases, and then returns the results in cpgStruct, a MATLAB® structure containing the starting and ending bases of the CpG islands greater than the minimum island size of 200 bases.

    cpgStruct = cpgisland(SeqDNA,Name=Value) specifies additional options using one or more name-value arguments.

    example

    Examples

    collapse all

    Import a nucleotide sequence from the GenBank® database. For example, retrieve a sequence from Homo sapiens chromosome 12.

    S = getgenbank("AC156455");

    Calculate the CpG islands in the sequence and plot the results. The CpG islands greater than 200 bases in length are listed.

    cpgisland(S.Sequence,Plot=true)

    Figure contains 4 axes objects. Axes object 1 with title GC content contains an object of type line. Axes object 2 with title CPGoe content contains an object of type line. Axes object 3 with title CpG islands > 200 bases contains an object of type line. Axes object 4 with title All CpG islands contains an object of type line.

    ans = struct with fields:
        Starts: [4510 29359]
         Stops: [5468 29604]
    
    

    Input Arguments

    collapse all

    DNA nucleotide sequence, specified as one of these values:

    • Character vector or string specifying a nucleotide sequence

    • Row vector of integers specifying a nucleotide sequence

    • MATLAB structure containing a Sequence field that contains a DNA nucleotide sequence, such as returned by fastaread, fastqread, emblread, getembl, genbankread, or getgenbank

    Valid characters include A, C, G, and T.

    cpgisland does not count ambiguous nucleotides or gaps.

    Data Types: double | char | string | struct

    Name-Value Arguments

    collapse all

    Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

    Example: Window=90

    Window size for calculating GC content and CpGobserved/CpGexpected ratios, specified as an integer. A smaller window size increases the noise in a plot.

    Data Types: double

    Minimum number of consecutive marked bases to report as a CpG island, specified as an integer.

    Data Types: double

    Minimum GC percent in a window needed to mark a base, specified as a value between 0 and 1.

    Data Types: double

    CpGobserved/CpGexpected ratio in each window needed to mark a base, specified as a value between 0 and 1. This ratio is defined as:

    CPGobs/CpGexp = (NumCpGs*Length)/(NumGs*NumCs)

    Data Types: double

    Control for plotting of GC content, CpGoe content, CpG islands greater than the minimum island size, and all potential CpG islands for the specified criteria, specified as false or true.

    Data Types: logical

    Output Arguments

    collapse all

    Starting and ending bases of the CpG islands greater than the minimum island size, specified as a MATLAB structure.

    Version History

    Introduced before R2006a