cpgisland
Locate CpG islands in DNA sequence
Syntax
cpgStruct
= cpgisland(SeqDNA
)
cpgStruct
= cpgisland(SeqDNA
,
...'Window', WindowValue
, ...)
cpgStruct
= cpgisland(SeqDNA
,
...'MinIsland', MinIslandValue
, ...)
cpgStruct
= cpgisland(SeqDNA
,
...'GCmin', GCminValue
, ...)
cpgStruct
= cpgisland(SeqDNA
,
...'CpGoe', CpGoeValue
, ...)
cpgStruct
= cpgisland(SeqDNA
,
...'Plot', PlotValue
, ...)
Input Arguments
SeqDNA | One of the following:
Valid characters include
|
WindowValue | Integer specifying the window size for calculating GC content
and CpGobserved/CpGexpected ratios. Default is 100 bases.
A smaller window size increases the noise in a plot. |
MinIslandValue | Integer specifying the minimum number of consecutive marked
bases to report as a CpG island. Default is 200 bases. |
GCminValue | Value specifying the minimum GC percent in a window needed
to mark a base. Choices are a value between 0 and 1 .
Default is 0.5 . |
CpGoeValue | Value specifying the minimum CpGobserved/CpGexpected
ratio in each window needed to mark a base. Choices are a value between CPGobs/CpGexp = (NumCpGs*Length)/(NumGs*NumCs) |
PlotValue | Controls the plotting of GC content, CpGoe content, CpG islands
greater than the minimum island size, and all potential CpG islands
for the specified criteria. Choices are true or false (default). |
Output Arguments
cpgStruct | MATLAB structure containing the starting and ending bases of the CpG islands greater than the minimum island size. |
Description
searches cpgStruct
= cpgisland(SeqDNA
)SeqDNA
,
a DNA nucleotide sequence, for CpG islands with a GC content greater
than 50
% and a CpGobserved/CpGexpected ratio greater
than 60
%. It marks bases meeting this criteria
within a moving window of 100
DNA bases and then
returns the results in cpgStruct
, a MATLAB structure
containing the starting and ending bases of the CpG islands greater
than the minimum island size of 200
bases.
calls cpgStruct
= cpgisland(SeqDNA
,
...'PropertyName
', PropertyValue
,
...)cpgisland
with optional properties
that use property name/property value pairs. You can specify one or
more properties in any order. Each PropertyName
must
be enclosed in single quotation marks and is case insensitive. These
property name/property value pairs are as follows:
specifies
the window size for calculating GC content and CpGobserved/CpGexpected
ratios. Default is cpgStruct
= cpgisland(SeqDNA
,
...'Window', WindowValue
, ...)100
bases. A smaller window
size increases the noise in a plot.
specifies
the minimum number of consecutive marked bases to report as a CpG
island. Default is cpgStruct
= cpgisland(SeqDNA
,
...'MinIsland', MinIslandValue
, ...)200
bases.
specifies
the minimum GC percent in a window needed to mark a base. Choices
are a value between cpgStruct
= cpgisland(SeqDNA
,
...'GCmin', GCminValue
, ...)0
and 1
.
Default is 0.5
.
specifies
the minimum CpGobserved/CpGexpected ratio in each window needed to
mark a base. Choices are a value between cpgStruct
= cpgisland(SeqDNA
,
...'CpGoe', CpGoeValue
, ...)0
and 1
.
Default is 0.6
. This ratio is defined as:
CPGobs/CpGexp = (NumCpGs*Length)/(NumGs*NumCs)
controls
the plotting of GC content, CpGoe content, CpG islands greater than
the minimum island size, and all potential CpG islands for the specified
criteria. Choices are cpgStruct
= cpgisland(SeqDNA
,
...'Plot', PlotValue
, ...)true
or false
(default).
Examples
Import a nucleotide sequence from the GenBank® database. For example, retrieve a sequence from Homo sapiens chromosome 12.
S = getgenbank('AC156455');
Calculate the CpG islands in the sequence and plot the results.
cpgisland(S.Sequence,'PLOT',true) ans = Starts: [4510 29359] Stops: [5468 29604]
The CpG islands greater than 200 bases in length are listed and a plot displays.
Version History
Introduced before R2006a