Main Content

getAttribute

Read HTML attribute of root node of HTML tree

Description

example

str = getAttribute(tree,attr) returns the attribute attr of the root node of tree. If that attribute is not set, then the function returns a missing value.

Examples

collapse all

Read HTML code from the URL https://www.mathworks.com/help/textanalytics using webread.

url = "https://www.mathworks.com/help/textanalytics";
code = webread(url);

Parse the HTML code using htmlTree.

tree = htmlTree(code);

Find all the hyperlinks in the HTML tree using findElement. The hyperlinks are the nodes with element name "A".

selector = "A";
subtrees = findElement(tree,selector);
subtrees(1:10)
ans = 
  10×1 htmlTree:

    <A class="svg_link navbar-brand" href="https://www.mathworks.com?s_tid=gn_logo"><IMG alt="MathWorks" class="mw_logo" src="/images/responsive/global/pic-header-mathworks-logo.svg"/></A>
    <A class="mwa-nav_login" href="https://www.mathworks.com/login?uri=http://www.mathworks.com/help/textanalytics/index.html">Sign In</A>
    <A href="https://www.mathworks.com/products.html?s_tid=gn_ps">Products</A>
    <A href="https://www.mathworks.com/solutions.html?s_tid=gn_sol">Solutions</A>
    <A href="https://www.mathworks.com/academia.html?s_tid=gn_acad">Academia</A>
    <A href="https://www.mathworks.com/support.html?s_tid=gn_supp">Support</A>
    <A href="https://www.mathworks.com/matlabcentral/?s_tid=gn_mlc">Community</A>
    <A href="https://www.mathworks.com/company/events.html?s_tid=gn_ev">Events</A>
    <A href="https://www.mathworks.com/company/aboutus/contact_us.html?s_tid=gn_cntus">Contact Us</A>
    <A href="https://www.mathworks.com/store?s_cid=store_top_nav&amp;s_tid=gn_store">How to Buy</A>

Get the hyperlink references using getAttribute. Specify the attribute name "href".

attr = "href";
str = getAttribute(subtrees,attr);
str(1:10)
ans = 10×1 string array
    "https://www.mathworks.com?s_tid=gn_logo"
    "https://www.mathworks.com/login?uri=http://www.mathworks.com/help/textanalytics/index.html"
    "https://www.mathworks.com/products.html?s_tid=gn_ps"
    "https://www.mathworks.com/solutions.html?s_tid=gn_sol"
    "https://www.mathworks.com/academia.html?s_tid=gn_acad"
    "https://www.mathworks.com/support.html?s_tid=gn_supp"
    "https://www.mathworks.com/matlabcentral/?s_tid=gn_mlc"
    "https://www.mathworks.com/company/events.html?s_tid=gn_ev"
    "https://www.mathworks.com/company/aboutus/contact_us.html?s_tid=gn_cntus"
    "https://www.mathworks.com/store?s_cid=store_top_nav&s_tid=gn_store"

Input Arguments

collapse all

HTML tree, specified as an htmlTree array.

Attribute name, specified as a string scalar, character vector, or a scalar cell array containing a character vector.

Output Arguments

collapse all

HTML attribute, returned as a string array

More About

collapse all

HTML Elements

A typical HTML element contains the following components:

  • Element name – Name of the HTML tag. The element name corresponds to the Name property of the HTML tree.

  • Attributes – Additional information about the tag. HTML attributes have the form name="value", where name and value denote the attribute name and value respectively. The attributes appear inside the opening HTML tag. To get the attribute values from an HTML tree, use getAttribute.

  • Content – Element content. The content appears between opening and closing HTML tags. The content can be text data or nested HTML elements. To extract the text from an htmlTree object, use extractHTMLText. To get the nested HTML elements of an htmlTree object, use the Children property.

For example, the HTML element <a href="https://www.mathworks.com">Home</a> comprises the following components:

ComponentValueDescription
Element nameaElement is a hyperlink
AttributeAttribute namehrefHyperlink reference
Attribute value"https://www.mathworks.com"Hyperlink reference value
ContentHomeText to display

Version History

Introduced in R2018b