I have a big XML file and I would like to count how many a specific tag 'X' appears in this text file.
Example: In this example I have a total of 4 X tags in the whole XML file
%% === HERE I HAVE 3 'X Tags' in the XML Level 3 ===
<A> % Level 1
<B> % Level 2
<X> % Level 3
"Some text 1"
</X>
<X> % Level 3
"Some text 2"
</X>
<X> % Level 3
"Some text 3"
</X>
</ B>
</A>
%% === HERE I HAVE ONLY 1 'X Tag' in the XML Level 3 ===
<A> % Level 1
<B> % Level 2
<X> % Level 3
"Some text 4"
</X>
</ B>
</A>
%% === HERE I DON'T HAVE ANY 'X Tag' in the XML Level 3, AND THE LEVEL 3 DOESN'T EXISTS TOO ===
<A> % Level 1
<B> % Level 2
</ B>
</A>
How can I count the total amount of 'X' tags and get a good performace and small time to compute it?

 Respuesta aceptada

Sean de Wolski
Sean de Wolski el 3 de En. de 2019

1 voto

Something along these lines:
xml = xmlread(xmlfile);
Xtag = xml.getElementsByTagName('X');
Xtag.getLength

4 comentarios

Nycholas Maia
Nycholas Maia el 4 de En. de 2019
Thank you Sean de Wolski, but looking my code again, I have a different question.
Please, look my code below:
%% SELECT A XML FILE:
% Open a GUI to select a XML file:
[file, path] = uigetfile('*.xml');
% Check if the user select a file:
if isequal(file, 0)
disp('User selected Cancel');
else
% Get the absolute path of the selected file:
full_path = fullfile(path, file);
disp(['User selected ', full_path]);
end
% Init parse time:
start_time = tic;
%% GET THE USEFUL CONTENT OF THE XML FILE:
% Open the XML file:
fileID = fopen(full_path, 'r');
% Read/discard the first line of the file:
fgetl(fileID);
% Read/discard the second line of the file:
fgetl(fileID);
% Get the useful XML content:
buffer = fread(fileID, '*char');
% Close XML file:
fclose(fileID);
disp('Trying to parse the Music XML file...');
%% TRY TO PARSE THE MXML CONTENT TO AN MATLAB STRUCT:
try
% Try to parse XML text to MATLAB struct:
xml = xml2struct(buffer);
% Remove the first node/tag of the MXML file: <score-partwise>
xml = cell2mat(struct2cell(xml));
catch error
disp('XML Conversion: Invalid XML format...');
rethrow(error);
end
My big XML have a non-standard XML header, and because of this I'm NOT using the MATLAB 'xmlread' function. This oficial MATLAB function returns a Java Error in this non-standard XML file type (Music XML file)
As you can see, I'm using the MATLAB 'fread' after skipping the header, resulting in the variable 'buffer'.
After this, to manipulate this data like a XML tree, I convert the 'buffer' to a 'struct', and after I remove the unused 'root' node <score-partwise>.
The final variable 'xml' is a MATLAB struct type without a dummy unused 'root' node.
I DON'T KNOW if this code is a good way to parse my special case of Music XML file to a Matlab tree/struct.
In MATLAB terminal, I get my variable tag values like this:
xml.part{p}.measure{m}.note{n}.pitch
where p, m, n are scalar indexes.
Finally: How can I count the number of <note> tags in this Music XML file?
Nycholas Maia
Nycholas Maia el 4 de En. de 2019
<?xml version="1.0" encoding='UTF-8' standalone='no' ?>
<!DOCTYPE score-partwise PUBLIC "-//Recordare//DTD MusicXML 3.0 Partwise//EN" "http://www.musicxml.org/dtds/partwise.dtd">
<score-partwise version="3.0">
<work>
<work-title>teste2</work-title>
</work>
<identification>
<rights>Copyright © </rights>
<encoding>
<encoding-date>2018-11-09</encoding-date>
<encoder>Nyck</encoder>
<software>Sibelius 8.3.0</software>
<software>Direct export, not from Dolet</software>
<encoding-description>Sibelius / MusicXML 3.0</encoding-description>
<supports element="print" type="yes" value="yes" attribute="new-system" />
<supports element="print" type="yes" value="yes" attribute="new-page" />
<supports element="accidental" type="yes" />
<supports element="beam" type="yes" />
<supports element="stem" type="yes" />
</encoding>
</identification>
<defaults>
<scaling>
<millimeters>210</millimeters>
<tenths>1200</tenths>
</scaling>
<page-layout>
<page-height>1697</page-height>
<page-width>1200</page-width>
<page-margins type="both">
<left-margin>72</left-margin>
<right-margin>72</right-margin>
<top-margin>72</top-margin>
<bottom-margin>72</bottom-margin>
</page-margins>
</page-layout>
<system-layout>
<system-margins>
<left-margin>67</left-margin>
<right-margin>0</right-margin>
</system-margins>
<system-distance>92</system-distance>
</system-layout>
<appearance>
<line-width type="stem">0.9375</line-width>
<line-width type="beam">5</line-width>
<line-width type="staff">0.9375</line-width>
<line-width type="light barline">1.5625</line-width>
<line-width type="heavy barline">5</line-width>
<line-width type="leger">1.5625</line-width>
<line-width type="ending">1.5625</line-width>
<line-width type="wedge">1.25</line-width>
<line-width type="enclosure">0.9375</line-width>
<line-width type="tuplet bracket">1.25</line-width>
<line-width type="bracket">5</line-width>
<line-width type="dashes">1.5625</line-width>
<line-width type="extend">0.9375</line-width>
<line-width type="octave shift">1.5625</line-width>
<line-width type="pedal">1.5625</line-width>
<line-width type="slur middle">1.5625</line-width>
<line-width type="slur tip">0.625</line-width>
<line-width type="tie middle">1.5625</line-width>
<line-width type="tie tip">0.625</line-width>
<note-size type="cue">75</note-size>
<note-size type="grace">60</note-size>
</appearance>
<music-font font-family="Opus Std" font-size="19.8425" />
<lyric-font font-family="Plantin MT Std" font-size="11.4715" />
<lyric-language xml:lang="en" />
</defaults>
<credit page="1">
<credit-words default-x="600" default-y="155" font-family="Plantin MT Std" font-style="normal" font-size="22.0128" font-weight="normal" justify="center" valign="middle">teste2</credit-words>
</credit>
<part-list>
<part-group type="start" number="1">
<group-symbol>brace</group-symbol>
</part-group>
<score-part id="P1">
<part-name>Piano</part-name>
<part-name-display>
<display-text>Piano</display-text>
</part-name-display>
<part-abbreviation>Pno.</part-abbreviation>
<part-abbreviation-display>
<display-text>Pno.</display-text>
</part-abbreviation-display>
<score-instrument id="P1-I1">
<instrument-name>Piano (2)</instrument-name>
<instrument-sound>keyboard.piano.grand</instrument-sound>
<solo />
<virtual-instrument>
<virtual-library>General MIDI</virtual-library>
<virtual-name>Acoustic Piano</virtual-name>
</virtual-instrument>
</score-instrument>
</score-part>
<part-group type="stop" number="1" />
</part-list>
<part id="P1">
<!--============== Part: P1, Measure: 1 ==============-->
<measure number="1" width="974">
<print new-page="yes">
<system-layout>
<system-margins>
<left-margin>80</left-margin>
<right-margin>0</right-margin>
</system-margins>
<top-system-distance>218</top-system-distance>
</system-layout>
</print>
<attributes>
<divisions>256</divisions>
<key color="#000000">
<fifths>0</fifths>
<mode>major</mode>
</key>
<time color="#000000">
<beats>4</beats>
<beat-type>4</beat-type>
</time>
<staves>1</staves>
<clef number="1" color="#000000">
<sign>G</sign>
<line>2</line>
</clef>
<staff-details number="1" print-object="yes" />
</attributes>
<note color="#000000" default-x="76" default-y="-50">
<pitch>
<step>C</step>
<octave>5</octave>
</pitch>
<duration>256</duration>
<instrument id="P1-I1" />
<voice>1</voice>
<type>quarter</type>
<stem>down</stem>
<staff>1</staff>
</note>
<note color="#000000" default-x="297" default-y="-40">
<pitch>
<step>E</step>
<octave>5</octave>
</pitch>
<duration>256</duration>
<instrument id="P1-I1" />
<voice>1</voice>
<type>quarter</type>
<stem>down</stem>
<staff>1</staff>
</note>
<note default-x="517">
<rest />
<duration>256</duration>
<instrument id="P1-I1" />
<voice>1</voice>
<type>quarter</type>
<staff>1</staff>
</note>
<note color="#000000" default-x="738" default-y="5">
<pitch>
<step>G</step>
<octave>4</octave>
</pitch>
<duration>256</duration>
<instrument id="P1-I1" />
<voice>1</voice>
<type>quarter</type>
<stem>up</stem>
<staff>1</staff>
</note>
<barline>
<bar-style>light-heavy</bar-style>
</barline>
</measure>
</part>
</score-partwise>
Nycholas Maia
Nycholas Maia el 4 de En. de 2019
Now, I got the correct numbers using this:
text = fileread('my_file.xml');
num_parts = length(strfind(text, '</score-part>'));
num_measures = length(strfind(text, '</measure>'));
num_notes = length(strfind(text, '</note>'));
I don't know if this is the best way thinking about computation performace, but it's working for now...
If there is a better way to do this, please, comment!
Thank you!
Sean de Wolski
Sean de Wolski el 7 de En. de 2019
s = string(fileread(xmlfile))
xs = extractBetween(s, "<x", "</x>")
numel(xs)
Or similar.

Iniciar sesión para comentar.

Más respuestas (0)

Productos

Versión

R2018b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by