Reading in specific column and plotting bar chart

I have a text file as:
Heading A
------------------------
Heading B
GA008246-0_B_F_1852967891 X 7117
GA011810-0_B_F_1852968731 14 7380
GA017861-0_B_F_1852970072 22 7749
GA017864-0_T_R_1853027526 22 7751
GA017866-0_T_R_1853027527 22 7753
GA017875-0_B_R_1852970076 22 7755
I want to be able to plot a histogram of the 2nd column under the title Heading B. sometimes there are additonal lines under heading A.
This is what I have so far.
%Read in data file
fid = fopen('c:\myfile.txt','rt');
C = textscan (fid, '%s %s s', 'delimiter', '\t','headerlines', 1)
while (strcmp(C{1}{1}, 'Heading B') == 0)
C = textscan (fid, '%s %s %s', 'delimiter', '\t')
end
fclose(fid);
C{:,2}
But Im picking out one too early item i.e.
ans =
''
'X'
'14'
'22'
'22'
'22'
'22'
once the additional ' ' item is removed, how can I plot a bar chart showing the number of occurances of each of these int he list. i.e. in this example
X = 1 repetition 14 = 1 repetition 22 = 4 repetitions
Tanaks for any help. Jsaon

 Respuesta aceptada

Guillaume
Guillaume el 14 de Abr. de 2015
Editada: Guillaume el 14 de Abr. de 2015
I would use fgetl instead of textscan to find the start of the heading B section, then use textscan to read it.
fid = fopen('c:\myfile.txt','rt');
tline = fgetl(fid);
while ~isnumeric(tline) && ~strcmp(tline, 'Heading B')
tline = fgetl(fid);
end
if isnumeric(tline) %eol reach before Heading B
error('End of file reached prematurely');
end
C = textscan (fid, '%s %s %s', 'delimiter', '\t');
To find the number of repetitions in a column of C, use the third return value of unique together with histc:
[names, ~, position] = unique(C{2})
repetitions = histc(position, 1:numel(names))
%useful for seeing the result:
table(names, repetitions)

5 comentarios

Jason
Jason el 14 de Abr. de 2015
Editada: Jason el 14 de Abr. de 2015
Thanks, in my much larger file, the result of your code is: names repetitions ___ _________
'0' 48
'1' 198
'10' 75
'11' 80
'12' 119
'13' 166
'14' 79
'15' 86
'16' 66
'17' 112
'18' 40
'19' 49
'2' 147
'20' 48
'21' 22
'22' 114
'3' 119
'4' 109
'5' 111
'6' 145
'7' 110
'8' 93
'9' 75
'X' 186
'XY' 5
'Y' 26
How do I know plot the histogram or bar chart, preferably with the strings in number order?
My approach which is wrong:
figure
bar(repetitions)
set(gca,'XTickLabel',names);
Thanks Jason
Guillaume
Guillaume el 14 de Abr. de 2015
Editada: Guillaume el 14 de Abr. de 2015
Natural sorting is not implemented in matlab, but there are a number of file exchange entries that should work: FEX 34464, FEX 10959, and probably the best for you: FEX 47433.
Another option would be to keep the order the values appear in your files instead of sorting them. You'd just add the 'stable' option to the unique call to do that:
[names, ~, position] = unique(C{2}, 'stable');
Jason
Jason el 14 de Abr. de 2015
OK, I don't mind about the sorting, but Im not getting all of the labels on my bar chart
Oh, sorry I misunderstood. You also need to change the position and numbers of ticks (XTick property)
set(gca, 'XTickLabel', names, 'XTick', 1:numel(names))
should work.
Jason
Jason el 15 de Abr. de 2015
Perfect, thankyou.

Iniciar sesión para comentar.

Más respuestas (1)

Star Strider
Star Strider el 14 de Abr. de 2015
I don’t have your file, but I would change the textscan call to:
C = textscan (fid, '%s %f %f', 'delimiter', '\t','headerlines', 3)
The initial ‘X’ in column #2 will then show up as either '' or NaN, so you can eliminate it by using isempty or isnan, as appropriate.

2 comentarios

Jason
Jason el 14 de Abr. de 2015
Editada: Jason el 14 de Abr. de 2015
The problem is that there are sometimes lines under "Heading A", so the number of lines until I find "Heading B" is variable.
I actually want the X as well as the numbers (its to do with Chromosomes). Its actually this mixture of text and numbers in the cell array that I am finding it hard to plot a bar chart showing the frequency of each string.
I've included the txt file. Thanks
Star Strider
Star Strider el 14 de Abr. de 2015
Editada: Star Strider el 14 de Abr. de 2015
This works for the current file:
fidi = fopen('test1.txt');
C = textscan (fidi, '%s %s %s', 'delimiter', '\t','headerlines', 2);
C2 = C{2};
Ix = cellfun(@isempty,C2);
[C2u,ia,ic] = unique(C2(~Ix));
cnts = hist(ic,length(C2u));
figure(1)
bar(cnts)
xt = get(gca, 'XTick');
set(gca, 'XTick', xt, 'XTickLabel',C2u)
EDIT —
Added plot ...

Iniciar sesión para comentar.

Categorías

Más información sobre Labels and Annotations en Centro de ayuda y File Exchange.

Preguntada:

el 14 de Abr. de 2015

Comentada:

el 15 de Abr. de 2015

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by