extract from text file

2 visualizaciones (últimos 30 días)
sas0701
sas0701 el 25 de Abr. de 2013
Comentada: D. Ali el 27 de Abr. de 2019
Hi, I am trying to use Matlab for this because I have ~300 files for which this needs to be done..
The data in the .txt file looks like this:
Event List
Patient Name: xyz Export Time: Thursday, April 11, 2013, 15:25:00 User Name: xyz Test ID: Name Time Duration Exam Start 00:00:00 00:00 Impedance - Impedance Values 00:00:00 00:00 Change - Montage Apply: 0bical.mtg 00:00:17 00:00 Calibration Off 00:00:18 00:00 Change - Montage Apply: 1doublebanana.mtg 00:00:36 00:00 . . .
I need a matrix file with 3 columns like this 00:00:00 00:00:00 Exam Start 00:00:00 00:00:00 Impedance - Impedance Values
i.e. 1) discard the first 7 lines.. 2) text goes into column 3 3) First values after text go into column 1 3) Second value + first value go into column 2
Is this possible?
Thanks, S
  1 comentario
Cedric
Cedric el 25 de Abr. de 2013
You seem to be saying that your data is not available in column (i.e. it is a stream of characters) but then you say that the first 7 lines have to be discarded, which seems to indicate that there is some structure.
Reading
Patient Name: xyz Export Time: Thursday, April 11, 2013, 15:25:00 User Name: xyz Test ID: Name Time Duration Exam Start 00:00:00 00:00 Impedance - Impedance Values 00:00:00 00:00 Change - Montage Apply: 0bical.mtg 00:00:17 00:00 Calibration Off 00:00:18 00:00 Change - Montage Apply: 1doublebanana.mtg 00:00:36 00:00...
doesn't really allow me to determine what you call "text", or columns 1, 2, 3. Could you be more specific or reformat the question using "code" formatting?
There are several options for extracting data from a text file, ranging from TEXTREAD/SCAN, STRFIND, etc, for formatted enough content, to regular expressions for pattern matching in more complicated or less formatted content. Once you refine the question, we can talk about an appropriate approach.

Iniciar sesión para comentar.

Respuesta aceptada

Cedric
Cedric el 25 de Abr. de 2013
Editada: Cedric el 25 de Abr. de 2013
New version, taking into account your last comment.
fname_in = 'myFile.txt' ;
fname_out = 'myFile.xlsx' ;
headerSize = 8 ;
fid = fopen(fname_in, 'r') ;
% - Discard header.
for k = 1 : headerSize
if feof(fid)
error('Structure discrepancy in %s', fname_in) ;
end
fgetl(fid) ;
end
% - Extract/process content.
buffer = cell(1e6, 3) ; % Cheap prealloc..
lineCnt = 0 ;
while ~feof(fid)
lineCnt = lineCnt + 1 ;
line = fgetl(fid) ;
lStr = line(1:end-15) ; % Label string.
dStr = line(end-13:end-6) ; % Date string.
tStr= line(end-4:end) ; % Time string.
fprintf('|%s|%s|%s|\n', dStr, tStr, lStr) ; % Just for testing.
buffer(lineCnt, :) = {dStr, tStr, lStr} ;
end
fclose(fid) ;
buffer = buffer(1:lineCnt,:) ; % Truncate prealloc.
% - Export to XLSX.
xlswrite(fname_out, buffer) ;
You might want to improve the prealloc, adding blocks of cells when/if the initial size is too small. Note that this solution assumes the time stamp (date, time) to have always the same structure.
  5 comentarios
Cedric
Cedric el 26 de Abr. de 2013
Editada: Cedric el 26 de Abr. de 2013
If you look for "MATLAB Excel Mac" online, you'll see that managing Excel files on Macs has not been working until very recently. I think that your best option is to export to CSV if you want a format that can be read by Excel.
CSVREAD and DLMREAD won't manage text content though, so you can't use them to build the CSV file. One option is to implement the export in the loop that we just built (avoid at the same time the burden to build the cell array):
fname_in = 'myFile.txt' ;
fname_out = 'myFile_out.csv' ;
headerSize = 8 ;
fid_in = fopen(fname_in, 'r') ;
% - Discard header.
for k = 1 : headerSize
if feof(fid_in)
error('Structure discrepancy in %s', fname_in) ;
end
fgetl(fid_in) ;
end
% - Extract/process content and output CSV content.
fid_out = fopen(fname_out, 'w') ;
while ~feof(fid_in)
line = fgetl(fid_in) ;
lStr = line(1:end-15) ; % Label string.
dStr = line(end-13:end-6) ; % Date string.
tStr= line(end-4:end) ; % Time string.
fprintf('|%s|%s|%s|\n', dStr, tStr, lStr) ; % Just for testing.
fprintf(fid_out, '%s,%s,%s\n', dStr, tStr, lStr) ;
end
fclose(fid_in) ;
fclose(fid_out) ;
sas0701
sas0701 el 26 de Abr. de 2013
Hi, I just tried the code on windows R2007b and got the same error. It looks like I am missing actxserver in both R2013(mac) and R2007b(windows). Not sure why but if I try
Warning: Could not start Excel server for export.
XLSWRITE will attempt to write file in CSV format.
> In xlswrite at 175
>> Excel = actxserver('Excel.Application');
Undefined function 'actxserver' for input arguments of type 'char'.
I'll start a new thread regarding this.. Thank you for you help on the above.. Very nice :)

Iniciar sesión para comentar.

Más respuestas (3)

sas0701
sas0701 el 25 de Abr. de 2013
Editada: sas0701 el 25 de Abr. de 2013
Sorry here goes again - First line is Event List, there is an empty line between Event List and Patient Name also.So I basically want to ignore everything upto the 8th line - Exam Start..:
Event List
Patient Name:xyz
Export Time: Fri..
User Name: xyx
Test ID:
Name Time Duration
Exam Start 00:00:00 00:00
Impedance - Impedance Values 00:00:00 00:00
Change - Montage Apply: 0bical.mtg 00:00:23 00:00
Calibration Off 00:00:26 00:00
  2 comentarios
Matt Kindig
Matt Kindig el 25 de Abr. de 2013
Editada: Matt Kindig el 25 de Abr. de 2013
Can you also post your intended output, i.e., what data after the 8th line that you want to retain, how you want to be structured, etc.? Your goal is still a bit unclear.
Cedric
Cedric el 25 de Abr. de 2013
After the 8th line, you have multiple blocks of
Exam Start 00:00:00 00:00
Impedance - Impedance Values 00:00:00 00:00
Change - Montage Apply: 0bical.mtg 00:00:23 00:00
Calibration Off 00:00:26 00:00
that you want to transpose into a structure that has one row per block?

Iniciar sesión para comentar.


sas0701
sas0701 el 25 de Abr. de 2013
Hi,
What I need is this:
00:00:00 00:00:00 Exam Start
00:00:00 00:00:00 Impedance - Impedance Values
00:00:23 00:00:23 Change - Montage Apply: 0bical.mtg
00:00:26 00:00:26 Calibration Off
%-----
Column 1 is the first value after the text (eg. after Exam Start)
Column 2 is first value + second value (in these examples second value is 0 but sometimes its not)
Column 3 is text
  3 comentarios
sas0701
sas0701 el 25 de Abr. de 2013
Hi, There are 100s of line of data (I have only shown 4) and sometimes the names can repeat - no order to this. And the text can be anything..
I need this to be output to an .xls file - so 3 columns :)
Thanks again!
Cedric
Cedric el 25 de Abr. de 2013
Please, see my updated answer.

Iniciar sesión para comentar.


D. Ali
D. Ali el 27 de Abr. de 2019
I have similar question where I need to extarct all MCAP with time they occured on in separat file and plot if possilbe
I attached the file
  2 comentarios
Cedric
Cedric el 27 de Abr. de 2019
Hi, you should start a new thread with this question.
D. Ali
D. Ali el 27 de Abr. de 2019
I did post two threads with the question but didn't get answers so I am trying to get answers from similar threads

Iniciar sesión para comentar.

Etiquetas

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by