Include Newline in Textscan

I have the following function for auto-importing data from a tabular text file with delimiter " ".
function RMM_New_Bins = Import_RMM(filename, startRow, endRow)
delimiter = ' ';
if nargin<=2
startRow = 1;
endRow = inf;
end
formatSpec = '%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%[^\n\r]';
fileID = fopen(filename,'r');
textscan(fileID, '%[^\n\r]', startRow(1)-1, 'WhiteSpace', '', 'ReturnOnError', false);
dataArray = textscan(fileID, formatSpec, endRow(1)-startRow(1)+1, 'Delimiter', delimiter, 'ReturnOnError', false);
for block=2:length(startRow)
frewind(fileID);
textscan(fileID, '%[^\n\r]', startRow(block)-1, 'WhiteSpace', '', 'ReturnOnError', false);
dataArrayBlock = textscan(fileID, formatSpec, endRow(block)-startRow(block)+1, 'Delimiter', delimiter, 'ReturnOnError', false);
for col=1:length(dataArray)
dataArray{col} = [dataArray{col};dataArrayBlock{col}];
end
end
fclose(fileID);
RMM_New_Bins = [dataArray{1:end-1}];
How do I include blank lines that are tabs in the RMM_New_Bins? I think it has to do with the textscan command.

5 comentarios

Azzi Abdelmalek
Azzi Abdelmalek el 19 de Jul. de 2016
How and where?
dpb
dpb el 19 de Jul. de 2016
Generally, unless you're trying to skip data you don't need to worry about them; they'll be transparently "eaten" by the scanner when it goes to the next matching record.
What's the file format; what are you trying to do in the above code?
At a minimum, use repmat to build format string with a multiple of repeated fields, don't string together umpty-dozen in totally indecipherable string...
fmt=[repmat('%s',1,N) '%*[^\n]']; % N %s fields, skip rest of line
Suganth Kannan
Suganth Kannan el 20 de Jul. de 2016
Editada: Suganth Kannan el 20 de Jul. de 2016
The file was created by a GNU Awk script and its format is tabular with delimiter " " and I am trying to include the newlines in between the lines of data.
Stephen23
Stephen23 el 20 de Jul. de 2016
@Suganth Kannan: please edit your question and upload a sample file by clicking the paperclip button.
Suganth Kannan
Suganth Kannan el 20 de Jul. de 2016
Editada: Suganth Kannan el 20 de Jul. de 2016
@Stephen Cobeldick Have attached the sample file: "Sample.txt". Please click on it and highlight it in the browser to see how the file looks. Line 2 (blank) should not show up in the imported array. Only lines 1, 3, 4 (blank), 5, and 6 (blank) should show up.

Iniciar sesión para comentar.

Respuestas (1)

dpb
dpb el 20 de Jul. de 2016
Editada: dpb el 23 de Jul. de 2016

1 voto

Pretty bizzaro request: "Line 2 (blank) should not show up in the imported array. Only lines 1, 3, 4 (blank), 5, and 6 (blank) should show up" but whatever...
As it's a mixture of data can only have it as a (padded) character array or a cellstr array--I'll choose cellstr as it's much more flexible--
>> f=textread('sample.txt','%s','delimiter','\n','whitespace',''); % read full file as cellstr
>> f(cellfun(@isempty,f))=[]; % eliminate any totally blank rows
>> f % display result...
f =
' 4[A] 6[B] 7[C] '
'4[A] 0 6.23022 '
' '
'6[B] 6.23022 0 '
' '
>>
"How do I include blank lines that are tabs..."
ADDENDUM 2: Lines which contain tabs aren't really blank. Of course, to read them you have to use a way that doesn't treat them as delimiters or whitespace which the above does in reading each record as a cell string.
Then, as modified above, you can simply test for and eliminate those records that were, indeed, blank (and, hence, return TRUE for isempty on the record. Note that isempty on the array f is not the same thing as applying isempty to each member of the cell array which cellfun does).
Continuing after the diversion the previous illustration--
>> ~cellfun(@isempty,strfind(f,char(9)))
ans =
0
0
1
0
1
>> find(~cellfun(@isempty,strfind(f,char(9))))
ans =
3
5
>>
ADDENDUM
You can, of course, read the non-blank fields in the file with textscan, but it is impossible other than by parsing character-by-character to separate a blank field from a blank delimiter and so all you'll be able to get with it will be the non-missing data--
>> fid=fopen('sample.txt','r');
>> f=textscan(fid,repmat('%s',1,3),'collectoutput',1)
f =
{3x3 cell}
>> f{:}
ans =
'4[A]' '6[B]' '7[C]'
'4[A]' '0' '6.23022'
'6[B]' '6.23022' '0'
>>

3 comentarios

Suganth Kannan
Suganth Kannan el 22 de Jul. de 2016
Editada: Suganth Kannan el 22 de Jul. de 2016
@dpb If you highlight, you will see line 2 is fundamentally different from lines 4 and 6. If you could edit based of this info, that would be great. I using the import GUI and changed all the cell array columns to text. MATLAB auto-ignored line 2 but imported lines 4 and 6.
dpb
dpb el 22 de Jul. de 2016
Editada: dpb el 23 de Jul. de 2016
I have no idea what "If you could edit based of this info, ..." means; I gave you a way to import the data and find rows containing a tab; if the idea is to eliminate rows that are all blank and delete them but keep those with the tab, then
f(cellfun(@length,f)==0)=[];
instead of the aforementioned fixed 2 after reading the file should clean those lines out.
dpb
dpb el 23 de Jul. de 2016
It dawned on me overnight it's actually an empty string if length()==0 so use the simpler test instead. I made a second update to the original answer incorporating same...

Iniciar sesión para comentar.

Etiquetas

Preguntada:

el 19 de Jul. de 2016

Editada:

dpb
el 23 de Jul. de 2016

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by