Yet another TEXTSCAN question...

Example string:
s = ['"1","2","3"' 10 '"","2","3"' 10 '"1","","3"' 10 '"1","2",""' 10 '"","",""' 10]
s =
'"1","2","3"
"","2","3"
"1","","3"
"1","2",""
"","",""
'
I want to extract columns as either cellstring or as numbers, using textscan (because it is fast). I can cheat and do this with the following:
t=textscan(strrep(s,'"',''),'%f%f%f','Delimiter',','); [t{:}] %as number
ans =
1 2 3
NaN 2 3
1 NaN 3
1 2 NaN
NaN NaN NaN
t=textscan(strrep(s,'"',''),'%s%s%s','Delimiter',','); [t{:}] %as string
ans =
5×3 cell array
{'1' } {'2' } {'3' }
{0×0 char} {'2' } {'3' }
{'1' } {0×0 char} {'3' }
{'1' } {'2' } {0×0 char}
{0×0 char} {0×0 char} {0×0 char}
But how to do it without strrep? so as to operate on file_id directly..
I have spent hours, thinking I've almost got it, 1 million permutations later no joy.. :'( :'( :'(

2 comentarios

Rik
Rik el 27 de Mayo de 2018
Is strrep so much slower that it is not feasible?
Serge
Serge el 28 de Mayo de 2018
Performance wise this is ok because textscan is doing 90% of the work, but it does requires the whole file to be read in first, even if say you only want one of 50 fields in the data. Using textscan directly on the file_id would have been neater. It looks like it is not possible with this file format...

Iniciar sesión para comentar.

Respuestas (2)

dpb
dpb el 27 de Mayo de 2018
Editada: dpb el 27 de Mayo de 2018
Let textscan do the equivalent strrep for you...
>> fmt1=repmat('%f',1,3);
>> t=cell2mat(textscan(s,fmt1,'delim',',','collectout',1,'whitespace','"'))
t =
1 2 3
NaN 2 3
1 NaN 3
1 2 NaN
NaN NaN NaN
>> fmt2=repmat('%s',1,3);
>> t=textscan(s,fmt2,'delim',',','collectout',1,'whitespace','"')
t =
1×1 cell array
{5×3 cell}
>> t{:}
ans =
5×3 cell array
{'1"' } {'2"' } {'3"' }
{0×0 char} {'2"' } {'3"' }
{'1"' } {0×0 char} {'3"' }
{'1"' } {'2"' } {0×0 char}
{0×0 char} {0×0 char} {0×0 char}
>>

7 comentarios

Serge
Serge el 28 de Mayo de 2018
But how to remove the " from the %s case?
dpb
dpb el 28 de Mayo de 2018
Editada: dpb el 28 de Mayo de 2018
Hmmm....interesting, just noticed the second one is left...looks like bug in textscan to me.
Maybe generate the file without the superfluous quotes to begin with???
Serge
Serge el 28 de Mayo de 2018
Wish they were my files..
Stephen23
Stephen23 el 28 de Mayo de 2018
Editada: Stephen23 el 28 de Mayo de 2018
To read them as char use %q instead of %s, which will automatically ignore the double quotes:
out = textscan(s, '%q%q%q', 'Delimiter',',', 'CollectOutput',1)
Stephen23
Stephen23 el 28 de Mayo de 2018
Editada: Stephen23 el 28 de Mayo de 2018
Serge's "Answer" moved here:
Sorry, forgot to say that a file contains a mixture of string and value columns. So I am afraid that does now work.
dpb
dpb el 28 de Mayo de 2018
+1 Stephen; forgot about '%q'.
Serge, it works to reproduce your suggested/requested output for the strings case; the alternative posted works for numeric.
Serge
Serge el 28 de Mayo de 2018
Perhaps this is a better example:
s = ['"","",""' 10 '"a","2",""' 10 '"a","","c"' 10 '"","2","c"' 10]
s =
'"","",""
"a","2",""
"a","","c"
"","2","c"
'
Where: any value can be any length or empty, any column may be numeric, which is described by file header.
I almost got it working with this ugly thing:
t = textscan(s,'%q%f%s','delim',{'","' '"'})
ISSUE: first column cannot be numeric, it must be string, and format for first column must be %q, for subsequent string columns must use %s.
Would love to see other suggestions.. Because this is ugly and has issues I'll stay with the strrep cheat.

Iniciar sesión para comentar.

Jeremy Hughes
Jeremy Hughes el 29 de Mayo de 2018
If the numbers are always surrounded by double-quotes, try this,
t = textscan(s,'"%f""%f""%f"','Delimiter',',')
or,
t = textscan(s,'%f%f%f', 'Delimiter',',','Whitespace',' \t"')
There's a lot of knobs in textscan. If you have a file with this kind of data, I suggest:
opts = detectImportOptions(filename)
t = readtable(filename)
HTH,
Jeremy

1 comentario

Serge
Serge el 29 de Mayo de 2018
Thank you Jeremy,
I think I have tried every permutation under the sun :/
This one fails when a value is empty: ,"",
t = textscan(s,'"%f""%f""%f"','Delimiter',',')
And this one grabs " at the end of strings, eg 'a"':
s = ['"a","","c"' 10 '"","2","c"' 10]
t = textscan(s,'%s%f%q','Delimiter',',','Whitespace',' \t"')
dbp said it looks like a BUG IN TEXTSCAN and I am inclined to agree.

Iniciar sesión para comentar.

Categorías

Preguntada:

el 27 de Mayo de 2018

Comentada:

el 29 de Mayo de 2018

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by