A problem while splitting a text input with regexp

I have a text file with the input as
sammy yo yo
yoyo with you
Samyukta
and I tried the following code to put each word into an element of an array.
fid = fopen('test4.txt');
table = fscanf(fid,'%c');
table2 = regexp(table,'\n','split');
this means that when I refer to table2{1}, it returns 'sammy yo yo' then I split every line individually with strsplit and ' ' (whitespace) as the delimiter. Therefore, when I refer to table2{1}{2} , it returns 'ýo'. But, the last word of every line has more number of letters than appears i.e. size(table2{1}{2},2) = 3 rather than 2. But when I strcmp it with '\n' and ' ' or any other thing, it returns 0. So now I don't know what to do.

2 comentarios

What shows up for
table2{1}{2}(end) + 0
I suspect you will find it is 13 (carriage return)
it gives an error : cell contents reference from a non cell array object

Iniciar sesión para comentar.

 Respuesta aceptada

Cedric
Cedric el 15 de Ag. de 2013
>> fprintf('%d,', table) ; fprintf('\n') ;
115,97,109,109,121,32,121,111,32,121,111,13,10,121,111,121,111,32,119,105,
116,104,32,121,111,117,13,10,83,97,109,121,117,107,116,97,13,10,
As you can see, at the end of each line, there are 13 (carriage return: '\r') and 10 (new line: '\n').
If you just want to split words, why don't you split using REGEXP only with a pattern which matches whitespaces? For example:
>> buffer = fileread('test4.txt') ;
>> words = regexp(buffer, '\s+', 'split')
words =
'sammy' 'yo' 'yo' 'yoyo' 'with' 'you' 'Samyukta' ''
with this, you would just have to delete the last cell when empty (which happens when your file ends with '\r\n'), and you would be done.

2 comentarios

Only if the file was created with an older MS Windows editor. More modern MS Windows editors only put in \n (newline) without \r (carriage return). Linux and OS-X have never used \r . (MacOS before OS-X might have used \r )
Cedric
Cedric el 15 de Ag. de 2013
Editada: Cedric el 15 de Ag. de 2013
Or MATLAB editor actually (I used it to generate this file on 2012b, Win7/64).
The pattern '\s+' works in all cases though.

Iniciar sesión para comentar.

Más respuestas (0)

Categorías

Más información sobre Characters and Strings en Centro de ayuda y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by