different number of delimiters error using readtable function

Question

sermet OGUTCU el 15 de Oct. de 2021

0
Enlazar

Enlace directo a esta pregunta

https://la.mathworks.com/matlabcentral/answers/1564531-different-number-of-delimiters-error-using-readtable-function

Editada: dpb el 16 de Oct. de 2021

I use the following code for reading text file:

fileID = fopen(full_file_name);
fclose(fileID);
tCOD=readtable(full_file_name,'FileType','text', ...
    'headerlines',25,'readvariablenames',0,'MultipleDelimsAsOne', true);

The above codes work for most of the text files I read. I attached one of them (data_without_problem.txt). But some text files, I receive the following error:

Error using readtable (line 216)
Reading failed at line 121. All lines of a text file must have the same number of delimiters. Line 121 has 6 delimiters, while
    preceding lines have 5.
    
    Note: readtable detected the following parameters:
    'Delimiter', '\t ', 'MultipleDelimsAsOne', true, 'Format', '%q%f%f%f%f%f'

I attached this kind of text file (data_with_problem.txt).

How I can modify the above readtable function for working with text files that different number of delimiters in all lines?

My Matlab version is 2019a.

2 comentarios
Mostrar NingunoOcultar Ninguno

dpb el 15 de Oct. de 2021

Abrir en MATLAB Online

#dP2019  9  7  0  0  0.00000000     576   u+U IGS14 FIT  GFZ
## 2069 518400.00000000   300.00000000 58733 0.0000000000000
+   95   C01C02C03C04C05C06C07C08C09C10C11C12C13C14C16E01E02
+        E03E04E05E07E08E09E11E12E13E14E15E18E19E21E24E25E26
+        E27E30E31E33E36G01G02G03G04G05G06G07G08G09G10G11G12
+        G13G14G15G16G17G18G19G20G21G22G23G24G25G26G27G28G29
+        G30G31G32J02J03J07R01R02R03R05R07R08R09R11R12R13R14
+        R15R16R17R18R19R20R21R22R23R24 00 00 00 00 00 00 00
++        10 10 10 10 10  6  8  6  6  8 10  8  8  6  6  6  6
++         6  6  8  6  6  6  6  6  6  6  6  6  6  6  6  6  6
++         6  6  6  6  6  6  6  6  6  6  6  6  6  6  6  6  6
++         6  6  6  6  6  6  6  6  6  6  6  6  6  6  6  6  6
++         6  6  6  6  6  8 10  6  8  8  8  8  6  6  6  6  6
++         6  6  6  6  8  8  6  6  6  6  0  0  0  0  0  0  0
%c M  cc GPS ccc cccc cccc cccc cccc ccccc ccccc ccccc ccccc
%c cc cc ccc ccc cccc cccc cccc cccc ccccc ccccc ccccc ccccc
%f  1.2500000  1.025000000  0.00000000000  0.000000000000000
%f  0.0000000  0.000000000  0.00000000000  0.000000000000000
%i    0    0    0    0      0      0      0      0         0
%i    0    0    0    0      0      0      0      0         0
/* PCV:IGS14_2062 OL/AL:FES2004  NONE     YN CLK:CoN ORB:CoN
/*     GeoForschungsZentrum Potsdam
/*                                                   
/*                                                   
*  2019  9  7  0  0  0.00000000
PC01 -32247.666769  27128.253711    852.734449   -142.854736                    
PC02   4291.889841  41959.462941   -227.014442    886.812431                    
PC03 -14756.270737  39468.969011    529.367042    -15.526662                    
PC04 -39608.430012  14398.971601    684.035369    -18.784543     
...

is the beginning of the so-called "problem" file -- what do expect to be able to read from it?

It clearly has header information and different kinds of data in it; a "one size fits all" solution is unlikely to be possible unless you can just skip the header and read the regular data after the header information.

sermet OGUTCU el 15 de Oct. de 2021

Editada: sermet OGUTCU el 15 de Oct. de 2021

Abrir en MATLAB Online

The problem isn't related to the header part. The problem is related to the following parts from the data_with_problem.txt:

*  2019  9  8  0  0  0.00000000
PC01 -32239.736154  27137.541640    844.727407   -138.707032               P   P
PC02   4294.572473  41959.818862   -211.999776    884.760306               P   P
PC03 -14769.274349  39464.569794    538.154451     -7.386696               P   P
PC04 -39609.638005  14394.049565    685.546634    -20.124467               P   P
PC05  21849.957336  36044.780362   -426.557427     34.985740               P   P
PC06 -13524.102154  21018.274072 -33636.478150    425.878264               P   P
PC07 -22156.702195  33972.389053  10694.698624   -115.964428               P   P
PC08  -7463.496078  34069.209662  23990.306083     -2.364877               P   P
PC09   2195.129733  26020.780507 -32809.344188    -49.805776               P   P
PC10 -10965.396049  34662.341002  21172.021893   -155.406014               P   P
PC11   3987.733082  19949.125865  19183.812893    125.613081               P   P
PC12 -19966.340237    149.143914  19517.047857   -242.357648               P   P

P P parts make the problem when using readtable. The data_without_problem.txt doesn't include the P P parts and readtable works without any problem.

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

dpb el 15 de Oct. de 2021

1
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/1564531-different-number-of-delimiters-error-using-readtable-function#answer_809621

Editada: dpb el 16 de Oct. de 2021

Abrir en MATLAB Online

Use import options object -- although to write a fully generic import code you'll have to scan the file to find the number of header lines for each file as detectImportOptions isn't clever enough to know what you intend about the header data on its own...

I used the explicit number of header lines here

optW=detectImportOptions('data_with_problem.txt','NumHeaderLines',25,"CommentStyle",'*','ReadVariableNames',0);
optW.MissingRule='omitrow';
optW.SelectedVariableNames=opt.SelectedVariableNames(1:5);
tDW=readtable('data_with_problem.txt',optW);

This produces a file whos head and tail look like--

>> [head(tDW);tail(tDW)]
ans =
  16×5 table
      Var1       Var2       Var3       Var4       Var5  
    ________    _______    _______    _______    _______
    {'PC01'}     -32248      27128     852.73    -142.85
    {'PC02'}     4291.9      41959    -227.01     886.81
    {'PC03'}     -14756      39469     529.37    -15.527
    {'PC04'}     -39608      14399     684.04    -18.785
    {'PC05'}      21860      36038    -443.83     39.711
    {'PC06'}     -13165      21105     -33718     427.66
    {'PC07'}     -22236      33734      11284    -113.86
    {'PC08'}    -7668.8      34363      23503    -2.0925
    {'PR17'}     -10797     3515.3      22840      258.4
    {'PR18'}     2372.7      16249      19510     6.6669
    {'PR19'}      13760      20695     5772.9    -52.585
    {'PR20'}      17905      12385     -13261    -389.79
    {'PR21'}      10304    -4036.6     -22990    -70.999
    {'PR22'}    -4738.6     -17784     -17720    -36.732
    {'PR23'}     -16617     -19348      7.969     252.43
    {'PR24'}     -17845     -10556      14834    -184.49
>> 
>> whos tDW
Name        Size            Bytes  Class    Attributes
tDW       380x5             56568  table              
>> 

The same logic will work for the files without the trailing 'P' in the records; the key is to tell it to only import the field name and the four numeric variables.

That assumes you don't need those based on your above description. If you do need them, then use

optW.ExtraColumnsRule='addvars';

and don't limit the number of SelectedVariables size.

With the variable number of header lines determined externally first, the above will work for either file; you'll note I used the 'CommentStyle','*' to get rid of the date stamp rows; if you want to keep those to parse them separately, then remove that. By using it, readtable is not flexible enough to have more than one comment character so I used the 'omitrow' for 'Missing' to eliminate the last EOF record. If you keep the commented time fields, then you could set the comment character to 'E' for that purpose instead.

ADDENDUM:

A little routine to return the number of header lines could look something like --

function nHdr=getNumHeaderLines(file)
  fid=fopen(file);
  nHdr=1;
  while ~startsWith(fgetl(fid),'* ')
    nHdr=nHdr+1;
  end
  fid=fclose(fid);
end

The above logic at the command for the problem data file returns--

>> fid=fopen('data_with_problem.txt');
>> nHdr=1;
>> while ~startsWith(fgetl(fid),'* '),nHdr=nHdr+1;end
>> nHdr
nHdr =
    25
>> fid=fclose(fid);

to illustrate it returns the value you want/need...

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

sermet OGUTCU el 16 de Oct. de 2021

Dear @dpb, thank you very much for the answer.

Iniciar sesión para comentar.

different number of delimiters error using readtable function

2 comentarios
Mostrar NingunoOcultar Ninguno

Respuesta aceptada

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Más respuestas (0)

Ver también

Categorías

Etiquetas

Community Treasure Hunt

different number of delimiters error using readtable function

2 comentarios Mostrar NingunoOcultar Ninguno

Respuesta aceptada

1 comentario Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Más respuestas (0)

Ver también

Categorías

Etiquetas

Community Treasure Hunt

2 comentarios
Mostrar NingunoOcultar Ninguno

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos