Reading in a document full of code

I'm building a program to read in some code for Accelerator physics and I really do not know how to start reading the files into MATLAB. I need to make the lines starting with "!" as a comment or for the case of reading it into MATLAB, just ignore any line starting with "!", so it will not show up in MATLAB. All of the names starting with "&", like "&NEWRUN", starts the info that I need. I can handle anything in between, but separating the sections that start with "&NAME" and ending with "/", and there are a few of them, I'm not sure exactly. Any help I can get to make this work well will be very welcomed, even any comments or advice will be great! Thank you!
Here is the document in a .in file format:
! header
!
&NEWRUN
Version=3
Head= 'IAC 44 MeV LINAC'
RUN=1,
Distribution = 'IAC_LASER_1p5ps70mmRISE5000.ini', Xoff=0.0, Yoff=0.0
! Qbunch=1.00
XYrms=0.6900
! Trms=4.0e-3
TRACK_ALL=T, PHASE_SCAN=F, AUTO_PHASE=.T
Lmonitor=.F
check_ref_part=.F
H_max=0.001
H_min=0.000
! MAX_STEP=5000000
debunch=0.0
/
&SCAN
LScan= T,
Scan_para='MaxB(1)',
! Fine Scan
S_min= 0.100, S_max=0.200, S_numb=11
!Scan_para='XYrms',
!rms beam size
!S_min=0.6555, S_max=0.7245, S_numb=11
!Scan_para='MaxB(2)',
! Rough Scan
! S_min=0.0, S_max=0.01, S_numb=17
!Scan_para='MaxB(3)',
! Rough Scan
! S_min=0.02, S_max=0.1, S_numb=12
!Scan_para='S_pos(1)',
!Gun Solenoid Position Scanning
!S_min=0.00, S_max=0.1, S_numb=21
!Scan_para='C_pos(2)',
!1st Cavity Position Scanning
!S_min=2.00, S_max=3.00, S_numb=21
!Scan_para='MaxE(1)' ,
!1st Cavity Gradient
!S_min=38.00, S_max=42.00, S_numb=11,
!Scan_para='Phi(1)' ,
!Gun Phase
!S_min=-2.1147, S_max=-1.9133, S_numb=11,
!Scan_para='MaxE(2)' ,
!1st Cavity Gradient
!S_min=9.975, S_max=11.025, S_numb=11,
!Scan_para='Phi(2)' ,
!1st Linac Cavity Phase
!S_min=-1.00, S_max=1.00, S_numb=16,
!Scan_para='MaxE(3)' ,
!S_min=22.85, S_max=52.85, S_numb=31,
FOM(1)='rms bunch length',
FOM(2)='horizontal rms emittance',
FOM(3)='vertical rms emittance',
FOM(4)='longitudinal rms emittance',
FOM(5)='horizontal rms spot size',
FOM(6)='vertical rms spot size',
FOM(7)='bunch charge',
FOM(8)='mean beam energy',
FOM(9)='horizontal rms emittance minus Z correlation',
FOM(10)='horizontal rms beam divergence'
/
! &ERROR
! /
&CHARGE
Loop=F,
LSPCH=.T,
Nrad=15, Nlong_in=20
Cell_var=2.0
min_grid=0.0
Max_scale=0.05
Lmirror=.T
! N_min=100
/
&FEM
/
&CAVITY
LEfield=.T
! 1.5 cells RF Gun with a symmetric coupler
File_Efield(1) = 'ttf2rfgun.dat',
Nue(1)=1.300, MaxE(1)=40.00, Phi(1)=-2.115, C_pos(1)=0.000000,
! The 1st 2.00 m long L-band Linac Structure with 24 cells
FILE_EFIELD(2) = 'TWS_IAC_Lband_ASTRA.dat'
Nue(2)=1.300, MaxE(2)=10.40, Phi(2)=0.8667, C_pos(2)=2.35, C_Numb(2)=24
! The 2nd 2.75 m long L-band Linac Structure with 33 cells
! Distance between 1st and 2nd L-band Linacs = 0.65 m
FILE_EFIELD(3) = 'TWS_IAC_Lband_ASTRA.dat'
Nue(3)=1.300, MaxE(3)=15, Phi(3)=0.0, C_pos(3)=5.15, C_Numb(3)=33
/
&SOLENOID
LBfield=.T,
! Gun Main Solenoid
File_Bfield(1)='TTF2solenoids.dat', MaxB(1)=0.1585,
S_pos(1)=0.0E-3, S_xoff(1)=0.0, S_yoff(1)=0.0, S_Smooth(1)=150
! 1st Solenoids in the 1st L-band Linac Structure
! distance between center of solenoid and 1st Linac head = 0.13795 m
! distance between center of solenoids = 0.35 m
! -> 2.35 m + 0.13795 m = 2.48795 m
! File_Bfield(2)='IAC_SOLENOID_ASTRA.dat', MaxB(2)=0.04
File_Bfield(2)='IAC_SOLENOID_ASTRA.dat', MaxB(2)=0.005
S_pos(2)=2.48795, S_xoff(2)=0.0, S_yoff(2)=0.0, S_Smooth(2)=150
! 2nd Solenoid = 2.48795 m + 0.35 m = 2.83795 m
File_Bfield(3)='IAC_SOLENOID_ASTRA.dat', MaxB(3)=0.005,
S_pos(3)=2.83795, S_xoff(3)=0.0, S_yoff(3)=0.0, S_Smooth(3)=150
! 3rd Solenoid = 2.83795 m + 0.35 m = 3.18795 m
File_Bfield(4)='IAC_SOLENOID_ASTRA.dat', MaxB(4)=0.005,
S_pos(4)=3.18795, S_xoff(4)=0.0, S_yoff(4)=0.0, S_Smooth(4)=150
! 4th Solenoid = 3.18795 m + 0.35 m = 3.53795 m
File_Bfield(5)='IAC_SOLENOID_ASTRA.dat', MaxB(5)=0.09,
S_pos(5)=3.53795, S_xoff(5)=0.0, S_yoff(5)=0.0, S_Smooth(5)=150
! 5th Solenoid = 3.53795 m + 0.35 m = 3.88795 m
File_Bfield(6)='IAC_SOLENOID_ASTRA.dat', MaxB(6)=0.09,
S_pos(6)=3.88795, S_xoff(6)=0.0, S_yoff(6)=0.0, S_Smooth(6)=150
! 6th Solenoid = 3.88795 m + 0.35 m = 4.23795 m
File_Bfield(7)='IAC_SOLENOID_ASTRA.dat', MaxB(7)=0.09,
S_pos(7)=4.23795, S_xoff(7)=0.0, S_yoff(7)=0.0, S_Smooth(7)=150
! 1st - 8th Solenoids in the 2nd L-band Linac Structure
! distance between center of solenoid and 1st Linac head = 0.13795 m
! distance between center of solenoids = 0.35 m
! -> 5.15 m + 0.13795 m = 5.28795 m
File_Bfield(8)='IAC_SOLENOID_ASTRA.dat', MaxB(8)=0.10,
S_pos(8)=5.28795, S_xoff(8)=0.0, S_yoff(8)=0.0, S_Smooth(8)=150
! 2nd Solenoid = 5.28795 m + 0.35 m = 5.63795 m
File_Bfield(9)='IAC_SOLENOID_ASTRA.dat', MaxB(9)=0.10,
S_pos(9)=5.63795, S_xoff(9)=0.0, S_yoff(9)=0.0, S_Smooth(9)=150
! 3rd Solenoid = 5.63795 m + 0.35 m = 5.98795 m
File_Bfield(10)='IAC_SOLENOID_ASTRA.dat', MaxB(10)=0.10,
S_pos(10)=5.98795, S_xoff(10)=0.0, S_yoff(10)=0.0, S_Smooth(10)=150
! 4th Solenoid = 5.98795 m + 0.35 m = 6.33795 m
File_Bfield(11)='IAC_SOLENOID_ASTRA.dat', MaxB(11)=0.10,
S_pos(11)=6.33795, S_xoff(11)=0.0, S_yoff(11)=0.0, S_Smooth(11)=150
! 5th Solenoid = 6.33795 m + 0.35 m = 6.68795 m
File_Bfield(12)='IAC_SOLENOID_ASTRA.dat', MaxB(12)=0.10,
S_pos(12)=6.68795, S_xoff(12)=0.0, S_yoff(12)=0.0, S_Smooth(12)=150
! 6th Solenoid = 6.68795 m + 0.35 m = 7.03795 m
File_Bfield(13)='IAC_SOLENOID_ASTRA.dat', MaxB(13)=0.10,
S_pos(13)=7.03795, S_xoff(13)=0.0, S_yoff(13)=0.0, S_Smooth(13)=150
! 7th Solenoid = 7.03795 m + 0.35 m = 7.38795 m
File_Bfield(14)='IAC_SOLENOID_ASTRA.dat', MaxB(14)=0.10,
S_pos(14)=7.38795, S_xoff(14)=0.0, S_yoff(14)=0.0, S_Smooth(14)=150
! 8th Solenoid = 7.38795 m + 0.35 m = 7.73795 m
File_Bfield(15)='IAC_SOLENOID_ASTRA.dat', MaxB(15)=0.10,
S_pos(15)=7.73795, S_xoff(15)=0.0, S_yoff(15)=0.0, S_Smooth(15)=150
/
&QUADRUPOLE
/

 Respuesta aceptada

Cedric
Cedric el 19 de Oct. de 2013
Editada: Cedric el 20 de Oct. de 2013
Simple answer below, and a more complete function in comment #3. Code is attached to the comment.
You can use a REGEXP to split blocks/sections, and then post-process each block of data:
content = fileread( 'myFile.in' ) ;
content = regexprep( content, '!.*?\r?\n', '' ) ;
blocks = regexp( content, '&(?<name>[^\n\r]+)(?<data>.+?)/', 'names' ) ;
This creates a struct array blocks with the following type of content:
>> blocks(1)
ans =
name: 'NEWRUN'
data: [1x266 char]
>> blocks(2)
ans =
name: 'SCAN'
data: [1x496 char]
where the field data stores the content of the section as a raw, unprocessed character array. Post-processing seems to be a bit section specific, but here is an example:
for bId = 1 : numel( blocks )
blocks(bId).parsed = regexp( blocks(bId).data, ...
'([^=\s,]+)\s*=\s*''?([^\s,\n\r'']+)', ...
'tokens' ) ;
end
which builds a new field named parsed for each block (you could overwrite blocks(bId).data with parsed data actually, to spare memory, but I kept both at this point for debugging). For example:
>> blocks(1)
ans =
name: 'NEWRUN'
data: [1x266 char]
parsed: {1x15 cell}
shows that 15 parameters were parsed (without commented lines) in block/section 1. Parsed parameters are stored in pairs name/value: ..
>> blocks(1).parsed{1}
ans =
'Version' '3'
>> blocks(1).parsed{4}
ans =
'Distribution' 'IAC_LASER_1p5ps70mmRISE5000.ini'
>> blocks(1).parsed{5}
ans =
'Xoff' '0.0'
This approach holds in ~6 lines of code, which is interesting for such a complex file. However, you have to be familiar enough with regular expressions for fine tuning the mechanism.
The second approach is simply to read the file line by line and build whatever structure or cell array of parameters you need. For example:
sections = {} ;
sId = 0 ;
fid = fopen( 'myFile.in', 'r' ) ;
while ~feof( fid )
line = strtrim( fgetl( fid )) ;
if isempty( line ), continue ; end
if line(1) == '!', continue ; end
if line(1) == '&'
sId = sId + 1 ;
sections{sId}.name = line(2:end) ;
sections{sId}.data = {} ;
elseif line(1) ~= '/'
sections{sId}.data = [sections{sId}.data; {line}] ;
end
end
fclose( fid ) ;
This is simple, but there would still be some post processing..
>> sections{1}
ans =
name: 'NEWRUN'
data: {11x1 cell}
>> sections{1}.data
ans =
'Version=3'
'Head= 'IAC 44 MeV LINAC''
'RUN=1,'
'Distribution = 'IAC_LASER_1p5ps70mmRISE5000.ini', Xoff=0.0, Yoff=0.0'
'XYrms=0.6900'
'TRACK_ALL=T, PHASE_SCAN=F, AUTO_PHASE=.T'
'Lmonitor=.F'
'check_ref_part=.F'
'H_max=0.001'
'H_min=0.000'
'debunch=0.0'

8 comentarios

Cedric
Cedric el 19 de Oct. de 2013
Editada: Cedric el 19 de Oct. de 2013
If you copied the code between Friday night and Saturday, noon EST, please read the new version and update your code accordingly, because I just performed a significant update (I made regexp patterns more specific, which filters parameters/values much better).
Chris E.
Chris E. el 20 de Oct. de 2013
Thank you for the help and for the reply! I will look into the code and make it work for the code I'm using. Thanks again!
I'm not sure what you mean exactly about updating my code in the second email, maybe I misunderstood what you were referring too...
Cedric
Cedric el 20 de Oct. de 2013
Editada: Cedric el 20 de Oct. de 2013
I just meant that if you updated your code based on my answer between Friday and Saturday noon, you should re-update it with the modifications that I brought on Saturday.
I spend 10 more minutes making a more complete and commented example, and wrapping it into a function (the M-File is attached to this comment.. not sure whether the attachment is working, but if not just copy/paste the code below into a new M-File named importInFile.m):
function sections = importInFile( fileLocator )
sections = struct() ;
% Read file content.
try
content = fileread( fileLocator ) ;
catch ME
return ;
end
% First pass parser -> blocks.
content = regexprep( content, '!.*?\r?\n', '' ) ;
blocks = regexp( content, '&(?<name>[^\n\r]+)(?<data>.+?)/', 'names' ) ;
% Post process blocks -> sections.
for bId = 1 : numel( blocks )
% Second pass parser -> split parameters/values.
parsed = regexp( blocks(bId).data, ...
'([^=\s,]+)\s*=\s*''?([^\s,\n\r'']+)', 'tokens' ) ;
% Iterate through parameters and parse them further.
S = struct() ;
for pId = 1 : numel( parsed )
% Try conversion to numeric for value.
value = str2double( parsed{pId}{2} ) ;
if isnan( value ), value = parsed{pId}{2} ; end
% Parse parameter name -> {name, id} if possible (arrays).
tokens = regexp( parsed{pId}{1}, '([^\(]+)\((\d+)\)', 'tokens' ) ;
if isempty( tokens )
% Not an array -> store .(name) = value.
S.(parsed{pId}{1}) = value ;
else
% Array -> convert ID to numeric.
vId = str2double(tokens{1}{2}) ;
if ischar( value )
% Non-numeric value -> cell array .(name){vId} = value.
S.(tokens{1}{1}){vId} = value ;
else
% Numeric value -> num array .(name)(vId) = value.
S.(tokens{1}{1})(vId) = value ;
end
end
end
sections.(blocks(bId).name) = S ;
end
end
It is basically what you already saw, this 6 lines solution, with an extra step for parsing parameters names and values and building arrays when relevant. I also changed the output from the struct array blocks to a more basic struct sections. So now you can use it as follows:
>> sections = importInFile( 'myFile.in' )
sections =
NEWRUN: [1x1 struct]
SCAN: [1x1 struct]
CHARGE: [1x1 struct]
FEM: [1x1 struct]
CAVITY: [1x1 struct]
SOLENOID: [1x1 struct]
QUADRUPOLE: [1x1 struct]
>> sections.NEWRUN
ans =
Version: 3
Head: 'IAC'
RUN: 1
Distribution: 'IAC_LASER_1p5ps70mmRISE5000.ini'
Xoff: 0
Yoff: 0
XYrms: 0.6900
TRACK_ALL: 'T'
PHASE_SCAN: 'F'
AUTO_PHASE: '.T'
Lmonitor: '.F'
check_ref_part: '.F'
H_max: 1.0000e-03
H_min: 0
debunch: 0
>> sections.NEWRUN.Version
ans =
3
>> sections.SCAN
ans =
LScan: 'T'
Scan_para: 'MaxB(1)'
S_min: 0.1000
S_max: 0.2000
S_numb: 11
FOM: {1x10 cell}
>> sections.SCAN.FOM
ans =
'rms' 'horizontal' 'vertical' 'longitudinal' 'horizontal'
'vertical' 'bunch' 'mean' 'horizontal' 'horizontal'
As you can see, it is now easier to wander through the structure of parameters, and numeric values have been already converted.
Note that I give no guarantee that this code is working correctly, so the evaluation part is on your side. For example, there might be special cases to manage better than what I did, where e.g. the first value of an array is not defined in a section. My code sets them to 0, which might not be appropriate. To illustrate:
>> sections.CAVITY
ans =
LEfield: '.T'
File_Efield: {'ttf2rfgun.dat'}
Nue: [1.3000 1.3000 1.3000]
MaxE: [40 10.4000 15]
Phi: [-2.1150 0.8667 0]
C_pos: [0 2.3500 5.1500]
FILE_EFIELD: {[] 'TWS_IAC_Lband_ASTRA.dat' 'TWS_IAC_Lband_ASTRA.dat'}
C_Numb: [0 24 33]
Here you can see that C_Numb(1) is 0, whereas it is not defined in your .in file.
Cheers,
Cedric
Chris E.
Chris E. el 20 de Oct. de 2013
I understand, I did check into the code so far and it is exactly what I wanted to do. I will also look into the new code you did too. I would like to really thank you for your help! I do appreciate your time in answering my qustion and help making it work. Thanks!
Cedric
Cedric el 20 de Oct. de 2013
Editada: Cedric el 20 de Oct. de 2013
You're welcome! Let me know if you need help with regular expressions in the code; they aren't completely intuitive I guess.
PS: I kept the regexp with named tokens in the function (because it's what was used in my first example) but it is not necessary anymore; a simpler "non-named tokens" based approach would be enough.
Cedric
Cedric el 21 de Oct. de 2013
If you look at section CAVITY in the content that you posted, you will see that a parameter appears ones in all caps and twice is mixed upper/lower case: FILE_EFIELD and File_Efield. If you look at sections.CAVITY above, you see that, as MATLAB is case sensitive, this leads to two fields:
File_Efield: {'ttf2rfgun.dat'}
FILE_EFIELD: {[] 'TWS_IAC_Lband_ASTRA.dat' 'TWS_IAC_Lband_ASTRA.dat'}
with an empty cell in the second. One way to overcome these case issues is to convert all field names to lower. For this you need to update the following line:
S.(parsed{pId}{1}) = value ;
into
S.(lower(parsed{pId}{1})) = value ;
and
S.(tokens{1}{1}){vId} = value ;
into
S.(lower(tokens{1}{1})){vId} = value ;
and
S.(tokens{1}{1})(vId) = value ;
into
S.(lower(tokens{1}{1}))(vId) = value ;
and finally
sections.(blocks(bId).name) = S ;
into
sections.(lower(blocks(bId).name)) = S ;
Chris E.
Chris E. el 22 de Oct. de 2013
Thank you for looking out for me that way!!! I did see that mistake, I fixed it already. Thank you again, you have been really helpful!
Cedric
Cedric el 22 de Oct. de 2013
You're welcome!

Iniciar sesión para comentar.

Más respuestas (0)

Categorías

Más información sobre Convert Image Type en Centro de ayuda y File Exchange.

Etiquetas

Preguntada:

el 19 de Oct. de 2013

Comentada:

el 22 de Oct. de 2013

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by