Read large dat file and get the necessary data

I have a rather large dat file (~1.5 Gb) which I import into Matlab. It contains a text and value as below.
{
"os": [
{
"utc": "2021-09-14 02:54:56",
"lat": 35.59538,
"lon": 129.574246,
"hdt": 295.9,
"rot": -2.1,
"sog": 1.0,
"cog": 335.5,
"rudder_order_stbd": null,
"rudder_order_port": null,
"rudder_stbd": 0.0,
"rudder_port": 0.0,
"rpm_stbd": 0.0,
"rpm_port": 0.0,
"stw_long": 0.87,
"stw_trans": "NaN",
"stw_long_stern": "NaN",
"stw_trans_stern": "NaN",
"stw_speed": null,
"wind_dir": 134.0,
"wind_speed": 5.5,
"current_dir": null,
"current_speed": null
},
{
"utc": "2021-09-14 02:54:58",
"lat": 35.595385,
"lon": 129.574233,
"hdt": 295.9,
"rot": -1.3,
"sog": 0.9,
"cog": 331.1,
"rudder_order_stbd": null,
"rudder_order_port": null,
"rudder_stbd": 0.0,
"rudder_port": 0.0,
"rpm_stbd": 0.0,
"rpm_port": 0.0,
"stw_long": 0.87,
"stw_trans": "NaN",
"stw_long_stern": "NaN",
"stw_trans_stern": "NaN",
"stw_speed": null,
"wind_dir": 141.0,
"wind_speed": 5.3,
and
"ts": [
[
{
"header": "VDM",
"msg_type": 1,
"mmsi": 440196110,
"navi_status": 0,
"time_stamp": 54,
"lat": 35.515383,
"lon": 129.386093,
"hdt": 2,
"rot_raw": 0,
"rot": "0",
"cog": 327.6,
"sog": 0.0
},
{
"header": "VDM",
"msg_type": 1,
"mmsi": 355924000,
"navi_status": 0,
"time_stamp": 56,
"lat": 35.345183,
"lon": 129.467416,
"hdt": 221,
"rot_raw": -127,
"rot": "-708",
"cog": 225.0,
"sog": 2.6
}
I want to export the value for each parameter as the matrix in .DAT file. But as you can guess, for a file this size it takes forever to run through. Is there a better way of accomplishing this and export the data?
Many thanks!

2 comentarios

Stephen23
Stephen23 el 16 de Feb. de 2022
@Diep Nguyen: please upload a representative data file by clicking the paperclip button.
A representative data file can be shortened, but must include sufficient data so that we can understand the file format.
Jan
Jan el 16 de Feb. de 2022
Yes, an 1.5 GB JSON file in text mode will take some time for reading.
It is not clear, what "export each matrix in .DAT" file means. Which matrices do you mean?

Iniciar sesión para comentar.

 Respuesta aceptada

Mathieu NOE
Mathieu NOE el 16 de Feb. de 2022
hello
I made this little wrapper for you
the amount of parameters you can export is up to you ()
here the full monty with all 22 data saved to excel (time axis is "row")
code :
clc
clearvars
filename = 'Data_22.txt';
parameters_length = 22; % do not exceed the max = 22
% data to retrieve (for info)
% "utc": "2021-09-14 02:56:02",
% "lat": 35.595596,
% "lon": 129.574031,
% "hdt": 294.6,
% "rot": -1.5,
% "sog": 0.9,
% "cog": 326.8,
% "rudder_order_stbd": null,
% "rudder_order_port": null,
% "rudder_stbd": 0.0,
% "rudder_port": 0.0,
% "rpm_stbd": 0.0,
% "rpm_port": 0.0,
% "stw_long": 0.79,
% "stw_trans": "NaN",
% "stw_long_stern": "NaN",
% "stw_trans_stern": "NaN",
% "stw_speed": null,
% "wind_dir": 90.0,
% "wind_speed": 3.1,
% "current_dir": null,
% "current_speed": null
% a = readlines(filename); % if you have readlines
a = my_readlines(filename); % work around for earlier matlab releases (not having readlines)
% first data (utc) indexes (as reference for further processing)
indexes = find(contains(a,'utc'));
out = [];
for ci = 1:length(indexes)
current_index = indexes(ci);
STR = split(a(current_index +(0:parameters_length-1))','":');
tmp = strrep(STR(:,2),',','');%get rid of commas
tmp = strrep(tmp,'"',''); %get rid of double quotes
out = [out tmp];%concatenation of all data
end
% concatenate labels (first column of STR) with all data (out)
tmp = strrep(STR(:,1),'"',''); %get rid of double quotes
out = [tmp out];
% now save to excel
writecell(out,'test.xlsx');
%%%%%%%%%%%%%%%%
function LINES = my_readlines(FILENAME)
% work around for earlier matlab releases (not having readlines)
LINES = regexp(fileread(FILENAME), '\r?\n', 'split');
if isempty(LINES{end}); LINES(end) = []; end %end of file correction
end

5 comentarios

hello
this is an improved code ; it will export the "os" section data in the first sheet of the excel output file, and the "ts" section data in the second sheet of the excel file
hope it helps
clc
clearvars
%% load file
filename = 'Data_22.txt';
% a = readlines(filename); % if you have readlines
a = my_readlines(filename); % work around for earlier matlab releases (not having readlines)
%% "os" data section
os_parameters_length = 22; % do not exceed the max = 22
% "utc": "2021-09-14 02:56:02",
% "lat": 35.595596,
% "lon": 129.574031,
% "hdt": 294.6,
% "rot": -1.5,
% "sog": 0.9,
% "cog": 326.8,
% "rudder_order_stbd": null,
% "rudder_order_port": null,
% "rudder_stbd": 0.0,
% "rudder_port": 0.0,
% "rpm_stbd": 0.0,
% "rpm_port": 0.0,
% "stw_long": 0.79,
% "stw_trans": "NaN",
% "stw_long_stern": "NaN",
% "stw_trans_stern": "NaN",
% "stw_speed": null,
% "wind_dir": 90.0,
% "wind_speed": 3.1,
% "current_dir": null,
% "current_speed": null
%% "ts" data section
ts_parameters_length = 12; % do not exceed the max = 12
% "header": "VDM",
% "msg_type": 1,
% "mmsi": 440196110,
% "navi_status": 0,
% "time_stamp": 54,
% "lat": 35.515383,
% "lon": 129.386093,
% "hdt": 2,
% "rot_raw": 0,
% "rot": "0",
% "cog": 327.6,
% "sog": 0.0
%% main loop
out_os = do_job(a,'utc',os_parameters_length);
out_ts = do_job(a,'header',ts_parameters_length);
% now save to excel
out_file = 'test.xlsx';
writecell(out_os,out_file,"Sheet",1);
writecell(out_ts,out_file,"Sheet",2);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%% sub functions section %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function out = do_job(a,ref,parameters_length)
% example : first data (utc) indexes (as reference for further processing)
indexes = find(contains(a,ref));
out = [];
for ci = 1:length(indexes)
current_index = indexes(ci);
STR = split(a(current_index +(0:parameters_length-1))','":');
tmp = strrep(STR(:,2),',','');%get rid of commas
tmp = strrep(tmp,'"',''); %get rid of double quotes
out = [out tmp];%concatenation of all data
end
% concatenate labels (first column of STR) with all data (out)
tmp = strrep(STR(:,1),',','');%get rid of commas
tmp = strrep(tmp,'"',''); %get rid of double quotes
tmp = strtrim(tmp); % Remove leading and trailing whitespace
out = [tmp out];
end
%%%%%%%%%%%%%%%%
function LINES = my_readlines(FILENAME)
% work around for earlier matlab releases (not having readlines)
LINES = regexp(fileread(FILENAME), '\r?\n', 'split');
if isempty(LINES{end}); LINES(end) = []; end %end of file correction
end
Diep Nguyen
Diep Nguyen el 21 de Feb. de 2022
Hello!!
Thank you so much for your help.
It helps me so much.
Mathieu NOE
Mathieu NOE el 1 de Mzo. de 2022
As always, my pleasure !
would you mind accepting my answer ? tx
Mathieu NOE
Mathieu NOE el 1 de Mzo. de 2022
hello
good news
when I say "accept" my anwer it means that on your side you "click" the "accept" button that should appear next to my answer
Mathieu NOE
Mathieu NOE el 1 de Mzo. de 2022
No problem :)

Iniciar sesión para comentar.

Más respuestas (0)

Categorías

Más información sobre Mathematics en Centro de ayuda y File Exchange.

Etiquetas

Preguntada:

el 16 de Feb. de 2022

Comentada:

el 1 de Mzo. de 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by