Loading in Data from large files

18 visualizaciones (últimos 30 días)
Mitch Lautigar
Mitch Lautigar el 19 de Mayo de 2022
Editada: dpb el 23 de Mayo de 2022
Hello everyone,
Going to try and explain what i'm trying to do below and I welcome any suggestions the community can provide.
  • Problem: I am trying to read in a large (>10GB) binary file and parse specific data. I can already parse the data, but I run out of RAM on MATLAB when parsing such large amounts of data.
  • Current Logic: I have been using memmap to load in the data, and it worked up until I started having to deal with large file sizes. I am aware that memmap can skip a specific amount of memory and start at a later point, but i need to load a specific amount of bits at a time. I'm trying to avoid using fread since it takes so long.
  • Goal: I am looking to parse through these binary files in smaller sections by using a command or some programmed logic to read in a small percentage of the binary file at a time, run the computations, grab what I need and store it elsewhere, then grab the next small percentage of the binary file and repeat. I've included some detailed notes below for my intent to try and help; but this is a project I am not allowed to share code too (copyright).
Procedure for what I want to do:
  1. Load in the header info of the file. this is static info that is easy to load and I can do this pretty well already.
  2. Load in a percentage of the file. For this specific logic, let's assume my file is 10 GB and I want to read it in in 500 MB sections.
  3. Parse the 500 MB I read in. I'm aware that I may only read 492 MB, in which case I need to make sure I read in 508 MB the next time.
  4. Store the parsed data in a structure
  5. clear used variables to reuse for next section of code logged.
  6. repeat
I hope this helps. I'll try to keep an eye on this post moving forward, but it might take me some time to respond as i'll be traveling.
  11 comentarios
Walter Roberson
Walter Roberson el 23 de Mayo de 2022
Where I am, precipitation is running 2 to 3 times normal, and farmers are (overall) happy because this is helping refill the aquifiers after years of borderline drought.
dpb
dpb el 23 de Mayo de 2022
We're in that big D4 area that stretches from KS all down E NM and W TX.

Iniciar sesión para comentar.

Respuestas (0)

Categorías

Más información sobre Calendar en Help Center y File Exchange.

Etiquetas

Productos


Versión

R2018b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by