Borrar filtros
Borrar filtros

Extract data from HTML file stored in C drive of Laptop

6 visualizaciones (últimos 30 días)
Pranav Balasaheb Mohite
Pranav Balasaheb Mohite el 4 de Ag. de 2022
Respondida: Saffan el 30 de Ag. de 2023
Hello Everyone,
I want to extract data from local HTML file stored in C drive of laptop.
Can anyonw guide me how can I extract the data from the HTML file and further converting the data into array of char and using it ahead.
the file format is HTML and link is something like - file:///C:/Users/Pranav/OneDrive/Desktop/.....................................
commands that I have already used - 1) str=fileread('xxxxxxxxxxxxxxxxx.html') ---> data=extractHTMLString (str)
but it is giving output data as a 1 X 1000000 range where each letter is considered.
I am looking forward to some quality advices
Thanks in advance!
  1 comentario
Walter Roberson
Walter Roberson el 6 de Sept. de 2022
Are you using extractHTMLText ?
As an experiment, what happens if you fileread() the file directly and process that?
You have two separate issues:
  1. Making sure that the text can be pulled out of a url;
  2. processing text
Reading the file without url will allow you to test out the processing part separately from reading from the url.
To test reading from the url you could fileread() from the url and fileread() from the local file without url, and compare the two.

Iniciar sesión para comentar.

Respuestas (1)

Saffan
Saffan el 30 de Ag. de 2023
Hi,
To accomplish this, you can modify your code to add an additional step of creating an HTMLTtree using the “htmlTree” method. This method parses the HTML code in the string and returns the resulting tree structure. You can then extract the text from the HTMLtree as shown in the following code snippet:
% Read the HTML file
htmlContent = fileread(filePath);
% Create an HTML tree from the content
tree = htmlTree(htmlContent);
% Extract the text from the HTML tree
data = extractHTMLText(tree);
Refer to this for more information:

Categorías

Más información sobre String Parsing en Help Center y File Exchange.

Etiquetas

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by