How to download multiple files from a website

30 visualizaciones (últimos 30 días)

Chad Greene el 21 de Nov. de 2023

0
Enlazar

Enlace directo a esta pregunta

https://la.mathworks.com/matlabcentral/answers/2050267-how-to-download-multiple-files-from-a-website

Comentada: Dyuman Joshi el 22 de Nov. de 2023

This question has been asked many times in various ways on this forum, but I've never found a simple answer to this very simple question:

How do I download all of the .nc files listed here? https://www.ngdc.noaa.gov/thredds/catalog/global/ETOPO2022/15s/15s_surface_elev_netcdf/catalog.html

It seems like there should be a two-line solution along the lines of :

url_list = get_urls('https://www.ngdc.noaa.gov/thredds/catalog/global/ETOPO2022/15s/15s_surface_elev_netcdf/catalog.html','extension','.nc'); 
websave(url_list)

if get_urls were a function and websave were as easy to use as entering a list of file urls to download and having it save them in the current directory.

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

Chad Greene el 21 de Nov. de 2023

Wow, thank you @Dyuman Joshi!

Dyuman Joshi el 22 de Nov. de 2023

You are welcome!

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Respuesta aceptada

Voss el 21 de Nov. de 2023

0
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/2050267-how-to-download-multiple-files-from-a-website#answer_1357512

Abrir en MATLAB Online

url = 'https://www.ngdc.noaa.gov/thredds/catalog/global/ETOPO2022/15s/15s_surface_elev_netcdf/catalog.html';
% webread() the main page and parse out the links to .nc files:
data = webread(url);
C = regexp(data,'<a href=".*?(\?[^"]*.nc)">','tokens');
temp_urls = strcat(url,vertcat(C{:}));
% webread() each linked url:
data = cell(size(temp_urls));
for ii = 1:numel(temp_urls)
    data{ii} = webread(temp_urls{ii});
end
% get the download link in each of those pages:
C = regexp(data,'<a href="([^"]*)">\s*<b>HTTPServer','tokens','once');
% append them to the (sub-)domain of the main URL to get the actual URLs 
% for downloading the .nc files:
idx = find(url == '/',3);
nc_urls = strcat(url(1:idx(end)-1),vertcat(C{:}));
% construct file names to save to locally:
[~,filenames,ext] = fileparts(nc_urls);
filenames = strcat(filenames,ext);
% download all the files:
for ii = 1:numel(nc_urls)
    websave(filenames{ii},nc_urls{ii});
end

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

Voss el 21 de Nov. de 2023

You're welcome!

Each link on the main page goes to a distinct intermediate page which contains the link to download the actual .nc file.

The first webread/regexp gets the set of urls to those intermediate pages. Then webread each of those intermediate pages in a loop, and regexp all the contents to get the download urls (which is the url immediately preceding 'HTTPServer' on each intermediate page - there are several other urls on those pages, and that was the only way I could think of to be sure to get the right one).

Chad Greene el 22 de Nov. de 2023

Ooh, okay, that makes a lot of sense. Thanks @Voss!

Iniciar sesión para comentar.

Más respuestas (0)

Iniciar sesión para responder a esta pregunta.

Categorías

MATLAB Installation and Licensing Downloads

Más información sobre Downloads en Help Center y File Exchange.

Productos

MATLAB

Versión

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by

How to download multiple files from a website

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

Respuesta aceptada

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

Más respuestas (0)

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

How to download multiple files from a website

3 comentarios Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

Respuesta aceptada

3 comentarios Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

Más respuestas (0)

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo