Error using matlab.int​ernal.webs​ervices.HT​TPConnecto​r/copyCont​entToByteA​rray (line 396) The server returned the status 429 with message "Too Many Requests" in response to the request to URL.

21 visualizaciones (últimos 30 días)
I am writing a script that will take my protein sequence of interest and find matches to it using NCBI blast. I identify the hits and then try and get the sequence for each (in code this is 'j=1:.....'). This works fine, but when I get to number 1226, I get the error below. I have tried a couple of things:
  • Pausing longer between iterations of my loop did not help
  • Manually doing this particular one in the command window returned the same error, though if I go to the NCBI website and find the protein, there's no problem.
Any advice would be greatly appreciated!!
THE ERROR:
Error using matlab.internal.webservices.HTTPConnector/copyContentToByteArray
(line 396)
The server returned the status 429 with message "Too Many Requests" in response
to the request to URL
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=nucest&term=NXO81641%5BAccession%5D.
Error in readContentFromWebService (line 46)
byteArray = copyContentToByteArray(connection);
Error in webread (line 125)
[varargout{1:nargout}] = readContentFromWebService(connection, options);
Error in getncbidata>accession2gi (line 316)
searchXML = webread(searchurl);
Error in getncbidata (line 182)
[giID,db] = accession2gi(accessnum,db,'quick');
Error in getgenpept (line 64)
[varargout{1:nargout}] =
getncbidata(accessnum,'database','protein','fileformat','GenPept',varargin{:});
75 rethrow(e)
MY CODE:
%% Get a list of everything that aligns with TPC2 (and must be listed as TPC2)
POI = 'MAEPQAESEPLLGGARGGGGDWPAGLTTYRSIQVGPGAAARWDLCIDQAVVFIEDAIQYRSINHRVDASSMWLYRRYYSNVCQRTLSFTIFLILFLAFIETPSSLTSTADVRYRAAPWEPPCGLTESVEVLCLLVFAADLSVKGYLFGWAHFQKNLWLLGYLVVLVVSLVDWTVSLSLVCHEPLRIRRLLRPFFLLQNSSMMKKTLKCIRWSLPEMASVGLLLAIHLCLFTMFGMLLFAGGKQDDGQDRERLTYFQNLPESLTSLLVLLTTANNPDVMIPAYSKNRAYAIFFIVFTVIGSLFLMNLLTAIIYSQFRGYLMKSLQTSLFRRRLGTRAAFEVLSSMVGEGGAFPQAVGVKPQNLLQVLQKVQLDSSHKQAMMEKVRSYGSVLLSAEEFQKLFNELDRSVVKEHPPRPEYQSPFLQSAQFLFGHYYFDYLGNLIALANLVSICVFLVLDADVLPAERDDFILGILNCVFIVYYLLEMLLKVFALGLRGYLSYPSNVFDGLLTVVLLVLEISTLAVYRLPHPGWRPEMVGLLSLWDMTRMLNMLIVFRFLRIIPSMKLMAVVASTVLGLVQNMRAFGGILVVVYYVFAIIGINLFRGVIVALPGNSSLAPANGSAPCGSFEQLEYWANNFDDFAAALVTLWNLMVVNNWQVFLDAYRRYSGPWSKIYFVLWWLVSSVIWVNLFLALILENFLHKWDPRSHLQPLAGTPEATYQMTVELLFRDILEEPGEDELTERLSQHPHLWLCR'
%Blast the inputted sequence against NCBI protein. Return 5000 results.
[blastsend, waittime]=blastncbi(POI, 'blastp', 'MaxNumberSequences', 5000);
%Get a copy of the blast report.
tm = string(datetime('now'));
blastresultsfile=string('blastresults'+tm+'.xml');
getblast(blastsend, 'Wait', waittime, 'ToFile', blastresultsfile);
%%
%Read in the blast results.
BlastFile=blastread('blastresults.xml');
BlastFileTb=struct2table(BlastFile.Hits);
IdentityFile=strings(1);
%Get the Identity value!
for b=1:height(BlastFileTb)
IdentityFile(1,b)=((BlastFile.Hits(b).Hsps(1).Identities)/752)*100;
end
IdentityFile=IdentityFile';
%Identify the different elements of the blast report.
Definition=table(BlastFileTb.Definition);
Definition.Properties.VariableNames={'Definition'};
ID=table(BlastFileTb.ID);
ID.Properties.VariableNames={'ID'};
Accession=table(BlastFileTb.Accession);
Accession.Properties.VariableNames={'Accession'};
Identity=table(IdentityFile);
Identity.Properties.VariableNames={'Identity'};
ExtractedBlast=[ID, Definition, Accession, Identity];
%%
TPC2_list1=ExtractedBlast(contains(ExtractedBlast.Definition, 'protein 2'),:);
TPC2_list2=ExtractedBlast(contains(ExtractedBlast.Definition, 'TPC2'),:);
TPC2_totallist=[TPC2_list1;TPC2_list2];
specieslist=strings(1);
seqlist=strings(1);
%%
% Now we have a list of all the TPC2 on ncbi. I want to identify a) all the
% the speices wirth it and b) also be ab;e to check box the species I want
% to focus on.
for i=1:height(TPC2_totallist)
%Take the description of the target
thisentry=string(table2array(TPC2_totallist(i,2)));
%Find the name of the species by looking between square brackets
findingspecies=split(thisentry, '[');
findingspecies2=findingspecies(contains(findingspecies, ']'));
findingspecies3= split(findingspecies2(1), ']') ;
speciesname=findingspecies3(1);
%Make a list of all the species
specieslist(i,1)=speciesname;
%Remove any repeats from the species list.
unique_specieslist=unique(specieslist);
end
%
%%
%Add the species list to the table we've been working from
TPC2_amendedlist=[TPC2_totallist, array2table(specieslist)];
%%
%% **THIS IS WHERE I HAVE THE PROBLEM, WITH NUMBER 1226
for j=1:height(TPC2_amendedlist)
%For each entry, get the accession number
accno=string(table2array(TPC2_amendedlist(j,3)));
%Use the accession number to get the FASTA online. Put it in a list for
%now ('seqlist')
seq=getgenpept(accno, 'SequenceOnly', true) ;
seqlist(1,j)=seq;
%This step needs a pause ( 1 sec) because the server will ping too often if not
%and reject the request. An extra pause on every tenth iteration. 'j' indicates what iteration we're on.
j
pause(.5);
% if mod(j, 10)==0
% pause(10);
% end
end

Respuestas (1)

Tarunbir Gambhir
Tarunbir Gambhir el 27 de Oct. de 2020
I tried to execute your code and got a similar error stacktrace, after retrieving 331 sequences. The problem is as described in the error description, it is a restriction at the NCBI server and nothing wrong with your script.
For your case, I would suggest to try and break up the work into segments of 1225 sequences each and run the script again for each segment. I understand this is not a solution, but it might be a workaround to your problem.

Categorías

Más información sobre Data Import and Export en Help Center y File Exchange.

Productos

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by