Borrar filtros
Borrar filtros

Random amino acid sequence generation with a given amino acid count of a specified sequence

4 visualizaciones (últimos 30 días)
I have a sequence
>sp|Q7RTX1|TS1R1_HUMAN Taste receptor type 1 member 1 OS=Homo sapiens OX=9606 GN=TAS1R1 PE=2 SV=1
MLLCTARLVGLQLLISCCWAFACHSTESSPDFTLPGDYLLAGLFPLHSGCLQVRHRPEVT
LCDRSCSFNEHGYHLFQAMRLGVEEINNSTALLPNITLGYQLYDVCSDSANVYATLRVLS
LPGQHHIELQGDLLHYSPTVLAVIGPDSTNRAATTAALLSPFLVPMISYAASSETLSVKR
QYPSFLRTIPNDKYQVETMVLLLQKFGWTWISLVGSSDDYGQLGVQALENQATGQGICIA
FKDIMPFSAQVGDERMQCLMRHLAQAGATVVVVFSSRQLARVFFESVVLTNLTGKVWVAS
EAWALSRHITGVPGIQRIGMVLGVAIQKRAVPGLKAFEEAYARADKKAPRPCHKGSWCSS
NQLCRECQAFMAHTMPKLKAFSMSSAYNAYRAVYAVAHGLHQLLGCASGACSRGRVYPWQ
LLEQIHKVHFLLHKDTVAFNDNRDPLSSYNIIAWDWNGPKWTFTVLGSSTWSPVQLNINE
TKIQWHGKDNQVPKSVCSSDCLEGHQRVVTGFHHCCFECVPCGAGTFLNKSDLYRCQPCG
KEEWAPEGSQTCFPRTVVFLALREHTSWVLLAANTLLLLLLLGTAGLFAWHLDTPVVRSA
GGRLCFLMLGSLAAGSGSLYGFFGEPTRPACLLRQALFALGFTIFLSCLTVRSFQLIIIF
KFSTKVPTFYHAWVQNHGAGLFVMISSAAQLLICLTWLVVWTPLPAREYQRFPHLVMLEC
TETNSLGFILAFLYNGLLSISAFACSYLGKDLPENYNEAKCVTFSLLFNFVSWIAFFTTA
SVYDGKYLPAANMMAGLSSLSSGFGGYFLPKCYVILCRPDLNSTEHFQASIQDYTRRCGS
T
I wish to get a set of 10000 sequences (in fasta format) having identical amino acid counts as in the above sequence.
I could not use properly the randseq function in getting what I need.
Any help would be highly appreciated.

Respuesta aceptada

Tim DeFreitas
Tim DeFreitas el 8 de Jun. de 2022
Editada: Tim DeFreitas el 22 de Jun. de 2022
If you want exactly the same amino acid counts, then you want to randomly shuffle the input sequence, which can be done with randperm:
[~, sequences] = fastaread('pf00002.fa');
targetSeq = sequences{1}; % Select specific sequence from FA file.
randomSeqs = cell(1,10000);
for i=1:numel(randomSeqs)
randomSeqs{i} = targetSeq(:, randperm(numel(targetSeq)));
end

Más respuestas (1)

Sam Chak
Sam Chak el 8 de Jun. de 2022
I'm no expert in this, but it is possible to generate a sequence of numbers that associate with the Roman alphabet.
Here is a simple script that you can modify to generate a long sequence of characters like the PASTA spaghetti form.
function amino = generateAmino(n)
ASCII_L = 65;
ASCII_U = 90;
C = round((ASCII_U - ASCII_L).*rand(n, 1) + ASCII_L);
amino = char(C');
end
On the Command Window:
S1 = generateAmino(60)
S1 =
'TGNRWYODEGVGUGXJFGPMJVPOXHTTKOCBNTXDOMAIEUINEPHQRTLCGXEVNZCL'
You can then use a for loop to generate the number of sequences that you want.
numOfSeq = 10000;
for i = 1:numOfSeq
S(i,:) = generateAmino(60);
end
S
That's the basic idea. You may need to modify the script to select certain Roman alphabets in the ASCII chart.

Categorías

Más información sobre Protein and Amino Acid Sequence Analysis en Help Center y File Exchange.

Productos


Versión

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by