textAnalytics toolbox: removing Entity details from documents

Question

david cowan el 18 de Nov. de 2023

0
Enlazar

Enlace directo a esta pregunta

https://la.mathworks.com/matlabcentral/answers/2048942-textanalytics-toolbox-removing-entity-details-from-documents

Movida: Cris LaPierre el 19 de Nov. de 2023

Respuesta aceptada: Cris LaPierre

I have a very large set of documents that I am preprocessing to use in a bert classification model.

I have tokenized the documents and added the entity details.

Now I want to remove all of the tokenswith in the documents that have been "tagged as" orginisation.

I have the following variables:

documents: tokenized documents

tdetails: a table of tokens with the document number, sentence number, line number, Type, Language, PartOfSpeech and Entity.

Token

"Astoria" 1 2 3 'letters' 'en' 'proper-noun' 'person'

"Federal Savings Bank" 1 2 3 'other' 'en' 'proper-noun' 'organization'

"settled" 1 2 3 'letters' 'en' 'verb' 'non-entity'

How do I remove all of the tokens in the variable documents based on the entity=organisation

eg in documents(1,1).Vocabulary(7) I can find "Federal Savings Bank" which is in row 7 of the example above. I coudl loop through all of the documents and tdetails==organisation but that woudl take quite while

cant seem to figure out how to do this more simply

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Cris LaPierre el 18 de Nov. de 2023

2
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/2048942-textanalytics-toolbox-removing-entity-details-from-documents#answer_1355632

Abrir en MATLAB Online

I would use removeWords.

documents = tokenizedDocument(Text(:));
tdetails = tokenDetails(documents) ;
documents2 = removeWords(documents,tdetails{tdetails.Entity=="organisation"}); 

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

david cowan el 19 de Nov. de 2023

Movida: Cris LaPierre el 19 de Nov. de 2023

Really appreciate that.

removeWords !!

I'll not forget that now - I knew there had to be a simple approach I was just missing

Iniciar sesión para comentar.

textAnalytics toolbox: removing Entity details from documents

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Más respuestas (0)

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

textAnalytics toolbox: removing Entity details from documents

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

1 comentario Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Más respuestas (0)

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos