error in reading special characters

i am reading a text using ocr but in that i have some special characters ,my ocr doest not recognise that character,can u suggest any idea

Respuestas (1)

Walter Roberson
Walter Roberson el 21 de Oct. de 2011

0 votos

Either you have a bug in your OCR routine, or your OCR routine is not powerful enough to handle those characters (perhaps you have not trained it enough on those characters?)
In past postings you have described this as a problem with Notepad, as if your OCR is working correctly but writing to Notepad gets you something different. That possibility seems a bit unlikely. Have you tracked the decimal (or hex) value of the characters your OCR routine claims to find, and then in a different program tried writing exactly that pattern to Notepad to see what happens? Is your OCR routine coming up with characters that need more than 8 bits to store internally? For that matter, coming up with characters greater than 127 so that character sets and fonts start being an issue?

12 comentarios

FIR
FIR el 22 de Oct. de 2011
both the post relate each other,special character is written as 0..can u send some link where i can find those,am trying for some time but could not get the desired result
Walter Roberson
Walter Roberson el 22 de Oct. de 2011
"find those" is not specific enough for me to understand what you are asking.
FIR
FIR el 22 de Oct. de 2011
find those refers where those cods will be available
Walter Roberson
Walter Roberson el 22 de Oct. de 2011
"refers"? I don't understand that. References, perhaps?
"cods" I am guessing is "codes".
I am not sure what you are asking, but possibly you are asking for a reference to which characters are capable of being represented within the MATLAB "char" data type. If so, you can see a pretty complete list at http://www.ssec.wisc.edu/~tomw/java/unicode.html
Everything you see in that chart can be stored as a single MATLAB "char", *including* even the sets such as Tibetan, Cherokee, and Arabic.
However! You need the appropriate font(s) to read most of those characters. And you need two bytes to represent anything beyond Basic Latin if you are using the UTF-8 encoding scheme, and you need three or more bytes to represent anything beyond the Thaana alphabet if you are using the UTF-8 encoding scheme or are using one of the UTF-16 encoding schemes.
Generally speaking, if you are writing out characters beyond Basic Latin (0-127) that you expect another program to be able to read, then you need to take care to encode the data properly as you write it, which might require calls to unicode2native() or native2unicode(). For a discussion of that process, see http://www.mathworks.com/matlabcentral/answers/6347-arabic-document
Still, based upon the examples you had posted earlier, the impression people would have gotten was that your OCR was recognizing and emitting characters that were purely in the Basic Latin subset (0-127), and saving such characters normally takes no special treatment at all. There do, though, appear to be some applications (including some Microsoft applications) that seem to assume that everything is encoded in one of the UTF-16 encodings (and sometimes don't even include a Byte Order Mark!): if you ran into one of those rogue programs then you would certainly need to take special care to arrange your input according to the non-standard requirements of that particular program.
FIR
FIR el 22 de Oct. de 2011
thanks walter for giving suggestions for my posts
i said i have a score card in earlier posts,i find ocr difficult in recognising ,is there any method like text detection or any other method to read that score card exactly,please help,i ma struck in this for nearly 20 days
FIR
FIR el 22 de Oct. de 2011
I am asking can u provide me with some code
Walter Roberson
Walter Roberson el 22 de Oct. de 2011
Could you post the link to the Answers question in which you show your existing OCR code? And perhaps also to the database of training images that you used?
FIR
FIR el 22 de Oct. de 2011
http://www.speedyshare.com/files/30862208/ocr_new.rar
Walter Roberson
Walter Roberson el 22 de Oct. de 2011
My system cannot handle .rar files. Please post the link to the Question in which you posted your existing code. You did post it already, right? You haven't been asking us how to debug code that we have never seen before, right?
Walter Roberson
Walter Roberson el 22 de Oct. de 2011
Also, I don't care to download anything from speedyshare. I am not going to buy one of their accounts just to retrieve files other people have posted, and people without a paid account are forced to wait more than 5 minutes before downloading each file. A link to your existing code in a Question would be much more efficient.
FIR
FIR el 22 de Oct. de 2011
% PRINCIPAL PROGRAM
warning off %#ok<WNOFF>
% Clear all
clc, close all, clear all
% Read image
imagen=imread('image.jpg');
% Show image
imshow(imagen);
title('INPUT IMAGE WITH NOISE'
% Convert to gray scale
if size(imagen,3)==3 %RGB image
imagen=rgb2gray(imagen);
end
% Convert to BW
threshold = graythresh(imagen);
imagen =~im2bw(imagen,threshold);
% Remove all object containing fewer than 30 pixels
imagen = bwareaopen(imagen,30);
%Storage matrix word from image
%FIGURE,IM
word=[ ];
re=imagen;
%Opens text.txt as file for write
fid = fopen('text.txt', 'wt');
% Load templates
load templates
global templates
% Compute the number of letters in template file
num_letras=size(templates,2);
while 1
%Fcn 'lines' separate lines in text
[fl re]=lines(re);
imgn=fl;
%Uncomment line below to see lines one by one
figure,imshow(fl);pause(0.5)
%-----------------------------------------------------------------
% Label and count connected components
[L Ne] = bwlabel(imgn);
for n=1:Ne
[r,c] = find(L==n);
% Extract letter
n1=imgn(min(r):max(r),min(c):max(c));
% Resize letter (same size of template)
img_r=imresize(n1,[42 24]);
%imshow(img_r);pause(0.5)
%-------------------------------------------------------------------
% Call fcn to convert image to text
letter=read_letter(img_r,num_letras);
% Letter concatenation
word=[word letter];
end
%fprintf(fid,'%s\n',lower(word));%Write 'word' in text file
(lower)
fprintf(fid,'%s\n',word);%Write 'word' in text file (upper)
% Clear 'word' variable
word=[ ];
if isempty(re) %See variable 're' in Fcn 'lines'
break
end
end
fclose(fid);
%Open 'text.txt' file
winopen('text.txt')
sample image is http://imgur.com/Ajmsi
FIR
FIR el 22 de Oct. de 2011
http://imgur.com/YHYFD and for this image also in need to extract score

Iniciar sesión para comentar.

Categorías

Más información sobre Characters and Strings en Centro de ayuda y File Exchange.

Etiquetas

Preguntada:

FIR
el 21 de Oct. de 2011

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by