public class Ocr extends Object
Click here to access the Asprise OCR developer's guide.
The OCR engine is capable of recognizing text in 20+ languages (English, Spanish, French, German, Italian, Hungarian, Finnish, Swedish, Romanian, Polish, Malay, Indonesian, and Russian) and 1-D/2-D barcode of most popular formats (EAN-8, EAN-13, UPC-A, UPC-E, ISBN-10, ISBN-13, Interleaved 2 of 5, Code 39, Code 128, PDF417, and QR Code.).
An instance of this class should be used by one thread at a time. For multi-treading, please create multiple instance of this class.
Ocr.setUp(); // one time setup Ocr ocr = new Ocr(); ocr.startEngine("eng", Ocr.SPEED_FASTEST); String s = ocr.recognize(new File[] {new File("test.jpg")}, Ocr.RECOGNIZE_TYPE_ALL, Ocr.OUTPUT_FORMAT_PLAINTEXT, 0, null); System.out.println("RESULT: " + s); // do more recognition here ... ocr.stopEngine();
Modifier and Type | Class and Description |
---|---|
static class |
Ocr.ImagePreProcessingType
Image pre-processing type
|
static class |
Ocr.PageType
Page type hint
|
static class |
Ocr.PropertyBuilder
Builder for configuring scan properties.
|
Modifier and Type | Field and Description |
---|---|
static String |
CONFIG_PROP_KEY_VALUE_SEPARATOR
Do not change unless you are told so.
|
static String |
CONFIG_PROP_SEPARATOR
Do not change unless you are told so.
|
static String |
LANGUAGE_DEU
deu (German)
|
static String |
LANGUAGE_ENG
eng (English)
|
static String |
LANGUAGE_FRA
fra (French)
|
static String |
LANGUAGE_POR
por (Portuguese)
|
static String |
LANGUAGE_SPA
spa (Spanish)
|
static String |
OUTPUT_FORMAT_PDF
Output recognition result as searchable PDF
|
static String |
OUTPUT_FORMAT_PLAINTEXT
Output recognition result as plain text
|
static String |
OUTPUT_FORMAT_RTF
Output to editable format RTF (can be edited in MS Word)
|
static String |
OUTPUT_FORMAT_XML
Output recognition result in XML format with additional information if coordination, confidence, runtime, etc.
|
static int |
PAGES_ALL
Recognize all pages.
|
static String |
PROP_DICT_DICT_IMPORTANCE
Percentage measuring the importance of the dictionary (0: not at all; 100: extremely important; default: 10)
|
static String |
PROP_IMG_PREPROCESS_CUSTOM_CMDS
Custom mage pre-processing command
|
static String |
PROP_IMG_PREPROCESS_TYPE
Image pre-processing type
|
static String |
PROP_IMG_PREPROCESS_TYPE_CUSTOM
Custom, need to set PROP_IMG_PREPROCESS_CUSTOM_CMDS
|
static String |
PROP_IMG_PREPROCESS_TYPE_DEFAULT
Use system default
|
static String |
PROP_IMG_PREPROCESS_TYPE_DEFAULT_WITH_ORIENTATION_DETECTION
Default + page orientation detection
|
static String |
PROP_INPUT_PDF_DPI
The DPI to be used to render the PDF file; default is 300 if not specified
|
static String |
PROP_LIMIT_TO_CHARSET
Recognizes only the specified list of characters.
|
static String |
PROP_OUTPUT_SEPARATE_WORDS
Set to 'true' to set the output level as word instead of the default, line.
|
static String |
PROP_PAGE_TYPE
Use this property to hint the OCR engine about page type.
|
static String |
PROP_PAGE_TYPE_AUTO |
static String |
PROP_PAGE_TYPE_SCATTERED |
static String |
PROP_PAGE_TYPE_SINGLE_BLOCK |
static String |
PROP_PAGE_TYPE_SINGLE_CHAR |
static String |
PROP_PAGE_TYPE_SINGLE_COLUMN |
static String |
PROP_PAGE_TYPE_SINGLE_LINE |
static String |
PROP_PAGE_TYPE_SINGLE_WORD |
static String |
PROP_PDF_OUTPUT_CONF_THRESHOLD
Valid value: 0 ~ 100 - text recognized below or above confidence will be highlighted in different colors.
|
static String |
PROP_PDF_OUTPUT_FILE
PDF output file - required for PDF output.
|
static String |
PROP_PDF_OUTPUT_FONT
Font to be used for PDF output.
|
static String |
PROP_PDF_OUTPUT_IMAGE_DPI
The DPI of the images or '0' to auto-detect.
|
static String |
PROP_PDF_OUTPUT_IMAGE_FORCE_BW
Convert images into black/white to reduce PDF output file size.
|
static String |
PROP_PDF_OUTPUT_RETURN_TEXT
Return text in 'text' or 'xml' format when the output format is set to PDF.
|
static String |
PROP_PDF_OUTPUT_RETURN_TEXT_FORMAT_PLAINTEXT |
static String |
PROP_PDF_OUTPUT_RETURN_TEXT_FORMAT_XML |
static String |
PROP_PDF_OUTPUT_TEXT_VISIBLE
Make text visible - for debugging and analysis purpose.
|
static String |
PROP_RTF_OUTPUT_FILE
RTF output file - required for RTF output.
|
static String |
PROP_RTF_OUTPUT_RETURN_TEXT
Return text in 'text' or 'xml' format when the output format is set to RTF.
|
static String |
PROP_RTF_OUTPUT_RETURN_TEXT_FORMAT_PLAINTEXT |
static String |
PROP_RTF_OUTPUT_RETURN_TEXT_FORMAT_XML |
static String |
PROP_RTF_PAPER_SIZE
default is LETTER, may set to A4.
|
static String |
PROP_SAVE_INTERMEDIATE_IMAGES_TO_DIR
Save intermediate images generated for debug purpose - don't specify or empty string to skip saving
|
static String |
PROP_TABLE_MIN_SIDE_LENGTH
default is 31 if not specified
|
static String |
PROP_TABLE_SKIP_DETECTION
table will be detected by default; set this property to true to skip detection.
|
static String |
RECOGNIZE_TYPE_ALL
Recognize both text and barcode
|
static String |
RECOGNIZE_TYPE_BARCODE
Recognize barcode
|
static String |
RECOGNIZE_TYPE_TEXT
Recognize text
|
static String |
SPEED_FAST
less speed, better accuracy
|
static String |
SPEED_FASTEST
Highest speed, accuracy may suffer - default option
|
static String |
SPEED_SLOW
lowest speed, best accuracy
|
static String |
START_PROP_DICT_CUSTOM_DICT_FILE
Path to your custom dictionary (words are separated using line breaks).
|
static String |
START_PROP_DICT_CUSTOM_TEMPLATES_FILE
Path to your custom templates (templates are separated using line breaks).
|
static String |
START_PROP_DICT_SKIP_BUILT_IN_ALL
set to 'true' to skip using all built-in dicts.
|
static String |
START_PROP_DICT_SKIP_BUILT_IN_DEFAULT
set to 'true' to skip using the default built in dict.
|
Constructor and Description |
---|
Ocr() |
Modifier and Type | Method and Description |
---|---|
static int |
getConsoleMode()
-1 unknow / 0 no / 1 yes
|
static String |
getLibraryVersion()
The library version.
|
static String |
getLibraryVersion(boolean verbose)
The library version.
|
boolean |
isEngineRunning()
Returns true only if the engine has been started and has not been stopped yet.
|
static boolean |
isSetupRequired()
Whether one-time setup is required.
|
static String[] |
listSupportedLanguages()
Returns all supported languages.
|
static void |
main(String[] args)
Displays the library version and optional performs OCR on the input file.
|
static String |
propsToString(Properties props) |
protected static Properties |
readProperties(Object[] propSpec) |
String |
recognize(File[] files,
String recognizeType,
String outputFormat,
Object... propSpec)
Performs text/barcode recognition on the given files with the specified output format.
|
String |
recognize(RenderedImage img,
String recognizeType,
String outputFormat,
Object... propSpec)
Performs text/barcode recognition on the given image with the specified output format.
|
String |
recognize(String files,
int pageIndex,
int startX,
int startY,
int width,
int height,
String recognizeType,
String outputFormat,
Object... propSpec)
Performs OCR on the given input files.
|
String |
recognize(URL[] sources,
String recognizeType,
String outputFormat,
Object... propSpec)
Performs text/barcode recognition on the given files with the specified output format.
|
static boolean |
saveAocrXslToDir(File dir,
boolean overwrite)
Saves aocr.xsl to the specified directory
|
static void |
setUp()
Performs one-time setup; does nothing if setup has already been done.
|
void |
startEngine(String lang,
String speed,
Object... startPropSpec)
Starts the OCR engine with optional properties (e.g., to specify dictionary/templates file)
|
void |
stopEngine()
Stops the OCR engine; does nothing if it has already been stopped.
|
public static final String SPEED_FASTEST
public static final String SPEED_FAST
public static final String SPEED_SLOW
public static final String RECOGNIZE_TYPE_TEXT
public static final String RECOGNIZE_TYPE_BARCODE
public static final String RECOGNIZE_TYPE_ALL
public static final String OUTPUT_FORMAT_PLAINTEXT
public static final String OUTPUT_FORMAT_XML
public static final String OUTPUT_FORMAT_PDF
public static final String OUTPUT_FORMAT_RTF
public static final String LANGUAGE_ENG
public static final String LANGUAGE_SPA
public static final String LANGUAGE_POR
public static final String LANGUAGE_DEU
public static final String LANGUAGE_FRA
public static final String START_PROP_DICT_SKIP_BUILT_IN_DEFAULT
public static final String START_PROP_DICT_SKIP_BUILT_IN_ALL
public static final String START_PROP_DICT_CUSTOM_DICT_FILE
public static final String START_PROP_DICT_CUSTOM_TEMPLATES_FILE
public static final String PROP_DICT_DICT_IMPORTANCE
public static final String PROP_PAGE_TYPE
public static final String PROP_PAGE_TYPE_AUTO
public static final String PROP_PAGE_TYPE_SINGLE_BLOCK
public static final String PROP_PAGE_TYPE_SINGLE_COLUMN
public static final String PROP_PAGE_TYPE_SINGLE_LINE
public static final String PROP_PAGE_TYPE_SINGLE_WORD
public static final String PROP_PAGE_TYPE_SINGLE_CHAR
public static final String PROP_PAGE_TYPE_SCATTERED
public static final String PROP_LIMIT_TO_CHARSET
public static final String PROP_OUTPUT_SEPARATE_WORDS
public static final String PROP_INPUT_PDF_DPI
public static final String PROP_IMG_PREPROCESS_TYPE
public static final String PROP_IMG_PREPROCESS_TYPE_DEFAULT
public static final String PROP_IMG_PREPROCESS_TYPE_DEFAULT_WITH_ORIENTATION_DETECTION
public static final String PROP_IMG_PREPROCESS_TYPE_CUSTOM
public static final String PROP_IMG_PREPROCESS_CUSTOM_CMDS
public static final String PROP_TABLE_SKIP_DETECTION
public static final String PROP_TABLE_MIN_SIDE_LENGTH
public static final String PROP_SAVE_INTERMEDIATE_IMAGES_TO_DIR
public static final String PROP_PDF_OUTPUT_FILE
public static final String PROP_PDF_OUTPUT_IMAGE_DPI
public static final String PROP_PDF_OUTPUT_FONT
public static final String PROP_PDF_OUTPUT_TEXT_VISIBLE
public static final String PROP_PDF_OUTPUT_IMAGE_FORCE_BW
public static final String PROP_PDF_OUTPUT_CONF_THRESHOLD
public static final String PROP_PDF_OUTPUT_RETURN_TEXT
public static final String PROP_PDF_OUTPUT_RETURN_TEXT_FORMAT_PLAINTEXT
public static final String PROP_PDF_OUTPUT_RETURN_TEXT_FORMAT_XML
public static final String PROP_RTF_OUTPUT_FILE
public static final String PROP_RTF_PAPER_SIZE
public static final String PROP_RTF_OUTPUT_RETURN_TEXT
public static final String PROP_RTF_OUTPUT_RETURN_TEXT_FORMAT_PLAINTEXT
public static final String PROP_RTF_OUTPUT_RETURN_TEXT_FORMAT_XML
public static String CONFIG_PROP_SEPARATOR
public static String CONFIG_PROP_KEY_VALUE_SEPARATOR
public static final int PAGES_ALL
public static String getLibraryVersion()
public static String getLibraryVersion(boolean verbose)
public static boolean isSetupRequired()
public static void setUp()
public static String[] listSupportedLanguages()
public void startEngine(String lang, String speed, Object... startPropSpec)
lang
- e.g., "eng" for Englishspeed
- valid values: SPEED_FASTEST, SPEED_FAST, SPEED_SLOW.startPropSpec
- optional start properties, can be a single Properties object or inline specification in pairs or a single string. Valid property names are defined in this class, etc.public void stopEngine()
public boolean isEngineRunning()
public String recognize(URL[] sources, String recognizeType, String outputFormat, Object... propSpec)
Supported file formats:
sources
- input image files - can be local files or files on remote serverrecognizeType
- valid values: RECOGNIZE_TYPE_TEXT, RECOGNIZE_TYPE_BARCODE or RECOGNIZE_TYPE_ALL.outputFormat
- valid values: OUTPUT_FORMAT_PLAINTEXT, OUTPUT_FORMAT_XML, OUTPUT_FORMAT_PDF or OUTPUT_FORMAT_RTF.propSpec
- additional properties, can be a single Properties object or inline specification in pairs or a single string. Valid property names are defined in this class, etc.nullif there is no input file.
public String recognize(File[] files, String recognizeType, String outputFormat, Object... propSpec)
Supported file formats:
files
- input image files - files must exist and file name can not contain ','recognizeType
- valid values: RECOGNIZE_TYPE_TEXT, RECOGNIZE_TYPE_BARCODE or RECOGNIZE_TYPE_ALL.outputFormat
- valid values: OUTPUT_FORMAT_PLAINTEXT, OUTPUT_FORMAT_XML, OUTPUT_FORMAT_PDF or OUTPUT_FORMAT_RTF.propSpec
- additional properties, can be a single Properties object or inline specification in pairs or a single string. Valid property names are defined in this class, etc.nullif there is no input file.
public String recognize(RenderedImage img, String recognizeType, String outputFormat, Object... propSpec)
img
- input imagerecognizeType
- valid values: RECOGNIZE_TYPE_TEXT, RECOGNIZE_TYPE_BARCODE or RECOGNIZE_TYPE_ALL.outputFormat
- valid values: OUTPUT_FORMAT_PLAINTEXT, OUTPUT_FORMAT_XML, OUTPUT_FORMAT_PDF or OUTPUT_FORMAT_RTF.propSpec
- additional properties, can be a single Properties object or inline specification in pairs or a single string. Valid property names are defined in this class, etc.nullif there is no input file.
public String recognize(String files, int pageIndex, int startX, int startY, int width, int height, String recognizeType, String outputFormat, Object... propSpec)
files
- comma ',' separated image file path (JPEG, BMP, PNG, TIFF)pageIndex
- -1 for all pages or the specified page (first page is 1) for multi-page image format like TIFFstartX
- -1 for whole page or the starting x coordinate of the specified regionstartY
- -1 for whole page or the starting y coordinate of the specified regionwidth
- -1 for whole page or the width of the specified regionheight
- -1 for whole page or the height of the specified regionrecognizeType
- valid values: RECOGNIZE_TYPE_TEXT, RECOGNIZE_TYPE_BARCODE or RECOGNIZE_TYPE_ALL.outputFormat
- valid values: OUTPUT_FORMAT_PLAINTEXT, OUTPUT_FORMAT_XML, OUTPUT_FORMAT_PDF or OUTPUT_FORMAT_RTFpropSpec
- additional properties, can be a single Properties object or inline specification in pairs or a single string. Valid property names are defined in this class, etc.protected static Properties readProperties(Object[] propSpec)
public static String propsToString(Properties props)
public static boolean saveAocrXslToDir(File dir, boolean overwrite)
public static int getConsoleMode()
public static void main(String[] args)
Usage: java -jar aocr.jar INPUT_FILE [text|xml|pdf]
args
- Copyright 2015 (C) Asprise. » Asprise OCR Home · Developer's Guide