Tesseract Engine

Create an OCR engine for a given language and control parameters. This can be used by the ocr and ocr_data functions to recognize text.

Usage

tesseract(
  language = "eng",
  datapath = NULL,
  configs = NULL,
  options = NULL,
  cache = TRUE
)

tesseract_params(filter = "")

tesseract_info()

Arguments

language: string with language for training data. Usually defaults to eng
datapath: path with the training data for this language. Default uses the system library.
configs: character vector with files, each containing one or more parameter values. These config files can exist in the current directory or one of the standard tesseract config files that live in the tessdata directory. See details.
options: a named list with tesseract parameters. See details.
cache: speed things up by caching engines
filter: only list parameters containing a particular string

Value

no return value, called for side effects

list with information about the tesseract engine

Details

Tesseract control parameters can be set either via a named list in the options parameter, or in a config file text file which contains the parameter name followed by a space and then the value, one per line. Use tesseract_params() to list or find parameters. Note that that some parameters are only supported in certain versions of libtesseract, and that invalid parameters can sometimes cause libtesseract to crash.

Examples

tesseract_params("smooth")
#> # A tibble: 4 × 3
#>   param                           default desc                                  
#> * <chr>                           <chr>   <chr>                                 
#> 1 textord_skewsmooth_offset       4       For smooth factor                     
#> 2 textord_skewsmooth_offset2      1       For smooth factor                     
#> 3 textord_wordstats_smooth_factor 0.05    Smoothing gap stats                   
#> 4 thresholding_smooth_kernel_size 0       Size of convolution kernel applied to…

Usage

Arguments

Value

Details

See also

Examples