#233416After installing tesseract-lang, tesseract only will work after reinstall
Issue Details
brew gist-logs <formula>
link OR brew config
AND brew doctor
output
HOMEBREW_VERSION: 4.6.3 ORIGIN: https://github.com/Homebrew/brew HEAD: a0d01bc7c410bdb55794f4858c29e9c79e0e485c Last commit: 2 days ago Branch: stable Core tap JSON: 13 Aug 22:12 UTC HOMEBREW_PREFIX: /home/linuxbrew/.linuxbrew HOMEBREW_CASK_OPTS: [] HOMEBREW_DISPLAY: :0 HOMEBREW_EDITOR: /usr/bin/nano HOMEBREW_MAKE_JOBS: 20 SUDO_ASKPASS: /usr/libexec/openssh/gnome-ssh-askpass Homebrew Ruby: 3.4.5 => /var/home/linuxbrew/.linuxbrew/Homebrew/Library/Homebrew/vendor/portable-ruby/3.4.5/bin/ruby CPU: 20-core 64-bit alderlake Clang: N/A Git: 2.50.1 => /bin/git Curl: 8.9.1 => /bin/curl Kernel: Linux 6.14.11-200.fc41.x86_64 x86_64 GNU/Linux OS: Bluefin (Version: gts-41.20250810 / FROM Fedora Silverblue 41) (Deinonychus) Host glibc: 2.40 /usr/bin/gcc: 14.3.1 /usr/bin/ruby: N/A glibc: N/A gcc@11: N/A gcc: 15.1.0 xorg: N/A
Verification
- My
brew doctor
output saysYour system is ready to brew.
and am still able to reproduce my issue. - I ran
brew update
and am still able to reproduce my issue. - I have resolved all warnings from
brew doctor
and that did not fix my problem. - I searched for recent similar issues at https://github.com/Homebrew/homebrew-core/issues?q=is%3Aissue and found no duplicates.
- My issue is not about a failure to build a formula from source.
What were you trying to do (and why)?
I installed ocrmypdf and its two main dependencies, i.e., tesseract and tesseract-lang. However, after installing tesseract-lang, besides the original languages that come with tesseract (such as eng) disappear, it failed to work. I was able to work it around by reinstalling tesseract.
What happened (include all command output)?
By running "tesseract --list-langs" you will get this output: ❯ tesseract --list-langs List of available languages in "/home/linuxbrew/.linuxbrew/share/tessdata/" (160): afr amh ara asm aze aze_cyrl bel ben bod bos bre bul cat ceb ces chi_sim chi_sim_vert chi_tra chi_tra_vert chr cos cym dan deu div dzo ell enm epo equ est eus fao fas fil fin fra frk frm fry gla gle glg grc guj hat heb hin hrv hun hye iku ind isl ita ita_old jav jpn jpn_vert kan kat kat_old kaz khm kir kmr kor kor_vert lao lat lav lit ltz mal mar mkd mlt mon mri msa mya nep nld nor oci ori pan pol por pus que ron rus san script/Arabic script/Armenian script/Bengali script/Canadian_Aboriginal script/Cherokee script/Cyrillic script/Devanagari script/Ethiopic script/Fraktur script/Georgian script/Greek script/Gujarati script/Gurmukhi script/HanS script/HanS_vert script/HanT script/HanT_vert script/Hangul script/Hangul_vert script/Hebrew script/Japanese script/Japanese_vert script/Kannada script/Khmer script/Lao script/Latin script/Malayalam script/Myanmar script/Oriya script/Sinhala script/Syriac script/Tamil script/Telugu script/Thaana script/Thai script/Tibetan script/Vietnamese sin slk slv snd spa spa_old sqi srp srp_latn sun swa swe syr tam tat tel tgk tha tir ton tur uig ukr urd uzb uzb_cyrl vie yid yor
Also, it will fail when trying to ocr a pdf:
❯ ocrmypdf -l por Summa\ Theologica\ -\ Wikipedia.pdf Summa\ Theologica\ -\ Wikipedia-ocr.pdf --force-ocr
Scanning contents ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 27/27 0:00:00
Start processing 20 pages concurrently ocr.py:96
1 page already has text! - rasterizing text and running OCR _pipeline.py:331
anyway
2 page already has text! - rasterizing text and running OCR _pipeline.py:331
anyway
3 page already has text! - rasterizing text and running OCR _pipeline.py:331
anyway
4 page already has text! - rasterizing text and running OCR _pipeline.py:331
anyway
5 page already has text! - rasterizing text and running OCR _pipeline.py:331
anyway
6 page already has text! - rasterizing text and running OCR _pipeline.py:331
anyway
7 page already has text! - rasterizing text and running OCR _pipeline.py:331
anyway
8 page already has text! - rasterizing text and running OCR _pipeline.py:331
anyway
9 page already has text! - rasterizing text and running OCR _pipeline.py:331
anyway
10 page already has text! - rasterizing text and running OCR _pipeline.py:331
anyway
11 page already has text! - rasterizing text and running OCR _pipeline.py:331
anyway
12 page already has text! - rasterizing text and running OCR _pipeline.py:331
anyway
13 page already has text! - rasterizing text and running OCR _pipeline.py:331
anyway
14 page already has text! - rasterizing text and running OCR _pipeline.py:331
anyway
15 page already has text! - rasterizing text and running OCR _pipeline.py:331
anyway
16 page already has text! - rasterizing text and running OCR _pipeline.py:331
anyway
17 page already has text! - rasterizing text and running OCR _pipeline.py:331
anyway
18 page already has text! - rasterizing text and running OCR _pipeline.py:331
anyway
19 page already has text! - rasterizing text and running OCR _pipeline.py:331
anyway
20 page already has text! - rasterizing text and running OCR _pipeline.py:331
anyway
5 [tesseract] read_params_file: Can't open hocr tesseract.py:257
5 [tesseract] read_params_file: Can't open txt tesseract.py:257
21 page already has text! - rasterizing text and running OCR anyway _pipeline.py:331
4 [tesseract] read_params_file: Can't open hocr tesseract.py:257
4 [tesseract] read_params_file: Can't open txt tesseract.py:257
3 [tesseract] read_params_file: Can't open hocr tesseract.py:257
3 [tesseract] read_params_file: Can't open txt tesseract.py:257
6 [tesseract] read_params_file: Can't open hocr tesseract.py:257
6 [tesseract] read_params_file: Can't open txt tesseract.py:257
9 [tesseract] read_params_file: Can't open hocr tesseract.py:257
9 [tesseract] read_params_file: Can't open txt tesseract.py:257
14 [tesseract] read_params_file: Can't open hocr tesseract.py:257
14 [tesseract] read_params_file: Can't open txt tesseract.py:257
18 [tesseract] read_params_file: Can't open hocr tesseract.py:257
18 [tesseract] read_params_file: Can't open txt tesseract.py:257
20 [tesseract] read_params_file: Can't open hocr tesseract.py:257
20 [tesseract] read_params_file: Can't open txt tesseract.py:257
8 [tesseract] read_params_file: Can't open hocr tesseract.py:257
8 [tesseract] read_params_file: Can't open txt tesseract.py:257
19 [tesseract] read_params_file: Can't open hocr tesseract.py:257
19 [tesseract] read_params_file: Can't open txt tesseract.py:257
1 [tesseract] read_params_file: Can't open hocr tesseract.py:257
1 [tesseract] read_params_file: Can't open txt tesseract.py:257
17 [tesseract] read_params_file: Can't open hocr tesseract.py:257
17 [tesseract] read_params_file: Can't open txt tesseract.py:257
7 [tesseract] read_params_file: Can't open hocr tesseract.py:257
7 [tesseract] read_params_file: Can't open txt tesseract.py:257
11 [tesseract] read_params_file: Can't open hocr tesseract.py:257
11 [tesseract] read_params_file: Can't open txt tesseract.py:257
10 [tesseract] read_params_file: Can't open hocr tesseract.py:257
10 [tesseract] read_params_file: Can't open txt tesseract.py:257
15 [tesseract] read_params_file: Can't open hocr tesseract.py:257
15 [tesseract] read_params_file: Can't open txt tesseract.py:257
12 [tesseract] read_params_file: Can't open hocr tesseract.py:257
12 [tesseract] read_params_file: Can't open txt tesseract.py:257
16 [tesseract] read_params_file: Can't open hocr tesseract.py:257
16 [tesseract] read_params_file: Can't open txt tesseract.py:257
13 [tesseract] read_params_file: Can't open hocr tesseract.py:257
13 [tesseract] read_params_file: Can't open txt tesseract.py:257
2 [tesseract] read_params_file: Can't open hocr tesseract.py:257
2 [tesseract] read_params_file: Can't open txt tesseract.py:257
2 [tesseract] lots of diacritics - possibly poor OCR tesseract.py:241
21 [tesseract] read_params_file: Can't open hocr tesseract.py:257
21 [tesseract] read_params_file: Can't open txt tesseract.py:257
OCR ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% 0/27 -:--:--
An exception occurred while executing the pipeline _common.py:296
Traceback (most recent call last):
File
"/home/linuxbrew/.linuxbrew/Cellar/ocrmypdf/16.10.4/libexec/lib/python3.13/site-packages/ocrmypdf/_pipelines/_common.py", line
261, in cli_exception_handler
return fn(options, plugin_manager)
File "/home/linuxbrew/.linuxbrew/Cellar/ocrmypdf/16.10.4/libexec/lib/python3.13/site-packages/ocrmypdf/_pipelines/ocr.py",
line 181, in _run_pipeline
optimize_messages = exec_concurrent(context, executor)
File "/home/linuxbrew/.linuxbrew/Cellar/ocrmypdf/16.10.4/libexec/lib/python3.13/site-packages/ocrmypdf/_pipelines/ocr.py",
line 117, in exec_concurrent
executor(
~~~~~~~~^
use_threads=options.use_threads,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...<10 lines>...
task_finished=update_page,
^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/home/linuxbrew/.linuxbrew/Cellar/ocrmypdf/16.10.4/libexec/lib/python3.13/site-packages/ocrmypdf/_concurrent.py", line
78, in call
self._execute(
~~~~~~~~~~~~~^
use_threads=use_threads,
^^^^^^^^^^^^^^^^^^^^^^^^
...<5 lines>...
task_finished=task_finished,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File
"/home/linuxbrew/.linuxbrew/Cellar/ocrmypdf/16.10.4/libexec/lib/python3.13/site-packages/ocrmypdf/builtin_plugins/concurrency.p
y", line 144, in _execute
result = future.result()
File "/home/linuxbrew/.linuxbrew/opt/python@3.13/lib/python3.13/concurrent/futures/_base.py", line 449, in result
return self.__get_result()
~~~~~~~~~~~~~~~~~^^
File "/home/linuxbrew/.linuxbrew/opt/python@3.13/lib/python3.13/concurrent/futures/_base.py", line 401, in __get_result
raise self._exception
File "/home/linuxbrew/.linuxbrew/opt/python@3.13/lib/python3.13/concurrent/futures/thread.py", line 59, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/linuxbrew/.linuxbrew/Cellar/ocrmypdf/16.10.4/libexec/lib/python3.13/site-packages/ocrmypdf/_pipelines/ocr.py",
line 81, in _exec_page_sync
ocr_out, text_out = _image_to_ocr_text(page_context, ocr_image_out)
~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/linuxbrew/.linuxbrew/Cellar/ocrmypdf/16.10.4/libexec/lib/python3.13/site-packages/ocrmypdf/_pipelines/ocr.py",
line 63, in _image_to_ocr_text
ocr_out = render_hocr_page(hocr_out, page_context)
File "/home/linuxbrew/.linuxbrew/Cellar/ocrmypdf/16.10.4/libexec/lib/python3.13/site-packages/ocrmypdf/_pipeline.py", line
774, in render_hocr_page
if hocr.stat().st_size == 0:
~~~~~~~~~^^
File "/home/linuxbrew/.linuxbrew/opt/python@3.13/lib/python3.13/pathlib/_local.py", line 515, in stat
return os.stat(self, follow_symlinks=follow_symlinks)
~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/ocrmypdf.io.3qq18bqv/000005_ocr_hocr.hocr'
What did you expect to happen?
After reinstalling tesseract, you will get the correct output, which in included eng:
❯ tesseract --list-langs List of available languages in "/home/linuxbrew/.linuxbrew/share/tessdata/" (163): afr amh ara asm aze aze_cyrl bel ben bod bos bre bul cat ceb ces chi_sim chi_sim_vert chi_tra chi_tra_vert chr cos cym dan deu div dzo ell eng enm epo equ est eus fao fas fil fin fra frk frm fry gla gle glg grc guj hat heb hin hrv hun hye iku ind isl ita ita_old jav jpn jpn_vert kan kat kat_old kaz khm kir kmr kor kor_vert lao lat lav lit ltz mal mar mkd mlt mon mri msa mya nep nld nor oci ori osd pan pol por pus que ron rus san script/Arabic script/Armenian script/Bengali script/Canadian_Aboriginal script/Cherokee script/Cyrillic script/Devanagari script/Ethiopic script/Fraktur script/Georgian script/Greek script/Gujarati script/Gurmukhi script/HanS script/HanS_vert script/HanT script/HanT_vert script/Hangul script/Hangul_vert script/Hebrew script/Japanese script/Japanese_vert script/Kannada script/Khmer script/Lao script/Latin script/Malayalam script/Myanmar script/Oriya script/Sinhala script/Syriac script/Tamil script/Telugu script/Thaana script/Thai script/Tibetan script/Vietnamese sin slk slv snd snum spa spa_old sqi srp srp_latn sun swa swe syr tam tat tel tgk tha tir ton tur uig ukr urd uzb uzb_cyrl vie yid yor
Also, creating ocr on a pdf now works:
❯ ocrmypdf -l por Summa\ Theologica\ -\ Wikipedia.pdf Summa\ Theologica\ -\ Wikipedia-ocr.pdf --force-ocr
Scanning contents ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 27/27 0:00:00
Start processing 20 pages concurrently ocr.py:96
1 page already has text! - rasterizing text and running OCR _pipeline.py:331
anyway
2 page already has text! - rasterizing text and running OCR _pipeline.py:331
anyway
3 page already has text! - rasterizing text and running OCR _pipeline.py:331
anyway
4 page already has text! - rasterizing text and running OCR _pipeline.py:331
anyway
5 page already has text! - rasterizing text and running OCR _pipeline.py:331
anyway
6 page already has text! - rasterizing text and running OCR _pipeline.py:331
anyway
7 page already has text! - rasterizing text and running OCR _pipeline.py:331
anyway
8 page already has text! - rasterizing text and running OCR _pipeline.py:331
anyway
9 page already has text! - rasterizing text and running OCR _pipeline.py:331
anyway
10 page already has text! - rasterizing text and running OCR _pipeline.py:331
anyway
11 page already has text! - rasterizing text and running OCR _pipeline.py:331
anyway
12 page already has text! - rasterizing text and running OCR _pipeline.py:331
anyway
13 page already has text! - rasterizing text and running OCR _pipeline.py:331
anyway
14 page already has text! - rasterizing text and running OCR _pipeline.py:331
anyway
15 page already has text! - rasterizing text and running OCR _pipeline.py:331
anyway
16 page already has text! - rasterizing text and running OCR _pipeline.py:331
anyway
17 page already has text! - rasterizing text and running OCR _pipeline.py:331
anyway
18 page already has text! - rasterizing text and running OCR _pipeline.py:331
anyway
19 page already has text! - rasterizing text and running OCR _pipeline.py:331
anyway
20 page already has text! - rasterizing text and running OCR _pipeline.py:331
anyway
21 page already has text! - rasterizing text and running OCR _pipeline.py:331
anyway
22 page already has text! - rasterizing text and running OCR _pipeline.py:331
anyway
23 page already has text! - rasterizing text and running OCR _pipeline.py:331
anyway
24 page already has text! - rasterizing text and running OCR _pipeline.py:331
anyway
25 page already has text! - rasterizing text and running OCR _pipeline.py:331
anyway
26 page already has text! - rasterizing text and running OCR _pipeline.py:331
anyway
2 [tesseract] lots of diacritics - possibly poor OCR tesseract.py:241
27 page already has text! - rasterizing text and running OCR _pipeline.py:331
anyway
OCR ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 27/27 0:00:00
Postprocessing... ocr.py:144
PDF/A conversion ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 27/27 0:00:00
Linearizing ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 100/100 0:00:00
Recompressing JPEGs ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% 0/0 -:--:--
Deflating JPEGs ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% 0/0 -:--:--
JBIG2 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% 0/0 -:--:--
Image optimization did not improve the file - optimizations will not be optimize.py:735
used
Image optimization ratio: 1.00 savings: -0.1% _pipeline.py:1002
Total file size ratio: 0.09 savings: -1059.6% _pipeline.py:1005
Output file is a PDF/A-2B (as expected) _common.py:474
The output file size is 11.41× larger than the input file. _validation.py:358
Possible reasons for this include:
--force-ocr was issued, causing transcoding.
PDF/A conversion was enabled. (Try --output-type pdf
.)
Step-by-step reproduction instructions (by running brew
commands)
1. To install: brew install ocrmypdf tesseract-lang. 2. The workaround, after ocrmypdf and tesseract-lang installation: brew reinstall tesseract