#233416After installing tesseract-lang, tesseract only will work after reinstall

homebrew-core

Homebrew

🍻 Default formulae for the missing package manager for macOS (or Linux)

Issue Details

2 months ago

No assignee

needs response

View on GitHub

pedrohqb

opened 2 months ago

Author

`brew gist-logs <formula>` link OR `brew config` AND `brew doctor` output

HOMEBREW_VERSION: 4.6.3
ORIGIN: https://github.com/Homebrew/brew
HEAD: a0d01bc7c410bdb55794f4858c29e9c79e0e485c
Last commit: 2 days ago
Branch: stable
Core tap JSON: 13 Aug 22:12 UTC
HOMEBREW_PREFIX: /home/linuxbrew/.linuxbrew
HOMEBREW_CASK_OPTS: []
HOMEBREW_DISPLAY: :0
HOMEBREW_EDITOR: /usr/bin/nano
HOMEBREW_MAKE_JOBS: 20
SUDO_ASKPASS: /usr/libexec/openssh/gnome-ssh-askpass
Homebrew Ruby: 3.4.5 => /var/home/linuxbrew/.linuxbrew/Homebrew/Library/Homebrew/vendor/portable-ruby/3.4.5/bin/ruby
CPU: 20-core 64-bit alderlake
Clang: N/A
Git: 2.50.1 => /bin/git
Curl: 8.9.1 => /bin/curl
Kernel: Linux 6.14.11-200.fc41.x86_64 x86_64 GNU/Linux
OS: Bluefin (Version: gts-41.20250810 / FROM Fedora Silverblue 41) (Deinonychus)
Host glibc: 2.40
/usr/bin/gcc: 14.3.1
/usr/bin/ruby: N/A
glibc: N/A
gcc@11: N/A
gcc: 15.1.0
xorg: N/A

Verification

My brew doctor output says Your system is ready to brew. and am still able to reproduce my issue.
I ran brew update and am still able to reproduce my issue.
I have resolved all warnings from brew doctor and that did not fix my problem.
I searched for recent similar issues at https://github.com/Homebrew/homebrew-core/issues?q=is%3Aissue and found no duplicates.
My issue is not about a failure to build a formula from source.

What were you trying to do (and why)?

I installed ocrmypdf and its two main dependencies, i.e., tesseract and tesseract-lang. However, after installing tesseract-lang, besides the original languages that come with tesseract (such as eng) disappear, it failed to work. I was able to work it around by reinstalling tesseract.

What happened (include all command output)?

By running "tesseract --list-langs" you will get this output: ❯ tesseract --list-langs List of available languages in "/home/linuxbrew/.linuxbrew/share/tessdata/" (160): afr amh ara asm aze aze_cyrl bel ben bod bos bre bul cat ceb ces chi_sim chi_sim_vert chi_tra chi_tra_vert chr cos cym dan deu div dzo ell enm epo equ est eus fao fas fil fin fra frk frm fry gla gle glg grc guj hat heb hin hrv hun hye iku ind isl ita ita_old jav jpn jpn_vert kan kat kat_old kaz khm kir kmr kor kor_vert lao lat lav lit ltz mal mar mkd mlt mon mri msa mya nep nld nor oci ori pan pol por pus que ron rus san script/Arabic script/Armenian script/Bengali script/Canadian_Aboriginal script/Cherokee script/Cyrillic script/Devanagari script/Ethiopic script/Fraktur script/Georgian script/Greek script/Gujarati script/Gurmukhi script/HanS script/HanS_vert script/HanT script/HanT_vert script/Hangul script/Hangul_vert script/Hebrew script/Japanese script/Japanese_vert script/Kannada script/Khmer script/Lao script/Latin script/Malayalam script/Myanmar script/Oriya script/Sinhala script/Syriac script/Tamil script/Telugu script/Thaana script/Thai script/Tibetan script/Vietnamese sin slk slv snd spa spa_old sqi srp srp_latn sun swa swe syr tam tat tel tgk tha tir ton tur uig ukr urd uzb uzb_cyrl vie yid yor

Also, it will fail when trying to ocr a pdf:

❯ ocrmypdf -l por Summa\ Theologica\ -\ Wikipedia.pdf Summa\ Theologica\ -\ Wikipedia-ocr.pdf --force-ocr Scanning contents ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 27/27 0:00:00 Start processing 20 pages concurrently ocr.py:96 1 page already has text! - rasterizing text and running OCR _pipeline.py:331 anyway
2 page already has text! - rasterizing text and running OCR _pipeline.py:331 anyway
3 page already has text! - rasterizing text and running OCR _pipeline.py:331 anyway
4 page already has text! - rasterizing text and running OCR _pipeline.py:331 anyway
5 page already has text! - rasterizing text and running OCR _pipeline.py:331 anyway
6 page already has text! - rasterizing text and running OCR _pipeline.py:331 anyway
7 page already has text! - rasterizing text and running OCR _pipeline.py:331 anyway
8 page already has text! - rasterizing text and running OCR _pipeline.py:331 anyway
9 page already has text! - rasterizing text and running OCR _pipeline.py:331 anyway
10 page already has text! - rasterizing text and running OCR _pipeline.py:331 anyway
11 page already has text! - rasterizing text and running OCR _pipeline.py:331 anyway
12 page already has text! - rasterizing text and running OCR _pipeline.py:331 anyway
13 page already has text! - rasterizing text and running OCR _pipeline.py:331 anyway
14 page already has text! - rasterizing text and running OCR _pipeline.py:331 anyway
15 page already has text! - rasterizing text and running OCR _pipeline.py:331 anyway
16 page already has text! - rasterizing text and running OCR _pipeline.py:331 anyway
17 page already has text! - rasterizing text and running OCR _pipeline.py:331 anyway
18 page already has text! - rasterizing text and running OCR _pipeline.py:331 anyway
19 page already has text! - rasterizing text and running OCR _pipeline.py:331 anyway
20 page already has text! - rasterizing text and running OCR _pipeline.py:331 anyway
5 [tesseract] read_params_file: Can't open hocr tesseract.py:257 5 [tesseract] read_params_file: Can't open txt tesseract.py:257 21 page already has text! - rasterizing text and running OCR anyway _pipeline.py:331 4 [tesseract] read_params_file: Can't open hocr tesseract.py:257 4 [tesseract] read_params_file: Can't open txt tesseract.py:257 3 [tesseract] read_params_file: Can't open hocr tesseract.py:257 3 [tesseract] read_params_file: Can't open txt tesseract.py:257 6 [tesseract] read_params_file: Can't open hocr tesseract.py:257 6 [tesseract] read_params_file: Can't open txt tesseract.py:257 9 [tesseract] read_params_file: Can't open hocr tesseract.py:257 9 [tesseract] read_params_file: Can't open txt tesseract.py:257 14 [tesseract] read_params_file: Can't open hocr tesseract.py:257 14 [tesseract] read_params_file: Can't open txt tesseract.py:257 18 [tesseract] read_params_file: Can't open hocr tesseract.py:257 18 [tesseract] read_params_file: Can't open txt tesseract.py:257 20 [tesseract] read_params_file: Can't open hocr tesseract.py:257 20 [tesseract] read_params_file: Can't open txt tesseract.py:257 8 [tesseract] read_params_file: Can't open hocr tesseract.py:257 8 [tesseract] read_params_file: Can't open txt tesseract.py:257 19 [tesseract] read_params_file: Can't open hocr tesseract.py:257 19 [tesseract] read_params_file: Can't open txt tesseract.py:257 1 [tesseract] read_params_file: Can't open hocr tesseract.py:257 1 [tesseract] read_params_file: Can't open txt tesseract.py:257 17 [tesseract] read_params_file: Can't open hocr tesseract.py:257 17 [tesseract] read_params_file: Can't open txt tesseract.py:257 7 [tesseract] read_params_file: Can't open hocr tesseract.py:257 7 [tesseract] read_params_file: Can't open txt tesseract.py:257 11 [tesseract] read_params_file: Can't open hocr tesseract.py:257 11 [tesseract] read_params_file: Can't open txt tesseract.py:257 10 [tesseract] read_params_file: Can't open hocr tesseract.py:257 10 [tesseract] read_params_file: Can't open txt tesseract.py:257 15 [tesseract] read_params_file: Can't open hocr tesseract.py:257 15 [tesseract] read_params_file: Can't open txt tesseract.py:257 12 [tesseract] read_params_file: Can't open hocr tesseract.py:257 12 [tesseract] read_params_file: Can't open txt tesseract.py:257 16 [tesseract] read_params_file: Can't open hocr tesseract.py:257 16 [tesseract] read_params_file: Can't open txt tesseract.py:257 13 [tesseract] read_params_file: Can't open hocr tesseract.py:257 13 [tesseract] read_params_file: Can't open txt tesseract.py:257 2 [tesseract] read_params_file: Can't open hocr tesseract.py:257 2 [tesseract] read_params_file: Can't open txt tesseract.py:257 2 [tesseract] lots of diacritics - possibly poor OCR tesseract.py:241 21 [tesseract] read_params_file: Can't open hocr tesseract.py:257 21 [tesseract] read_params_file: Can't open txt tesseract.py:257 OCR ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% 0/27 -:--:-- An exception occurred while executing the pipeline _common.py:296 Traceback (most recent call last):
File
"/home/linuxbrew/.linuxbrew/Cellar/ocrmypdf/16.10.4/libexec/lib/python3.13/site-packages/ocrmypdf/_pipelines/_common.py", line
261, in cli_exception_handler
return fn(options, plugin_manager)
File "/home/linuxbrew/.linuxbrew/Cellar/ocrmypdf/16.10.4/libexec/lib/python3.13/site-packages/ocrmypdf/_pipelines/ocr.py",
line 181, in _run_pipeline
optimize_messages = exec_concurrent(context, executor)
File "/home/linuxbrew/.linuxbrew/Cellar/ocrmypdf/16.10.4/libexec/lib/python3.13/site-packages/ocrmypdf/_pipelines/ocr.py",
line 117, in exec_concurrent
executor(
~~~~~~~~^
use_threads=options.use_threads,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...<10 lines>...
task_finished=update_page,
^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/home/linuxbrew/.linuxbrew/Cellar/ocrmypdf/16.10.4/libexec/lib/python3.13/site-packages/ocrmypdf/_concurrent.py", line
78, in call
self._execute(
~~~~~~~~~~~~~^
use_threads=use_threads,
^^^^^^^^^^^^^^^^^^^^^^^^
...<5 lines>...
task_finished=task_finished,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File
"/home/linuxbrew/.linuxbrew/Cellar/ocrmypdf/16.10.4/libexec/lib/python3.13/site-packages/ocrmypdf/builtin_plugins/concurrency.p
y", line 144, in _execute
result = future.result()
File "/home/linuxbrew/.linuxbrew/opt/python@3.13/lib/python3.13/concurrent/futures/_base.py", line 449, in result
return self.__get_result()
~~~~~~~~~~~~~~~~~^^
File "/home/linuxbrew/.linuxbrew/opt/python@3.13/lib/python3.13/concurrent/futures/_base.py", line 401, in __get_result
raise self._exception
File "/home/linuxbrew/.linuxbrew/opt/python@3.13/lib/python3.13/concurrent/futures/thread.py", line 59, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/linuxbrew/.linuxbrew/Cellar/ocrmypdf/16.10.4/libexec/lib/python3.13/site-packages/ocrmypdf/_pipelines/ocr.py",
line 81, in _exec_page_sync
ocr_out, text_out = _image_to_ocr_text(page_context, ocr_image_out)
~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/linuxbrew/.linuxbrew/Cellar/ocrmypdf/16.10.4/libexec/lib/python3.13/site-packages/ocrmypdf/_pipelines/ocr.py",
line 63, in _image_to_ocr_text
ocr_out = render_hocr_page(hocr_out, page_context)
File "/home/linuxbrew/.linuxbrew/Cellar/ocrmypdf/16.10.4/libexec/lib/python3.13/site-packages/ocrmypdf/_pipeline.py", line
774, in render_hocr_page
if hocr.stat().st_size == 0:
~~~~~~~~~^^
File "/home/linuxbrew/.linuxbrew/opt/python@3.13/lib/python3.13/pathlib/_local.py", line 515, in stat
return os.stat(self, follow_symlinks=follow_symlinks)
~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/ocrmypdf.io.3qq18bqv/000005_ocr_hocr.hocr'

What did you expect to happen?

After reinstalling tesseract, you will get the correct output, which in included eng:

❯ tesseract --list-langs List of available languages in "/home/linuxbrew/.linuxbrew/share/tessdata/" (163): afr amh ara asm aze aze_cyrl bel ben bod bos bre bul cat ceb ces chi_sim chi_sim_vert chi_tra chi_tra_vert chr cos cym dan deu div dzo ell eng enm epo equ est eus fao fas fil fin fra frk frm fry gla gle glg grc guj hat heb hin hrv hun hye iku ind isl ita ita_old jav jpn jpn_vert kan kat kat_old kaz khm kir kmr kor kor_vert lao lat lav lit ltz mal mar mkd mlt mon mri msa mya nep nld nor oci ori osd pan pol por pus que ron rus san script/Arabic script/Armenian script/Bengali script/Canadian_Aboriginal script/Cherokee script/Cyrillic script/Devanagari script/Ethiopic script/Fraktur script/Georgian script/Greek script/Gujarati script/Gurmukhi script/HanS script/HanS_vert script/HanT script/HanT_vert script/Hangul script/Hangul_vert script/Hebrew script/Japanese script/Japanese_vert script/Kannada script/Khmer script/Lao script/Latin script/Malayalam script/Myanmar script/Oriya script/Sinhala script/Syriac script/Tamil script/Telugu script/Thaana script/Thai script/Tibetan script/Vietnamese sin slk slv snd snum spa spa_old sqi srp srp_latn sun swa swe syr tam tat tel tgk tha tir ton tur uig ukr urd uzb uzb_cyrl vie yid yor

Also, creating ocr on a pdf now works:

Step-by-step reproduction instructions (by running `brew` commands)

1. To install: brew install ocrmypdf tesseract-lang.

2. The workaround, after ocrmypdf and tesseract-lang installation: brew reinstall tesseract

homebrew-core

Homebrew

🍻 Default formulae for the missing package manager for macOS (or Linux)

Issue Details

2 months ago

No assignee

needs response

View on GitHub

pedrohqb

opened 2 months ago

Author

`brew gist-logs <formula>` link OR `brew config` AND `brew doctor` output

HOMEBREW_VERSION: 4.6.3
ORIGIN: https://github.com/Homebrew/brew
HEAD: a0d01bc7c410bdb55794f4858c29e9c79e0e485c
Last commit: 2 days ago
Branch: stable
Core tap JSON: 13 Aug 22:12 UTC
HOMEBREW_PREFIX: /home/linuxbrew/.linuxbrew
HOMEBREW_CASK_OPTS: []
HOMEBREW_DISPLAY: :0
HOMEBREW_EDITOR: /usr/bin/nano
HOMEBREW_MAKE_JOBS: 20
SUDO_ASKPASS: /usr/libexec/openssh/gnome-ssh-askpass
Homebrew Ruby: 3.4.5 => /var/home/linuxbrew/.linuxbrew/Homebrew/Library/Homebrew/vendor/portable-ruby/3.4.5/bin/ruby
CPU: 20-core 64-bit alderlake
Clang: N/A
Git: 2.50.1 => /bin/git
Curl: 8.9.1 => /bin/curl
Kernel: Linux 6.14.11-200.fc41.x86_64 x86_64 GNU/Linux
OS: Bluefin (Version: gts-41.20250810 / FROM Fedora Silverblue 41) (Deinonychus)
Host glibc: 2.40
/usr/bin/gcc: 14.3.1
/usr/bin/ruby: N/A
glibc: N/A
gcc@11: N/A
gcc: 15.1.0
xorg: N/A

Verification

My brew doctor output says Your system is ready to brew. and am still able to reproduce my issue.
I ran brew update and am still able to reproduce my issue.
I have resolved all warnings from brew doctor and that did not fix my problem.
I searched for recent similar issues at https://github.com/Homebrew/homebrew-core/issues?q=is%3Aissue and found no duplicates.
My issue is not about a failure to build a formula from source.

What were you trying to do (and why)?

What happened (include all command output)?

Also, it will fail when trying to ocr a pdf:

What did you expect to happen?

After reinstalling tesseract, you will get the correct output, which in included eng:

Also, creating ocr on a pdf now works:

Step-by-step reproduction instructions (by running `brew` commands)

1. To install: brew install ocrmypdf tesseract-lang.

2. The workaround, after ocrmypdf and tesseract-lang installation: brew reinstall tesseract

#233416After installing tesseract-lang, tesseract only will work after reinstall

Issue Details

brew gist-logs <formula> link OR brew config AND brew doctor output

Verification

What were you trying to do (and why)?

What happened (include all command output)?

What did you expect to happen?

Step-by-step reproduction instructions (by running brew commands)

Issue Details

brew gist-logs <formula> link OR brew config AND brew doctor output

Verification

What were you trying to do (and why)?

What happened (include all command output)?

What did you expect to happen?

Step-by-step reproduction instructions (by running brew commands)

`brew gist-logs <formula>` link OR `brew config` AND `brew doctor` output

Step-by-step reproduction instructions (by running `brew` commands)

`brew gist-logs <formula>` link OR `brew config` AND `brew doctor` output

Step-by-step reproduction instructions (by running `brew` commands)