Ticket #12557 (confirmed Bug)

Opened 4 years ago

Last modified 4 years ago

Products.CMFPlone.UnicodeSplitter.splitter can crash at unicodedata.normalize

Reported by: tnagai@… Owned by: vincentfretin
Priority: major Milestone: 4.x
Component: Backend (Python) Version: 4.0
Keywords: Cc:

Description

We use Plone 4.0.5 to develop our new university site. One of our staff found that she couldn't upload a particular Japanese PDF document. When she uploaded the PDF, Plone process crashed and automatically restarted.

I confirmed Plone crashes at Products.CMFPlone.UnicodeSplitter.splitter module. The call of unicodedata.normalize method in the module crashes Python interpreter itself. The PDF unexpectedly contains a special sequence of Unicodes causing the crash.

This problem seems to already known in Python community :  http://bugs.python.org/issue10254. However, Plone 4.0.x uses Python 2.6 and this problems is not fixed.

To solve this problem, I replaced unicodedata.normalize by Normalizer in PyICU and it works fine (although we need to install PyICU).

$ diff splitter.py.org splitter.py.fixed 8a9

from icu import *

16a18

normalizerNFKC = Normalizer2.getInstance(None, "nfkc",UNormalizationMode2.COMPOSE)

89c91 < normalized = unicodedata.normalize('NFKC', uni) ---

normalized = unicode(normalizerNFKC.normalize(uni))

104c106 < normalized = unicodedata.normalize('NFKC', uni) ---

normalized = unicode(normalizerNFKC.normalize(uni))

Change History

comment:1 Changed 4 years ago by kleist

  • Priority changed from minor to major
  • Status changed from new to confirmed
  • Component changed from Internationalization to Backend (Python)

comment:2 Changed 4 years ago by kleist

Is this still an issue in Plone 4.1 or 4.2 ?

comment:3 Changed 4 years ago by kleist

  • Milestone set to 4.x
Note: See TracTickets for help on using tickets.