Ticket #8268 (reopened Bug)

Opened 8 years ago

Last modified 2 years ago

Indexing/Transforming issue with Word files for the wvWare transform

Reported by: wohnlice Owned by: nouri
Priority: minor Milestone: 4.x
Component: General Version: 4.3
Keywords: Cc: grahamperrin, micecchi, keul

Description

We're running Plone 3.0.6 with Zope 2.10.5, with wvWare. When posting any .doc file greater than around 1MB (not very big) as either a File or a PloneExFile (I am getting the same behavior with this product), the page will generally time out. Worse, the entire process will be slow - other users cannot navigate on any portal on that Zope server until the file finally finishes loading. I've taken the text of this file and saved it as a plain text file and as a PDF and have had no problems.

I was able to isolate the problem down to line 35 in PortalTransforms.transforms.office_wvware.py - "html = scrubHTML(html)" which I believe is there to remove any malicious scripts/tags in the html. If I comment out this line, I have absolutely no problems uploading/indexing doc files of a reasonable size, and regular navigation of the site by other users can occur while the file is uploading.

This is about as far as I go though - I don't have any experience with SGML parsers and do not really have an idea of what the problem may be.

Change History

comment:1 Changed 8 years ago by hannosch

  • Owner hannosch deleted

I don't have a clue about how to approach this. Someone else needs to take a look.

comment:2 Changed 7 years ago by grahamperrin

  • Cc grahamperrin added

comment:3 Changed 7 years ago by hannosch

  • Owner set to nouri
  • Component changed from Transforms to Archetypes

comment:4 Changed 4 years ago by kleist

  • Status changed from new to closed
  • Version set to 4.1
  • Resolution set to wontfix

Plone 3 no longer supported. Please re-open if still an issue in Plone 4.

comment:5 Changed 2 years ago by micecchi

  • Status changed from closed to reopened
  • Cc micecchi, keul added
  • Component changed from Archetypes to General
  • Version changed from 4.1 to 4.3
  • Milestone changed from 3.3.x to 4.x
  • Resolution wontfix deleted

I have the same problem on a Plone 4.3.1 instance.

A *.doc file of 1mb takes a lot of time to be saved.

The same text saved as pdf is saved in much much minor time.

Note: See TracTickets for help on using tickets.