Ticket #14929 (confirmed PLIP)

Opened 20 months ago

Last modified 6 months ago

Use lxml cleaner for safehtml transforms

Reported by: tom_gross Owned by:
Priority: minor Milestone: 5.0
Component: Backend (Python) Version:
Keywords: Cc:

Description (last modified by tom_gross) (diff)

Proposer: Tom Gross
Seconder:

Abstract

Use the HTML cleaner shipped with lxml to provide the safe_html transform of Plone found in Products.Portaltransforms.

Motivation

The lxml cleaner class is shipped already with Plone. Some homebacked code relying on CMFDefault, regex and the obsolete SGMLParser can be removed and be replaced with a cleaner and faster implementation of HTML code cleaning.

Proposal & Implementation

Replace the SGMLParser / regex solution with the HTML cleaner shipped with lxml, which is a Plone dependency already.

Deliverables

A branch of Products.Portaltransforms with the new implementation of the safe_html-transform.

Risks

  • The output of HTML might change in a subtile way and break many tests of Plone or third parties, which rely on this output.
  • Needs a security audit?
  • Might need additonal tests with broken HTML

Participants

Tom Gross <tomgross>

Progress

 https://github.com/plone/Products.PortalTransforms/tree/tomgross-nocmfdefault Portaltransforms Tests failing with a simple replacement  https://gist.github.com/tomgross/d059653bfe3948521d8f

Change History

comment:1 Changed 20 months ago by tom_gross

  • Type changed from Bug to PLIP
  • Version 4.4 deleted
  • Component changed from Unknown to Backend (Python)
  • Description modified (diff)
  • Milestone changed from 4.x to 5.0

comment:2 Changed 20 months ago by tom_gross

 https://pypi.python.org/pypi/htmllaundry/2.0 might be a good candidate to check also. It is based on lxml.html.cleaner and provides a cleaning API already and includes z3c.form support.

Last edited 20 months ago by tom_gross (previous) (diff)

comment:3 Changed 20 months ago by cwainwright

  • Status changed from new to confirmed

comment:4 Changed 15 months ago by timo

I added this PLIP to the spreadsheet and the FWT will discuss it during the next FWT meeting in two weeks.

 https://docs.google.com/a/plone.org/spreadsheets/d/15Cut73TS5l_x8djkxNre5k8fd7haGC5OOSGigtL2drQ/edit?authkey=CLH59a8D&hl=en&authkey=CLH59a8D&pli=1#gid=3

There is also a possible GSoC student who is interested in working on that topic.

comment:5 Changed 15 months ago by MatthewWilkes

My security perspective on this is: HELL YES. DO IT.

comment:6 Changed 15 months ago by timo

We created a google docs document for brainstorming and discussing the ideas for GSoC:

 https://docs.google.com/document/d/12M5NUviDY6FwYomUyoZUIc-39DwFEDS1topzbEVEhGc/edit#heading=h.879sq1t2kqp

Once we are done, we can paste the results here.

comment:7 Changed 13 months ago by timo

Hi Tom,

the FWT approved your PLIP. Sorry it took so long. Please go ahead with the implementation. I will champion your PLIP, so I can answer the questions you might have regarding the PLIP process or the implementation.

I created a PLIP Jenkins job for you:

 http://jenkins.plone.org/view/PLIPs/job/plip-14929-lxml-safehtml-transform/

Cheers, Timo

comment:8 Changed 11 months ago by timo

Hi Tom,

I saw some activity on your PortalTransforms branch. Are you working on this? Can you give us a quick heads up?

The transforms GSoC project has been accepted. If there is anything the GSoC student could do to help your efforts, let us know.

Cheers, Timo

comment:9 Changed 6 months ago by timo

Hi Tom,

could you give the FWT a quick heads up about the current status of your PLIP? Do you plan to work on this for Plone 5.1? If this is the case I would recommend to move the PLIP to github.com/plone/Products.CMFPlone/issues and mark it with the PLIP label.

Cheers, Timo

comment:10 Changed 6 months ago by tom_gross

What is the status of the GSoC project working on this topic?

Most of the work is done in the branch cited in the description of the PLIP already.

comment:11 Changed 6 months ago by darkterror

It's all here  https://github.com/collective/experimental.safe_html_transform . We have a transform script that have been developed using lxml and we have added the control panel for the add-on with the deregistration of safe_html from portal_transform on installing of this new safe_html transform and we have also created a uninstall profile for the add-on. Also the tinyMCE integration of the script is almost done. Now the thing that is left is the integration of control panel with the script so that user can modify the transform to work according to their needs.

Cheers, Prakhar Joshi (_pjoshi)

Note: See TracTickets for help on using tickets.