Ticket #14929 (confirmed PLIP)
Use lxml cleaner for safehtml transforms
Reported by: | tom_gross | Owned by: | |
---|---|---|---|
Priority: | minor | Milestone: | 5.0 |
Component: | Backend (Python) | Version: | |
Keywords: | Cc: |
Description (last modified by tom_gross) (diff)
Proposer: Tom Gross
Seconder:
Abstract
Use the HTML cleaner shipped with lxml to provide the safe_html transform of Plone found in Products.Portaltransforms.
Motivation
The lxml cleaner class is shipped already with Plone. Some homebacked code relying on CMFDefault, regex and the obsolete SGMLParser can be removed and be replaced with a cleaner and faster implementation of HTML code cleaning.
Proposal & Implementation
Replace the SGMLParser / regex solution with the HTML cleaner shipped with lxml, which is a Plone dependency already.
Deliverables
A branch of Products.Portaltransforms with the new implementation of the safe_html-transform.
Risks
- The output of HTML might change in a subtile way and break many tests of Plone or third parties, which rely on this output.
- Needs a security audit?
- Might need additonal tests with broken HTML
Participants
Tom Gross <tomgross>
Progress
https://github.com/plone/Products.PortalTransforms/tree/tomgross-nocmfdefault Portaltransforms Tests failing with a simple replacement https://gist.github.com/tomgross/d059653bfe3948521d8f
Change History
comment:1 Changed 20 months ago by tom_gross
- Type changed from Bug to PLIP
- Version 4.4 deleted
- Component changed from Unknown to Backend (Python)
- Description modified (diff)
- Milestone changed from 4.x to 5.0
comment:2 Changed 20 months ago by tom_gross
https://pypi.python.org/pypi/htmllaundry/2.0 might be a good candidate to check also. It is based on lxml.html.cleaner and provides a cleaning API already and includes z3c.form support.
comment:4 Changed 15 months ago by timo
I added this PLIP to the spreadsheet and the FWT will discuss it during the next FWT meeting in two weeks.
There is also a possible GSoC student who is interested in working on that topic.
comment:5 Changed 15 months ago by MatthewWilkes
My security perspective on this is: HELL YES. DO IT.
comment:6 Changed 15 months ago by timo
We created a google docs document for brainstorming and discussing the ideas for GSoC:
Once we are done, we can paste the results here.
comment:7 Changed 13 months ago by timo
Hi Tom,
the FWT approved your PLIP. Sorry it took so long. Please go ahead with the implementation. I will champion your PLIP, so I can answer the questions you might have regarding the PLIP process or the implementation.
I created a PLIP Jenkins job for you:
http://jenkins.plone.org/view/PLIPs/job/plip-14929-lxml-safehtml-transform/
Cheers, Timo
comment:8 Changed 11 months ago by timo
Hi Tom,
I saw some activity on your PortalTransforms branch. Are you working on this? Can you give us a quick heads up?
The transforms GSoC project has been accepted. If there is anything the GSoC student could do to help your efforts, let us know.
Cheers, Timo
comment:9 Changed 6 months ago by timo
Hi Tom,
could you give the FWT a quick heads up about the current status of your PLIP? Do you plan to work on this for Plone 5.1? If this is the case I would recommend to move the PLIP to github.com/plone/Products.CMFPlone/issues and mark it with the PLIP label.
Cheers, Timo
comment:10 Changed 6 months ago by tom_gross
What is the status of the GSoC project working on this topic?
Most of the work is done in the branch cited in the description of the PLIP already.
comment:11 Changed 6 months ago by darkterror
It's all here https://github.com/collective/experimental.safe_html_transform . We have a transform script that have been developed using lxml and we have added the control panel for the add-on with the deregistration of safe_html from portal_transform on installing of this new safe_html transform and we have also created a uninstall profile for the add-on. Also the tinyMCE integration of the script is almost done. Now the thing that is left is the integration of control panel with the script so that user can modify the transform to work according to their needs.
Cheers, Prakhar Joshi (_pjoshi)