@@ -36,22 +36,31 @@ method appropriately ([see below](#convert)).
3636 sanitization in popular software] for notes on best practices to ensure
3737 HTML is properly sanitized.
3838
39- The developers of Python-Markdown recommend using [`nh3`][nh3] or [`bleach`][bleach][^1]
40- as a sanitizer on the output of `markdown.markdown`. However, be
41- aware that those libraries may not be sufficient in themselves and will
42- likely require customization. Some useful lists of allowed tags and
43- attributes can be found in the [`bleach-allowlist`][bleach-allowlist] library, which should
39+ The developers of Python-Markdown recommend using [JustHTML] as a
40+ sanitizer on the output of `markdown.markdown`. JustHTML includes a
41+ built-in HTML sanitizer. When you pass the HTML output through JustHTML
42+ (`JustHTML(markdown.markdown(text), fragment=True).to_html())`), it
43+ is sanitized by default according to a strict [allow list policy]. The
44+ policy can be [customized] if necessary.
45+
46+ If you cannot use JustHTML for some reason, some alternatives include
47+ [`nh3`][nh3] or [`bleach`][bleach][^1]. However, be aware that those
48+ libraries will not be sufficient in themselves and will require
49+ customization. Some useful lists of allowed tags and attributes can be
50+ found in the [`bleach-allowlist`][bleach-allowlist] library, which should
4451 work with either sanitizer.
4552
4653
4754[ Markdown and XSS ] : https://michelf.ca/blog/2010/markdown-and-xss/
4855[ Improper markup sanitization in popular software ] : https://github.com/ChALkeR/notes/blob/master/Improper-markup-sanitization.md
56+ [ JustHTML ] : https://emilstenstrom.github.io/justhtml/
57+ [ allow list policy ] : https://emilstenstrom.github.io/justhtml/html-cleaning.html#default-sanitization-policy
58+ [ customized ] : https://emilstenstrom.github.io/justhtml/html-cleaning.html#use-a-custom-sanitization-policy
4959[ nh3 ] : https://nh3.readthedocs.io/en/latest/
5060[ bleach ] : http://bleach.readthedocs.org/en/latest/
5161[ bleach-allowlist ] : https://github.com/yourcelf/bleach-allowlist
52- [ ^ 1 ] : We are aware that the [ bleach] project has been [ deprecated] ( https://github.com/mozilla/bleach/issues/698 ) .
53- However, it is the only pure-Python HTML sanitation library we are aware of and may be the only option for
54- those who cannot use [ ` nh3 ` ] [ nh3 ] (Python bindings to a Rust library).
62+ [ ^ 1 ] : Note that the [ bleach] project has been [ deprecated] ( https://github.com/mozilla/bleach/issues/698 ) .
63+ However, it may be the only option for some users.
5564
5665The following options are available on the ` markdown.markdown ` function:
5766
@@ -205,6 +214,20 @@ __tab_length__{: #tab_length }:
205214
206215### ` markdown.markdownFromFile (**kwargs) ` {: #markdownFromFile data-toc-label='markdown.markdownFromFile' }
207216
217+ !!! warning
218+
219+ The Python-Markdown library does ***not*** sanitize its HTML output. If
220+ you are processing Markdown input from an untrusted source, it is your
221+ responsibility to ensure that it is properly sanitized. See [Markdown and
222+ XSS] for an overview of some of the dangers and [Improper markup
223+ sanitization in popular software] for notes on best practices to ensure
224+ HTML is properly sanitized.
225+
226+ As `markdown.markdownFromFile` writes directly to the file system, there
227+ is no easy way to sanitize the output from Python code. Therefore, it is
228+ recommended that the `markdown.markdownFromFile` function not be used on
229+ input from an untrusted source.
230+
208231With a few exceptions, ` markdown.markdownFromFile ` accepts the same options as
209232` markdown.markdown ` . It does ** not** accept a ` text ` (or Unicode) string.
210233Instead, it accepts the following required options:
@@ -242,22 +265,6 @@ __encoding__{: #encoding }
242265 meet your specific needs, it is suggested that you write your own code
243266 to handle your encoding/decoding needs.
244267
245- !!! warning
246-
247- The Python-Markdown library does ***not*** sanitize its HTML output. If
248- you are processing Markdown input from an untrusted source, it is your
249- responsibility to ensure that it is properly sanitized. See [Markdown and
250- XSS] for an overview of some of the dangers and [Improper markup
251- sanitization in popular software] for notes on best practices to ensure
252- HTML is properly sanitized.
253-
254- The developers of Python-Markdown recommend using [`nh3`][nh3] or [`bleach`][bleach][^1]
255- as a sanitizer on the output of `markdown.markdownFromFile`.
256- However, be aware that those libraries may not be sufficient in
257- themselves and will likely require customization. Some useful lists of
258- allowed tags and attributes can be found in the
259- [`bleach-allowlist`][bleach-allowlist] library, which should work with either sanitizer.
260-
261268### ` markdown.Markdown([**kwargs]) ` {: #Markdown data-toc-label='markdown.Markdown' }
262269
263270The same options are available when initializing the ` markdown.Markdown ` class
@@ -273,6 +280,29 @@ string must be passed to one of two instance methods.
273280
274281#### ` Markdown.convert(source) ` {: #convert data-toc-label='Markdown.convert' }
275282
283+ !!! warning
284+
285+ The Python-Markdown library does ***not*** sanitize its HTML output. If
286+ you are processing Markdown input from an untrusted source, it is your
287+ responsibility to ensure that it is properly sanitized. See [Markdown and
288+ XSS] for an overview of some of the dangers and [Improper markup
289+ sanitization in popular software] for notes on best practices to ensure
290+ HTML is properly sanitized.
291+
292+ The developers of Python-Markdown recommend using [JustHTML] as a
293+ sanitizer on the output of `Markdown.convert`. JustHTML includes a
294+ built-in HTML sanitizer. When you pass the HTML output through JustHTML
295+ (`JustHTML(md.convert(text), fragment=True).to_html())`), it
296+ is sanitized by default according to a strict [allow list policy]. The
297+ policy can be [customized] if necessary.
298+
299+ If you cannot use JustHTML for some reason, some alternatives include
300+ [`nh3`][nh3] or [`bleach`][bleach][^1]. However, be aware that those
301+ libraries will not be sufficient in themselves and will require
302+ customization. Some useful lists of allowed tags and attributes can be
303+ found in the [`bleach-allowlist`][bleach-allowlist] library, which should
304+ work with either sanitizer.
305+
276306The ` source ` text must meet the same requirements as the [ ` text ` ] ( #text )
277307argument of the [ ` markdown.markdown ` ] ( #markdown ) function.
278308
@@ -300,6 +330,8 @@ To make this easier, you can also chain calls to `reset` together:
300330html3 = md.reset().convert(text3)
301331```
302332
333+ #### ` Markdown.convertFile(**kwargs) ` {: #convertFile data-toc-label='Markdown.convertFile' }
334+
303335!!! warning
304336
305337 The Python-Markdown library does ***not*** sanitize its HTML output. If
@@ -309,14 +341,10 @@ html3 = md.reset().convert(text3)
309341 sanitization in popular software] for notes on best practices to ensure
310342 HTML is properly sanitized.
311343
312- The developers of Python-Markdown recommend using [`nh3`][nh3] or [`bleach`][bleach][^1]
313- as a sanitizer on the output of `Markdown.convert`. However, be
314- aware that those libraries may not be sufficient in themselves and will
315- likely require customization. Some useful lists of allowed tags and
316- attributes can be found in the [`bleach-allowlist`][bleach-allowlist] library, which should
317- work with either sanitizer.
318-
319- #### ` Markdown.convertFile(**kwargs) ` {: #convertFile data-toc-label='Markdown.convertFile' }
344+ As `Markdown.convertFile` writes directly to the file system, there
345+ is no easy way to sanitize the output from Python code. Therefore, it is
346+ recommended that the `Markdown.convertFile` method not be used on
347+ input from an untrusted source.
320348
321349The arguments of this method are identical to the arguments of the same
322350name on the ` markdown.markdownFromFile ` function ([ ` input ` ] ( #input ) ,
@@ -325,19 +353,3 @@ name on the `markdown.markdownFromFile` function ([`input`](#input),
325353process multiple files without creating a new instance of the class for
326354each document. State may need to be ` reset ` between each call to
327355` convertFile ` as is the case with ` convert ` .
328-
329- !!! warning
330-
331- The Python-Markdown library does ***not*** sanitize its HTML output. If
332- you are processing Markdown input from an untrusted source, it is your
333- responsibility to ensure that it is properly sanitized. See [Markdown and
334- XSS] for an overview of some of the dangers and [Improper markup
335- sanitization in popular software] for notes on best practices to ensure
336- HTML is properly sanitized.
337-
338- The developers of Python-Markdown recommend using [`nh3`][nh3] or [`bleach`][bleach][^1]
339- as a sanitizer on the output of `Markdown.convertFile`. However, be
340- aware that those libraries may not be sufficient in themselves and will
341- likely require customization. Some useful lists of allowed tags and
342- attributes can be found in the [`bleach-allowlist`][bleach-allowlist] library, which should
343- work with either sanitizer.
0 commit comments