Apr 15, 2010

python writexml fails with xml.dom.minidom, when using ISO-8859-1 encoding and e.g. character ö

I'm processing XML Files with python ( 2.4.2 and 2.6.2  makes no difference ).

The XML file stored in encoding ISO-8859-1 and contains a ö in the value
for an attribute:


content= "Löschungsgrund" />

I just attempted to load the XML file and write it to a different file:

>>> d=xml.dom.minidom.parse( "AUSW_KUEND.xml" )
>>> f=open("/tmp/a", "w" )
>>> d.writexml( f )
Traceback (most recent call last):
  File "", line 1, in
  File "/usr/lib/python2.6/xml/dom/minidom.py", line 1749, in writexml
    node.writexml(writer, indent, addindent, newl)
  File "/usr/lib/python2.6/xml/dom/minidom.py", line 817, in writexml
    node.writexml(writer,indent+addindent,addindent,newl)
  File "/usr/lib/python2.6/xml/dom/minidom.py", line 817, in writexml
    node.writexml(writer,indent+addindent,addindent,newl)
  File "/usr/lib/python2.6/xml/dom/minidom.py", line 817, in writexml
    node.writexml(writer,indent+addindent,addindent,newl)
  File "/usr/lib/python2.6/xml/dom/minidom.py", line 812, in writexml
    _write_data(writer, attrs[a_name].value)
  File "/usr/lib/python2.6/xml/dom/minidom.py", line 303, in _write_data
    writer.write(data)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 1: ordinal not in range(128)
Using the encoding of writexml():

>>> d.writexml(f, '', '', '', "ISO-8859-1" )

also doen't work, because the code just uses this encoding to write
the XML-Header in the new document.

The hint so successfully write the document came from here.

>>> import codecs
>>> f=codecs.open( "/tmp/aaa.xml", "w", "ISO-8859-1" )
>>> d.writexml( f, '', '','', "ISO-8859-1" )

with the same DOM-Object d as before, the new document is written without problems and the correct encoding.