DataWeave: Convert XML to JSON preserving namespaces

DataWeave: Convert XML to JSON preserving namespaces

Every MuleSoft developer knows that converting XML to JSON with DataWeave is super easy—you just set the output to application/json, and you're done.

More experienced developers also know there are options to preserve element attributes and handle repeated elements as arrays:

%dw 2.0
output application/json writeAttributes=true, duplicateKeyAsArray=true
---
payload        

But what about namespaces? Is it possible to preserve namespaces on elements and attributes during JSON conversion?

At first glance, it seems there’s no built-in support in DataWeave to retain XML namespaces when converting to JSON—at least not yet.

So, what can we do if preserving namespaces is a requirement?

If you’ve ever worked with XSLT or XPath, you probably agree that parsing XML with namespaced elements and attributes has always been a bit trickier. Is it any easier with DataWeave?

I tried several times in the past—unsuccessfully—until I finally set aside some time to crack the nut. In this article, I’ll show how you can convert an XML document to a JSON object while preserving the namespaces of both elements and attributes.

How to read namespaces and attributes

First of all, the official documentation is not entirely correct about XML selectors, it says that they return return string values of an attribure or namespace respectively. But actually

  • keyName.# selector returns a Namespace typed value, represented by namespace uri string by default.
  • keyName.@ selector returns Object typed value with all XML attributes as key-value pairs, and every attribute has a namespace too.

Here is an example.

Input payload:

<?xml version='1.0' encoding='UTF-8'?>
<root:root xmlns:root="http://namespace.root" xmlns:ns1="http://namespace.ns1" ns1:attr="ns1 attribute value" xmlns:ns2="http://namespace.ns2" ns2:attr="ns2 attribute value"/>        

DataWeave trasformation:

%dw 2.0
output application/json

var dash = payload.root.#
var at = payload.root.@

---
{ 
    "dash": {  
     "selector": dash,
     "typeOf": typeOf(dash),
     "prefix": dash.prefix,
     "uri": dash.uri,
    },
    "at": {  
     "selector": at,  
     "typeOf": typeOf(at),     
     "withNS": at mapObject {
         ($$): { value: $,
         "prefix": $.#.prefix,
         "uri": $.#.uri} 
        }
    }
}        

Output:

{
  "dash": {
    "selector": "http://namespace.root",
    "typeOf": "Namespace",
    "prefix": "root",
    "uri": "http://namespace.root"
  },
  "at": {
    "selector": {
      "attr": "ns1 attribute value",
      "attr": "ns2 attribute value"
    },
    "typeOf": "Object",
    "withNS": {
      "attr": {
        "value": "ns1 attribute value",
        "prefix": "ns1",
        "uri": "http://namespace.ns1"
      },
      "attr": {
        "value": "ns2 attribute value",
        "prefix": "ns2",
        "uri": "http://namespace.ns2"
      }
    }
  }
}        

This knowledge is really helpful, because now we can create or own DataWeave xml-to-json converstion. Let's do that.

Helper functions

To build our final solution, we’ll need a few helper functions along the way.

parseNS

This function returns an object containing the prefix and URI of an XML element's or attribute’s namespace. If the namespace is not available, a default namespace is returned as a fallback in the same object format.

fun parseNS (value, xmlns) = 
{ 
  prefix: (value.# default xmlns).prefix,
  uri: (value.# default xmlns).uri
}        

parseXML

We'll create two versions of the parseXML function:

  • The first version handles the root element. It extracts the root’s namespace to use as the default throughout the structure.
  • The second version is a recursive function that parses child elements. It builds an array of elements at the same level, where each XML element is represented as an object with the following keys: namespace, element, attributes, and value.

fun parseXML(xml) = do {
    var xmlns = xml[0].#
    ---
    xml pluck ((value, key) -> ({ 
    namespace: parseNS(value, xml), 
    element: key,
    attributes: parseXML(value.@,xmlns), 
    value: if (value is Object) parseXML(value,xmlns) else value
}) filterObject $ !=  null )
}         

This version is called by the root parser and takes the default namespace (xmlns) as a parameter. Typically, this namespace is defined by the xmlns attribute of the root element.

fun parseXML(xml,xmlns) = xml pluck ((value, key) -> ({ 
    namespace: parseNS(value, xmlns), 
    element: key,
    attributes: parseXML(value.@,xmlns), 
    value: if (value is Object) parseXML(value,xmlns) else value
}) filterObject $ !=  null )        

listNS

The next helper function collects all unique namespaces from the parsed structure. These namespaces will later be added to the root element of the final result as attributes—just like in the original XML.

The function also detects the default namespace declared with the xmlns attribute, and prepends "xmlns:" to all other namespace definitions.

fun listNS(parsedXML) = parsedXML..*namespace distinctBy $ map { (if (isBlank($.prefix)) "xmlns" else "xmlns:"++$.prefix) : $.uri } reduce ((item,acc) -> acc ++ item)         

reJson

Now, let’s restructure and simplify the parsed content so that the namespace prefix is added to each element or attribute name using a colon (:)—just like in XML.

This step prepares the data to be represented in JSON format while retaining the XML namespace context in a familiar, readable way.

fun reJson(json) = json map {
    ( if (!isBlank($.namespace.prefix)) $.namespace.prefix ++ ":" ++ $.element else $.element): ({
        attributes: ($.attributes map ((a) -> { ( if (!isBlank(a.namespace.prefix)) a.namespace.prefix ++ ":" ++ a.element else a.element): a.value }) reduce ((item,acc={}) -> acc ++ item)),
        value: if ($.value is Array) reJson($.value) reduce ((item,acc={}) -> acc ++ item) else $.value
    } filterObject $ != null )
}        

Putting It All Together: xml2json

Now we have everything we need to build our final solution—a custom XML-to-JSON conversion function:

fun xml2json(xml) = do {
  var parsedXML = parseXML(xml)
  var json = reJson(parsedXML)[0]
  var rootJson = json[0]
  var updatedRootJson = rootJson update {
    case .attributes -> listNS(parsedXML) ++ rootJson.attributes
  }
  ---
  json mapObject { ($$): updatedRootJson } 
}        

This function converts any XML input into a structured JSON object where:

  • Each element is represented as an object with attributes and value keys.
  • Namespace prefixes are preserved in both element and attribute names, using the familiar prefix:name format.
  • All namespace declarations are included as attributes in the root element, just like in the original XML.

Here is an example, for the following input xml payload:

<?xml version="1.0" encoding="UTF-8"?>
<root:root xmlns:root="http://namespace.root" xmlns:ns1="http://namespace.ns1" xmlns:ns2="http://namespace.ns2" ns1:attr="ns1 attribute value" ns2:attr="ns2 attribute value">
  <ns1:levelOne ns1:attr="a1" ns2:attr="a2">
    <ns1:levelTwo ns1:attr="a1" ns2:attr="a2">value1</ns1:levelTwo>
    <ns2:levelTwo ns1:attr="a1" ns2:attr="a2">value2</ns2:levelTwo>
  </ns1:levelOne>
  <ns2:levelOneElement ns1:attr="a1" ns2:attr="a2">
    <ns1:levelTwo ns1:attr="a1" ns2:attr="a2">value1</ns1:levelTwo>
    <ns2:levelTwo ns1:attr="a1" ns2:attr="a2">value2</ns2:levelTwo>
  </ns2:levelOneElement>
</root:root>        

Our function will produce the following json output:

{
  "root:root": {
    "attributes": {
      "xmlns:root": "http://namespace.root",
      "xmlns:ns1": "http://namespace.ns1",
      "xmlns:ns2": "http://namespace.ns2",
      "ns1:attr": "ns1 attribute value",
      "ns2:attr": "ns2 attribute value"
    },
    "value": {
      "ns1:levelOne": {
        "attributes": {
          "ns1:attr": "a1",
          "ns2:attr": "a2"
        },
        "value": {
          "ns1:levelTwo": {
            "attributes": {
              "ns1:attr": "a1",
              "ns2:attr": "a2"
            },
            "value": "value1"
          },
          "ns2:levelTwo": {
            "attributes": {
              "ns1:attr": "a1",
              "ns2:attr": "a2"
            },
            "value": "value2"
          }
        }
      },
      "ns2:levelOneElement": {
        "attributes": {
          "ns1:attr": "a1",
          "ns2:attr": "a2"
        },
        "value": {
          "ns1:levelTwo": {
            "attributes": {
              "ns1:attr": "a1",
              "ns2:attr": "a2"
            },
            "value": "value1"
          },
          "ns2:levelTwo": {
            "attributes": {
              "ns1:attr": "a1",
              "ns2:attr": "a2"
            },
            "value": "value2"
          }
        }
      }
    }
  }
}        

But why?

Pretty cool, right? But you might be wondering—why on Earth would we need to preserve XML namespaces during data transformations? After all, it’s hard to imagine a namespace itself carrying business-critical data.

In my case, the need arose from static code analysis of Mule applications—specifically, automating the generation of C4 Model component diagrams directly from Mule XML. Since each MuleSoft connector declares its own XML namespace, preserving those namespaces was essential to parsing the code accurately and identifying components.

That, by the way, is a whole other story I’d love to share another time.

I'm sure that with a bit of reflection, you'll think of other scenarios where keeping XML namespaces intact during transformations could be just as valuable. If so, I’d love to hear your thoughts—feel free to share your use cases in the comments.

Thanks for reading!

To view or add a comment, sign in

More articles by Aleksandr Balabchenkov

Others also viewed

Explore content categories