DataWeave: Convert XML to JSON preserving namespaces
Every MuleSoft developer knows that converting XML to JSON with DataWeave is super easy—you just set the output to application/json, and you're done.
More experienced developers also know there are options to preserve element attributes and handle repeated elements as arrays:
%dw 2.0
output application/json writeAttributes=true, duplicateKeyAsArray=true
---
payload
But what about namespaces? Is it possible to preserve namespaces on elements and attributes during JSON conversion?
At first glance, it seems there’s no built-in support in DataWeave to retain XML namespaces when converting to JSON—at least not yet.
So, what can we do if preserving namespaces is a requirement?
If you’ve ever worked with XSLT or XPath, you probably agree that parsing XML with namespaced elements and attributes has always been a bit trickier. Is it any easier with DataWeave?
I tried several times in the past—unsuccessfully—until I finally set aside some time to crack the nut. In this article, I’ll show how you can convert an XML document to a JSON object while preserving the namespaces of both elements and attributes.
How to read namespaces and attributes
First of all, the official documentation is not entirely correct about XML selectors, it says that they return return string values of an attribure or namespace respectively. But actually
Here is an example.
Input payload:
<?xml version='1.0' encoding='UTF-8'?>
<root:root xmlns:root="http://namespace.root" xmlns:ns1="http://namespace.ns1" ns1:attr="ns1 attribute value" xmlns:ns2="http://namespace.ns2" ns2:attr="ns2 attribute value"/>
DataWeave trasformation:
%dw 2.0
output application/json
var dash = payload.root.#
var at = payload.root.@
---
{
"dash": {
"selector": dash,
"typeOf": typeOf(dash),
"prefix": dash.prefix,
"uri": dash.uri,
},
"at": {
"selector": at,
"typeOf": typeOf(at),
"withNS": at mapObject {
($$): { value: $,
"prefix": $.#.prefix,
"uri": $.#.uri}
}
}
}
Output:
{
"dash": {
"selector": "http://namespace.root",
"typeOf": "Namespace",
"prefix": "root",
"uri": "http://namespace.root"
},
"at": {
"selector": {
"attr": "ns1 attribute value",
"attr": "ns2 attribute value"
},
"typeOf": "Object",
"withNS": {
"attr": {
"value": "ns1 attribute value",
"prefix": "ns1",
"uri": "http://namespace.ns1"
},
"attr": {
"value": "ns2 attribute value",
"prefix": "ns2",
"uri": "http://namespace.ns2"
}
}
}
}
This knowledge is really helpful, because now we can create or own DataWeave xml-to-json converstion. Let's do that.
Helper functions
To build our final solution, we’ll need a few helper functions along the way.
parseNS
This function returns an object containing the prefix and URI of an XML element's or attribute’s namespace. If the namespace is not available, a default namespace is returned as a fallback in the same object format.
fun parseNS (value, xmlns) =
{
prefix: (value.# default xmlns).prefix,
uri: (value.# default xmlns).uri
}
parseXML
We'll create two versions of the parseXML function:
Recommended by LinkedIn
fun parseXML(xml) = do {
var xmlns = xml[0].#
---
xml pluck ((value, key) -> ({
namespace: parseNS(value, xml),
element: key,
attributes: parseXML(value.@,xmlns),
value: if (value is Object) parseXML(value,xmlns) else value
}) filterObject $ != null )
}
This version is called by the root parser and takes the default namespace (xmlns) as a parameter. Typically, this namespace is defined by the xmlns attribute of the root element.
fun parseXML(xml,xmlns) = xml pluck ((value, key) -> ({
namespace: parseNS(value, xmlns),
element: key,
attributes: parseXML(value.@,xmlns),
value: if (value is Object) parseXML(value,xmlns) else value
}) filterObject $ != null )
listNS
The next helper function collects all unique namespaces from the parsed structure. These namespaces will later be added to the root element of the final result as attributes—just like in the original XML.
The function also detects the default namespace declared with the xmlns attribute, and prepends "xmlns:" to all other namespace definitions.
fun listNS(parsedXML) = parsedXML..*namespace distinctBy $ map { (if (isBlank($.prefix)) "xmlns" else "xmlns:"++$.prefix) : $.uri } reduce ((item,acc) -> acc ++ item)
reJson
Now, let’s restructure and simplify the parsed content so that the namespace prefix is added to each element or attribute name using a colon (:)—just like in XML.
This step prepares the data to be represented in JSON format while retaining the XML namespace context in a familiar, readable way.
fun reJson(json) = json map {
( if (!isBlank($.namespace.prefix)) $.namespace.prefix ++ ":" ++ $.element else $.element): ({
attributes: ($.attributes map ((a) -> { ( if (!isBlank(a.namespace.prefix)) a.namespace.prefix ++ ":" ++ a.element else a.element): a.value }) reduce ((item,acc={}) -> acc ++ item)),
value: if ($.value is Array) reJson($.value) reduce ((item,acc={}) -> acc ++ item) else $.value
} filterObject $ != null )
}
Putting It All Together: xml2json
Now we have everything we need to build our final solution—a custom XML-to-JSON conversion function:
fun xml2json(xml) = do {
var parsedXML = parseXML(xml)
var json = reJson(parsedXML)[0]
var rootJson = json[0]
var updatedRootJson = rootJson update {
case .attributes -> listNS(parsedXML) ++ rootJson.attributes
}
---
json mapObject { ($$): updatedRootJson }
}
This function converts any XML input into a structured JSON object where:
Here is an example, for the following input xml payload:
<?xml version="1.0" encoding="UTF-8"?>
<root:root xmlns:root="http://namespace.root" xmlns:ns1="http://namespace.ns1" xmlns:ns2="http://namespace.ns2" ns1:attr="ns1 attribute value" ns2:attr="ns2 attribute value">
<ns1:levelOne ns1:attr="a1" ns2:attr="a2">
<ns1:levelTwo ns1:attr="a1" ns2:attr="a2">value1</ns1:levelTwo>
<ns2:levelTwo ns1:attr="a1" ns2:attr="a2">value2</ns2:levelTwo>
</ns1:levelOne>
<ns2:levelOneElement ns1:attr="a1" ns2:attr="a2">
<ns1:levelTwo ns1:attr="a1" ns2:attr="a2">value1</ns1:levelTwo>
<ns2:levelTwo ns1:attr="a1" ns2:attr="a2">value2</ns2:levelTwo>
</ns2:levelOneElement>
</root:root>
Our function will produce the following json output:
{
"root:root": {
"attributes": {
"xmlns:root": "http://namespace.root",
"xmlns:ns1": "http://namespace.ns1",
"xmlns:ns2": "http://namespace.ns2",
"ns1:attr": "ns1 attribute value",
"ns2:attr": "ns2 attribute value"
},
"value": {
"ns1:levelOne": {
"attributes": {
"ns1:attr": "a1",
"ns2:attr": "a2"
},
"value": {
"ns1:levelTwo": {
"attributes": {
"ns1:attr": "a1",
"ns2:attr": "a2"
},
"value": "value1"
},
"ns2:levelTwo": {
"attributes": {
"ns1:attr": "a1",
"ns2:attr": "a2"
},
"value": "value2"
}
}
},
"ns2:levelOneElement": {
"attributes": {
"ns1:attr": "a1",
"ns2:attr": "a2"
},
"value": {
"ns1:levelTwo": {
"attributes": {
"ns1:attr": "a1",
"ns2:attr": "a2"
},
"value": "value1"
},
"ns2:levelTwo": {
"attributes": {
"ns1:attr": "a1",
"ns2:attr": "a2"
},
"value": "value2"
}
}
}
}
}
}
But why?
Pretty cool, right? But you might be wondering—why on Earth would we need to preserve XML namespaces during data transformations? After all, it’s hard to imagine a namespace itself carrying business-critical data.
In my case, the need arose from static code analysis of Mule applications—specifically, automating the generation of C4 Model component diagrams directly from Mule XML. Since each MuleSoft connector declares its own XML namespace, preserving those namespaces was essential to parsing the code accurately and identifying components.
That, by the way, is a whole other story I’d love to share another time.
I'm sure that with a bit of reflection, you'll think of other scenarios where keeping XML namespaces intact during transformations could be just as valuable. If so, I’d love to hear your thoughts—feel free to share your use cases in the comments.
Thanks for reading!