Evaluation of current mesh format

Continuing the discussion from Re-import only the mesh:

I have worked quite a bit with the model format recently, and thought I would provide some feedback on the current format of storing the mesh/material/model data. While I do not expect anything to come of it, when you next update the model format, remember this thread and maybe give it a read.

Currently the model format works, but it akward to do several things, particularly where generating and working with the data directly.

Good Points of Current Format

  • Mesh data is stored separate to instance data allowing instanced meshes (keeping filesizes small)
  • Instances have parenting, support different scales, rotations etc.
  • Somewhat human readable

Things I would like improved (summary)

  • The material data is separate to the instace, despite being logically related
  • Things are heavily dependant on order - UV, mesh index, nodes
  • Unnecessary/easily computed data

Material Data:
In the json file, there is:

'meshInstances':{
                'node': ob_num,
                'mesh': mesh_num
                }

Then there is a whole separate file to say “this meshInstance has this material.” In my mind it would be much more logical to simply do:

    'meshInstances':{
                    'node': ob_num,
                    'mesh': mesh_num
                    'material':'path/to/material.json'
                    }

This reduces the filecount/number of http requests, and keeps logical data together. I guess why you separated it out is so that the model data only contains model data, not any material data at all. But I think these should be linked in the way described above.

Linking by Order vs Linking by Name
Currently the model format has a lot of lists and other order based things:
One example, the nodes:

'nodes': [
                {
                    "name": "RootNode",
                    "position": [0, 0, 0],
                    "rotation": [0, 0, 0],
                    "scale": [1, 1, 1],
                }, {
                    "name": "Object1",
                    "position": [0, 1, 0],
                    "rotation": [0, 90, 0],
                    "scale": [1, 1, 1],
                },
          ]

And then in the meshinstances:

'meshInstances':{
                    'node': 1,
                    'mesh': 0
                    }

I would suggest:

'meshInstances':{
                    'node': 'NodeName',
                    'mesh': 'MeshName'
                    }

Yes, it takes more work to parse from within playcanvas (you have to extract the name and store in an associative array/object rather than just read it into a list), and it means nodes have to be uniquely named (which they are in every modelling software I’ve ever used), but it means that the file is more human readable and exporting to the format is easier. Neither of these are likely to be your priorities though.

The same goes for UVdata (currently stored as ‘texCoord0’, ‘texCoord1’ Not even in a list but indexed by some ID rather than a name. I’d do UVData:[{'name':'Lightmap', 'data':[]}, {}...]. This one means that when using the same material for different meshes, the order UVData is stored is not important as it is referenced by name.

There are other things stored by implicit order (vertex positions, vertex normals, vertex colors) which I think are OK, but if they are things that may want to be dealt with on a higher level, I think they should be indexed by name.

Also, does a node ever have multiple meshes? I have never seen more than one meshinstance mapped to a node. Could the nodes list with the meshInstances list be combined?

Unnecessary Data

mesh:{
         'aabb': {'min': minpos, 'max': maxpos},
         'vertices': vertices,
         'indices': indices,
         'type': typ,
         'base': base,
         'count': count
     }

‘aabb’ can be calculated easily while parseing the json file
‘typ’ never changes?
‘base’ is always zero?
‘count’ is just the length of vertices divided by three. You don’t ask for length of lists in other places.

In other places:

  • ‘type’: ‘float32’ everywhere. Is anything else valid? (I do think ‘components’ is a good one to have in otherwise unstructured lists though)
  • There is always a rootNode at zero position,rotation and 1 scale. This then is parented to -1. This doesn’t seem to add any actual information that can’t be assumed

Well, I hope that was interesting.
On a side note, some other questions: what languages/what is the current model conversion pipeline? Is their a summary of material format somwhere? How is animation data stored anyway?

Important to mention, that our model format and other data formats, were made to be used mostly in editor published apps. So models were stored as gzipped json, and materials actually are stored within big asset files, but not as separate files.
Then we had to find a way to load model files off-tools, by engine-only approach, so decisions and ways went off it.
Since then it was never reviewed throughout. Although we had ideas and some thoughts on different optimisations around the places.

Array indexes - are actually fairly stable way to reference items. Way more stable than node names which in fact are not unique and can clash.
Mapping data was intentionally kept separately, because we do not want to have any file paths inside our formats, as it is not the way things are referenced in engine.
In database we keep mappings (materials to nodes) separately as well, and all it contains: ID’s of assets (materials) relation to node index. It is pretty portable way to store data, it has no issue when paths are changed, or names of things are changed, nor material, nor node name.

aabb - calculating aabb can actually cost a lot. If model has thousands of vertices and multiple nodes, it has to iterate through each vertex for each node. We’ve optimised that part a lot recently, but this data is very simple to store, and having it pre-calculated improves loading times, especially on mobile.

There are many improvements to this format can be done.
We actually looked at the direction of binary serialisation with almost-ready buffers, that would be possible to feed them to webgl directly, that will save a lot time on parsing. As well as possible sorting techniques to improve compression rates to reduce file size. But there is no urgency in such right now, so it is very low in our backlog.

Thanks for looking through our format. It is nice to see when people go into guts of engine and data, and hack-around - this is a good thing, and gives you good understanding how things work, and that allows you to provide some valuable feedback.

1 Like