The ultimate quest of bringing Visio support to LibreOffice


Today LGW’s guests are Eilidh McAdam and Fridrich Strba who implemented support for VSD documents (Microsoft Visio) in LibreOffice within GSoC2011.

Plot

Legacy file formats are evil. They tend to have no written specifications, and when you start reverse-engineering them, you often discover layers of questionable solutions built on top of even more questionable solutions that are carried around for backwards compatibility sake.

So why go through pains of supporting them at all? Simply put, because it’s the legacy which isn’t always in the past really, because businesses don’t like upgrades. Designers keep local cliparts in all kinds of arcane file formats, big publishing houses keep using DOC instead of DOCX, and system integration companies still toss around VSD files.

The problem with Visio is that Microsoft never released any useful specification on the binary file format. And it takes a very special kind of mind to spend a hell of a time working out internal structure of a binary document. Most free software developers would rather implement new fancy features. Even big companies tend to outsource this kind of software development.

As a member of the re-lab team that is responsible for initial work on reverse-engineering Microsoft Visio file formats I have to admit that I was highly biased when it came to picking a topic for the next interview. Nevertheless community’s response to announcement of preliminary support for VSD file in LibreOffice was, essentially, a choir of cheers.

So here I am interviewing Eilidh McAdam and Fridrich Strba who worked this summer on getting LibreOffice to understand binary Visio documents. For TL;DR people among you, support for VSD files will be publicly available in LibreOffice 3.5, currently planned for February 8, 2012.

Interview

Hi, folks! Could you introduce yourself please?

Eilidh: I’m a 24 year old Scottish PhD student. I have a BSc in Computing from the University of Abertay, Dundee, where I am currently trying to solve critical infrastructure network problems by looking at how biological networks manage to be so resilient. On IRC I’m also known as Tibby Lickle.

Fridrich: I am a 42-years old Christian male. Working for Attachmate/SUSE (formerly Novell) on LibreOffice. I am happily married with a wonderful wife and we have 3 children whose age changes every year and currently ranges between 1 year for the little one and almost 7 for the front-runner.

I have a computer science bachelor’s and master’s degree from an obscure university in Slovakia that nobody knows (University of Zilina) and a bachelor’s degree in International relations from the University of Geneva and master’s degree in International law from the Graduate Institute of International Studies in Geneva.

Fridrich, for how long have you been working on OO.o/LO? What is your primary area of interest/responsibility in the project?

I joined OOo about in 2005 when Michael Meeks helped us to integrate our WordPerfect import plug-in (based on our libwpd) into OOo codebase. And since by that time my co-developers in libwpd did not have much time for FOSS work anymore, I became the maintainer of that plug-in inside OOo.

I first mentored GSoC for OOo project in 2006. And, by chance it was also a graphic importer :) And since then I stayed around. In 2007, I was hired by Novell to work on OOo and later on GoOo and LibreOffice.

These days I work in the release team assuring that we have usable user builds for each release candidate and release. As for my interests: I like to work on file format conversion, be it text document import of graphic files import.

Eilidh, is it your first experience working with a free software project team?

More or less, yes. I had decided earlier in the year that I should finally give a little back to the community and chose LibreOffice as a starting point. I completed one of the easy hacks (which are entirely awesome from my perspective as a newbie), a very simple one liner in Calc to disable some auto-completion, and by chance noticed on the LibreOffice news page that they would be accepting GSoC projects.

Why did you pick this project?

Eilidh: I went onto the IRC channel, not really sure what I was wanting to do and asked for something interesting. I went along with it, because I’d never done any reverse engineering before and it sounded like fun. To be honest, I didn’t even know what Visio was prior to this project. I think Fridrich once called this an intellectual exercise. I might not be able to make beautiful diagrams, but I feel like I know it inside out now.

How well does importing work now when the GSoC program is over? Which VSD features are supported as of now?

Fridrich: Without the sin of false humility, I can say that it works freaking well. We support all drawing primitives, and almost all styles (fill, line etc.). There is an initial basic support for text too. For sure, we produced a software, so we expect it to have bugs, but as we stand now, there are no major features that would have easy mapping to SVG or ODG that are not supported.

Original network diagram in Visio

Network diagram converted to ODG

Eilidh: I think Fridrich covered this best. We have implemented way more than what the initial proposal outlined. Some of the major triumphs were implementing NURBS (there is no native support for these in SVG and ODG) and styles/stencils.

I think the best people to answer this, however, will be those who actually have Visio diagrams they wish to view in LibreOffice.

Eilidh, what was the most challenging part of your project?

It’s hard to say as many aspects were challenging. Structuring a large project and working with other programmers and non-programmers were new to me, but luckily I had a very good mentor who knew plenty about both of these things :)

What do you think you have gained from it?

Eilidh: I feel like I’ve learned more here than I did at university! How to integrate with a remote team (the main thing is communication, same as if it wasn’t remote), how much perfectionism can hold you back and the importance of getting some sort of results.

I also learned some new ways of thinking about program structure. Oh, and the extreme usefulness of GDB and Valgrind.

Are you planning to participate in LibreOffice development beyond GSoC?

Eilidh: It started as a simple desire to contribute something back and that hasn’t gone away. While I do still have my PhD to keep me busy, libvisio is really my pet project now. I put a lot into it and wouldn’t want to abandon it. I think that LibreOffice has a great community and I’d like to continue to participate in it.

VSD file format has some features unmatched by LibreOffice, and some UI that is unmatched by LibreOffice either like e.g. stencils palette. It probably doesn’t make a lot of sense trying to implement things for the sake of 100% compatibility, but how much of that is likely to be addressed in the future for good, solid reasons?

Eilidh: It’s not really up to me to decide on such a large interface change in LO, but if the team wanted this, I’d be happy to help where I could. We really focused on simply viewing diagrams, and things like the stencils palette are really more useful for creating them.

However, viewing stencils as diagrams would certainly be possible if there was demand for it.

Fridrich: There should not be big problem in implementing the stencil dumping into a multipage ODG document with one stencil per page. We will surely accept any patch that would go in this sense and that would be sane enough.

Nevertheless, the aim of this project was not to rewrite Visio, but to make a solid import filter for LibreOffice Draw, so that people who don’t have Visio, can open their Visio documents even on their Linux boxes.

Fridrich, does LibreOffice team have any plans regarding support for writing VSD files?

LibreOffice is a patch accepting economy, not a plan economy. I don’t know about any plan concerning Visio export. I am not even sure whether it is something of such an importance. But, since the one who codes decides her features, there will be no opposition against anybody who would try to come up with an extension of libvisio for export.

Nevertheless, I would encourage people to first help out with the VDX support and in finishing the implementation of features that are still missing in the importer. Because whatever is not imported will not be exported either. So, in order to achieve a lossless roundtrips, one would first need to have a lossless import.

Users tend to expect from software support for all kind of (legacy) binary file formats. At the same time developers tend to want everyone just use standards (such as OpenDocument). How do you solve this conflict within LibreOffice project? Is it more like a “send us a patch for whatever bothers you and then we’ll see” policy?

Fridrich: I don’t think that what is happening inside LibreOffice project can be characterized by a strive between developers wanting everywhere ODF and users wanting support for “exotic” file-formats. When one sees all the filters in LibreOffice, one will see that support for “exotic” file-formats is pretty high on the priority list of LibreOffice development.

But for sure, the best way for a user to advance his own agenda is not to go around begging others to implement his pet feature, but to get his hands dirty and produce a patch. Even if that patch is not perfect, it is sure that it will be attended to by the LibreOffice community with a lot of care and the author will be mentored into becoming a happy and fulfilled contributor.

Thanks you for insightful replies!

It is our pleasure and thanks for the wonderful web-site and for furthering the cause of free graphic applications.

Trivia

Initial reverse-engineering work was done by Valek Filippov in 2007 and resulted in (currently obsolete) vsdump and vsdviewer tools. The currently active project, used for libvisio development, is OLE Toy which can also read many other proprietary file formats.

Work on libvisio library started in May 2011 as part of Google Summer of Code 2011 project by Eilidh McAdam, a student from the University of Abertay, Scotland.

According to Fridrich, the peak of commits in June was due to a re-factoring the team did and also to the fact that Eilidh was implementing the different primitives. That is normally part of the first 95% of the work that takes normally 5% of the time.

They continued to do more brain-intensive work on NURBS in July, and the work on stencils and styles did not really add the same amount of code in itself, but was having huge impact on how the resulting graphics look. The team now is in the phase where they basically will do incremental changes.

Commits to libvisio

Primary developers are Fridrich and Eilidh, with guest commits from Valek. The drop in the graph below is explained by soft pencils down stage of GSoC project in the middle of August.

Committers to libvisio

The project is currently at 6.5K lines of C++ code with assorted bits of infrastructure around.

Code growth in libvisio

The library provides a vsd2xhtml converter that embeds SVG documents. A vsd2odg converter is provided by writerperfect package. LibreOffice from Git just opens VSD files.

At the moment of publishing this article the libvisio library supports:

  • all geometry features such as MoveTo, LineTo, PolylineTo, ArcTo, EllipticalArcTo, Ellipse, NURBSTo;
  • strokes, stroke styles (some issues there), transparent, plain and gradient fills;
  • page size and orientation, multipage documents;
  • text, including basic formatting (size, bold and italic faces) and styles;
  • transformations such as rotation and flipping (works for groups too);
  • groups of objects;
  • embedded bitmaps.

LibreOffice didn’t support elliptic arcs which are common in Visio stencils. The team had to implement approximation of these arcs to 4 cubic beziers in the upcoming LibreOffice 3.5 to make rendering of Visio stencils correct.

Building

If you are brave enough, you can try building the whole stack of tools to try conversion for yourself.

  1. Fetch and build libwpd

    $ git clone git://libwpd.git.sourceforge.net/gitroot/libwpd/libwpd

    $ cd libwpd

    $ ./autogen.sh && ./configure && make && sudo make install

  2. Fetch and build libwpg:

    $ git clone git://libwpg.git.sourceforge.net/gitroot/libwpg/libwpg

    $ cd libwpg

    $ ./autogen.sh && ./configure && make && sudo make install

  3. Fetch and build libvisio:

    $ git clone git://anongit.freedesktop.org/libreoffice/contrib/libvisio

    $ cd libvisio

    $ ./autogen.sh && ./configure --prefix=/usr && make && sudo make install

  4. Fetch and build writerperfect:

    $ git clone git://libwpd.git.sourceforge.net/gitroot/libwpd/writerperfect

    $ cd writerperfect

    $ ./autogen.sh && ./configure --with-libvisio && make && sudo make install

Purists will yell at me for using --prefix=/usr, but it works for me.

Conversion to XHTML and SVG

As already mentioned above, the libvisio package contains a vsd2xhtml tool that converts VSD to SVG illustrations and embeds them into XHTML files. Here is an example of such a diagram opened in Inkscape:

An excerpt from an VSD file converted to XHTML and opened with Inkscape

If you’d rather use Inkscape, here is a guide to the macabre ritual of doing that. A pentagram is nearly involved.

  1. Run ‘$ vsd2xhtml file.vsd > file.xhtml’.
  2. Use a text editor to open file.xhtml and remove HTML bits, keeping SVG bits (see below), then save to SVG.
  3. Open the SVG file in Inkscape.
  4. Enjoy your diagram.

The aforementioned SVG bits will typically start with:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" 
"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg:svg version="1.1" xmlns:svg="http://www.w3.org/2000/svg" 
xmlns:xlink="http://www.w3.org/1999/xlink" width="1008" height="1008" >

…and finish with:

</svg:svg>

The XML declaration and DOCTYPE part will be commented out by default — remove that comment. Also remember that original VSD files can have multiple pages. In that case vsd2xhtml tool will embed one SVG document after another. Just save those multiple documents to separate SVG files.

Hopefully at some point there will be an Inkscape extension to load VSD files directly into the application.

Conclusion

In all fairness, this is a huge work. VSD features are not 100% covered; as of now, you are likely to stumble upon some imperfections. This will eventually be taken care of. So if something goes wrong, please report to LibreOffice/libvisio team and provide a sample document.