Opened 8 years ago

Closed 8 years ago

Last modified 6 years ago

#3808 closed enhancement (fixed)

Add BOM to Modelica files

Reported by: ceraolo Owned by: adeas31
Priority: high Milestone:
Component: OMEdit Version:
Keywords: Cc:

Description

Modelica specifications say:

Each Modelica file in the file-system is stored in UTF-8 format (defined by The Unicode Consortium; http://www.unicode.org) and may start with the UTF-8 encoded byte order mark (0xef 0xbb 0xbf);

Currently OM does not use BOMs. the resulting files are not read correctly by other tools, such as Dymola, that expect BOM to exist.
I mean, the files run correctly, but non-ASCII characters are badly displayed.

Since BOM usage is allowed by the Modelica specifications, and their addition does not cause problems, I recommend to add them to OM-created files, at least as an option.
This would enhance OM compatibility with other tools with a very little effort.

Change History (5)

comment:1 Changed 8 years ago by sjoelund.se

Since BOM usage [...} does not cause problems

This is a false statement. It does cause problems for Modelica tools that did not implement support for the BOM. The Unicode Consortium also does not recommend using the BOM since it is not necessary and causes problems. There is also only a single file encoding allowed by the Modelica specification (UTF-8), so there is no need to mark which file encoding is used by the file.

BOM is also only allowed at the top of a file (string), which means you need to take very good care to never ever introduce them inside strings for example. Which means the OM API should never ever display them. This also means API functions like readFile can cause unexpected results on files with BOM (starting in Unicode 3.2, BOM in the middle of a string is no longer a zero-width space). So BOM complicates things, which is why they are not recommended to use.

comment:2 Changed 8 years ago by ceraolo

Well, I intended exactly to add the three bytes only at the beginning of a mo file. Never in strings.
I intended never to display these three bytes that would have remained invisible to users.

Still this complicates things?
I know that OM programmers have many things to do, and I made this suggestion only thinking (maybe wrongly) that adding three bytes at the very beginning of a mo file, possibily as an option, were straightforward.
If this causes troubles, never mind.

I will also ask Modelon to ask Dassault to add a flag in Dymola so that input files are always interpreted as UTF-8. Currently is not the case with Dymola, I believe because they added UTF-support relatively recently, and want to keep as much backwards compatibility as possible.

Last edited 8 years ago by ceraolo (previous) (diff)

comment:3 Changed 8 years ago by adeas31

  • Resolution set to fixed
  • Status changed from new to closed

Added bom settings in fbf65b8/OMEdit.

The settings has three options,

  • Always Add (always add a BOM when saving a file)
  • Keep If Already Present (default - save the file with a BOM if it already had one when it was loaded)
  • Always delete (never write a BOM, possibly deleting a pre-existing one)

if you always want bom choose Always Add. The default option is Keep If Already Present

comment:4 Changed 8 years ago by ceraolo

Thanks a lot!
It is very cleverly implemented!

I want to show students that OM is nice and powerful, and by far compatibile with Dymola (and other tools).
This change helps this compatibility a lot. We as Italians tend to use very frequently Italian words in comments and having them spoiled was really disappointing.

It was difficult for me to understand things also because Dymola has a strange behaviour: writes as UTF-8 (adds BOM) only if at least one non ASCII character is present.
This logic does not consider extended ASCII as non ASCII, and therefore Italian characters were always spoiled when switching from Dymola to OM.

I found a nice workaround for this: put in the file a comment containing just the euro symbol: this forces Dymola to code as UTF-8.

Now, with this fix, the transfer should be "Italian-safe" also when moving the other way (from OM to Dym).

comment:5 Changed 6 years ago by sjoelund.se

  • Milestone 1.10.0 deleted

Milestone deleted

Note: See TracTickets for help on using tickets.