libicucore on Mac OS X

December 10th, 2006

Mac OS X 10.4 includes the ICU library but no simple way to actually use it. ICU is an open source Unicode library developed by IBM for Unicode and Locale-dependent functionality. Apple provides the library (/usr/lib/libicucore.dylib) in a standard Mac OS X installation but no headers to compile against the library. ICU provides many different features but the feature that I was most interested in was support for regular expressions. The ICU regular expression library is used by Java and Xcode and a lot of other software. The classic regular expression functions regexec() and friends do not support Unicode and that makes them undesirable for use in modern applications. There are many other Regular Expression libraries for the Mac but it would be nice to use ICU as it is powerful and installed on Mac OS X.

I’m not the first person who has tried to use ICU on Mac OS X. There was a thread on Cocoa-dev and also one on the Xcode-users list. Here’s the info I got from the latter thread from Apple’s “Unicode Liason”, Deborah Goldsmith:

Apple does not currently support direct developer access to ICU. We are considering such support for a future release of Mac OS X. It is possible to access ICU by downloading it from icu.sourceforge.net, building it to produce the header files, compiling against those headers, and then linking against /usr/lib/libicucore.dylib. You need to be aware of the following: 1. You need to use the right version of ICU for the version of Mac OS X you are using: a. 10.2.x and earlier do not have ICU at all b. 10.3.x has ICU 2.6 c. 10.4.x has ICU 3.2 For example, if you need to run on Panther, use the ICU 2.6 headers; if you require Tiger, then you can use the 3.2 headers. If you need to run on 10.2.x then you cannot rely on ICU being present. 2. You must not use draft or C++ APIs, or your application may crash when run on future versions of Mac OS X. If you are using the ICU 3.2 headers (or later), you can do this to hide draft APIs: #define U_HIDE_DRAFT_API 1 before including any ICU header files. Never include a C++ header file. 3. Apple does not support this usage of ICU. I think it will work, but cannot make any guarantees.

Unfortunately, this isn’t quite enough information to successfully use ICU on Mac OS X. Here’s what I did to successfully use ICU:

  1. Built ICU 3.2
  2. added the icu headers from the build to my project and included them in my header search path
  3. add the define for U_HIDE_DRAFT_API to hide the draft APIs
  4. declare the URegularExpression type that is now missing due to the previous step.
  5. add #define U_DISABLE_RENAMING 1 to disable renaming of functions
  6. include <unicode/uregex.h> (after the new defines) in your project.
  7. verify that code that uses ICU can be built

I’ve created the Cocoa ICU Library which wraps the ICU Regular Expression functions in a Cocoa API based on the ICU C++ API for the RegexPattern and RegexMatcher classes. It should be useful for anyone interested in using ICU-based Regular Expressions with Cocoa or for anyone interested in using any other ICU functions on Mac OS X.

One Response to “libicucore on Mac OS X”

  1. aaron evans » Blog Archive » Posts From The Past Says:

    [...] Dec, 2006: I explained the previously unknown process of how to use ICU on Mac OS X. That led to CocoaICU. [...]