article marketing
Cerca:

Home | Computer | Software


Character Encoding Recognition Made Easy

By: Darrell Burk

So you're writing the mother of all text editors, and your rich editing features are working beautifully. Then you hit a serious snag as you start the code that reads and decodes existing files: character sets. How can your program tell which character encoding should be used to properly read each file?

Or perhaps you're writing a custom program to convert to Unicode and archive thousands of text documents for your employer. The original documents are saved in many different encodings, and there is no easy way to correctly identify the character set for each one.

You do a little research and find that byte order markers (BOMs) might help you identify some of the UTF character sets, plus you learn some tricks that can help you recognize when a file might use the US-ASCII encoding. But these tricks aren't guaranteed-in fact, they'll probably fail as often as they work. Plus they don't help you at all with most of the two hundred or so other possible encodings.

That just isn't good enough for your application. You need software that can accurately recognize the character encoding of a text file no matter what it is. As you begin to discover the wide array of character sets and encoding strategies and contemplate the complexities involved, you conclude you'd really rather not write it.

You need EncodingSleuth Text.

EncodingSleuth Text is a powerful Java library designed specifically with your application in mind. It examines files and byte streams to determine whether they contain encoded text, and identifies the character set most likely used to encode them.

EncodingSleuth Text uses several different statistical analysis techniques-called detectors-to analyze each possible character set that might be used to decode a file, and to score each one so that the correct character set obtains the highest score. It is configurable: you can selectively enable/disable each of the detectors to tailor its operation for your specific needs. It is also extensible: you can provide your own detector implementations should the need arise.

With licensing options that allow royalty-free redistribution within your applications, and even deployment within server applications, and a price that's a fraction of the cost to develop your own encoding recognition technology, EncodingSleuth Text offers a complete and robust answer to your need.

You can download EncodingSleuth Text, request a free full-featured trial license, and peruse the documentation at http://www.encodingsleuth.com.

Italian Article Marketing Directory: http://www.articolando.com

Darrell is president, developer, and most everything else at SynergiSystems, Inc. He launched SynergiSystems in 2007 in order to create software to make life easier for software developers.





social bookmarking

Vota l'articolo

 

Not yet Rated

Clicca sulla icona XML per ricevere Software Via RSS!

realizzazione siti cagliari| creazione siti sardegna

Powered by Article Dashboard