Wednesday, 5 May 2010

Encoding to use while reading/writing files in .NET


"Digging deep into encoding"

While performing File I/O, it is important to keep in mind whether the files being processed can contain special characters or not. If it is so, then check the encoding you are using.

Encoding used in:

StreamReader: it uses UTF-8 encoding by default unless otherwise specified.


StreamWriter: Uses the default encoding of the system unless otherwise specified.

If we do a check on the default encoding - we get "Windows-1252" [which is ANSI]. So while development if we don't specify encoding (and use defaults) then the files created by applications will use ANSI encoding, and applications reading them will use UTF-8!!!

This will give error, and we will see square characters/questions marks in place of the special characters.

Solution:
- We can use encoding "Windows-1252" when reading the file or
- use encoding UTF-8 while writing /reading the files (better approach)

Ironically:
You will get error (if no encoding is specified) only if you are reading an ANSI file using UTF-8 encoding. But you can successfully read a UTF-8 file using the ANSI (Windows-1252) encoding!!!

No comments:

Post a Comment

Note: only a member of this blog may post a comment.

Shorts - week 3, 2022

Post with links to what I am reading: 1. A very good post on different aspects of system architecture: https://lethain.com/introduction-to-a...