Jump to content
Muxe Inc Forums
Sign in to follow this  
dimases

Unicode in Editor & Viewer

Recommended Posts

dimases    0

Realy need more times work with unicode encoding. Unfortunately for it must use fu...ing notepad =(((

 

Is possible to add this feature?

Share this post


Link to post
Share on other sites

hi!

 

i am not sure if this is possible in text mode; we can convert from unicode to ansi etc, but, if you want to actually edit a file it will probably not work

 

you cannot correctly display all unicode in text mode (special characters etc), that's for sure

 

any other opinions?

(i don't know a lot about unicode)

 

Stefan / AH

Share this post


Link to post
Share on other sites
dimases    0

oh! can you to make partially, expl to view/edit only visible symbols.

 

sorry, i don't understant about all unicode specification, i can wrong, but if this possible in limited edition, it is necessary!!! if nothing else for russian unicode page.

 

 

more text files are in unicode now =(

Share this post


Link to post
Share on other sites

hi again!

 

in the todo list i have already made a note to add unicode to viewer, because i have also seen many files in unicode in windows (inf files for example) and the viewer might be easier to support

 

my idea for this was:

check a certain amount of code for this pattern

if this matches the file will be treated as unicode, and it will only display every so you should be able to read it (and at least change the existing file by overwriting the eyisting text)

 

as you can see this would be limited to normal characters only and not usable for real unicode encoding with other combinations than

 

does simple russian unicode look like too? if so we could give it a try

 

of course, in editor something similar could be possible too, but i have not thought about this because the editor has a different structure!

 

Stefan / AH

Share this post


Link to post
Share on other sites

hi again!

 

of course i once again didn't remember that there is html interpretaton of the post

------------------------------------

in the todo list i have already made a note to add unicode to viewer, because i have also seen many files in unicode in windows (inf files for example) and the viewer might be easier to build this into

 

my idea for this was:

check a certain amount of code for this pattern #nul# #char# #nul# #char#

 

if this matches the file will be treated as unicode, and it will only display every #char# so you should be able to read it (and at least change the existing file by overwriting the eyisting text)

 

as you can see this would be limited to normal characters only and not usable for real unicode encoding with other combinations than #nul# #char#

 

does simple russian unicode look like #nul# #char# too? if so we could give it a try

 

of course, in editor something similar could be possible too, but i have not thought about this because the editor has a different structure!

----------------------------------------

i hope this makes sense now :P

 

Stefan / AH

Share this post


Link to post
Share on other sites
dimases    0

i know about unicode (UTF-8) very small, but know that it contain of two bytes, where first is a codepage, second - char.

 

to expl, look, chars

in russian windows 1251

91,ae,e5,e0,a0,ad,a8,e2,ec (hex)

 

in unicode are looking as

90,d1,90,fe,91,b5,91,b0,90,df,90,fd,90,f1,91,b2,91,bc

 

 

look more info at http://www.unicode.org/

 

 

---

 

good look! we trust you =)

Share this post


Link to post
Share on other sites
Garl    0

maybe make external encoding table file like *.xlt to view unicode

with structure like 
#XX#YY#ZZ ( XX - codepage YY-char ZZ-result char )

Share this post


Link to post
Share on other sites
dandv    0

OK, I've been working with various Unicode encoding for almost one year - I'm a software localization engineer.

Unicode is really a set of abstract characters, e.g. 'A', 'b', '*' or '★', without any specific code assigned to them.

 

In order to store a Unicode character in a file, you need to encode it. Two popular encoding schemes are:

 

* UCS-2 (or UTF-16), which encodes a character in two bytes (the popular pattern refered to before: #char #nul #char #nul in UCS-2 Little Endian, or #nul #char, in UCS-2 Big Endian). The first 127 bytes happen to match over the familiar ASCII character

 

* UTF-8, whch encodes a character in a variable number of bytes (1-3). Again, the first 127 characters are common with ASCII.

 

Unfortunately, to display text in text mode, you store a byte at B800:offset and the color information next to it, so there is no way to display more than 256 characters at a time. So no chance for anything close to the real Unicode.

 

I see two things that could be done:

1. if a (portion of a) file consists entirely of the "#char #nul" (or #nul #char) pattern (this corresponds to standard ASCII gratuitously encoded in UCS-2), NDN could display that portion stripping the null bytes

2. searching could have a checkbox to also search for the UCS-2 LE and UCS-2 BE encodings of the searched string. This is relatively simple to implement and quite useful for searching for strings inside EXE files

 

Hope that helps,

Dan Dascalescu

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×