< Rebol Programming
USAGE:
INVALID-UTF? data /utf num
DESCRIPTION:
Checks for proper UTF encoding and returns NONE if correct or position where the error occurred.
INVALID-UTF? is a function value.
ARGUMENTS:
- data -- (Type: binary)
REFINEMENTS:
- /utf -- Check encodings other than UTF-8
- num -- Bit size - positive for BE negative for LE (Type: integer)
(SPECIAL ATTRIBUTES)
- catch
SOURCE CODE
invalid-utf?: func [
{Checks for proper UTF encoding and returns NONE if correct or position where the error occurred.}
[catch]
data [binary!]
/utf "Check encodings other than UTF-8"
num [integer!] "Bit size - positive for BE negative for LE" /local
ascii
utf8+1
utf8+2
utf8+3
utf8rest pos
hi lo w c
][
ascii: make bitset! #{
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF00000000000000000000000000000000
}
utf8+1: make bitset! #{
000000000000000000000000000000000000000000000000FCFFFFFF00000000
}
utf8+2: make bitset! #{
00000000000000000000000000000000000000000000000000000000FFFF0000
}
utf8+3: make bitset! #{
0000000000000000000000000000000000000000000000000000000000001F00
}
utf8rest: make bitset! #{
00000000000000000000000000000000FFFFFFFFFFFFFFFF0000000000000000
}
switch/default any [num 8] [
8 [
unless parse/all/case data [(pos: none) any [
pos: ascii | utf8+1 utf8rest |
utf8+2 2 utf8rest | utf8+3 3 utf8rest
]] [as-binary pos]
]
16 [
pos: data
while [not tail? pos] [
hi: first pos
case [
none? lo: pick pos 2 [break/return pos]
55296 > w: hi * 256 + lo [pos: skip pos 2]
57343 < w [pos: skip pos 2]
56319 < w [break/return pos]
none? hi: pick pos 3 [break/return pos]
none? lo: pick pos 4 [break/return pos]
56320 > w: hi * 256 + lo [break/return pos]
57343 >= w [pos: skip pos 4]
]
none
]
]
-16 [
pos: data
while [not tail? pos] [
lo: first pos
case [
none? hi: pick pos 2 [break/return pos]
55296 > w: hi * 256 + lo [pos: skip pos 2]
57343 < w [pos: skip pos 2]
56319 < w [break/return pos]
none? lo: pick pos 3 [break/return pos]
none? hi: pick pos 4 [break/return pos]
56320 > w: hi * 256 + lo [break/return pos]
57343 >= w [pos: skip pos 4]
]
none
]
]
32 [
pos: data
while [not tail? pos] [
if any [
4 > length? pos
negative? c: to-integer pos
1114111 < c
] [break/return pos]
]
]
-32 [
pos: data
while [not tail? pos] [
if any [
4 > length? pos
negative? c: also to-integer reverse/part pos 4 reverse/part pos 4
1114111 < c
] [break/return pos]
]
]
] [
throw-error 'script 'invalid-arg num
]
]
This article is issued from Wikibooks. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.