ffutf
Utility functions for manipulating Unicode Transformation Format (UTF) compatible data
Functions
File manipulation functions

Functions

uint32_t fgetc_u (FILE *fp, int type)
 
size_t fwrite_u (uint32_t c, FILE *fp, int type)
 

Detailed Description

Note
If an invalid FILE pointer is specified, the resulting behaviour can be specified by specifying the FFUTF_FAILSAFELY macro definition in compile-time. If the aforementioned macro is specified, a NULL FILE pointer will cause the program to abort. Otherwise, a segmentation fault is induced. Any expansion associated with the macro will be ignored.
If UTF16 or UTF32 is specified without byte-ordering, big-endian ordering is assumed.

Function Documentation

◆ fgetc_u()

uint32_t fgetc_u ( FILE *  fp,
int  type 
)

Extends the standard fgetc function such that that function is now Unicode-aware. As such, the file position will seek from one to four bytes depending on what is parsed.

Parameters
fpValid FILE pointer.
typeValid type (as defined in Unicode types).
Returns
a valid UTF32 character on success, otherwise the following can be returned:
  • UTFERR upon a non-standard error, where ffutf_errno is set to a non-zero number.
  • EOF upon a standard error or end of file which can at time of writing be checked with feof() and ferror().
Note
If NONE as a type is specified, then a raw byte casted to uint32_t will be read ( similar behaviour to fgetc).
Importantly, any byte-order marks that is seen will be treated as being found not at the start of the stream.
If an invalid type is specified, the format is assumed to be a raw byte.
Upon error, the file position will not revert to where the position was before calling the function. For example, for UTF8 encoded characters with 4 bytes, if the function had parsed two bytes and did not see a valid third byte, the function will terminate with the file position at the start of the fourth byte and not at the first byte.

◆ fwrite_u()

size_t fwrite_u ( uint32_t  c,
FILE *  fp,
int  type 
)

Analogous to the fputc standard library function, adding Unicode-aware functionality. The prefix fwrite was chosen as internally fwrite is used instead of fputc and, consequently, the return values are similar to fwrite.

Parameters
cA valid UTF32 character.
fpValid FILE pointer.
typeValid type (as defined in Unicode types).
Returns
How many bytes were written out which depends on the type as follows:
Type Bytes expected
UTF8 1 to 4 inclusive
UTF16 2 or 4 only
UTF32 4 only
NONE 4 only

Otherwise,

  • UTFERR upon a non-standard error, where ffutf_errno is set to a non-zero number.
  • 0 upon a standard error or end of file which can at time of writing be checked with feof() and ferror().
Note
If NONE is specified as a type, a UTF32 character in big-endian ordering will be written out.
If an invalid type is specified, the format is assumed to be a UTF32 character in big-endian ordering.
Upon error, the file position will not seek to the position before the function was called.