AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |
Back to Blog
Ascii file from mestrenova 103/31/2024 I made a function that addresses all this issues. If you apply utf8_encode() to an already UTF8 string it will return a garbled UTF8 output. Here's a transcription of another answer I gave to a similar question: If you really don't know the encoding of a file, nothing is going to tell you it with 100% accuracy. Just to reiterate though: all of this is heuristic. At that point you could detect that a file is valid UTF-8 by recoding it to UTF-8 and seeing whether the input and output are identical.Īlternatively, do this programmatically rather than using the recode utility - it would be quite straightforward in C#, for example. one which contains invalid UTF-8 byte sequences) it may well convert the invalid sequences into question marks or something similar. I'm not familiar with the recode tool itself, but you might want to see whether it's capable of recoding a file from and to the same encoding - if you do this with an invalid file (i.e. One option would be to detect whether it's actually a completely valid UTF-8 file first, I suppose. Now there are certainly characteristics which would strongly suggest that it's UTF-8 - if it starts with the UTF-8 BOM, for example - but they wouldn't be definitive. How would you expect recode to know that a file is Windows-1252? In theory, I believe any file is a valid Windows-1252 file, as it maps every possible byte to a character.
0 Comments
Read More
Leave a Reply. |