Skip to content
  • Henri Sivonen's avatar
    Bug 1402247 - Use encoding_rs for XPCOM string encoding conversions. r=Nika,erahm,froydnj. · 3edc6013
    Henri Sivonen authored
    Correctness improvements:
    
     * UTF errors are handled safely per spec instead of dangerously truncating
       strings.
    
     * There are fewer converter implementations.
    
    Performance improvements:
    
     * The old code did exact buffer length math, which meant doing UTF math twice
       on each input string (once for length calculation and another time for
       conversion). Exact length math is more complicated when handling errors
       properly, which the old code didn't do. The new code does UTF math on the
       string content only once (when converting) but risks allocating more than
       once. There are heuristics in place to lower the probability of
       reallocation in cases where the double math avoidance isn't enough of a
       saving to absorb an allocation and memcpy.
    
     * Previously, in UTF-16 <-> UTF-8 conversions, an ASCII prefix was optimized
       but a single non-ASCII code point pessimized the rest of the string. The
       new code tries to get back on the fast ASCII path.
    
     * UTF-16 to Latin1 conversion guarantees less about handling of out-of-range
       input to eliminate an operation from the inner loop on x86/x86_64.
    
     * When assigning to a pre-existing string, the new code tries to reuse the
       old buffer instead of first releasing the old buffer and then allocating a
       new one.
    
     * When reallocating from the new code, the memcpy covers only the data that
       is part of the logical length of the old string instead of memcpying the
       whole capacity. (For old callers old excess memcpy behavior is preserved
       due to bogus callers. See bug 1472113.)
    
     * UTF-8 strings in XPConnect that are in the Latin1 range are passed to
       SpiderMonkey as Latin1.
    
    New features:
    
     * Conversion between UTF-8 and Latin1 is added in order to enable faster
       future interop between Rust code (or otherwise UTF-8-using code) and text
       node and SpiderMonkey code that uses Latin1.
    
    MozReview-Commit-ID: JaJuExfILM9
    3edc6013