[FFmpeg-trac] #8466(avformat:closed): av_url_split violates RFC2396, fails to parse URLs

FFmpeg trac at avcodec.org
Sun Feb 16 19:41:10 EET 2020


#8466: av_url_split violates RFC2396, fails to parse URLs
-------------------------------------+------------------------------------
             Reporter:  aitte        |                    Owner:
                 Type:  defect       |                   Status:  closed
             Priority:  normal       |                Component:  avformat
              Version:  unspecified  |               Resolution:  fixed
             Keywords:  http         |               Blocked By:
             Blocking:               |  Reproduced by developer:  0
Analyzed by developer:  0            |
-------------------------------------+------------------------------------

Comment (by aitte):

 Replying to [comment:8 cus]:
 > Replying to [comment:7 aitte]:
 > > Note how the code is adding a "/" slash if the first character of the
 path is a question mark. Is there any problem if the URL is
 "https://example.com#foo"?
 > >
 > > It looks to me like the hashmark check a few lines earlier (
 https://git.videolan.org/?p=ffmpeg.git;a=commitdiff;h=987ce96d41d21a120e6109f9bb1e3a9dcddbf30b
 ) puts a null byte at the hashmark position to "erase the hashmark and
 everything after it" already, so I guess there is no problem! Just asking
 to be sure. :-)
 >
 > Yeah, that should be fine as well, empty path (it becomes empty after
 removing the hashmark) was converted to / even before this patchset.

 Hey, thanks again, I really almost can't believe how much cleaner the new
 URL code is. The old one you replaced (ls, ls2 system) was a convoluted
 mess. Hehe. This change is truly brilliant:
 https://git.videolan.org/?p=ffmpeg.git;a=commitdiff;h=554576b6cfe79a91d37e14d3617ca417562085db

 And ahh yeah I see what you mean now about the empty path converting to
 "/", at this area of the code:
 https://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavformat/http.c;h=d0df1b6d585da422aef14eb6128eb319ddd9ac98;hb=a3d8875df1f71e62a36c583cee638ab1d51a71d3#l220

 So "http://example.com#tag" would be split via "av_url_split()" into
 "http", "example.com" and "#tag" (path1). Then path1 variable gets
 truncated to "\0tag", and then it says "if (path1[0] == '\0') { path =
 "/"; }" which means it should properly be converting the path to "/" when
 there was just a hashmark after the domain name.

 And if the path is non-empty but starts with "?", your new code formats it
 as "/?" properly. And lastly, if the path is normal and starts with "/"
 already then it's used as-is.

 Alright this looks perfect.

 You've done a fantastic job!

 - Cleaned up the buggy, messy av_url_split, wohoo!
 - Added support for "example.com?x" and "example.com#x" URLs in the URL
 splitter.
 - Properly removing hashmark fragments from paths when connecting to
 target URLs. That would have caused problems in the old code which kept
 them. Great catch!
 - Added support for auto-inserting "/" when connecting to target URLs that
 didn't begin their path with a slash already.

 This looks like a good implementation of RFC3986. Thank you so much for
 the great work!

 :-)

--
Ticket URL: <https://trac.ffmpeg.org/ticket/8466#comment:9>
FFmpeg <https://ffmpeg.org>
FFmpeg issue tracker


More information about the FFmpeg-trac mailing list