[FFmpeg-devel] [PATCH] WIP/lavu: add string interpolation helper.

Clément Bœsch ubitux at gmail.com
Wed Feb 1 23:02:19 CET 2012


On Wed, Feb 01, 2012 at 07:35:53PM +0100, Nicolas George wrote:
> Le tridi 13 pluviôse, an CCXX, Clément Bœsch a écrit :
> > From: Clément Bœsch <clement.boesch at smartjog.com>
> > 
> > ---
> > Hi,
> > 
> > Just a PoC to do some string interpolation easily. This can have various
> > applications, for instance in segmenter, the -report option or drawtext filter.
> > 
> > Right now, the hash function is just a simple additive one (feel free to
> > propose anything better), with a fixed size hashtable + linked list in case of
> > collisions.
> 
> If the hash table does not allow deletion, then double hashing is probably a
> better solution: try cell H; if it is already used, try (H+H2)%SIZE, the
> (H+2*H2)%SIZE, (H+3*H2)%SIZE, etc., with H2 a second hash function. Double
> the table size (and re-insert all items) if it becomes more than 3/4 full.
> 

I'm not sure I want to omit the deletion feature. I'm not familiar at all
with hash tables (this is actually the first time I was implementing one),
so I was just copying around various hash functions, like the new proposed
one which is supposedly more appropriate with simple ASCII key.

[...]
> > +static uint32_t get_hash(const char *key, size_t len, int tbl_len)
> > +{
> > +    size_t i;
> > +    uint32_t hash = (uint32_t)len;
> > +    for (i = 0; i < len; i++)
> > +        hash += key[i];
> 
> Some random multiplication would probably yield better results: with that
> function "xy" and "yx" will always collide.
> 
> > +    return hash % tbl_len;
> > +}
> > +
> > +static struct node *new_node(const char *key, const char *value)
> > +{
> > +    struct node *node = av_mallocz(sizeof(*node));
> 
> Unchecked allocation.
> 
> And I really do not like those kind of small allocations: the amount of
> overhead is tremendous. Better alloc a bunch of them at a time and consume
> them as needed.
> 
> (But we do not need the allocation with double hashing.)
> 

I'm now curious about the double hashing method, so I'll have a look.

> > +    node->key   = key;
> > +    node->value = value;
> > +    return node;
> > +}
> > +
> > +int av_stringinterp_ref(AVStringInter *ctx, const char *key, const char *value)
> > +{
> > +    int id = get_hash(key, strlen(key), FF_ARRAY_ELEMS(ctx->hashtable));
> > +    struct node *node = ctx->hashtable[id];
> > +    struct node *insert_after = NULL;
> 
> 	struct node **insert_at = &ctx->hashtable[id];
> 
> > +
> > +    while (node) {
> > +        if (!strcmp(key, node->key)) {
> > +            node->value = value;
> > +            return 0;
> > +        }
> > +        insert_after = node;
> 
> 	  insert_at = &node->next;
> 
> > +        node = node->next;
> > +    }
> > +    if (insert_after)
> > +        insert_after->next = new_node(key, value);
> > +    else
> > +        ctx->hashtable[id] = new_node(key, value);
> 
>       *insert_at = new_node(key, value);
> 
> Classic trick, often much simpler when writing linked lists.
> 

Yes sure good idea.

> > +    return 0;
> > +}
> > +
> > +static const char *get_value(AVStringInter *ctx, const char *key)
> > +{
> > +    int id;
> > +    size_t keylen;
> > +    const char *p = strchr(key, ')');
> 
> I do not find this part very elegant: I would like it better if the string
> -> string mapping API were clearly separated.
> 
> OTOH, using (pointer, length) pairs (or (start_pointer, end_pointer)) rather
> than NUL-terminated strings is the way to go to avoid unnecessary copies.
> 

I wasn't sure about extracting my poor hash implementation from the string
mapping feature; also the chosen hashing method could be chosen more
appropriately that way (instead of a generic one). Of course, this is just
a PoC I wrote this morning, I didn't wanted to make a huge addition to the
API just to present the feature.

> > +    struct node *node;
> > +
> > +    if (!p)
> > +        return NULL;
> > +    keylen = p - key;
> > +    id = get_hash(key, keylen, FF_ARRAY_ELEMS(ctx->hashtable));
> > +    for (node = ctx->hashtable[id]; node; node = node->next)
> > +        if (!strncmp(key, node->key, keylen))
> > +            return node->value;
> > +    return NULL;
> > +}
> > +
> > +char *av_stringinterp_interpolate(AVStringInter *ctx, const char *s)
> 
> The name is self-redundant.
> 

I wanted to keep "av_stringinterp_" prefix, not sure how it should
be renamed…

> > +{
> > +    char res[256]; // FIXME
> > +    char *p = res;
> > +    char *pend = p + sizeof(res);
> > +
> > +    while (*s) {
> > +        if (*s == '%' && *(s+1) == '(') {
> 
> Do you have a reason to favor %(foo) like... no language I know of, instead
> of maybe ${foo} like shell and perl?
> 

It was "inspired" from the python dict matching format which allows such
thing:
  data = {"foo": "bla", "bar": 12}
  "%(foo)s %(bar)d" % data
==>
  "bla - 12"

Obviously, it can be changed to something else, but it has the advantage
of not needing much shell escaping.

> > +            const char *value = get_value(ctx, s + 2);
> > +            if (value) {
> > +                p += snprintf(p, pend-p, "%s", value);
> 
> If p goes beyond pend, (size_t)(pend-p) becomes huge. pend - FFMIN(pend, p)
> should do the trick.
> 

This really was a quick hack, as imply the FIXME on the buffer. I actually
changed it pretty soon after sending the patch.

> > +                s  = strchr(s, ')') + 1;
> 
> Segfault if the user gives a string without the matching ')'.
> 

It's in the "if (value)" scope, so it can't fail AFAIK.

> > +                continue;
> > +            }
> > +        }
> > +        p += snprintf(p, pend-p, "%c", *s);
> 
> Maybe we can dispense with the printf for just a char.
> 

That was intended to print until '%' at first, and also have an easy way
to check for boundaries. This is changed to something better.

> > +        s++;
> > +    }
> > +    return av_strdup(res);
> 
> For this kind of situations, I like to have two runs of exactly the same
> code: once computing the required space but not writing anything, and a
> second time actually writing in the buffer with exactly the necessary size.
> 

I changed that by keeping the buffer in the context; it has also drawbacks
that could be discussed though.

> But whatever the solution, I believe we could do with a set of functions
> av_buf_printf(buf, fmt, ...).
> 

I'll look at what you proposed and update or just drop the patch
accordingly.

[...]
> > +/**
> > + * @file
> > + * string interpolation header
> > + */
> > +
> > +#ifndef AVUTIL_STRINGINTERP_H
> > +#define AVUTIL_STRINGINTERP_H
> > +
> > +typedef struct AVStringInter AVStringInter;
> > +
> > +AVStringInter *av_stringinterp_alloc(void);
> > +void av_stringinterp_free(AVStringInter **ctx);
> > +int av_stringinterp_ref(AVStringInter *ctx, const char *key, const char *value);
> > +char *av_stringinterp_interpolate(AVStringInter *ctx, const char *s);
> 
> Some doxy about what it is about would be nice.
> 

'Added some when I was satisfied :)

> > +
> > +#endif/* AVUTIL_STRINGINTERP_H */
> > -- 
> > 1.7.8.3
> 
> On the whole, I believe you are trying to design a solution without a
> problem, and that is usually not a good sign.
> 
> The credo around here is to only add functions to lavu when there is an use
> case in the rest of the code. This is not a very efficient solution, and it
> tends to force the code in small local optima because no one wants to anneal
> (as in "simulated annealing") ffmpeg, but at least this is a consistent
> policy and it avoids having useless code.
> 

I should have proposed some real application with the patch, sorry about
that.

> The other possible policy is the one for glib (which is for Gtk+ the
> equivalent of lavu for lavc/lavf): aim for very a very generic API, covering
> any foreseeable possible use.
> 
> The practical problem I see with your proposal is that it seems to be too
> specific, and will not be easily usable for a lot of situations. For
> example, let us say we want to use it to process the string for the drawtext
> filter: how do we expand %(pts)? raw integer? float with six digits?
> rational? hh:mm:ss.mmm? And do we want to have to stringify each possible
> variable for each frame, even if the user only wants to add their initials?
> 

This was actually one of the reasons I limited the scope to strings; I
know we have various printing helpers around for printing that kind of
special variables like pts, so I was naively thinking the user will just
format it at will with intermediate buffer when mapping the value (I
wasn't sure about duplicating key/value in the tree BTW).

> 
> May I suggest to redesign the API that way: ?
> 

I'm open to any kind of proposition, this was just a PoC/WIP

> - A set of av_bsprintf functions: like asprintf, but into a buffer that is
>   grown as necessary.
> 
> - Instead of a hash table, av_stringinterp_interpolate accepts a function:
> 
>   void get_value(opaque_pointer, target_buffer, var_name, var_flags);
> 
>   Of course, someone who wants to interpolate from a hash table can just
>   write a get_value function that reads the hash table; we can provide a
>   stock one.
> 
> - Accept to interpolate ${var:flags}, for example ${pts:%M:%S.%3}.
> 

Additional customized layout for some variables? The main point was to
allow stuff like:
  ffmpeg -i ... -o '%(meta_artist) - %(meta_title).mp3'
or:
  ffmpeg -i ... -segment_list_format '%(pts).%(base_filename)_%(segment_number).%(ext)'

so I'm not sure there is really a need for such thing, but are you
thinking of any other particular application?

Anyway, I'm attaching an improved version from the first draft (verbatim,
it was actually like that before your review) just to illustrate what I
was saying about buffering and stuff. Don't take it for more than it
really is (a PoC), and thank you for your great feedback.

-- 
Clément B.
-------------- next part --------------
From 65f5c85ed9416d4c71f5df78d272ec72b5beffc4 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Cl=C3=A9ment=20B=C5=93sch?= <clement.boesch at smartjog.com>
Date: Wed, 1 Feb 2012 10:29:59 +0100
Subject: [PATCH] lavu: add string interpolation helper.

---
 libavutil/Makefile       |    2 +-
 libavutil/stringinterp.c |  199 ++++++++++++++++++++++++++++++++++++++++++++++
 libavutil/stringinterp.h |   60 ++++++++++++++
 3 files changed, 260 insertions(+), 1 deletions(-)
 create mode 100644 libavutil/stringinterp.c
 create mode 100644 libavutil/stringinterp.h

diff --git a/libavutil/Makefile b/libavutil/Makefile
index 2cd763c..da3fe7b 100644
--- a/libavutil/Makefile
+++ b/libavutil/Makefile
@@ -80,7 +80,7 @@ OBJS-$(ARCH_X86) += x86/cpu.o
 
 
 TESTPROGS = adler32 aes avstring base64 cpu crc des eval file fifo lfg lls \
-            md5 opt pca parseutils rational sha tree
+            md5 opt pca parseutils rational sha tree stringinterp
 TESTPROGS-$(HAVE_LZO1X_999_COMPRESS) += lzo
 
 TOOLS = ffeval
diff --git a/libavutil/stringinterp.c b/libavutil/stringinterp.c
new file mode 100644
index 0000000..e8cb572
--- /dev/null
+++ b/libavutil/stringinterp.c
@@ -0,0 +1,199 @@
+/*
+ * Copyright (c) 2012 Clément Bœsch
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+/**
+ * @file
+ * string interpolation
+ */
+
+#include "avutil.h"
+#include "avstring.h"
+#include "stringinterp.h"
+
+struct node {
+    const char *key;
+    const char *value;
+    struct node *next;
+};
+
+typedef struct AVStringInterp {
+    struct node *hashtable[512];
+    char *buf;
+    size_t buf_size;
+} AVStringInterp;
+
+AVStringInterp *av_stringinterp_alloc(void)
+{
+    AVStringInterp *ctx = av_mallocz(sizeof(*ctx));
+    ctx->buf = av_malloc(128);
+    ctx->buf_size = 128;
+    return ctx;
+}
+
+void av_stringinterp_free(AVStringInterp **ctx)
+{
+    int i;
+    AVStringInterp *s = *ctx;
+    for (i = 0; i < FF_ARRAY_ELEMS(s->hashtable); i++) {
+        struct node *tmp, *node = s->hashtable[i];
+        while (node) {
+            tmp = node;
+            node = node->next;
+            av_freep(&tmp);
+        }
+    }
+    av_freep(&s->buf);
+    av_freep(ctx);
+}
+
+static uint32_t get_hash(const char *key, size_t len, int tbl_len)
+{
+    size_t i;
+    uint32_t hash = 0;
+    for (i = 0; i < len; i++)
+        hash = (hash<<5) ^ (hash>>27) ^ key[i];
+    return hash % tbl_len;
+}
+
+static struct node *new_node(const char *key, const char *value)
+{
+    struct node *node = av_mallocz(sizeof(*node));
+    if (!node)
+        return NULL;
+    node->key   = key;
+    node->value = value;
+    return node;
+}
+
+int av_stringinterp_map(AVStringInterp *ctx, const char *key, const char *value)
+{
+    int id = get_hash(key, strlen(key), FF_ARRAY_ELEMS(ctx->hashtable));
+    struct node *node = ctx->hashtable[id];
+    struct node *insert_after = NULL;
+
+    while (node) {
+        if (!strcmp(key, node->key)) {
+            node->value = value;
+            return 0;
+        }
+        insert_after = node;
+        node = node->next;
+    }
+    node = new_node(key, value);
+    if (!node)
+        return AVERROR(ENOMEM);
+    if (insert_after)
+        insert_after->next = node;
+    else
+        ctx->hashtable[id] = node;
+    return 0;
+}
+
+static const char *get_value(AVStringInterp *ctx, const char *key)
+{
+    int id;
+    size_t keylen;
+    const char *p = strchr(key, ')');
+    struct node *node;
+
+    if (!p)
+        return NULL;
+    keylen = p - key;
+    id = get_hash(key, keylen, FF_ARRAY_ELEMS(ctx->hashtable));
+    for (node = ctx->hashtable[id]; node; node = node->next)
+        if (!strncmp(key, node->key, keylen))
+            return node->value;
+    return NULL;
+}
+
+#define APPEND_STRING(str, slen) do {               \
+    size_t new_size = pos + slen + 1;               \
+    if (new_size > ctx->buf_size) {                 \
+        char *ptr = av_realloc(ctx->buf, new_size); \
+        if (!ptr)                                   \
+            return NULL;                            \
+        ctx->buf      = ptr;                        \
+        ctx->buf_size = new_size;                   \
+    }                                               \
+    av_strlcpy(ctx->buf + pos, str, slen + 1);      \
+    pos += slen;                                    \
+} while (0)
+
+char *av_stringinterp_interpolate(AVStringInterp *ctx, const char *s)
+{
+    const char *p;
+    size_t pos = 0;
+
+    while (*s) {
+        if (s[0] == '%' && s[1] == '(') {
+            const char *value = get_value(ctx, s + 2);
+            if (value) {
+                APPEND_STRING(value, strlen(value));
+                s = strchr(s, ')') + 1;
+                continue;
+            }
+        }
+        p = strchr(s + 1, '%');
+        if (!p) {
+            APPEND_STRING(s, strlen(s));
+            break;
+        }
+        APPEND_STRING(s, p - s);
+        s = p;
+    }
+    return ctx->buf;
+}
+
+#ifdef TEST
+#undef printf
+static void dump_table(AVStringInterp *ctx)
+{
+    int i;
+    printf("Hashtable dump:\n");
+    for (i = 0; i < FF_ARRAY_ELEMS(ctx->hashtable); i++) {
+        struct node *node = ctx->hashtable[i];
+        if (!node)
+            continue;
+        printf("  [%02d/%02d]:", i, FF_ARRAY_ELEMS(ctx->hashtable));
+        while (node) {
+            printf(" \"%s\"=>\"%s\"", node->key, node->value);
+            node = node->next;
+        }
+        printf("\n");
+    }
+}
+
+int main(int ac, char **av)
+{
+    AVStringInterp *si = av_stringinterp_alloc();
+
+    av_stringinterp_map(si, "ts1",      "0.4123");
+    av_stringinterp_map(si, "ts2",      "54.153");
+    av_stringinterp_map(si, "filename", "FooBar");
+    av_stringinterp_map(si, "ext",      ".mpg");
+    av_stringinterp_map(si, "ext",      ".mp4");
+
+    dump_table(si);
+
+    printf("%s\n", av_stringinterp_interpolate(si, "%(filename)_%(ts1)-%(ts2)_%(nokey)%(ext)"));
+    av_stringinterp_free(&si);
+    return 0;
+}
+#endif
diff --git a/libavutil/stringinterp.h b/libavutil/stringinterp.h
new file mode 100644
index 0000000..c23d589
--- /dev/null
+++ b/libavutil/stringinterp.h
@@ -0,0 +1,60 @@
+/*
+ * Copyright (c) 2012 Clément Bœsch
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+/**
+ * @file
+ * string interpolation header
+ */
+
+#ifndef AVUTIL_STRINGINTERP_H
+#define AVUTIL_STRINGINTERP_H
+
+typedef struct AVStringInterp AVStringInterp;
+
+/**
+ * Allocate a new string interpolation context.
+ */
+AVStringInterp *av_stringinterp_alloc(void);
+
+/**
+ * Free a AVStringInterp context previously created with
+ * av_stringinterp_alloc().
+ */
+void av_stringinterp_free(AVStringInterp **ctx);
+
+/**
+ * Map a key string with a value string.
+ *
+ * If the key already exists, the value is replaced.
+ * @return 0 in case of success, a negative AVERROR code otherwise
+ * @note key and value are not duplicated, must keep a reference to them.
+ */
+int av_stringinterp_map(AVStringInterp *ctx, const char *key, const char *value);
+
+/**
+ * Do the string interpolation.
+ *
+ * @return a pointer to the printable string, or NULL in case of failure.
+ * @note the returned buffer is kept in the AVStringInterp context, you must
+ *       not free it.
+ */
+char *av_stringinterp_interpolate(AVStringInterp *ctx, const char *s);
+
+#endif/* AVUTIL_STRINGINTERP_H */
-- 
1.7.9

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 490 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20120201/ae93900a/attachment.asc>


More information about the ffmpeg-devel mailing list