Fuzzy phone number matching

Joachim Breitner mail at joachim-breitner.de
Wed Jul 16 19:48:33 CEST 2008


Am Mittwoch, den 16.07.2008, 17:43 +0100 schrieb Chris Lord:
> On Wed, 2008-07-16 at 18:04 +0200, Joachim Breitner wrote:
> > Hi,
> > 
> > Am Dienstag, den 15.07.2008, 23:20 +0200 schrieb Jay Vaughan:
> > > > What would be a good way to tackle this issue? Anything else but tell
> > > > our users always to write +49172..., even when dialing by hand?
> > >
> > > Match the phone numbers backwards from right to left instead of left  
> > > to right, and throw away 'whitespace' which would be defined as  
> > > 'anything that isn't a number'.
> > 
> > thx, it’s simlar to what I’m doing already in my openmoko-messages2
> > patches. My question was rather: Is this the right way to fix it? There
> > are a lot of places where the comparison of phone numbers happens, down
> > into evolution-data-server, so I’m wondering if there is consensus that
> > this is the right way.
> Unfortuantely, this is one of the things I didn't get round to doing
> while working on openmoko-messages2/phonekit/etc. It'd be good if there
> was a small static library for handling phone-number normalisation, so
> this code isn't reimplemented in different ways all over the place.

Some code is already present in my patches. Remaining issue with the
normalization is that currently, the country prefix (+49) is hardcoded.
I see two possibilities:

* Every user of this code needs to figure out what prefix to use.
* We never do normalization, but only provide a comparison function (and
a hash function compatible with this comparison). This way, we never
choose the “wrong” prefix, but we might match a non-fully-qualified
number to a equal looking number in a different country (a very rare
case, IMHO).

> With regards to how this should be done with eds-dbus, there are a
> number of ways to go about it... A feasible way would be to store the
> normalised number primarily and the entered number as secondary/custom
> field. When searching, search for the normalised number, then in the
> case of multiple results, use some kind of heuristic to retrieve the
> desired contact (or if you're lazy, just pick the first, the likelihood
> of multiple matches is pretty low :))

Well, the main reason I want human readable numbers in eds is that it
can be exactly the same data as on my laptop’s evolution, so this does
not sound like a good idea.

Maybe adding a (match-number number1 number2) operator to the query
s-exp parser in e-d-s is not too bad after all. It’s a local change that
does not break anything that doesn’t use this feature, and then we could
use the feature in libjana.

Do we currently use eds unmodified, or do we already have some patches?
(I wouldn’t ask if I knew where to look :-)

> All this code would go in the aforementioned static library, to be used
> by any application that's attempting to handle storage/searching of
> phone numbers.

Sounds sensible. I’d be interested in implementing this, but who should
I then talk to when it comes to integrating the patches? And what code
base should I make the changes again? (I’m new to Openmoko, and so far I
haven’t found the equivalent of "apt-get source" in Debian, which gives
me the exact source package for the current binary package :-))


Joachim Breitner
  e-Mail: mail at joachim-breitner.de
  Homepage: http://www.joachim-breitner.de
  ICQ#: 74513189
  Jabber-ID: nomeata at joachim-breitner.de
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: Dies ist ein digital signierter Nachrichtenteil
Url : http://lists.openmoko.org/pipermail/devel/attachments/20080716/4a6926d0/attachment.pgp 

More information about the devel mailing list