POC: stringcompare and normalization to find numbers in contacts

Joerg Reisenweber joerg at openmoko.org
Tue May 19 03:42:11 CEST 2009


Am Mo  18. Mai 2009 schrieb Joerg Reisenweber:
> normalized number = "+<CC><HC><NUMBER>"
> non-normalized my be: <NUMBER> | <NationalPrefix><anyHC><NUMBER> | 
> <InterntlPrefix><anyCC><anyHC><NUMBER>
> 
> so: 
> Nr1 := Nr s/<IP>(.*)/\+$1/           #00 49 911 12345 -> + 49 911 12345
> Nr2 = Nr1 s/<NP>/\+<CC>/             # 0 911 12345 ->  + 49 911 12345
> Nr3 = Nr2 s/[^\+](.*)/\+<CC><HC>$1/  # 12345 ->  + 49 911 12345
> step3 above isn't considered very practical for cellphones
> <IP>, <CC>, <NP>, and <HC> are user definable config-values
> 
> apply normalization to both numbers prior to strcmp() *only* - no mangeling 
of 
> numbers on storing, dialing, display to user
> 
> nota bene: inbound call numbers are considered fully normalized (per GSM 
> definition). If you like to cope with non-standards-conforming inbound 
> numbers, you need a means to acquire CC' of the network currently in use, 
and 
> use this CC' instead of CC for normalizing the inbound number.
> 
> a good idea might be to cut all spaces as well, and also truncate any 
> leading/trailing netcode sequences like *31#+49 911 12345 W;1**3 (truncate 
> from left side and from right side all *short* sequences including any 
> non-numeric+"+" delimeter, this is "*31#" (excluding the "+"!) for left, 
> and "W;1**3" for right. Admittedly this part is tricky. 
> Left side truncation is mandatory, right side a nice-to-have)
> 
> cheers
> jOERG
> 

jr at halley:~> . bin/normalizelib
jr at halley:~> # set InternationalPrefix, NationalPrefix, CountryCode, AreaCode
jr at halley:~> IP=00
jr at halley:~> NP=0
jr at halley:~> CC=49
jr at halley:~> AC=30
jr at halley:~> normalize "*31#0049 (0) 30 12300 - 456W*1200*#"
+493012300456
jr at halley:~>normalize "0049 30 123 00 456"
+493012300456
jr at halley:~> normalize "030 123 00 456;foo"
+493012300456
jr at halley:~> normalize "123 00 456"
+493012300456
jr at halley:~> 
jr at halley:~> cat bin/normalizelib
normalize(){
 echo "${1}" | sed "s/[wWpP\;\,].*//; s/[ -]//g; s/\([0-9]\)
(${NP})\([0-9]\)/\1\2/; s/.*[^0-9+]//; s/^${IP}/\+/; s/^${NP}/\+${CC}/; 
s/^\([^\+]\)/\+${CC}${AC}\1/"
}jr at halley:~>


Details:
s/[wWpP\;\,].*//; == cut trailing crap, actually only a few valid delimiters
s/[ -]//g; == remove filler chars, may add others here. User-config?
s/\([0-9]\)(${NP})\([0-9]\)/\1\2/; == remove stupid intersparsed NP
s/.*[^0-9+]//; == remove leading netcodes
s/^${IP}/\+/; == substitute IP with "+"
s/^${NP}/\+${CC}/; == substitute NP with "+"<CC>
s/^\([^\+]\)/\+${CC}${AC}\1/" == process local numbers (no "+", IP, NP)


An improved version will follow eventually, maybe also a "testsuite"

cheers
jOERG

-------------- next part --------------
normalize(){
 echo "${1}" | sed "s/[wWpP\;\,].*//; s/[ -]//g; s/\([0-9]\)(${NP})\([0-9]\)/\1\2/; s/.*[^0-9+]//; s/^${IP}/\+/; s/^${NP}/\+${CC}/; s/^\([^\+]\)/\+${CC}${AC}\1/"
}
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part.
Url : http://lists.openmoko.org/pipermail/devel/attachments/20090519/fc8247c5/attachment.pgp 


More information about the devel mailing list