Another attempt at speeding up yaouh, with persistent connections

Helge Hafting helge.hafting at hist.no
Tue Mar 3 11:34:32 CET 2009


Robin Paulson wrote:
> 2009/3/3 Helge Hafting <helge.hafting at hist.no>:
>> too. It looks like openstreetmap can have up to 50 files in a directory.
>>
> ........
>> So far this program has checked 5100 files in 12 minutes, and downloaded
>> 260. It looks like it will get through all my 50.000 tiles in under 2 hours.
>> I use wifi, the usb connection may be slower.
>>
>> If anyone want to try, here is the download link.
>> http://www.aitel.hist.no/~helgehaf/openmoko/yaouh_new.py
> 
> awesome, that's lots faster
> 
> by the way, i don't know if it's important or not, but iirc osm folder
> structure has up to 128000 files per folder, for zoom 17
>
The tiles have 5-digit names, so the max would be 100000 ?

Anyway, that could be a problem. With hundred thousand tiles and about 
60 byte per URL, pythm would send a 7.5MB parameter list to a invocation 
of curl. That can easily be trimmed down to 950kB by only sending file 
names. Still, it'd be interesting to know if the freerunner can handle 
that at all. A pc usually can.

A loop can be used to limit this to something sane, like 1000 files or 
so. But do anyone actually have that many files? I just checked my
50.000 tiles, and found no more than 250 tiles in any directory. Perhaps 
a large square area at zoom 17?

Further speedups are possible. File check data for 50.000 tiles is 12MB,
the limit for such a dataset is how fast 12MB can be downloaded, and how 
fast all those files can be checksummed. And then there is the transfer 
of changed tiles. My code is inefficient when there are many small 
directories, because it starts a new transfer for each directory.

To get the best possible speed, pycurl should be used. Everything can 
then be done from inside python, avoiding process creation overhead. 
Persistent connections can be utilized for the entire transfer, 
eliminating the small-directory problem entirely.

I am not sure if there is a pycurl package yet though.
Installing python-modules dodn't bring it in.

Helge Hafting








More information about the community mailing list