Iran, a Nation of Bloggers
November 26, 2008 on 10:29 am | In english, politics | No CommentsNo Comments needed Link
“Where the Hell is Matt” the Presentation
September 24, 2008 on 11:25 am | In english, travel, web | No CommentsA short presentation from Matt at a the Oreilly Ignite. Funny and cool…
Hug a developer today
September 10, 2008 on 1:56 pm | In english, hacking, life, technic, web | No Comments
I need a hug..
Mormons
September 6, 2008 on 12:56 pm | In english, ethic | No CommentsOhh yeah. Mormons driving me crazy. Watch the stupid story of the formation.
Hashtable vs. List
January 25, 2008 on 6:47 pm | In english, hacking, python | No CommentsLike every student of computer science, I had a course with the title „algorithm and data structures“. That is one of the classic subjects in informatics. You are learn things like linked list, tree’s and so on. Also different algorithm for searching. Sequential search, binary search and top-down 2-3-4 trees. It is a nice and important subject. Today we have frameworks like Java,.NET,Python and they all have build in data structures like lists and hashtables. With these technics it is quite easy to use a list. Here is a small and nice example in Python:
list = [] for i in range(10000): rand = random.randint(0,50000) if not rand in list: list.append(rand)
Now we have a list with up to 10000 random and unique entries. Very easy to program, but you also see the performance problem. In line 4 is a check for the uniqueness of the possible new entry. To check for existence of an element in a list, the framework runs through all elements of the list. That is a sequential search, the slowest searching algorithm in this universe. This small example needs 365360 microseconds on my machine.
Now lets try a different version with a dictionary.
dict = {} for i in range(10000): rand = random.randint(0,50000) if not dict.has_key(rand): dict[rand] = 0
This example needs only 34908 microseconds. That is 10 times faster than the version with the list. It looks very stupid to program a line like “dict[rand] = 0�, but this solution now uses a different search algorithm. If you increment the loop up to 20000 entries the different is 83times. But in the near of 30000 it comes to a point where the insort algorithm of the dict structure needs so long that the speed benefit is over.
I wrote another small program to visualize the effect.
See the code

I found this problem in a small textminig project. Using dictionarys to search for existence of a word in an document was extremely faster than a list. So sometimes you should think about, when to use a list and when use a dictionary like an list.




