Pgpool-II has query caching functionality using storage provided by dedicated PostgreSQL ("system database"). This has several drawbacks however.
1)it is slow because it needs to access disk storage.
2)it does not invalidate the cache automatically.
This project tries to solve these problems.
- To speed up the cache access, it will be placed on memory, rather than database. The memory will be put on shared memory or external memory services such as memcached so that the cache can be shared by multiple sessions. Old cache entries will be deleted by LRU manner.
- The cache will be invalidated automatically upon the timing when the relevant tables are updated. Note that this is not always possible because the query result might come from multiple tables, views or even functions. In this case the cache will be invalidated by timeout(or they are not cached at all).
- Fine tuning knobs need to be invented to control the cache behavior though they are not clear at this moment.
external specification(for user)
On memory query cache allows you to speed-up database access by reusing SELECT results.
- The cache can be stored in either shared memory or memcached (not both).
- The result of SELECT using none IMMUTABLE functions is not cached.
- If the size of SELECT result is larger than the value specified in pgpool.conf, it is not cached.
- Cache will be removed if one of following conditions are met:
- Cache expiration time exceeds
- If cache is full, the least recently used cache is removed
- If a table is dropped or modified, related cache data is deleted.
- If a table definition is modified(ALTER TABLE), related cache data is deleted.
- If a schema or database is dropped, related cache data is deleted.
- To enable the memory cache functionality, set this to on (default is off)
memory_cache_enabled = on/off
- Memory cache behavior can be specified by cache_param directive.
memqcache_method = 'shmem' *1 memqcache_memcached_host = 'foo.com' *2 memqcache_memcached_port = 11211 *3 memqcache_total_size = 10240 *4 memqcache_expire = 60 *5 memqcache_maxcache = 512 *6 memqcache_cache_block_size= 8192 *7
1) Cache store method. Either "shmem"(shared memory) or "memcached". Default is shmem.
2) Memcached host name. Mandatory if memqcache_method=memcached. If this parameter is starting with "/", UNIX domain socket is assumed.
3) Memcached port number. Mondatory if memqcache_method=memcached.
4) Total memory size in bytes for storing memory cache. Mandatory if memqcache_method=shmem.
5) Memory cache entry life time specified in seconds. Default is 60.
6) Maximum SELECT result size in bytes. If exceeds this value, the result is not cached.
7) Cache block size in bytes
Only meaning full when memqcache_method=shmem. Memory cache is devided into fixed size of block.
If the block size is small, memory usage is efficient, but processing might be slow. On the other hand,
if the block size is large, memory usage is not efficient but processing might be fast. Default is 8192.
The schedule for this project is as below.
- Preparation (April 11 - April 30)
- I am going to start this project at April 11.
- Decide specification in detail so as to create modules of caching query results in pgpool-II.
- I want to get in contact with this project's mentor and talk them as to my project schedule and technology.
Implement (May 1 - June 19) - I will start to implemente program at May 1,
and intend to get through phase of an implementation before June 20.
- Implement (May 1 - June 26)
June 14 - 19 coding : The shmem cache. set, search or remove an data in shared memory.
June 20 - 26 coding : The cache invalidation part.
- Software testing (June 20 - July 22)
- My implementation might be insuffecient. Therefore, I will try to
do software testing, and reduce bug.
- Writing documents (July 23 - August 12)
- I plan to complete phase of software testing
before July 23, and write documents for user.