APC Quick-Start Braindump This is a rapidly written braindump of how APC currently works in the form of a quick-start guide to start hacking on APC. 1. Install and use APC a bit so you know what it does from the end-user's perspective. user-space functions are all explained here: 2. Grab the current APC code from CVS: cvs -d:pserver:cvsread@cvs.php.net:/repository login Password: phpfi cvs -d:pserver:cvsread@cvs.php.net:/repository co pecl/apc apc/php_apc.c has most of the code for the user-visible stuff. It is also a regular PHP extension in the sense that there are MINIT, MINFO, MSHUTDOWN, RSHUTDOWN, etc. functions. 3. Build it. cd pecl/apc phpize ./configure --enable-apcu make cp modules/apcu.so /usr/local/lib/php apachectl restart 4. Debugging Hints apachectl stop gdb /usr/bin/httpd break ?? run -X Grab the .gdbinit from the PHP source tree and have a look at the macros. 5. The basics of APCu APCu has three main component parts: 1) shared memory allocator 2) pooling 3) user land cache 5.1) APCu SMA It is a pretty standard memory allocator, now supporting third party extensions. apc_sma_malloc, apc_sma_realloc, apc_sma_strdup and apc_sma_free behave to the caller just like malloc, realloc, strdup and free, they are generated from macros in apc_sma_api.h Note: apc_sma_api.h is formatted and designed such that the SMA APCu uses can be used by third parties in their own extensions without interfering with, or consuming the resources of APCu itself apc_sma is a structure of type apc_sma_t, it is statically allocated at runtime, appropriate handlers are generated and set, and the structure made ready for initialization. MINIT then initializes apc_sma with apc_sma_api_init(). APCu SMA then takes care of mmaping the shared memory. ( which you can obtain in any compilation unit with apc_sma_api_extern(apc_sma) ) At this point, we have a completely useless 32MB chunk of memory at our disposal, before it can be used, an apc_cache_header_t is initialized at the beginning of the reigon of mmapp'ed memory. The header serves as a place to store, among other things, statistical information and a lock. Immediately after the header comes a zero sized block, immediately after that a single block the remaining size of the shared memory. At this point, the shared memory looks like this: +--------+--------+----------------------------------+ | header | 0-size | shared | +--------+--------+----------------------------------+ The blocks are just a simple offset-based linked list (so no pointers): typedef struct block_t block_t; struct block_t { size_t size; /* size of this block */ size_t next; /* offset in segment of next free block */ size_t canary; /* canary to check for memory overwrites */ #ifdef APC_SMA_DEBUG int id; /* identifier for the memory block */ #endif }; The BLOCKAT macro turns an offset into an actual address for you: #define BLOCKAT(offset) ((block_t*)((char *)shmaddr + offset)) where shmaddr = sma->shaddrs[0] And the OFFSET macro goes the other way: #define OFFSET(block) ((int)(((char*)block) - (char*)shmaddr)) Allocating a block walks through the linked list of blocks until it finds one that is >= to the requested size. The first call to allocate will hit the second block. We then chop up that block so it looks like this: +--------+-------+-------+-------------------------+ | header | block | block | block | +--------+-------+-------+-------------------------+ Then we unlink that block from the linked list so it won't show up as an available block on the next allocate. So we actually have: +--------+-------+ +-------------------------+ | header | block |------>| block | +--------+-------+ +-------------------------+ And header->avail along with block->size of the remaining large block are updated accordingly. The arrow there representing the link which now points to a block with an offset further along in the segment. When the block is freed the steps are basically just reversed. The block is put back and then the deallocate code looks at the block before and after to see if the block immediately before and after are free and if so the blocks are combined. So you never have 2 free blocks next to each other, apart from at the front with that 0-sized dummy block. This mostly prevents fragmentation. I have been toying with the idea of always allocating block at 2^n boundaries to make it more likely that they will be re-used to cut down on fragmentation further. That's what the POWER_OF_TWO_BLOCKSIZE you see in apc_sma.c is all about. 5.2) APCu Pooling Pooling serves as a means to provide operations with a context: without context, managing memory within APCu would become near impossible, and very error prone. Whenever APCu is instructed to undertake an operation that requires relinquishing the owner of some memory a struct of type apc_cache_context_t is passed, among other things, the context contains a pool, the pool provides references to handlers that are appropriate for the current operation. For example: To copy data into the shared area, APCu will require the use of allocators that return blocks from shared memory. To copy data out of the shared area and hand over ownership to PHP, normal allocators must be used. For more information about pooling, see apc_pool.h/apc_pool.c in the source distribution. 5.3) APCu Cache The caching functionality of APCu is provided by a modified version of the APC source code Some simple tweaks have been applied: Locking is written to use the best kind of locking available, and emulate it where it is not to simplify logic. Extension of the SMA to support multiple instances, such that additional caches using APCu do not increase contention of the main APCu cache. The possibility to control more finely what happens when resources become low for APCu. An exposed, coherent, and documented API and example included in the distribution. There's probably some of my blood in it, if you look real close ... The remainder of the document goes on to explain in some detail the cache itself, functionally unchanged by APCu 6. Next up is apc_cache.c which implements the cache logic. Having initialized a suitable allocator, MINIT must call apc_cache_create, using the allocator provided APCu will create a cache. The parameters to apc_cache_create for APCu are defined by various INI settings. API users can provide the same options from anywhere ( their globals for example ). The cache is stored in/described by this struct allocated locally: /* {{{ struct definition: apc_cache_t */ typedef struct _apc_cache_t { void* shmaddr; /* process (local) address of shared cache */ apc_cache_header_t* header; /* cache header (stored in SHM) */ apc_cache_slot_t** slots; /* array of cache slots (stored in SHM) */ apc_sma_t* sma; /* set during creation of cache */ apc_serializer_t* serializer; /* serializer */ zend_ulong nslots; /* number of slots in cache */ zend_ulong gc_ttl; /* maximum time on GC list for a slot */ zend_ulong ttl; /* if slot is needed and entry's access time is older than this ttl, remove it */ zend_ulong smart; /* smart parameter for gc */ zend_bool defend; /* defense parameter for runtime */ } apc_cache_t; /* }}} */ Whenever you see functions that take a 'cache' argument, this is what they take. At the beginning of the cache we have a header. The header looks like this: /* {{{ struct definition: apc_cache_header_t Any values that must be shared among processes should go in here. */ typedef struct _apc_cache_header_t { apc_lock_t lock; /* header lock */ zend_ulong nhits; /* hit count */ zend_ulong nmisses; /* miss count */ zend_ulong ninserts; /* insert count */ zend_ulong nexpunges; /* expunge count */ zend_ulong nentries; /* entry count */ zend_ulong mem_size; /* used */ time_t stime; /* start time */ volatile zend_ushort state; /* cache state */ apc_cache_key_t lastkey; /* last key inserted (not necessarily without error) */ apc_cache_slot_t* gc; /* gc list */ } apc_cache_header_t; /* }}} */ Since this is at the start of the shared memory segment, these values are accessible across all processes / threads and hence access to them has to be locked. After the header we have an array of slots. The number of slots is user-defined through the apc.entries_hint ini hint. Each slot is described by: /* {{{ struct definition: apc_cache_slot_t */ typedef struct apc_cache_slot_t apc_cache_slot_t; struct apc_cache_slot_t { apc_cache_key_t key; /* slot key */ apc_cache_entry_t* value; /* slot value */ apc_cache_slot_t* next; /* next slot in linked list */ zend_ulong nhits; /* number of hits to this slot */ time_t ctime; /* time slot was initialized */ time_t dtime; /* time slot was removed from cache */ time_t atime; /* time slot was last accessed */ }; /* }}} */ The apc_cache_slot_t *next there is a linked list to other slots that happened to hash to the same array position. apc_cache_insert() shows what happens on a new cache insert. slot = &cache->slots[zend_inline_hash_func(key, keylen) % cache->nslots]; cache->slots is our array of slots in the segment. So, on an insert we find the array position in the slots array by hashing the key provided. If there are currently no other slots there, we just create the slot and stick it into the array: *slot = make_slot(cache, key, value, *slot, t TSRMLS_CC) If there are other slots already at this position we walk the link list to get to the end. While walking the linked list we also check to see if the cache has a TTL defined. If while walking the linked list we see a slot that has expired, we remove it since we are right there looking at it. This is the only place we remove stale entries unless the shared memory segment fills up and we force a full expunge via apc_cache_expunge(). apc_cache_expunge() walks all slots attempting deletion, how deletion occurs depends on runtime parameters, see INSTALL for runtime parameter configuration details. apc_cache_find() simply hashes and returns the entry if it is there. If it is there but older than the mtime in the entry we are looking for, we delete the one that is there and return indicating we didn't find it. API users are advised to use apc_cache_fetch over find for simplicity, this ensures correct operation, fetch sets up the call to find and takes care of copying and releasing the entry from the cache to a zval* provided. Next we need to understand what an actual cache entry looks like. Have a look at apc_cache.h for the structs. Here is the definition of apc_cache_key_t: /* {{{ struct definition: apc_cache_key_t */ typedef struct _apc_cache_key_t { const char *str; /* pointer to constant string key */ zend_uint len; /* length of data at str */ zend_ulong h; /* pre-computed hash of key */ time_t mtime; /* the mtime of this cached entry */ apc_cache_owner_t owner; /* the context that created this key */ } apc_cache_key_t; /* }}} */ To create a apc_cache_key_t structure, call apc_cache_make_key(), see apc_cache.h Ok, on to the actual cache entry, here is the definition of apc_cache_entry_t: /* {{{ struct definition: apc_cache_entry_t */ typedef struct _apc_cache_entry_t { zval *val; /* the zval copied at store time */ zend_uint ttl; /* the ttl on this specific entry */ int ref_count; /* the reference count of this entry */ size_t mem_size; /* memory used */ apc_pool *pool; /* pool which allocated the value */ } apc_cache_entry_t; /* }}} */ To create an apc_cache_entry_t, call apc_cache_make_entry(), see apc_cache.h Any of the structures taken by apc_cache_* functions have their equivalent apc_cache_make_* If an insertion of an entry should fail, it falls to the caller of insert to free the pooled resources used to create the entry. If you made it to the end of this, you should have a pretty good idea of where things are in the code. There is much more reading to do in headers ... good luck ...