注册 登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

gmd20的个人空间

// 编程和生活

 
 
 

日志

 
 

Concurrency in the Kernel 内核的并发控制?  

2009-03-31 21:57:53|  分类: linux相关 |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |

Essential Linux Device Drivers》一书关于内核同步的章节,是我见过的写的最清楚的,真是好书啊,其他的也都写的非常好。本人转载仅仅是为了便于以后查看方便^_^

Concurrency in the Kernel

With the arrival of multicore laptops, Symmetric Multi Processing (SMP) is no longer confined to the realm of hi-tech users. SMP and kernel preemption are scenarios that generate multiple threads of execution. These threads can simultaneously operate on shared kernel data structures. Because of this, accesses to such data structures have to be serialized.

Let's discuss the basics of protecting shared kernel resources from concurrent access. We start with a simple example and gradually introduce complexities such as interrupts, kernel preemption, and SMP.

Spinlocks and Mutexes

A code area that accesses shared resources is called a critical section. Spinlocks and mutexes (short for mutual exclusion) are the two basic mechanisms used to protect critical sections in the kernel. Let's look at each in turn.

A spinlock ensures that only a single thread enters a critical section at a time. Any other thread that desires to enter the critical section has to remain spinning at the door until the first thread exits. Note that we use the term thread to refer to a thread of execution, rather than a kernel thread.

The basic usage of spinlocks is as follows:

#include <linux/spinlock.h>spinlock_t mylock = SPIN_LOCK_UNLOCKED; /* Initialize *//* Acquire the spinlock. This is inexpensive if there * is no one inside the critical section. In the face of * contention, spinlock() has to busy-wait. */spin_lock(&mylock);/* ... Critical Section code ... */spin_unlock(&mylock); /* Release the lock */

In contrast to spinlocks that put threads into a spin if they attempt to enter a busy critical section, mutexes put contending threads to sleep until it's their turn to occupy the critical section. Because it's a bad thing to consume processor cycles to spin, mutexes are more suitable than spinlocks to protect critical sections when the estimated wait time is long. In mutex terms, anything more than two context switches is considered long, because a mutex has to switch out the contending thread to sleep, and switch it back in when it's time to wake it up.

In many cases, therefore, it's easy to decide whether to use a spinlock or a mutex:

  • If the critical section needs to sleep, you have no choice but to use a mutex. It's illegal to schedule, preempt, or sleep on a wait queue after acquiring a spinlock.

  • Because mutexes put the calling thread to sleep in the face of contention, you have no choice but to use spinlocks inside interrupt handlers. (You will learn more about the constraints of the interrupt context in Chapter 4.)

Basic mutex usage is as follows:

#include <linux/mutex.h>/* Statically declare a mutex. To dynamically   create a mutex, use mutex_init() */static DEFINE_MUTEX(mymutex);/* Acquire the mutex. This is inexpensive if there * is no one inside the critical section. In the face of * contention, mutex_lock() puts the calling thread to sleep. */mutex_lock(&mymutex);/* ... Critical Section code ... */mutex_unlock(&mymutex);      /* Release the mutex */

To illustrate the use of concurrency protection, let's start with a critical section that is present only in process context and gradually introduce complexities in the following order:

  1. Critical section present only in process context on a Uniprocessor (UP) box running a nonpreemptible kernel.

  2. Critical section present in process and interrupt contexts on a UP machine running a nonpreemptible kernel.

  3. Critical section present in process and interrupt contexts on a UP machine running a preemptible kernel.

  4. Critical section present in process and interrupt contexts on an SMP machine running a preemptible kernel.

The Old Semaphore Interface

The mutex interface, which replaces the older semaphore interface, originated in the –rt tree and was merged into the mainline with the 2.6.16 kernel release. The semaphore interface is still around, however. Basic usage of the semaphore interface is as follows:

#include <asm/semaphore.h>  /* Architecture dependent                               header *//* Statically declare a semaphore. To dynamically   create a semaphore, use init_MUTEX() */static DECLARE_MUTEX(mysem);down(&mysem);    /* Acquire the semaphore *//* ... Critical Section code ... */up(&mysem);      /* Release the semaphore */

Semaphores can be configured to allow a predetermined number of threads into the critical section simultaneously. However, semaphores that permit more than a single holder at a time are rarely used.


Case 1: Process Context, UP Machine, No Preemption

This is the simplest case and needs no locking, so we won't discuss this further.

Case 2: Process and Interrupt Contexts, UP Machine, No Preemption

In this case, you need to disable only interrupts to protect the critical region. To see why, assume that A and B are process context threads, and C is an interrupt context thread, all vying to enter the same critical section, as shown in Figure 2.4.

Figure 2.4. Process and interrupt context threads inside a critical section.

Concurrency in the Kernel 内核的并发控制? - widebright - widebright的个人空间


Because Thread C is executing in interrupt context and always runs to completion before yielding to Thread A or Thread B, it need not worry about protection. Thread A, for its part, need not be concerned about Thread B (and vice versa) because the kernel is not preemptible. Thus, Thread A and Thread B need to guard against only the possibility of Thread C stomping through the critical section while they are inside the same section. They achieve this by disabling interrupts prior to entering the critical section:

Point A:  local_irq_disable();  /* Disable Interrupts in local CPU */  /* ... Critical Section ...  */  local_irq_enable();   /* Enable Interrupts in local CPU */

However, if interrupts were already disabled when execution reached Point A, local_irq_enable() creates the unpleasant side effect of reenabling interrupts, rather than restoring interrupt state. This can be fixed as follows:

unsigned long flags;Point A:  local_irq_save(flags);     /* Disable Interrupts */  /* ... Critical Section ... */  local_irq_restore(flags);  /* Restore state to what                                it was at Point A */

This works correctly irrespective of the interrupt state at Point A.

Case 3: Process and Interrupt Contexts, UP Machine, Preemption

If preemption is enabled, mere disabling of interrupts won't protect your critical region from being trampled over. There is the possibility of multiple threads simultaneously entering the critical section in process context. Referring back to Figure 2.4 in this scenario, Thread A and Thread B now need to protect themselves against each other in addition to guarding against Thread C. The solution apparently, is to disable kernel preemption before the start of the critical section and reenable it at the end, in addition to disabling/reenabling interrupts. For this, Thread A and Thread B use the irq variant of spinlocks:

unsigned long flags;Point A:  /* Save interrupt state.   * Disable interrupts - this implicitly disables preemption */  spin_lock_irqsave(&mylock, flags);  /* ... Critical Section ... */  /* Restore interrupt state to what it was at Point A */  spin_unlock_irqrestore(&mylock, flags);

Preemption state need not be explicitly restored to what it was at Point A because the kernel internally does that for you via a variable called the preemption counter. The counter gets incremented whenever preemption is disabled (using preempt_disable()) and gets decremented whenever preemption is enabled (using preempt_enable()). Preemption kicks in only if the counter value is zero.

Case 4: Process and Interrupt Contexts, SMP Machine, Preemption

Let's now assume that the critical section executes on an SMP machine. Your kernel has been configured with CONFIG_SMP and CONFIG_PREEMPT turned on.

In the scenarios discussed this far, spinlock primitives have done little more than enable/disable preemption and interrupts. The actual locking functionality has been compiled away. In the presence of SMP, the locking logic gets compiled in, and the spinlock primitives are rendered SMP-safe. The SMP-enabled semantics is as follows:

unsigned long flags;Point A:  /*    - Save interrupt state on the local CPU    - Disable interrupts on the local CPU. This implicitly disables      preemption.    - Lock the section to regulate access by other CPUs   */  spin_lock_irqsave(&mylock, flags);  /* ... Critical Section ... */  /*    - Restore interrupt state and preemption to what it      was at Point A for the local CPU    - Release the lock   */  spin_unlock_irqrestore(&mylock, flags);

On SMP systems, only interrupts on the local CPU are disabled when a spinlock is acquired. So, a process context thread (say Thread A in Figure 2.4) might be running on one CPU, while an interrupt handler (say Thread C in Figure 2.4) is executing on another CPU. An interrupt handler on a nonlocal processor thus needs to spin-wait until the process context code on the local processor exits the critical section. The interrupt context code calls spin_lock()/spin_unlock() to do this:

spin_lock(&mylock);/* ... Critical Section ... */spin_unlock(&mylock);

Similar to the irq variants, spinlocks also have bottom half (BH) flavors. spin_lock_bh() disables bottom halves when the lock is acquired, whereas spin_unlock_bh() reenables bottom halves when the lock is released. We discuss bottom halves in Chapter 4.

The –rt tree

The real time (-rt) tree, also called the CONFIG_PREEMPT_RT patch-set, implements low-latency modifications to the kernel. The patch-set, downloadable from www.kernel.org/pub/linux/kernel/projects/rt, allows most of the kernel to be preempted, partly by replacing many spinlocks with mutexes. It also incorporates high-resolution timers. Several -rt features have been integrated into the mainline kernel. You will find detailed documentation at the project's wiki hosted at http://rt.wiki.kernel.org/.


The kernel has specialized locking primitives in its repertoire that help improve performance under specific conditions. Using a mutual-exclusion scheme tailored to your needs makes your code more powerful. Let's take a look at some of the specialized exclusion mechanisms.

Atomic Operators

Atomic operators are used to perform lightweight one-shot operations such as bumping counters, conditional increments, and setting bit positions. Atomic operations are guaranteed to be serialized and do not need locks for protection against concurrent access. The implementation of atomic operators is architecture-dependent.

To check whether there are any remaining data references before freeing a kernel network buffer (called an skbuff), the skb_release_data() routine defined in net/core/skbuff.c does the following:

if (!skb->cloned ||  /* Atomically decrement and check if the returned value is zero */    !atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 :                       1,&skb_shinfo(skb)->dataref)) {  /* ... */  kfree(skb->head);}

While skb_release_data() is thus executing, another thread using skbuff_clone() (defined in the same file) might be simultaneously incrementing the data reference counter:

/* ... *//* Atomically bump up the data reference count */atomic_inc(&(skb_shinfo(skb)->dataref));/* ... */

The use of atomic operators protects the data reference counter from being trampled by these two threads. It also eliminates the hassle of using locks to protect a single integer variable from concurrent access.

The kernel also supports operators such as set_bit(), clear_bit(), and test_and_set_bit() to atomically engage in bit manipulations. Look at include/asm-your-arch/atomic.h for the atomic operators supported on your architecture.

Reader-Writer Locks

Another specialized concurrency regulation mechanism is a reader-writer variant of spinlocks. If the usage of a critical section is such that separate threads either read from or write to a shared data structure, but don't do both, these locks are a natural fit. Multiple reader threads are allowed inside a critical region simultaneously. Reader spinlocks are defined as follows:

rwlock_t myrwlock = RW_LOCK_UNLOCKED;read_lock(&myrwlock);     /* Acquire reader lock *//* ... Critical Region ... */read_unlock(&myrwlock);   /* Release lock */

However, if a writer thread enters a critical section, other reader or writer threads are not allowed inside. To use writer spinlocks, you would write this:

rwlock_t myrwlock = RW_LOCK_UNLOCKED;write_lock(&myrwlock);    /* Acquire writer lock *//* ... Critical Region ... */write_unlock(&myrwlock);  /* Release lock */

Look at the IPX routing code present in net/ipx/ipx_route.c for a real-life example of a reader-writer spinlock. A reader-writer lock called ipx_routes_lock protects the IPX routing table from simultaneous access. Threads that need to look up the routing table to forward packets request reader locks. Threads that need to add or delete entries from the routing table acquire writer locks. This improves performance because there are usually far more instances of routing table lookups than routing table updates.

Like regular spinlocks, reader-writer locks also have corresponding irq variants—namely, read_lock_irqsave(), read_lock_irqrestore(), write_lock_irqsave(), and write_lock_irqrestore(). The semantics of these functions are similar to those of regular spinlocks.

Sequence locks or seqlocks, introduced in the 2.6 kernel, are reader-writer locks where writers are favored over readers. This is useful if write operations on a variable far outnumber read accesses. An example is the jiffies_64 variable discussed earlier in this chapter. Writer threads do not wait for readers who may be inside a critical section. Because of this, reader threads may discover that their entry inside a critical section has failed and may need to retry:

u64 get_jiffies_64(void) /* Defined in kernel/time.c */{  unsigned long seq;  u64 ret;  do {    seq = read_seqbegin(&xtime_lock);    ret = jiffies_64;  } while (read_seqretry(&xtime_lock, seq));  return ret;}

Writers protect critical regions using write_seqlock() and write_sequnlock().

The 2.6 kernel introduced another mechanism called Read-Copy Update (RCU), which yields improved performance when readers far outnumber writers. The basic idea is that reader threads can execute without locking. Writer threads are more complex. They perform update operations on a copy of the data structure and replace the pointer that readers see. The original copy is maintained until the next context switch on all CPUs to ensure completion of all ongoing read operations. Be aware that using RCU is more involved than using the primitives discussed thus far and should be used only if you are sure that it's the right tool for the job. RCU data structures and interface functions are defined in include/linux/rcupdate.h. There is ample documentation in Documentation/RCU/*.

For an RCU usage example, look at fs/dcache.c. On Linux, each file is associated with directory entry information (stored in a structure called dentry), metadata information (stored in an inode), and actual data (stored in data blocks). Each time you operate on a file, the components in the file path are parsed, and the corresponding dentries are obtained. The dentries are kept cached in a data structure called the dcache, to speed up future operations. At any time, the number of dcache lookups is much more than dcache updates, so references to the dcache are protected using RCU primitives.

Debugging

Concurrency-related problems are generally hard to debug because they are usually difficult to reproduce. It's a good idea to enable SMP (CONFIG_SMP) and preemption (CONFIG_PREEMPT) while compiling and testing your code, even if your production kernel is going to run on a UP machine with preemption disabled. There is a kernel configuration option under Kernel hacking called Spinlock and rw-lock debugging (CONFIG_DEBUG_SPINLOCK) that can help you catch some common spinlock errors. Also available are tools such as lockmeter (http://oss.sgi.com/projects/lockmeter/) that collect lock-related statistics.

A common concurrency problem occurs when you forget to lock an access to a shared resource. This results in different threads "racing" through that access in an unregulated manner. The problem, called a race condition, might manifest in the form of occasional strange code behavior.

Another potential problem arises when you miss releasing held locks in certain code paths, resulting in deadlocks. To understand this, consider the following example:

spin_lock(&mylock);     /* Acquire lock *//* ... Critical Section ... */if (error) {            /* This error condition occurs rarely */  return -EIO; /* Forgot to release the lock! */}spin_unlock(&mylock);   /* Release lock */

After the occurrence of the error condition, any thread trying to acquire mylock gets deadlocked, and the kernel might freeze.

If the problem first manifests months or years after you write the code, it'll be all the more tough to go back and debug it. (There is a related debugging example in the section "Kdump" in Chapter 21, "Debugging Device Drivers.") To avoid such unpleasant encounters, concurrency logic should be designed when you architect your software.

Previous Page Next Page
  评论这张
 
阅读(833)| 评论(0)
推荐 转载

历史上的今天

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2017