【转】内核等待队列机制原理分析

1. 等待队列数据结构

等待队列由双向链表实现，其元素包括指向进程描述符的指针。每个等待队列都有一个等待队列头(wait queue head)，等待队列头是一个类型为wait_queque_head_t的数据结构：

struct __wait_queue_head {

spinlock_t lock;

struct list_head task_list;

};

typedef struct __wait_queue_head wait_queue_head_t;

其中，lock是用来防止并发访问，task_list字段是等待进程链表的头。

等待队列链表中的元素类型为wait_queue_t,我们可以称之为等待队列项：

struct __wait_queue {

unsigned int flags;

#define WQ_FLAG_EXCLUSIVE 0x01

void *private;

wait_queue_func_t func;

struct list_head task_list;

};

typedef struct __wait_queue wait_queue_t;

每一个等待队列项代表一个睡眠进程，该进程等待某一事件的发生。它的描述符地址通常放在private字段中。Task_list字段中包含的是指针，由这个指针把一个元素链接到等待相同事件的进程链表中。

等待队列元素的func字段用来表示等待队列中睡眠进程应该用什么方式唤醒(互斥方式和非互斥方式)。

整个等待队列的结构如下图所示：

下面看看等待队列的工作原理。

2. 等待队列的睡眠过程

使用等待队列前通常先定义一个等待队列头：static wait_queue_head_t wq ,然后调用wait_event_*函数将等待某条件condition的当前进程插入到等待队列wq中并睡眠，一直等到condition条件满足后，内核再将睡眠在等待队列wq上的某一进程或所有进程唤醒。

定义等待队列头没什么好讲的，下面从调用wait_event_*开始分析：

这里我们举比较常用的wait_event_interruptible：

/**

* wait_event_interruptible - sleep until a condition gets true

* @wq: the waitqueue to wait on

* @condition: a C expression for the event to wait for

* The process is put to sleep (TASK_INTERRUPTIBLE) until the

* @condition evaluates to true or a signal is received.

* The @condition is checked each time the waitqueue @wq is woken up.

* wake_up() has to be called after changing any variable that could

* change the result of the wait condition.

* The function will return -ERESTARTSYS if it was interrupted by a

* signal and 0 if @condition evaluated to true.

#define wait_event_interruptible(wq, condition) \

({ \

int __ret = 0; \

if (!(condition)) \

__wait_event_interruptible(wq, condition, __ret); \

__ret; \

})

这里很简单，判断一下condition条件是否满足，如果不满足则调用__wait_event_interruptible函数。

#define __wait_event_interruptible(wq, condition, ret) \

do { \

DEFINE_WAIT(__wait); \

for (;;) { \

prepare_to_wait(&wq, &__wait, TASK_INTERRUPTIBLE); \

if (condition) \

break; \

if (!signal_pending(current)) { \

schedule(); \

continue; \

} \

ret = -ERESTARTSYS; \

break; \

} \

finish_wait(&wq, &__wait); \

} while (0)

__wait_event_interruptible首先定义了一个wait_queue_t类型的等待队列项__wait :

#define DEFINE_WAIT(name) \

wait_queue_t name = { \

.private = current, \

.func = autoremove_wake_function, \

.task_list = LIST_HEAD_INIT((name).task_list), \

}

可以发现，这里__wait的private成员(通常用来存放进程的描述符)已经被初始化为current, 表示该等待队列项对应为当前进程。func成员为该等待队列项对应的唤醒函数，该进程被唤醒后会执行它，已经被初始化为默认的autoremove_wake_function函数。

然后在一个for (;;) 循环内调用prepare_to_wait函数：

void fastcall prepare_to_wait(wait_queue_head_t *q, wait_queue_t *wait, int state)

{

unsigned long flags;

wait->flags &= ~WQ_FLAG_EXCLUSIVE;

spin_lock_irqsave(&q->lock, flags);

if (list_empty(&wait->task_list))

__add_wait_queue(q, wait);

* don't alter the task state if this is just going to

* queue an async wait queue callback

if (is_sync_wait(wait))

set_current_state(state);

spin_unlock_irqrestore(&q->lock, flags);

}

prepare_to_wait做如下两件事，将先前定义的等待队列项__wait插入到等待队列头wq，然后将当前进程设为TASK_INTERRUPTIBLE状态。prepare_to_wait执行完后立马再检查一下condition有没有满足，如果此时碰巧满足了则不必要在睡眠了。如果还没有满足，则准备睡眠。

睡眠是通过调用schedule()函数实现的，由于之前已经将当前进程设置为TASK_INTERRUPTIBLE状态，因而这里再执行schedule()进行进程切换的话，之后就永远不会再调度到该进程运行的，直到该进程被唤醒（即更改为TASK_RUNNING状态）。

这里在执行schedule()切换进程前会先判断一下有没signal过来，如果有则立即返回ERESTARTSYS。没有的话则执行schedule()睡眠去了。

for (;;) 循环的作用是让进程被唤醒后再一次去检查一下condition是否满足。主要是为了防止等待队列上的多个进程被同时唤醒后有可能其他进程已经抢先把资源占有过去造成资源又变为不可用，因此最好再判断一下。(当然，内核也提供了仅唤醒一个或多个进程（独占等待进程）的方式，有兴趣的可以参考相关资料)

进程被唤醒后最后一步是调用finish_wait(&wq, &__wait)函数进行清理工作。finish_wait将进程的状态再次设为TASK_RUNNING并从等待队列中删除该进程。

void fastcall finish_wait(wait_queue_head_t *q, wait_queue_t *wait)

{

unsigned long flags;

__set_current_state(TASK_RUNNING);

if (!list_empty_careful(&wait->task_list)) {

spin_lock_irqsave(&q->lock, flags);

list_del_init(&wait->task_list);

spin_unlock_irqrestore(&q->lock, flags);

}

再往后就是返回你先前调用wait_event_interruptible(wq, condition)被阻塞的地方继续往下执行。

3. 等待队列的唤醒过程

直到这里我们明白等待队列是如何睡眠的，下面我们分析等待队列的唤醒过程。

使用等待队列有个前提，必须得有人唤醒它，如果没人唤醒它，那么同眠在该等待队列上的所有进程岂不是变成"僵尸进程"了。

对于设备驱动来讲，通常是在中断处理函数内唤醒该设备的等待队列。驱动程序通常会提供一组自己的读写等待队列以实现上层(user level)所需的BLOCK和O_NONBLOCK操作。当设备资源可用时，如果驱动发现有进程睡眠在自己的读写等待队列上便会唤醒该等待队列。

唤醒一个等待队列是通过wake_up_*函数实现的。这里我们举对应的wake_up_interruptible作为例子分析。定义如下：

#define wake_up_interruptible(x) __wake_up(x, TASK_INTERRUPTIBLE, 1, NULL)

这里的参数x即要唤醒的等待队列对应的等待队列头。唤醒TASK_INTERRUPTIBLE类型的进程并且默认唤醒该队列上所有非独占等待进程和一个独占等待进程。

__wake_up定义如下：

/**

* __wake_up - wake up threads blocked on a waitqueue.

* @q: the waitqueue

* @mode: which threads

* @nr_exclusive: how many wake-one or wake-many threads to wake up

* @key: is directly passed to the wakeup function

void fastcall __wake_up(wait_queue_head_t *q, unsigned int mode,

int nr_exclusive, void *key)

{

unsigned long flags;

spin_lock_irqsave(&q->lock, flags);

__wake_up_common(q, mode, nr_exclusive, 1, key);

spin_unlock_irqrestore(&q->lock, flags);

preempt_check_resched_delayed();

}

__wake_up 简单的调用__wake_up_common进行实际唤醒工作。

__wake_up_common定义如下：

* The core wakeup function. Non-exclusive wakeups (nr_exclusive == 0) just

* wake everything up. If it's an exclusive wakeup (nr_exclusive == small +ve

* number) then we wake all the non-exclusive tasks and one exclusive task.

* There are circumstances in which we can try to wake a task which has already

* started to run but is not in state TASK_RUNNING. try_to_wake_up() returns

* zero in this (rare) case, and we handle it by continuing to scan the queue.

static void __wake_up_common(wait_queue_head_t *q, unsigned int mode,

int nr_exclusive, int sync, void *key)

{

struct list_head *tmp, *next;

list_for_each_safe(tmp, next, &q->task_list) {

wait_queue_t *curr = list_entry(tmp, wait_queue_t, task_list);

unsigned flags = curr->flags;

if (curr->func(curr, mode, sync, key) &&

(flags & WQ_FLAG_EXCLUSIVE) && !--nr_exclusive)

break;

}

__wake_up_common循环遍历等待队列内的所有元素，分别执行其对应的唤醒函数。

这里的唤醒函数即先前定义等待队列项DEFINE_WAIT(__wait)时默认初始化的autoremove_wake_function函数。autoremove_wake_function最终会调用try_to_wake_up函数将进程置为TASK_RUNNING状态。这样后面的进程调度便会调度到该进程，从而唤醒该进程继续执行。

Reference:

1) OReilly.Understanding.the.Linux.Kernel.3rd.Edition.Nov.2005.HAPPY.NEW.YEAR

2) Linux 2.6.18_Pro500 (Montavista)

大狗熊的博客

搜索此博客

【转】内核等待队列机制原理分析

评论

发表评论

此博客中的热门博文

【转】AMBA、AHB、APB总线简介

【转】C++/CLI程序进程之间的通讯

【转】VxWorks入门