Linux Systemcall By INT 0x80、Llinux Kernel Debug Based On Sourcecode

作者: .Little Hann
发布时间:2015-07-01 16:33:49

目录

1. 系统调用简介  2. 系统调用跟踪调试  3. 系统调用内核源码分析

 

1. 系统调用简介

关于系统调用的基本原理,请参阅另一篇文章,本文的主要目标是从内核源代码的角度来学习一下系统调用在底层的内核中是如何实现的

Relevant Link:

http://www.cnblogs.com/LittleHann/p/3850653.html  http://www.kerneltravel.net/journal/iv/syscall.htm  http://zjuedward.blog.51cto.com/1445231/465997  http://www.ibm.com/developerworks/cn/Linux/l-system-calls/  http://www.cnblogs.com/LittleHann/p/3850655.html

 

2. 系统调用跟踪调试

我们可以借助一些内核调试工具,来动态跟踪运行路径
code:

#include <syscall.h>  #include <unistd.h>  #include <stdio.h>  #include <sys/types.h>    int main(void)   {      long ID;      ID = getpid();      printf ("getpid()=%ld\n", ID);      return(0);  }

编译:

gcc sys.c -o sys

kdb调试

1. 激活KDB(按下pause键,当然你必须已经给内核打了KDB补丁)  2. 设置内核断点: "bp sys_getpid"  3. 退出kdb: "go"  4. 执行./getpid  5. 进入内核调试状态,执行路径停止在断点sys_getpid处。  6. 在KDB>提示符下,执行bt命令观察堆栈,发现调用的嵌套路径,可以看到在sys_getpid是在内核函数system_call中被嵌套调用的。  7. 在KDB>提示符下,执行rd命令查看寄存器中的数值,可以看到eax中存放的getpid调用号: 0x00000014(=20).  8. 在KDB>提示符下,执行ssb(或ss)命令跟踪内核代码执行路径,可以发现sys_getpid执行后,会返回system_call函数,然后接者转入ret_from_sys_call例程 

程序的执行流程大致可归结为以下几个步骤

1. 该程序调用libc库的封装函数getpid。该封装函数将系统调用号_NR_getpid(第20个)压入EAX寄存器  2. 通过IDT找到0x80中断的地址  3. 调用软中断int 0x80 进入内核态  4. 在内核中首先执行system_call,接着执行根据系统调用号在调用表中查找到的对应的系统调用服务例程sys_getpid  5. 执行sys_getpid服务例程  6. 执行完毕后,转入ret_from_sys_call例程,系统调用中返回

Relevant Link:

ftp://oss.sgi.com/projects/kdb/download/v4.4/  https://www.kernel.org/  http://www.cnblogs.com/shineshqw/articles/2359114.html  http://www.drdobbs.com/open-source/linux-kernel-debugging/184406318

 

3. 系统调用内核源码分析

还是拿文章第2节的getuid()这个小程序作为例如,我们来从源码角度分析一下linux中系统调用的原理

linux-2.6.32.63\arch\x86\kernel

/*  所有系统调用的入口点,参数system_call是所希望激活的系统调用的"调用号"  */  ENTRY(system_call)      RING0_INT_FRAME            # can't unwind into user space anyway      /*      保存orig_eax,这个值就是传入的"系统调用号"              */      pushl %eax            CFI_ADJUST_CFA_OFFSET 4      /*  SAVE_ALL的宏定义如下  他的作用是先把所有寄存器的值压栈,然后在system_call返回之前使用RESTORE_ALL把栈从栈中弹出,在这其中system_call可以根据需要子去使用寄存器的值。任何它调用的c函数都可以从栈中查找到所希望的参数,因为
SAVE_ALL已经把所有寄存器的值都压入栈中了 .macro SAVE_ALL cld PUSH_GS pushl %fs CFI_ADJUST_CFA_OFFSET 4 pushl %es CFI_ADJUST_CFA_OFFSET 4 pushl %ds CFI_ADJUST_CFA_OFFSET 4 pushl %eax CFI_ADJUST_CFA_OFFSET 4 CFI_REL_OFFSET eax, 0 pushl %ebp CFI_ADJUST_CFA_OFFSET 4 CFI_REL_OFFSET ebp, 0 pushl %edi CFI_ADJUST_CFA_OFFSET 4 CFI_REL_OFFSET edi, 0 pushl %esi CFI_ADJUST_CFA_OFFSET 4 CFI_REL_OFFSET esi, 0 pushl %edx CFI_ADJUST_CFA_OFFSET 4 CFI_REL_OFFSET edx, 0 pushl %ecx CFI_ADJUST_CFA_OFFSET 4 CFI_REL_OFFSET ecx, 0 pushl %ebx CFI_ADJUST_CFA_OFFSET 4 CFI_REL_OFFSET ebx, 0 movl $(__USER_DS), %edx movl %edx, %ds movl %edx, %es movl $(__KERNEL_PERCPU), %edx movl %edx, %fs SET_KERNEL_GS %edx .endm
*/ SAVE_ALL GET_THREAD_INFO(%ebp) /* 检查系统调用是否正在被跟踪 */ testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags(%ebp) jnz syscall_trace_entry cmpl $(nr_syscalls), %eax jae syscall_badsys syscall_call: /* 调用系统函数 sys_call_table也定义在是一张由指向实现各种系统调用的内核函数的函数指针组成的表: linux-2.6.32.63\arch\x86\kernel\syscall_table_32.S ENTRY(sys_call_table) .long sys_restart_syscall /* 0 - old "setup()" system call, used for restarting */ .long sys_exit .long ptregs_fork .long sys_read .long sys_write .long sys_open /* 5 */ .long sys_close .long sys_waitpid .long sys_creat .long sys_link .long sys_unlink /* 10 */ .long ptregs_execve .long sys_chdir .long sys_time .long sys_mknod .long sys_chmod /* 15 */ .long sys_lchown16 .long sys_ni_syscall /* old break syscall holder */ .long sys_stat .long sys_lseek .long sys_getpid /* 20 */ .long sys_mount .long sys_oldumount .long sys_setuid16 .long sys_getuid16 .long sys_stime /* 25 */ .long sys_ptrace .long sys_alarm .long sys_fstat .long sys_pause .long sys_utime /* 30 */ .long sys_ni_syscall /* old stty syscall holder */ .long sys_ni_syscall /* old gtty syscall holder */ .long sys_access .long sys_nice .long sys_ni_syscall /* 35 - old ftime syscall holder */ .long sys_sync .long sys_kill .long sys_rename .long sys_mkdir .long sys_rmdir /* 40 */ .long sys_dup .long sys_pipe .long sys_times .long sys_ni_syscall /* old prof syscall holder */ .long sys_brk /* 45 */ .long sys_setgid16 .long sys_getgid16 .long sys_signal .long sys_geteuid16 .long sys_getegid16 /* 50 */ .long sys_acct .long sys_umount /* recycled never used phys() */ .long sys_ni_syscall /* old lock syscall holder */ .long sys_ioctl .long sys_fcntl /* 55 */ .long sys_ni_syscall /* old mpx syscall holder */ .long sys_setpgid .long sys_ni_syscall /* old ulimit syscall holder */ .long sys_olduname .long sys_umask /* 60 */ .long sys_chroot .long sys_ustat .long sys_dup2 .long sys_getppid .long sys_getpgrp /* 65 */ .long sys_setsid .long sys_sigaction .long sys_sgetmask .long sys_ssetmask .long sys_setreuid16 /* 70 */ .long sys_setregid16 .long sys_sigsuspend .long sys_sigpending .long sys_sethostname .long sys_setrlimit /* 75 */ .long sys_old_getrlimit .long sys_getrusage .long sys_gettimeofday .long sys_settimeofday .long sys_getgroups16 /* 80 */ .long sys_setgroups16 .long old_select .long sys_symlink .long sys_lstat .long sys_readlink /* 85 */ .long sys_uselib .long sys_swapon .long sys_reboot .long sys_old_readdir .long old_mmap /* 90 */ .long sys_munmap .long sys_truncate .long sys_ftruncate .long sys_fchmod .long sys_fchown16 /* 95 */ .long sys_getpriority .long sys_setpriority .long sys_ni_syscall /* old profil syscall holder */ .long sys_statfs .long sys_fstatfs /* 100 */ .long sys_ioperm .long sys_socketcall .long sys_syslog .long sys_setitimer .long sys_getitimer /* 105 */ .long sys_newstat .long sys_newlstat .long sys_newfstat .long sys_uname .long ptregs_iopl /* 110 */ .long sys_vhangup .long sys_ni_syscall /* old "idle" system call */ .long ptregs_vm86old .long sys_wait4 .long sys_swapoff /* 115 */ .long sys_sysinfo .long sys_ipc .long sys_fsync .long ptregs_sigreturn .long ptregs_clone /* 120 */ .long sys_setdomainname .long sys_newuname .long sys_modify_ldt .long sys_adjtimex .long sys_mprotect /* 125 */ .long sys_sigprocmask .long sys_ni_syscall /* old "create_module" */ .long sys_init_module .long sys_delete_module .long sys_ni_syscall /* 130: old "get_kernel_syms" */ .long sys_quotactl .long sys_getpgid .long sys_fchdir .long sys_bdflush .long sys_sysfs /* 135 */ .long sys_personality .long sys_ni_syscall /* reserved for afs_syscall */ .long sys_setfsuid16 .long sys_setfsgid16 .long sys_llseek /* 140 */ .long sys_getdents .long sys_select .long sys_flock .long sys_msync .long sys_readv /* 145 */ .long sys_writev .long sys_getsid .long sys_fdatasync .long sys_sysctl .long sys_mlock /* 150 */ .long sys_munlock .long sys_mlockall .long sys_munlockall .long sys_sched_setparam .long sys_sched_getparam /* 155 */ .long sys_sched_setscheduler .long sys_sched_getscheduler .long sys_sched_yield .long sys_sched_get_priority_max .long sys_sched_get_priority_min /* 160 */ .long sys_sched_rr_get_interval .long sys_nanosleep .long sys_mremap .long sys_setresuid16 .long sys_getresuid16 /* 165 */ .long ptregs_vm86 .long sys_ni_syscall /* Old sys_query_module */ .long sys_poll .long sys_nfsservctl .long sys_setresgid16 /* 170 */ .long sys_getresgid16 .long sys_prctl .long ptregs_rt_sigreturn .long sys_rt_sigaction .long sys_rt_sigprocmask /* 175 */ .long sys_rt_sigpending .long sys_rt_sigtimedwait .long sys_rt_sigqueueinfo .long sys_rt_sigsuspend .long sys_pread64 /* 180 */ .long sys_pwrite64 .long sys_chown16 .long sys_getcwd .long sys_capget .long sys_capset /* 185 */ .long ptregs_sigaltstack .long sys_sendfile .long sys_ni_syscall /* reserved for streams1 */ .long sys_ni_syscall /* reserved for streams2 */ .long ptregs_vfork /* 190 */ .long sys_getrlimit .long sys_mmap_pgoff .long sys_truncate64 .long sys_ftruncate64 .long sys_stat64 /* 195 */ .long sys_lstat64 .long sys_fstat64 .long sys_lchown .long sys_getuid .long sys_getgid /* 200 */ .long sys_geteuid .long sys_getegid .long sys_setreuid .long sys_setregid .long sys_getgroups /* 205 */ .long sys_setgroups .long sys_fchown .long sys_setresuid .long sys_getresuid .long sys_setresgid /* 210 */ .long sys_getresgid .long sys_chown .long sys_setuid .long sys_setgid .long sys_setfsuid /* 215 */ .long sys_setfsgid .long sys_pivot_root .long sys_mincore .long sys_madvise .long sys_getdents64 /* 220 */ .long sys_fcntl64 .long sys_ni_syscall /* reserved for TUX */ .long sys_ni_syscall .long sys_gettid .long sys_readahead /* 225 */ .long sys_setxattr .long sys_lsetxattr .long sys_fsetxattr .long sys_getxattr .long sys_lgetxattr /* 230 */ .long sys_fgetxattr .long sys_listxattr .long sys_llistxattr .long sys_flistxattr .long sys_removexattr /* 235 */ .long sys_lremovexattr .long sys_fremovexattr .long sys_tkill .long sys_sendfile64 .long sys_futex /* 240 */ .long sys_sched_setaffinity .long sys_sched_getaffinity .long sys_set_thread_area .long sys_get_thread_area .long sys_io_setup /* 245 */ .long sys_io_destroy .long sys_io_getevents .long sys_io_submit .long sys_io_cancel .long sys_fadvise64 /* 250 */ .long sys_ni_syscall .long sys_exit_group .long sys_lookup_dcookie .long sys_epoll_create .long sys_epoll_ctl /* 255 */ .long sys_epoll_wait .long sys_remap_file_pages .long sys_set_tid_address .long sys_timer_create .long sys_timer_settime /* 260 */ .long sys_timer_gettime .long sys_timer_getoverrun .long sys_timer_delete .long sys_clock_settime .long sys_clock_gettime /* 265 */ .long sys_clock_getres .long sys_clock_nanosleep .long sys_statfs64 .long sys_fstatfs64 .long sys_tgkill /* 270 */ .long sys_utimes .long sys_fadvise64_64 .long sys_ni_syscall /* sys_vserver */ .long sys_mbind .long sys_get_mempolicy .long sys_set_mempolicy .long sys_mq_open .long sys_mq_unlink .long sys_mq_timedsend .long sys_mq_timedreceive /* 280 */ .long sys_mq_notify .long sys_mq_getsetattr .long sys_kexec_load .long sys_waitid .long sys_ni_syscall /* 285 */ /* available */ .long sys_add_key .long sys_request_key .long sys_keyctl .long sys_ioprio_set .long sys_ioprio_get /* 290 */ .long sys_inotify_init .long sys_inotify_add_watch .long sys_inotify_rm_watch .long sys_migrate_pages .long sys_openat /* 295 */ .long sys_mkdirat .long sys_mknodat .long sys_fchownat .long sys_futimesat .long sys_fstatat64 /* 300 */ .long sys_unlinkat .long sys_renameat .long sys_linkat .long sys_symlinkat .long sys_readlinkat /* 305 */ .long sys_fchmodat .long sys_faccessat .long sys_pselect6 .long sys_ppoll .long sys_unshare /* 310 */ .long sys_set_robust_list .long sys_get_robust_list .long sys_splice .long sys_sync_file_range .long sys_tee /* 315 */ .long sys_vmsplice .long sys_move_pages .long sys_getcpu .long sys_epoll_pwait .long sys_utimensat /* 320 */ .long sys_signalfd .long sys_timerfd_create .long sys_eventfd .long sys_fallocate .long sys_timerfd_settime /* 325 */ .long sys_timerfd_gettime .long sys_signalfd4 .long sys_eventfd2 .long sys_epoll_create1 .long sys_dup3 /* 330 */ .long sys_pipe2 .long sys_inotify_init1 .long sys_preadv .long sys_pwritev .long sys_rt_tgsigqueueinfo /* 335 */ .long sys_perf_event_open sys_call_table的三个参数 1) 默认数组的基地址,因为这里为空则赋予0,但它需要和偏移地址sys_call_table相加,简单的说是sys_call_table被当作数组的基地址 2) 数组索引(系统调用号) 3) 大小(每个数组元素中的字节数) 所以这行代码可以重写为: call sys_call_table[%eax](); */ call *sys_call_table(,%eax,4) /* 系统调用返回 它在EAX寄存器中的返回值(这个值同时也是system_call的返回值)被存储了起来。返回值被存储在堆栈中的EAX内,以使得RESTORE_ALL可以迅速地恢复实际的EAX寄存器及其他寄存器的值 */ movl %eax,PT_EAX(%esp) syscall_exit: LOCKDEP_SYS_EXIT DISABLE_INTERRUPTS(CLBR_ANY) TRACE_IRQS_OFF movl TI_flags(%ebp), %ecx testl $_TIF_ALLWORK_MASK, %ecx /* 退出系统调用 */ jne syscall_exit_work .....

Relevant Link:

http://bbs.chinaunix.net/thread-2284015-1-1.html  http://wenku.baidu.com/view/f367a0ccda38376baf1fae0d.html  http://www.ibm.com/developerworks/cn/linux/l-system-calls/

 

Copyright (c) 2014 LittleHann All rights reserved

 

来源:http://www.cnblogs.com/LittleHann/p/3871630.html

推荐: