Posts Tagged ‘Memory’
Linux Memory Manager: VM area
The full address space of a process is rarely used, only sparse areas are. Each area is represented by a vm_area_struct which never overlap and represent a set of addresses with the same protection and purpose.
A full list of mapped areas a process has can be viewed var the proc interface at /proc/PID/maps.
The struct vm_area_struct is declared as follows:
struct vm_area_struct {
struct mm_struct * vm_mm; /* The address space we belong to. */
unsigned long vm_start; /* Our start address within vm_mm. */
unsigned long vm_end;
/* linked list of VM areas per task, sorted by address */
struct vm_area_struct *vm_next;
pgprot_t vm_page_prot; /* Access permissions of this VMA. */
unsigned long vm_flags; /* Flags, listed below. */
struct rb_node vm_rb;
/*
* For areas with an address space and backing store,
* linkage into the address_space->i_mmap prio tree, or
* linkage to the list of like vmas hanging off its node, or
* linkage of vma in the address_space->i_mmap_nonlinear list.
*/
union {
struct {
struct list_head list;
void *parent; /* aligns with prio_tree_node parent */
struct vm_area_struct *head;
} vm_set;
struct raw_prio_tree_node prio_tree_node;
} shared;
/*
* A file's MAP_PRIVATE vma can be in both i_mmap tree and anon_vma
* list, after a COW of one of the file pages. A MAP_SHARED vma
* can only be in the i_mmap tree. An anonymous MAP_PRIVATE, stack
* or brk vma (with NULL file) can only be in an anon_vma list.
*/
struct list_head anon_vma_node;
struct anon_vma *anon_vma; /* Serialized by page_table_lock */
/* Function pointers to deal with this struct. */
struct vm_operations_struct * vm_ops;
/* Information about our backing store: */
unsigned long vm_pgoff;
struct file * vm_file; /* File we map to (can be NULL). */
void * vm_private_data; /* was vm_pte (shared mem) */
unsigned long vm_truncate_count;
};
Create a memory area
The system call mmap() is provided for creating new memory areas within a process:
asmlinkage long sys_mmap2(unsigned long addr, unsigned long len,
unsigned long prot, unsigned long flags,
unsigned long fd, unsigned long pgoff)
sys_mmap2() will clears the MAP_EXECUTABLE and MAP_DENYWRITE bits from the flags parameter as they are ignored by Linux, and then calling do_mmap_pgoff().
do_mmap_pgoff() will doing following steps:
- Checks that the appropriate filesystem or device functions are available if a file or device is being mapped
- Ensures the size of the mapping is page aligned and that it does not attempt to create a mapping in the kernel portion of the address space.
- Makes sure the size of the mapping does not overflow the range of pgoff
- Make sure the process does not have too many mapped areas already.
- Call get_unmapped_area to find a free linear address space large enough for the mamory mapping.
- Calculate the VM flags and check them against the file access permissions.
- If an old area exists where the mapping is to take place, fix it up so that it is suitable for the new mapping.
- Allocate a vm_area_struct from the slab allocator and fill in it’s entries.
- Link in the new VMA.
- Call the filesystem or device specific mmap function.
- Update statistics and return.
Linux Memory Manager Explore
Pages
Page is the basic unit of memory managed by Linux memory manager. Read more…
Page Tables
To access physical memory, every process must go through the paging system. In or to be able to access any physical page, the page must be in your page tables. Every process has it’s own page tables.
Kernel Space & User Space
The Kernel Space is a set of addresses that contains kernel code+kernel related structures
The User Space is an area of the memory used by user mode applications.
Kernel space and user space always refer to the virtual addresses, and in return these virtual addresses points to the physical addresses through page tables.
Checks if a user space pointer is valid:
access_ok ( type, addr, size);
Process Address Space
The process address space is described by the mm_struct struct. Only one exists for each process and is shared between threads:
struct mm_struct {
struct vm_area_struct * mmap; /* list of VMAs */
struct rb_root mm_rb;
struct vm_area_struct * mmap_cache; /* last find_vma result */
unsigned long (*get_unmapped_area) (struct file *filp,
unsigned long addr, unsigned long len,
unsigned long pgoff, unsigned long flags);
void (*unmap_area) (struct mm_struct *mm, unsigned long addr);
unsigned long mmap_base; /* base of mmap area */
unsigned long task_size; /* size of task vm space */
unsigned long cached_hole_size;
unsigned long free_area_cache;
pgd_t * pgd;
atomic_t mm_users; /* How many users with user space? */
atomic_t mm_count;
int map_count; /* number of VMAs */
struct rw_semaphore mmap_sem;
spinlock_t page_table_lock;
struct list_head mmlist;
/* Special counters, in some configurations protected by the
* page_table_lock, in other configurations by being atomic.
*/
mm_counter_t _file_rss;
mm_counter_t _anon_rss;
unsigned long hiwater_rss; /* High-watermark of RSS usage */
unsigned long hiwater_vm; /* High-water virtual memory usage */
unsigned long total_vm, locked_vm, shared_vm, exec_vm;
unsigned long stack_vm, reserved_vm, def_flags, nr_ptes;
unsigned long start_code, end_code, start_data, end_data;
unsigned long start_brk, brk, start_stack;
unsigned long arg_start, arg_end, env_start, env_end;
unsigned long saved_auxv[AT_VECTOR_SIZE];
unsigned dumpable:2;
cpumask_t cpu_vm_mask;
/* Architecture-specific MM context */
mm_context_t context;
/* Token based thrashing protection. */
unsigned long swap_token_time;
char recent_pagein;
/* coredumping support */
int core_waiters;
struct completion *core_startup_done, core_done;
/* aio bits */
rwlock_t ioctx_list_lock;
struct kioctx *ioctx_list;
};
VM area
The full address space of a process is rarely used, only sparse areas are. Each area is represented by a vm_area_struct which never overlap and represent a set of addresses with the same protection and purpose. Read more…
Splice
Splice is a system all that copies data between a file handle and a pipe, or between a pipe and user space.
There has 3 system calls in the linux kernel:
sys_splice - transfers between a file descriptor and a pipe:
asmlinkage long sys_splice(int fd_in, loff_t __user *off_in,
int fd_out, loff_t __user *off_out,
size_t len, unsigned int flags);
sys_vmsplice - map an application data area into a pipe (or vice versa), thus allowing transfer between pipes and user memory:
asmlinkage long sys_vmsplice(int fd, const struct iovec __user *iov,
unsigned long nr_segs, unsigned int flags);
sys_tee - duplicates one pipe to another, enabling forks in the way applications are connected with pipes:
asmlinkage long sys_tee(int fdin, int fdout, size_t len, unsigned int flags);
Linux Memory Manager: Pages
Page is the basic unit of memory managed by Linux memory manager. Every physical page in the system has a struct page associated:
struct page {
unsigned long flags;
atomic_t _count; /* Usage count, see below. */
atomic_t _mapcount;
union {
struct {
unsigned long private;
struct address_space *mapping;
};
};
pgoff_t index; /* Our offset within mapping. */
struct list_head lru;
};
page->flags describe the status of the page:
#define PG_locked 0 /* Page is locked. Don't touch. */
#define PG_error 1
#define PG_referenced 2
#define PG_uptodate 3
#define PG_dirty 4
#define PG_lru 5
#define PG_active 6
#define PG_slab 7 /* slab debug (Suparna wants this) */
#define PG_checked 8 /* kill me in 2.5.
#define PG_arch_1 9
#define PG_reserved 10
#define PG_private 11 /* Has something at ->private */
#define PG_writeback 12 /* Page is under writeback */
#define PG_nosave 13 /* Used for system suspend/resume */
#define PG_compound 14 /* Part of a compound page */
#define PG_swapcache 15 /* Swap page: swp_entry_t in private */
#define PG_mappedtodisk 16 /* Has blocks allocated on-disk */
#define PG_reclaim 17 /* To be reclaimed asap */
#define PG_nosave_free 18 /* Free, should not be written */
#define PG_buddy 19 /* Page is free, on buddy lists */
Compound Page : A compound page is a higher-order page. To enable compound page support in the kernel, “Huge TLB Page Support” must be enabled at compile time. A compound page is composed of more then one page, the first of which is called the “head” page and the remainder of which are called “tail” pages. All compound pages will have the PG_compound bit set in there respective page->flags, and the page->lru.next pointing the the head page.