Linux Project

In this post, I will put some C++, linux, computer networking and computer architecture concepts and theories. In addition, I will also do a Linux high-concurrency web server project in the end to consolidate these knowledge.


IO functions of Linux system

  • int open(const char *pathname, int flags): open an existed file and return a file descriptor.
    • pathname: file path
    • flags: permission of the file and other settings (O_RDONLY, O_WRONLY, or O_RDWR)
  • void perror(const char *s): print the error description for message.
    • s: message (e.g. “open”)
  • int close(int fd): closes a file descriptor, so that it no longer refers to any file and may be reused.
    • fd: file descriptor
  • int open(const char *pathname, int flags, mode_t mode): create and open a file
    • mode_t: Octal, indicating the user’s permission to operate on the newly created file (e.g 0775 rwx-rwx-rwx, final permission: mode_t & ~umask)
  • ssize_t read(int fd, void *buf, size_t count): read a file
    • fd: file descriptor
    • buf: buffer area, the address of the array
    • count: the size of the buffer area
    • ssize_t: on success, the number of bytes read is returned. On error, -1 is returned and ’errno’ is set properly.
  • ssize_t write(int fd, const void *buf, size_t count): write a file
    • buf: data which is going to be written
    • count: size of the data which are going to be written
  • off_t lseek(int fd, off_t offset, int whence): move the file pointer
    • whence: can be SEEK_SET (set the offset of the file descriptor), SEEK_CUR (current position + offset), SEEK_END (file size + offset)
    • off_t (return value) is the position of the file pointer after the invoke of the function
    • Usage:
      • move the file pointer to the head of the file
        • lseek(fd, 0, SEEK_SET);
      • get the position of current file pointer
        • lseek(fd, 0, SEEK_CUR)
      • get the length of the file
        • lseek(fd, 0, SEEK_END)
      • expand the length of the file (e.g. 10b -> 110b)
        • lseek(fd, 100, SEEK_END)
        • Need to write data once
  • int stat(const char *pathname, struct stat *statbuf): return information about a file, in the buffer pointed to by statbuf
    • statbuf: a pointer to a struct which saves the information of the file
    • return value: on success, returns 1. On error, -1 is returned and ’errno’ is set properly.
  • int lstat(const char *pathname, struct stat *statbuf): same as stat but is used for soft link
  • int access(const char *pathname, int mode): check access permission or if the file exists
    • mode: can be R_OK, W_OK, X_OK, F_OK (check if the file exists)
    • return value: on success, return 0. On failure, return -1.
  • int chmod(const char *pathname, mode_t mode): change the mode of the file
    • mode: see documentation (e.g. 0777, 0775)
    • return value: on success, return 0. On failure, return -1.
  • int chown(const char *pathname, uid_t owner, gid_t group): change the owner and group of the file
    • owner: uid
    • group: gid
  • int truncate(const char *path, off_t length): truncate the file to a size of precisely length bytes
    • length: specified size
    • return value: on success, 0 is returned. On error, -1 is returned, and ’errno’ is set appropriately.
  • int mkdir(const char *pathname, mode_t mode): create a directory
    • pathname: the path for created directory
    • mode: permission (octal)
    • return value: on success, 0 is returned. On error, -1 is returned, and ’errno’ is set appropriately.
  • int rename(const char *oldpath, const char *newpath):
    • oldpath: old directory name
    • newpath: new directory name
    • return value: on success, zero is returned. On error, -1 is returned, and ’errno’ is set appropriately.
  • int chdir(const char *path): change the working directory of current process
    • path: the working directory that we want to change to
  • char *getcwd(char *buf, size_t size): equal to pwd
    • buf: a pointer that points to an array that saves the current working directory
    • size: the size of the array
    • return value: returns a pointer that points to buf
  • DIR *opendir(const char *name): open a directory stream
    • name: name of the directory that needed to be opened
    • return value: directory stream, if on error, return null.
  • struct dirent *readdir(DIR *dirp): read directory files
    • dirp: the outcome of opendir()
    • return value: the file information that we get
  • int closedir(DIR *dirp): close the directory stream
  • int dup(int oldfd): create a copy of the file descriptor oldfd, using the lowest-numbered unused file descriptor number for the new descriptor
    • return value: On success, these system calls return the new file descriptor. On error, -1 is returned, and errno is set appropriately.
  • int dup2(int oldfd, int newfd): perform the same task as dup(), but instead of using the lowest-numbered unused file descriptor, it uses the file descriptor number specified in newfd. If the file descriptor newfd was previously open, it is silently closed before being reused
  • int fcntl(int fd, int cmd, … [arg…] ): duplicate a file descriptor / return the file access mode and the file status flags (arg is ignored) / set the file status flags to the value specified by arg
    • fd: the file descriptor that we need to operate
    • cmd: command for operations (F_DUPFD / F_GETFL / F_SETFL)

File Descriptors

  • File descriptor is in the PCB (Process Control Block) of linux kernel of virtual memory segment
  • In PCB, there is an array called ‘file descriptor table’
  • A file can be opened multiple times and ‘fopen’ function will take up different file descriptors

Virtual Address

  • This is the structure of Linux virtual memory segment. When the program is running, they will be mapped to real memory address by MMU.

Comparison between IO funtions of C library and linux system

  • I/O functions of C standard library

  • IO functions of C standard library and Linux system


GDB

  • Gdb is a debugger for C (and C++). It allows you to do things like run the program up to a certain point then stop and print out the values of certain variables at that point, or step through the program one line at a time and print out the values of each variable after executing each line.
  • Preparation for GDB
    • Switch off compiler optimization option (-O).
    • Switch on debugging option (-g).
    • It is better to use ‘-Wall’ if it does not affect the behaviors of program.
    • Generate gdb file: gcc test.c -o test -g
  • Some commands for GDB
    • General commands
      • Start: gdb executable program
      • Quit: quit
      • Set parameters for program: set args 10 20…
      • Check parameters: show args
    • Browse code
      • Check code of current file: list line number / list function name
      • Check code of other files: list file name : line number / list file name : function name
      • Show list size: show listsize
      • Set list size: set listsize number
    • Set up breakpoint
      • Set up breakpoint: break line number / function name / file name : line number / file name : function name
      • Check breakpoints: info break
      • Delete breakpoint: delete breakpoint ID
      • Disable breakpoint: disable breakpoint ID
      • Enable breakpoint: enable breakpoint ID
      • Set conditional breakpoint: break 16 if i = 3
    • Run the program with GDB
      • Run GDB program: start (program will stop at line 1) / run (program will stop until the first breakpoint)
      • Continue executing the program until next breakpoint: continue
      • Execute one line of code: next (It won’t go inside of the function body)
      • Execute one line of code: step (It will go inside of the function body) / finish (jump out of the function body)
      • Variable operations:
        • Print variable value: print variable name
        • Print variable type: ptype variable name
        • Automatic operations:
          • Set automatic printing variable: display variable name
          • Check automatic printing variable list: info display
          • Delete automatic printing variable: undisplay breakpoint ID
      • Other operations:
        • Change the value of a variable: set var variable name = variable value
        • Jump out of loop: until (make sure to disable the breakpoint)

Makefile

  • Makefile sets a set of rules to determine which parts of a program need to be recompile, and issues command to recompile them. Makefile is a way of automating software building procedure and other complex tasks with dependencies.
  • Makefile Rule:
    Target ... : Dependency ...
        Shell commands
        ...
    
  • How does makefile work
    • Before executing the command, makefile will check if dependencies exist or not
      • If exist, execute the command
      • If not exist, check commands after this command and execute them to generate the dependencies needed
    • Check update, when running the command, makefile will compare the edit time of target and dependencies
      • If dependencies are edited after generating target, target will be re-generated
  • Some other symbols and commands for makefile
    • User-defined variable: var = hello
    • Pre-defined variables:
      • AR: name of the archive maintenance program, default value: ar
      • CC: name of C compiler, default value: cc
      • CXX: name of C++ compiler, default value: g++
      • $@: get the full name of target
      • $<: get the name of the first dependency
      • $^: get all dependencies
    • Get the value of variable: $(variable name)
    • Pattern match
      • %: wildcard character, can be any string
    • Functions
      • $(wildcard PATTERN …): Get the file list with specific type from specific directory
      • $(patsubst , , ): check ’text’, if ’text’ matches ‘pattern’, then replace it with ‘replacement’
    • Example
      # Define variable
      src = $(patsubst %.c, %.o, $(wildcard ./*.c ))
      target = app
      
      $(target): $(src)
          $(CC) $^ -o $@
      
      %.o : %.c
          $(CC) -c $< -o $@
      
      .PHONY:clean
      clean:
          rm $(src) -f
      

Comparison between static library and dynamic library

  • Static Library
    • Advantages:
      • Loading is fast
      • Do not need to provide static library when publishing the software
    • Disadvantages:
      • Consume system resource and waste memory (if two programs share the same static library)
      • Difficult to update, deploy and publish the software
  • Dynamic Library
    • Advantages:
      • Can share resources between different processes
      • Easy to update, deploy and publish the software
      • Can control when to load dynamic libraries
    • Disadvantages:
      • Loading is slow
      • Need to provide dynamic library when publishing the software

Dynamic library

  • Naming convention for dynamic library
    • Linux
      • libxxx.so (lib: prefix, xxx: name of the library, .so: postfix)
    • Windows
      • libxxx.dll
  • Generate dynamic library
    • Use gcc to get .o files
      • gcc -c -fpic/-fPIC file1.c file2.c
    • Use gcc to get dynamic library
      • gcc -shared a.o b.o -o libxxx.so
  • Use dynamic library
    • gcc main.c -o main -I ./include/ -l calc -L ./lib/
    • ./main: error while loading shared libraries: libcalc.so: cannot open shared object file: No such file or directory
      • Explanation: When the program starts, dynamic library will be loaded into the memory and the program will use ldd (list dynamic dependencies) to check the dependency of dynamic library. However, our executable program only knows the name of the dynamic library; therefore we need to use system dynamic loader to get the absolute path.
      • ld-linux.so will be responsible for loading dynamic library for executable files with elf format. It will search elf file in this order: DT_RPATH segment -> Environment variables (LD_LIBRARY_PATH) -> /etc/ld.so.cache file list -> /lib/, /usr/lib/. When the loader finds the target dynamic library file, it will load it into the memory.
    • Correct methods
      • Approach1: Set up the environment variable (Temporary):
        • export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/jason/Linux/lesson06/library/lib
        • echo $LD_LIBRARY_PATH
      • Approach2: Set up the environment variable (Permanent | User Level):
        • vim .bashrc
        • insert the command
        • . .bashrc
      • Approach3: Set up the environment variable (Permanent | System Level):
        • sudo vim /etc/profile
        • insert the command
        • . /etc/profile
      • Approach4: Modify /etc/ld.so.cache file list
        • sudo vim /etc/ld.so.conf
        • paste the path (eg: /home/jason/Linux/lesson06/library/lib)
        • sudo ldconfig

Static library

  • Two types of libraries
    • Static library is copied into the program during the linking stage.
    • Dynamic library is loaded into the memory for program to invoke during the running stage.
  • Advantages of using libraries
    • Code are confidential.
    • Convenient to deploy and distribute
  • Naming convention for static library
    • Linux
      • libxxx.a (lib: prefix, xxx: name of the library, .a: postfix)
    • Windows
      • libxxx.lib
  • Generate static library
    • Use gcc to get .o files
      • gcc -c file1.c file2.c
    • Pack .o files using ar tool (archive)
      • ar rcs libxxx.a file1.o file2.o
        • r: insert files into library files
        • c: build library files
        • s: create index for files
  • Use static library
    • gcc main.c -o app -I ./include/ -l calc -L ./lib/
    • -I specifies the search directory for the header file
    • -l specifies the name of the library to be used when the program is compiled
    • -L specifies the path of the library to be used when the program is compiled

GCC

  • GCC is called GNU C Compiler. GNU compiler collection includes compilers for C, C++, Objective-C, Java, and libraries for these programming languages.
  • Compiling Process:
    • Preprocessor <- Source Code (Copy header file, delete comment, macro replacement…)
      • gcc test.c -E -o test.i
    • Complier <- Processed Code (Generate assembly code)
      • gcc test.i -S -o test.s
    • Assembler <- Assembly Code (Generate target code)
      • gcc test.s -s -o test.o
    • Linker <- Target Code, Starting Code, Library Code, Other Target Code (Generate executable program)
      • ./test.o
  • Tip: C++ program can also be compiled by gcc. However, it cannot automatically link with libraries that C++ program uses. That’s why we use g++ to compile C++ code. (can also use gcc -lstdc++)
  • GCC Parameters:
    • -I directory: Specify the search directory for the header file
    • -g: Generate debugging information at compile time so that the program can be debugged by a debugger
    • -D: At program compile time, specify a macro
    • -w: Does not generate any warning message
    • -Wall: Generate all error messages
    • -On: n can be 0 - 3, four levels of compiler optimization options. -O0 means no optimization, -O1 is default option
    • -l: Specify the name of the library to be used when the program is compiled
    • -L: Specify the path of the library to be used when the program is compiled
    • -fPIC/fpic: Generate location-independent code
    • -shared: Generate shared target files
    • -std: Specify language standard