ls -l in the shell
For the past two weeks, my partner Afa and I have been writing, testing, and documenting our own linux shell. This is was a major end-of-quarter project for us in our studies at the Holberton School. This experience has given us much more insight into how a shell works, which we are happy to share with you here.
When the shell starts we are given a command prompt, the format of which is specified in the environment variable PS1. At this prompt the user is given the opportunity to enter commands and arguments. Once the user inputs a string of commands and arguments and presses <enter>, the shell carries out the command and then prints the prompt again.
This pattern of prompting the user, waiting for a command/argument string, processing the command/argument string, and then re-prompting the user continues in perpetuity until the user leaves the shell via the “exit” command (or upon "EOF"). From our standpoint as developers, this pattern is represented by the code of our shell_loop function.
When the user inputs a command/argument string and presses <enter> the shell starts by figuring out what to do with the first word (i.e. the command) in the string. The command always goes through a prioritized sequence of tests to determine its type. That sequence in a typical bash/dash shell is as follows:
1. alias, 2. special built-ins, 3. functions, 4. built-ins, 5. programs in $PATH defined folders
As soon as the shell matches the user’s command to one of the above listed executable types, the shell executes the program and then prompts the user again for another command/argument string. Before moving into greater detail, we’ll briefly touch on some of these different executable types:
Aliases
Aliases are user-defined “shortcuts” the shell checks first. To get the list of current aliases a system the user simply inputs the command “alias” and gets output that looks like this:
Built-Ins
Built-ins are commands programmed directly into the shell itself. When the built-in command is used as the first word of the user’s input string, the shell executes the command directly, without invoking an external program. Shells rely on built-ins to implement functionality that would be difficult, slow, or impossible to achieve with external programs. The list of built-ins in our dash shell includes:
- true
- alias
- cd
- echo
- eval
- exec
- exit
- export
- fp
- getopts
- hash
- pwd
- read
- readonly
- printf
- set
- shift
- test
- times
- trap
- type
- ulimit
- umask
- unalias
- unset
- wait
Due to time constraints, the shell that we wrote only includes a scant few built-in commands.
Programs in the PATH
After the shell fails to find a command in the aforementioned locations, its last resort is to check for programs in the PATH. The PATH is an environmental variable that lists directories that hold executable files. The shell will search through these directories looking for a match to the user-tendered command. When we type 'echo $PATH' in our shell we get this list of directories separated by colon delimiters:
A great in-depth piece on the PATH can be found here.
The Question of ls -l in the Shell
When the user inputs 'ls -l' and presses enter the shell first gets the line we just entered with the getline function. Then it parses (or tokenizes) the line into separate tokens (or words). Each token is separated in the original string by a delimiter, usually a space. The shell then takes the first token and treats it as the command. In the case of 'ls -l' the command is 'ls.' The shell takes our 'ls' command and checks to see if it is an alias and finds a match. Because we have alias 'ls' defined above as 'ls --color=auto' the shell recognizes the command as an alias and executes the alias definition in place of the alias. So 'ls -l' just became 'ls –color=auto -l'.
Now that our alias definition has replaced our alias command in the input stream, the new command (still ‘ls’) will be searched for by the shell. The shell will not find it as a built-in. It will not find it as a function. Finally, the shell searches each entry in the PATH in turn for the command ‘ls’. Each PATH entry consists of a directory name. When the shell finds ‘ls’ in a PATH directory it will execute the program.
Program Execution
Before executing the program, the shell will fork itself, creating a child process that inherits the environment of the shell. The child process then uses the execve system call to execute the program. The previously tokenized arguments ‘--color=auto’ and ‘-l‘ are now passed to the ls program, which then gives us the following output, before terminating the child process and giving us another opportunity to enter a command:
Syscalls
System calls are the fundamental interface between an application and the Linux kernel. The list of system calls we used in our shell included access, stat, write, waitpid, execve, exit, and fork. We suggest looking at the syscalls man page or this blog post for more in depth information on syscalls.