Pipex is a project that requires you to implement shell pipe |
in C.
Table of contents
Open Table of contents
Introduction
During Piscine, you worked on some exercises in Shell01 that required using ‘pipe’ (i.e. cmd1 | cmd2). Let us revise below question from Shell01.
Question:
Write a command line that displays your machine’s MAC addresses. Each address must be followed by a line break.
Answer:
ifconfig | grep "ether " | cut -d ' ' -f 2
Similar to a warehouse packing conveyor belt where individual machines handle specific tasks, such as placing packages on the belt and performing the packing, the original task in the preceding question is broken down into several subtasks. Each subtask is managed by a distinct command, and the output is subsequently transferred to the next command for processing through a ‘pipe’. This approach allow you to solve complex problems just by arranging Linux commands to work together.
Understand how pipe() works
Reference: Linux Manual Page
Let’s look at its prototype
int pipe(int pipefd[2]);
pipe() creates a directional channel that data flows from pipefd[1] (write-end) to pipefd[0] (read-end). Data written to the write-end is buffered by the kernel until it is read from the read-end. Here is the sample code:
#include <stdio.h>
#include <string.h>
#include <unistd.h>
int main()
{
int n, pipefd[2];
char buf[1025], *data = "hello... this is sample data";
pipe(pipefd);
write(pipefd[1], data, strlen(data));
if ((n = read(pipefd[0], buf, 1024)) >= 0)
{
buf[n] = 0; /* terminate the string */
printf("read from the pipe: \"%s\"\n", buf);
}
return 0;
}
// Output:
// read from the pipe: "hello... this is sample data"
Understand how fork() works
In C, fork() duplicates the current process. The new process is referred to as the child process, and the original process is referred to as the parent process. The child process is an exact copy of the parent process, except that it has a different process ID and a different parent process ID. The process ID of the child process is returned by fork() to the parent process, and the value 0 is returned to the child process. The child process and the parent process continue to execute from the point where fork() was called. The child process and the parent process have their own copies of the data segment, the heap, and the stack. The child process and the parent process do not share these segments.
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main() {
int data = 10;
pid_t pid;
pid = fork();
if (pid == 0) { // if it is child process
data = -1; // child process has a copy of data
// hence, this change does not reflect in
// variable data in the parent process
} else {
// this is parent process
waitpid(pid, NULL, 0); // wait till child process is finished
printf("data = %d\n", data); // data should not be affected by the child process
}
return 0;
}
Output
data = 10
Understand how dup2() works
dup2(newfd, oldfd) works like a redirection from oldfd to newfd. Any data written to oldfd will be written to newfd instead. Let us look at below two example
Example 1
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
int main() {
int file = open("output.txt", O_WRONLY | O_CREAT, 0644);
int stdout = 1;
dup2(stdout, file);
// writing to output.txt will be redirected to stdout
// hence, output.txt does not contain anything
write(file, "example of dup2\n", 16);
close(file);
return 0;
}
stdout
example of dup2
output.txt
Example 2
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
int main() {
int file = open("output.txt", O_WRONLY | O_CREAT, 0644);
int stdout = 1;
dup2(file, stdout);
// writing to stdout will be redirected to output.txt
// hence, stdout does not contain anything
// but output.txt contains the written string
write(stdout, "example of dup2\n", 16); //
close(file);
return 0;
}
stdout
output.txt
example of dup2
How to solve pipex
Step 1: Validate the input
For example,
- Check if the number of arguments is correct
- Check if the input file exists
- Check if the output file is writable or can be created
- Check if a pipe can be created
Step 2: Create a child process
The child process will execute the first command with the input file descriptor being duplicated to the stdin
and the stdout
is duplicated to the write-end of the pipe.
Step 3: Handle the parent process
The parent process will execute the second command with the read-end of the pipe being duplicated to the stdin
and the stdout
is duplicated to the output file descriptor.
Here is the scaffold
run_cmd(cmd, input_fd, output_fd) {
dup2(input_fd, 0); // duplicate input_fd to stdin
dup2(output_fd, 1); // duplicate output_fd to stdout
// execute the command
}
main() {
var pipefd[2];
pipe(pipefd); // pipefd[1] is the write-end of the pipe
// pipefd[0] is the read-end of the pipe
fin = open(input_file, O_RDONLY); // input file descriptor
fout = open(output_file, O_WRONLY | O_CREAT | O_TRUNC, 0644); // output file descriptor
pid = fork();
if (pid == 0) // child process
{
run_cmd(cmd1, fin, pipefd[1]);
}
else // parent process
{
close(pipefd[1]) // must close the write end before reading from the pipe
run_cmd(cmd2, pipefd[0], fout);
}
}
Step 4: Handle the command
Scenario 1: Command is the custom command (e.g. ./custom_cmd)
In this case, we need to check if the path of the custom command is valid using access()
function.
access(cmd, X_OK) // check if the command is executable
Scenario 2: Command is the built-in command (e.g. ls, cat, grep, etc.)
If the access check above fails, it means the path to the command is stored in PATH environment variable. You need to locate the command from the PATH in the env
strings passed to the main function, i.e. int main(int argc, char **argv, char **env)
.
Here is the psuedo code
get_path(env) {
for each string in env {
if string starts with "PATH=" {
return string
}
}
}
get_cmd_path(path, cmd) {
paths = split(path, ":")
for each directory in path {
if access(directory/cmd, X_OK) == 0 {
return directory/cmd
}
}
}
Scenario 3: Command contains double quotes (e.g. grep “hello world”)
In practice, you will get argv[3] = "grep \"hello world\""
if grep “hello world” is passed to the main function. You will need to split argv[3]
into two strings “grep” and ""hello world"", then remove the double quotes from the second string by copying only non-double-quote characters to the new string.
Step 5: Test your code
After you complete your code, you can test using my Pipex Checker