In this months column, we feature a set of computer science interview questions.
Last month, we had covered a set of questions on machine learning and natural language processing. We continue our discussion of computer science questions in this months column, focusing on operating systems.
1. We are all familiar with different sorting and searching algorithms. However, these algorithms typically are designed for data that fits in physical memory. Consider that you are given two matrices, A and B, of dimensions MXN and NXR. These dimensions are such that neither of the matrices will fit in the main memory. You are asked to write an algorithm for multiplying the two matrices and storing the result into a matrix C of dimensions MXR on the disk. Can you outline the approach you would take to minimise the number of disk accesses that your matrix_multiply function would perform?
2. Now, with regard to this same problem, you are told that matrices A and B are such that most of their elements are zero. You are again asked to compute the matrix multiplication of these two matrices. Can you come up with an algorithm which minimises the number of disk accesses?
3. In operating systems, what is meant by page thrashing? When does it occur? How can you avoid page thrashing?
4. Consider the following code snippet:
int main(int argc, char *argv[]) { void *my_mem = 0; int alloc_blocks = 0; while(1) { my_mem = (void *) malloc(1024 * 1024); if (!my_mem) break; memset(my_mem,1, (1024*1024)); printf(allocated memory blocks is %d MB\n,++alloc_blocks); } exit(0); }
Can you explain what would happen when you compile and run this program?
5. Consider this same code snippet and assume that you are asked to modify the code so as to remove the call to the memset function. Can you explain what would happen when you compile and run the code?
6. On your Linux machine, you see the following message on the /var/log/messages file:
”kernel: Out of memory: Kill process 2000 (a.out) score 511 or sacrifice child”
- How can you figure out why Process 2000 was chosen to be terminated?
- If you want to make the termination of Process 2000 unlikely, how would you do that?
- In the scenario above, only the Process a.out with the PID 2000 got terminated. Instead, you want to configure your system such that the OS gets restarted whenever an out-of-memory condition occurs. How would you achieve this?
7. Linux offers different kernel memory allocators like the standard SLAB allocator, the SLUB allocator and the SLOB (Simple List Of Blocks) allocator. You are asked to set up a Linux system which has been configured with 4GB of physical memory. Which memory allocator would you use and why?
8. Looking at Question 7 further, you are asked to configure the Linux kernel for an embedded system with a physical memory of only 256MB. What would be your choice of memory allocator? Can you explain the rationale behind your choice?
9. Consider that you are running your application on a Linux system that has 4GB of physical memory. You want to determine the maximum memory footprint of your application. Explain how you will find out how much physical memory is being used by your application?
10. When does the Linux operating system start swapping processes to the swap device configured? How can you change this behavior?
11. Consider a case in which your user application is stuck in an infinite loop during which it invokes itself recursively. How would you prevent the stack overflow signal from killing your application?
12. Now considering the same problem in Question 11 — instead of your user space application being stuck in an infinite loop, you have written a kernel module where a function is getting called in a recursive loop without any termination clause. Would this cause a Stack overflow message?
13. Linux supports the creation of multiple file systems such as ext3, ext4 and NFS on the same physical device. Can you explain how the kernel supports multiple different file systems? For example, when a read request comes for a file located on an ext3 partition and a write request comes for a file on an ext4 partition, how does the kernel appropriately invoke the right functions?
14. You are asked to write a C code snippet which can print out the current value of the stack pointer (without invoking any assembly instructions/routines). How would you do this? Can you explain whether this is feasible? If not, explain why it may not be feasible? (Clue: think of compiler optimisations.)
15. You are running a server application which is latency-sensitive. You want to ensure that the virtual address space of the server process is locked into physical memory (essentially none of its pages should be paged out). How would you ensure this? Assume that during the course of its run, the server application can map further memory pages by dynamic memory allocations or through the mmap call. You need to ensure that any future pages mapped by the server also remain locked in physical memory. Explain how this can be achieved.
16. Exploring Question 15 further, given that you have locked all the pages of your server application to remain in physical memory, is it guaranteed that your application will never get a page fault? The server application has mmaped a file in to its address space and the corresponding physical address is, say, X. Is it guaranteed that X will never change while the application runs?
17. In the last couple of questions, we considered how a user space application can force its pages to be locked into physical memory. Now , assume you are writing a kernel module which needs to ensure that its virtual memory pages are locked in physical memory. How would you ensure this?
18. Consider that you are developing a database server application. You know from the various databases you have worked with in the past that database transactional logging can significantly add to performance overheads. Is it possible for you to eliminate the performance overhead due to logging by turning off the transactional logging in the database? If your answer is No, explain why.
19. Delving deeper into Question 18, if you want to reduce the performance overhead caused by database logging (without turning off the logging completely), what are the ways to achieve this?
20. What are in-memory databases? How do they ensure that the database remains consistent in the event of a power failure or system crash?
If you have any favorite programming questions/software topics that you would like to discuss on this forum, please send them to me, along with your solutions and feedback, at sandyasm_AT_yahoo_DOT_com. Till we meet again next month, happy programming!