You can use a combination of terminal commands like awk, sort, uniq, and cut to analyze an Apache log file and find out which files are called the most and the bandwidth they are using.

To see which files are called the most:

awk '{print $7}' access.log | sort | uniq -c | sort -nr | head -n 10

This command does the following:

  1. awk '{print $7}' access.log: Extracts the seventh column from the log file, which typically contains the requested file.
  2. sort: Sorts the output alphabetically.
  3. uniq -c: Counts the occurrences of each unique line.
  4. sort -nr: Sorts the counted occurrences numerically in reverse order (to get the most accessed files first).
  5. head -n 10: Displays the top 10 most accessed files.

To find out the bandwidth used by each file, you’ll need to calculate the total bytes transferred for each file:

awk '{sum[$7]+=$10} END {for (i in sum) print i, sum[i]}' access.log | sort -nrk2 | head -n 10

This command does the following:

  1. awk '{sum[$7]+=$10} END {for (i in sum) print i, sum[i]}' access.log: Calculates the sum of bytes transferred for each unique file requested (seventh column) and prints the result.
  2. sort -nrk2: Sorts the output based on the second column (the sum of bytes transferred) in reverse numerical order.
  3. head -n 10: Displays the top 10 files by bandwidth usage.

Replace access.log with the actual name of your Apache log file.