Comprehensive Guide to Using the Linux Cut Command

Comprehensive Guide to Using the Linux Cut Command

The cut command in Linux is an essential utility for text processing, designed to extract particular segments from each line of a file or from piped input. This command does not modify the original file but instead reads the data and displays the desired portions in the standard output. In this guide, we will delve into the functionality of the cut command in Linux and provide practical, real-world examples to demonstrate its usage.

Exploring the cut Command

The cut command is instrumental for anyone dealing with structured text, facilitating effective data manipulation and extraction within Unix-like environments. By extracting portions of a line based on byte positions, character positions, separators, or fields, cut proves invaluable for filtering and organizing data in shell scripts and command-line operations. Its applications range from retrieving specific columns from CSV files to trimming unnecessary characters or analyzing logs. Although often employed with files directly, cut also seamlessly interacts with the output of other commands when harnessed in a pipeline.

Basic Syntax of the cut Command

The cut command is straightforward, utilizing options followed by a file name. The syntax is as follows:

cut [OPTIONS] [FILE]

In this structure, OPTIONS dictate how the cut command operates, allowing you to select a field separator (like a comma), choose specific fields, set ranges, and exclude lines missing the separator, among other functionalities. If a file isn’t specified, cut will read from standard input. Additionally, you can provide multiple files, which will be treated as a combined entity for processing.

Commonly Used Options

The cut command offers a variety of options to pinpoint exact segments of text to extract. Here are some of the most frequently used:

  • -f or –fields=LIST: Allows selection of specific fields based on a designated delimiter.
  • -b or –bytes=LIST: Extracts specified bytes from each line.
  • -c or –characters=LIST: Retrieves specific characters from each line.
  • -d or –delimiter: Sets a custom delimiter instead of the default tab.
  • –complement: Outputs everything except the specified fields, bytes, or characters.
  • -s or –only-delimited: Skips lines lacking the delimiter; such lines are included by default.
  • –output-delimiter: Allows selection of a different delimiter for the output, contrasting with the input delimiter.

The -f, -b, and -c options utilize a LIST to define what to extract. You can specify the following:

  • A single number like 2.
  • Multiple numbers separated by commas, like 1, 3, 5.
  • A range like 2-4 (to extract values from 2 to 4).
  • N- to denote extraction from position N to the end.
  • -M to signify extraction from the start up to position M.

Utilizing the cut Command in Linux

To illustrate how the cut command functions, let’s execute some practical examples. First, let’s create a sample file named “mte.csv” using the echo command:

echo -e "empID, empName, empDesig\n101, Anees, Author\n102, Asghar, Manager\n103, Damian, CEO" > mte.csv

Echo Command

Next, we can check the contents of the file using the cat command:

cat mte.csv

Show File Data

It’s crucial to mention that the cut command merely presents the specified output without changing the file itself.

Extracting Data By Characters

To extract characters by position, utilize the -c option with the cut command:

cut -c 1, 8 mte.csv

This command extracts the first and eighth characters from each row:

Cut By Characters

To extract characters within a specified range, apply the following command:

cut -c 1-8 mte.csv

This extracts characters from positions 1 to 8 in each row:

Cut By Range

Extracting By Byte

To extract specific bytes, utilize the -b option with the cut command:

cut -b 1-3 mte.csv

This command extracts the first three bytes from each line in the file mte.csv:

Cut By Bytes

Extracting By Field (Column)

To extract an entire field from a file, utilize the cut command with the -f and -d options:

cut -d', ' -f2 mte.csv

In this command, -d', ' designates a comma as a delimiter, while -f2 indicates that cut should extract the second field from each line:

Cut By Field

Implementing Custom Delimiters in cut

Though cut defaults to using a tab as a delimiter, if fields are separated by a different character, use -d to specify the correct one. For example, to extract the fifth word from a space-separated sentence, you can use:

echo "Hey! Geeks Welcome to Maketecheasier.com" | cut -d ' ' -f 5

Cut With Custom Delimiter

Excluding Specific Fields During Extraction

You can omit certain fields while extracting text from a file by employing the --complement option with the cut command. This option specifies that cut should output all fields apart from the designated ones:

cut -d', ' -f1 mte.csv --complement

This command skips the first column and returns the remainder of the content:

Cut With Complement

Modifying the Default Output Delimiter

By default, when extracting fields, the cut command retains the input delimiter in the output. However, you can alter the output delimiter by using the --output-delimiter option:

cut -d', ' -f1-3 --output-delimiter='-' mte.csv

This command utilizes a hyphen as a separator in the output:

Custom Output Delimiter

Combining cut with Other Linux Commands

The cut command can also be utilized in conjunction with other Linux commands using the pipeline | symbol. For instance, the following command extracts the first five characters from each output line of the who command:

who | cut -c 1-5

Cut With Who

In another example, you can use the cut command along with head to display the first two lines of “mte.csv, ” extracting only the empName and empDesig fields:

head -n 2 mte.csv | cut -d ', ' -f2, 3

Cut With Head

Navigating Irregular Data Formats with the Linux cut Command

The cut command excels when handling data that is well-formatted with consistent delimiters (like commas or tabs).However, if you encounter files with inconsistent spacing or mixed delimiters, applying cut alone may yield unsatisfactory results. To address these scenarios, it’s often beneficial to clean the data in advance using commands like tr or sed, ensuring cut can effectively extract the correct portions.

Managing Excess Spaces

Consider a file named “mteData.txt” where fields are separated by varying spaces:

cat mteData.txt

Show Sample Data

Since cut anticipates a single delimiter, utilize tr to normalize the spacing before applying cut:

cat mteData.txt | tr -s ' ' | cut -d ' ' -f1-2

This command processes “mteData.txt, ” replaces multiple spaces with a single one using tr, and then extracts the first two fields:

Cut With Tr

Managing Mixed Delimiters

In cases where a file uses a combination of spaces and commas, normalize the format with sed. For example, a file named “mteData1.txt” contains:

cat mteData1.txt

Show Sample Content

Utilize sed with cut to convert all spaces to commas and then extract the first and third fields:

sed 's/ /, /g' mteData1.txt | cut -d ', ' -f1, 3

Cut With Sed

Conclusion

Throughout this article, we’ve uncovered the functionalities of the Linux cut command, a vital tool for extracting data from files or piped inputs. With its simple syntax, you can effortlessly obtain characters, bytes, or fields based on a specified delimiter. Additionally, we showcased how to combine the cut command with other utilities such as tr, sed, and head to manage unclean data and achieve more efficient output. Whether you’re handling CSV files, analyzing logs, or cleansing data, the cut command is an indispensable asset for text processing in Unix-like environments.

Frequently Asked Questions

1. What is the primary purpose of the cut command in Linux?

The cut command in Linux is primarily used for extracting specific sections of text from files or the output of other commands. It enables users to manipulate and format data effectively based on delimiters, byte positions, or character positions.

2. Can I combine the cut command with other Linux commands?

Yes! The cut command can be seamlessly integrated with other Linux commands using the pipeline symbol (|).This allows for powerful data processing, enabling you to filter and format outputs from various commands.

3. How can I specify a custom delimiter when using the cut command?

You can specify a custom delimiter by using the -d option followed by the desired delimiter character. For example, to use a comma as a delimiter, you would use -d', '.

Source & Images

Leave a Reply

Your email address will not be published. Required fields are marked *