egrep、fgrep以及文本查看和处理工具wc,cut,sort,uniq,diff,patch_综合

之前我们已经了解过grep，现在我们对grep中的两个特殊的选项做一下说明：

-E 支持扩展正则表达式元字符
-F 不支持正则表达式

由此，便可引出今天的主人公egrep和fprep。

eprep 支持扩展正则表达式，实现类似于grep的文本过滤功能

用法：
# egrep [option] PATTERN [FILE…]

选项：

-i 忽略字符大小写
-o 仅显示匹配到的字符本身
-v 仅显示未被匹配到的行
-q 不显示任何信息
-A # （After）显示匹配行及其后#行
-B # （Before）显示匹配行及其前#行
-C # （Context）显示匹配行及其前后各#行
-G 支持基本正则表达式
-F 不支持正则表达式

扩展正则表达式的元字符

字符匹配
. （点）：匹配任意单个字符
[ ] :匹配指定范围内的任意单个字符
[^] :匹配指定范围外的任意单个字符

次数匹配

* 匹配其前面的字符0次、1次或多次
? 匹配其前面的字符0次或1次
+ 匹配其其前面的字符至少一次
{ m} 其前面字符出现m次
{m,n} 其前面的字符至少出现m次，至多出现n次
{0,n} 最多出现n次
{m,} 最少出现m次

位置锚定
^ 行首锚定
$ 行尾锚定
< , \b 词首锚定
> , \b 词尾锚定

分组及引用
( ) 分组，括号内的模式匹配到的字符会被记录到正则表达式引擎的内部变量中，这种用法称为“后向引用”。

变量同基本正则表达式：
\1 \2 \3…

或
a | b 表示a或者b
C | cat 表示C或者cat
(C | c)at 表示Cat或者cat

fgrep 不支持正则表达式元字符

用法：
当无需用到元字符去编写模式时，使用fgrep更好。

选项：

-G 支持基本正则表达式
-E 支持扩展基本正则表达式

下面我们趁热打铁，使用egrep完成以下练习：

1，找出/proc/meminfo文件中所有大写或小写s开头的行

我们给出以下三种方式实现：

[jeffrey@localhost ~]$ egrep "^(s|S)" /proc/meminfo
SwapCached:           20 kB
SwapTotal:       2097148 kB
SwapFree:        2097128 kB
Shmem:              8836 kB
Slab:             119492 kB
SReclaimable:      57812 kB
SUnreclaim:        61680 kB[jeffrey@localhost ~]$ egrep "^[sS]" /proc/meminfo
SwapCached:         2984 kB
SwapTotal:       2097148 kB
SwapFree:        1990684 kB
Shmem:              5068 kB
Slab:              87484 kB
SReclaimable:      30260 kB
SUnreclaim:        57224 kB[jeffrey@localhost ~]$ egrep -i "^s" /proc/meminfo
SwapCached:         2984 kB
SwapTotal:       2097148 kB
SwapFree:        1990684 kB
Shmem:              5068 kB
Slab:              87484 kB
SReclaimable:      30260 kB
SUnreclaim:        57224 kB

2，显示当前系统上root、contos和user用户的相关信息

[jeffrey@localhost ~]$ egrep "^(root|centos|user)\>" /etc/passwd
root:x:0:0:root:/root:/bin/bash
user:x:1003:1004::/home/user:/bin/csh

3，找出/etc/rc.d/init.d/functions文件中某单词后面跟一个小括号的行

[jeffrey@localhost ~]$ egrep "[_[:alnum:]]+\(\)" /etc/rc.d/init.d/functions
checkpid() {
__kill_pids_term_kill_checkpids() {
__kill_pids_term_kill() {
__pids_var_run() {
__pids_pidof() {
daemon() {
killproc() {
.....

4，使用echo命令输出一绝对路径，使用egrep取出基名和路径名

[jeffrey@localhost ~]$ echo /etc/rc.d/init.d/functions | egrep -o "[^/]+/?$"
functions[jeffrey@localhost ~]$ echo /etc/rc.d/init.d/functions | egrep -o "^/.*/"
/etc/rc.d/init.d/

5，找出ifconfig命令结果中的1-255之间的数值

[jeffrey@localhost ~]$ ifconfig | egrep -o "\<([1-9]|[1-9][0-9]|1[0-9][0-9]|2[0-5][0-5])\>"
192
168
81
140
255
255
255
192
168
81
255
64
29
27
86
18
1
1
4
73
127
1
255
1
128
1
146
12
2
146
12
2
192
168
122
1
255
255
255
192
168
122
255
52
54
31

6，找出ifconfig命令结果中的IP地址

7，添加用户bash、testbash、basher以及nologin（其shell为/bin/nologin）；而后找出/etc/passwd文件中用户名同shell名的行

[root@localhost jeffrey]# egrep "^([^:]+\>).*\1$" /etc/passwd
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
nologin:x:1007:1008::/home/nologin:/sbin/nologin

接下来，我们继续学习一些文本查看及处理的命令：wc,cut,sort,uniq,diff,patch

wc显示文本中的行数、单词数和字节数（word count）

用法
# wc [option]… [FILE]…

选项

-l 只显示文本行数(lines)
-w 只显示文本中的单词数(words)
-c 只显示文本中的字节数(bytes)

cut显示文本中每行截取的某些字段

用法
# cut [option]… [FILE]…

选项

-d CHAR 以CHAR为分隔符（-d与CHAR之间的空格可有可无）
-f FIELDS 挑选某些字段
-f # ：指定的某个字段
-f #-# ：指定的连续字段
-f #,# ：指定的离散字段

示例：
查看文件/etc/passwd中的行数、单词数和字节数

[jeffrey@localhost ~]$ wc /etc/passwd50   94 2554 /etc/passwd

截取/etc/passwd中每行的第1、3，4，7个字段

[jeffrey@localhost ~]$ cut -d: -f 1,3-4,7 /etc/passwd
user:1003:1004:/bin/csh
bash:1004:1005:/bin/csh
bashtext:1005:1006:/bin/csh
basher:1006:1007:/bin/csh
nologin:1007:1008:/sbin/nologin

sort对文本中的行进行排序

用法：
# sort [option]… [FILE]…

选项：

-t CHAR 指定字段分隔符
-k # 指定对字段#进行比较排序
-n 根据数值大小而不是字符大小排序
-r 逆序排序
-f 忽略字符大小写
-u 只显示重复行的一行（连续且重复的行称为重复行）

示例：
查看系统中用户的默认shell

[jeffrey@localhost ~]$ cut -d: -f7 /etc/passwd | sort
/bin/bash
/bin/bash
/bin/csh
/bin/csh
/bin/csh
/bin/csh
/bin/csh
/bin/csh
/bin/sync
/sbin/halt
/sbin/nologin

查看系统中用户的默认shell，不含重复行，并计算其种类数目

[jeffrey@localhost ~]$ cut -d: -f7 /etc/passwd | sort -u 
/bin/bash
/bin/csh
/bin/sync
/sbin/halt
/sbin/nologin
/sbin/shutdown[jeffrey@localhost ~]$ cut -d: -f7 /etc/passwd | sort -u | wc -l
6

uniq报告或删除重复的行

用法：
# uniq [option]… [INPUT[OUTPUT]]

选项：

-c 显示行重复次数
-u 显示未曾重复的行
-d 显示有重复的行

示例：

[jeffrey@localhost ~]$ cut -d: -f7 /etc/passwd | uniq -c1 /bin/bash4 /sbin/nologin1 /bin/sync1 /sbin/shutdown1 /sbin/halt34 /sbin/nologin1 /bin/bash6 /bin/csh1 /sbin/nologin
[jeffrey@localhost ~]$ cut -d: -f7 /etc/passwd | uniq -c -u1 /bin/bash1 /bin/sync1 /sbin/shutdown1 /sbin/halt1 /bin/bash1 /sbin/nologin
[jeffrey@localhost ~]$ cut -d: -f7 /etc/passwd | uniq -c -d4 /sbin/nologin34 /sbin/nologin6 /bin/csh

diff：逐行比较文件之间的差异

用法：
# diff [option]… FILES
# diff FILE1 FILE2 > PATCH_FILE 生成补丁文件

选项：

-u 使用unfined机制，显示要修改行的上下文，默认为3行

示例：

[jeffrey@localhost ~]$ cat lovers.txt
He loves his lover
He likes his lover
She likes her liker
She loves her liker[jeffrey@localhost ~]$ cat a.txt
He loves his lover
what are you?
He likes his lover
She likes her liker
She loves her liker[jeffrey@localhost ~]$ diff lovers.txt a.txt
1a2
> what are you?

patch 向文件打补丁

用法：
# patch [option] -i PATCH_FILE OLDFILE
# patch OLDFILE < PATCH_FILE

示例：
先生成补丁文件，再打补丁

[jeffrey@localhost ~]$ diff lovers.txt a.txt > patch
[jeffrey@localhost ~]$ patch lovers.txt < patch
[jeffrey@localhost ~]$ cat lovers.txt
He loves his lover
what are you?
He likes his lover
She likes her liker
She loves her liker