如何从文本文件的行中删除特定单词？

我的文本文件如下所示：

Liquid penetration 95% mass (m) = 0.000205348 Liquid penetration 95% mass (m) = 0.000265725 Liquid penetration 95% mass (m) = 0.000322823 Liquid penetration 95% mass (m) = 0.000376445 Liquid penetration 95% mass (m) = 0.000425341

现在我想从我的行中删除Liquid penetration 95% mass (m)以仅获取值。我该怎么办？

如果只有一个=符号，你可以删除之前的所有内容，包括=这样：

 $ sed -r 's/.* = (.*)/\1/' file 0.000205348 0.000265725 0.000322823 0.000376445 0.000425341

如果要更改原始文件，请在测试后使用-i选项：

 sed -ri 's/.* = (.*)/\1/' file

笔记

-r使用ERE所以我们不必逃避(和)
s/old/new用s/old/new替换old
.*任意数量的任何字符
(things)稍后用\1 ， \2等将things保存到反向引用中。

这是awk的工作; 假设值仅出现在最后一个字段中（根据您的示例）：

 awk '{print $NF}' file.txt

NF是一个awk变量，扩展到记录（行）中的字段数，因此$NF （注意前面的$ ）包含最后一个字段的值。

例：

 % cat temp.txt Liquid penetration 95% mass (m) = 0.000205348 Liquid penetration 95% mass (m) = 0.000265725 Liquid penetration 95% mass (m) = 0.000322823 Liquid penetration 95% mass (m) = 0.000376445 Liquid penetration 95% mass (m) = 0.000425341 % awk '{print $NF}' temp.txt 0.000205348 0.000265725 0.000322823 0.000376445 0.000425341

我决定比较这里列出的不同解决方案。为此，我根据OP提供的内容创建了一个大文件：

我创建了一个名为input.file的简单文件：

 $ cat input.file Liquid penetration 95% mass (m) = 0.000205348 Liquid penetration 95% mass (m) = 0.000265725 Liquid penetration 95% mass (m) = 0.000322823 Liquid penetration 95% mass (m) = 0.000376445 Liquid penetration 95% mass (m) = 0.000425341

然后我执行了这个循环：

 for i in {1..100}; do cat input.file | tee -a input.file; done

终端窗口被阻止。我从另一个终端执行了killall tee 。然后我通过命令检查文件的内容： less input.file和cat input.file 。它看起来不错，除了最后一行。所以我删除了最后一行并创建了一个备份副本： cp input.file{,.copy} （因为使用inplace选项的命令）。
进入文件input.file的行的最终计数是2 192 473 。我通过命令wc获得了这个数字：
```
 $ cat input.file | wc -l 2192473 
```

以下是比较结果：

grep -o '[^[:space:]]\+$'

 $ time grep -o'[^ [：space：]] \ + $'input.file> output.file

真正的0m58.539s
用户0m58.416s
 sys 0m0.108s

sed -ri 's/.* = (.*)/\1/'

 $ time sed -ri的/.* =（。*）/ \ 1 /'input.file

真正的0m26.936s
用户0m22.836s
 sys 0m4.092s

或者，如果我们将输出重定向到新文件，则命令更快：

 $ time sed -r's /.* =（。*）/ \ 1 /'input.file> output.file

真正的0m19.734s
用户0m19.672s
 sys 0m0.056s

gawk '{gsub(".*= ", "");print}'

 $ time gawk'{gsub（“。* =”，“”）; print}'input.file> output.file

真正的0m5.644s
用户0m5.568s
 sys 0m0.072s

rev | cut -d' ' -f1 | rev

 $ time rev input.file |  cut -d'' -  f1 |  rev> output.file

真正的0m3.703s
用户0m2.108s
 sys 0m4.916s

grep -oP '.*= \K.*'

 $ time grep -oP'。* = \ K. *'input.file> output.file

真正的0m3.328s
用户0m3.252s
 sys 0m0.072s

sed 's/.*= //' （分别使用-i选项使命令慢几倍）

 $ time sed的/.*= //'input.file> output.file

真正的0m3.310s
用户0m3.212s
 sys 0m0.092s

perl -pe 's/.*= //' （- -i选项在这里不会产生很大的生产力差异）

 $ time perl -i.bak -pe的/.*= //'input.file

真正的0m3.187s
用户0m3.128s
 sys 0m0.056s

 $ time perl -pe的/.*= //'input.file> output.file

真正的0m3.138s
用户0m3.036s
 sys 0m0.100s

awk '{print $NF}'

 $ time awk'{print $ NF}'input.file> output.file

真正的0m1.251s
用户0m1.164s
 sys 0m0.084s

cut -c 35-

 $ time cut -c 35- input.file> output.file

真正的0m0.352s
用户0m0.284s
 sys 0m0.064s

cut -d= -f2

 $ time cut -d = -f2 input.file> output.file

 真正的0m0.328s 
  用户0m0.260s 
  sys 0m0.064s

这个想法的来源。

使用grep和-P来获得PCRE （将模式解释为P erl- C ompatible R egular E xpression）和-o单独打印匹配模式。 \K通知将忽略匹配的部分在它之前。

 $ grep -oP '.*= \K.*' infile 0.000205348 0.000265725 0.000322823 0.000376445 0.000425341

或者您可以使用cut命令。

 cut -d= -f2 infile

由于行前缀始终具有相同的长度（34个字符），因此您可以使用cut ：

 cut -c 35- input.txt > output.txt

使用rev反转文件的内容，将输出管道cut为空格作为分隔符，将1作为目标字段，然后再将其反转以获取原始编号：

 $ rev your_file | cut -d' ' -f1 | rev 0.000205348 0.000265725 0.000322823 0.000376445 0.000425341

这很简单，简单，易于编写，理解和检查，我个人喜欢它：

 grep -oE '\S+$' file

在使用-E或-P调用时，Ubuntu中的grep将速记 \s表示空白字符（实际上通常是空格或制表符）， \S表示任何不是一个空格字符。使用量词+和行尾锚$ ， 模式\S+$匹配一行末尾的一个或多个非空格 。您可以使用-P而不是-E ; 这种情况下的含义相同，但使用了不同的正则表达式引擎，因此它们可能具有不同的性能特征。

这相当于Avinash Raj的评论解决方案（只是使用更简单，更紧凑的语法）：

 grep -o '[^[:space:]]\+$' file

如果在数字之后可能有尾随空格，则这些方法将不起作用。它们可以被修改，但是我认为这里没有任何意义。尽管在更多情况下将解决方案概括为有时有时是有益的，但这种做法几乎与人们倾向于假设一样频繁，因为人们通常无法知道问题可能最终需要的多种不同方式中的哪种方式。被推广。

性能有时是一个重要的考虑因素这个问题没有规定输入非常大，而且这里发布的每个方法都足够快。但是，如果需要速度，这里是千万行输入文件的小基准：

 $ perl -e 'print((<>) x 2000000)' file > bigfile $ du -sh bigfile 439M bigfile $ wc -l bigfile 10000000 bigfile $ TIMEFORMAT=%R $ time grep -o '[^[:space:]]\+$' bigfile > bigfile.out 819.565 $ time grep -oE '\S+$' bigfile > bigfile.out 816.910 $ time grep -oP '\S+$' bigfile > bigfile.out 67.465 $ time cut -d= -f2 bigfile > bigfile.out 3.902 $ time grep -o '[^[:space:]]\+$' bigfile > bigfile.out 815.183 $ time grep -oE '\S+$' bigfile > bigfile.out 824.546 $ time grep -oP '\S+$' bigfile > bigfile.out 68.692 $ time cut -d= -f2 bigfile > bigfile.out 4.135

我运行了两次，以防订单重要（因为它有时会对I / O繁重的任务造成影响），因为我没有可用的机器在后台没有做其他可能会导致结果偏差的东西。从这些结果我得出以下结论，至少暂时和我使用的大小的输入文件：

哇！传递-P （使用PCRE ）而不是-G （未指定方言时的默认值）或-E使grep速度超过一个数量级。因此对于大文件，使用此命令可能比上面显示的更好：
```
 grep -o P '\S+$' file 
```
哇！！ αғsнιη的答案中的cut方法， cut -d= -f2 file ，比我的方式更快的版本快了一个数量级！它也是pa4080基准测试中的赢家，它涵盖了比这更多的方法，但输入更少 – 这也是我选择所有其他方法的原因，包括在我的测试中。如果性能很重要或文件很大，我认为应该使用αғsнιη的cut方法。

这也提醒我们， 不应该忘记简单的cut和paste实用程序 ，并且应该在适用的时候首选它，即使有更复杂的工具，如grep ，通常作为一线解决方案提供（我是个人比较习惯使用）。

perl – s ubstitute模式/.*= /带空字符串// ：

 perl -pe 's/.*= //' input.file > output.file

 perl -i.bak -pe 's/.*= //' input.file

来自perl --help ：

 -e program one line of program (several -e's allowed, omit programfile) -p assume loop like -n but print line also, like sed -i[extension] edit <> files in place (makes backup if extension supplied)

sed – 用空字符串替换模式：

 sed 's/.*= //' input.file > output.file

或（但比上面的速度慢） ：

 sed -i.bak 's/.*= //' input.file

我提到这种方法，因为它比Zanna的答案快几倍。

gawk – 将模式".*= "替换为空字符串"" ：

 gawk '{gsub(".*= ", "");print}' input.file > output.file

从man gawk ：

 gsub(r, s [, t]) For each substring matching the regular expression r in the string t, substitute the string s, and return the number of substitutions. If t is not supplied, use $0...

如何从文本文件的行中删除特定单词？

笔记

如何从命令行关闭vim？

为什么pkexec比gksudo更喜欢图形应用程序？

如何将终端命令发送到TTY终端

右键单击从鹦鹉螺打开终端

使用命令行设置麦克风输入音量？

如何编写Subversionvalidation脚本？

通过GUI /命令行启用/禁用无密码登录/自动登录

增加gnome终端的填充

//在路径中是什么意思？

笔记本电脑USB端口停止工作：如何在不重新启动PC的情况下重启它们？