如何计算具有特定扩展名的文件及其所在的目录？

我想知道有多少常规文件在大型复杂目录结构中具有扩展名.c ，以及这些文件分布在多少个目录中。我想要的输出只是那两个数字。

我已经看到了关于如何获取文件数量的问题，但我还需要知道文件所在的目录数量。

我的文件名（包括目录）可能包含任何字符; 他们可能会开始. 或-并有空格或换行符。
我可能有一些符号链接，其名称以.c结尾，符号链接到目录。我不希望跟踪或计算符号链接，或者我至少想知道它们是否以及何时被计算在内。
目录结构有许多级别，顶级目录（工作目录）中至少有一个.c文件。

我急忙在（Bash）shell中写了一些命令来自己计算，但我不认为结果是准确的……

 shopt -s dotglob shopt -s globstar mkdir out for d in **/; do find "$d" -maxdepth 1 -type f -name "*.c" >> out/$(basename "$d") done ls -1Aq out | wc -l cat out/* | wc -l

这会输出关于模糊重定向，错过当前目录中的文件以及查找特殊字符（例如，重定向find输出在文件名中打印换行符）的投诉，并写入一大堆空文件（oops）。

如何可靠地枚举我的.c文件及其包含的目录？

如果它有帮助，这里有一些命令来创建一个具有错误名称和符号链接的测试结构：

 mkdir -p cfiles/{1..3}/{a..b} && cd cfiles mkdir space\ d touch -- ic -.c bad\ .c 'terrible .c' not-c .hidden.c for d in space\ d 1 2 2/{a..b} 3/b; do cp -t "$d" -- *.c; done ln -s 2 dirlink ln -s 3/b/ic filelink.c

在结果结构中，7个目录包含.c文件，29个常规文件以.c结尾（如果运行命令时dotglob关闭）（如果我错误装入，请告诉我）。这些是我想要的数字。

请随意不要使用此特定测试。

_{注意：任何shell或其他语言的答案都将由我测试和欣赏。} _{如果我必须安装新包，没问题。} _{如果你知道一个GUI解决方案，我鼓励你分享（但我可能不会安装一个完整的DE来测试它）:)我使用Ubuntu MATE 17.10。}

我没有用符号链接检查输出，但是：

 find . -type f -iname '*.c' -printf '%h\0' | sort -z | uniq -zc | sed -zr 's/([0-9]) .*/\1 1/' | tr '\0' '\n' | awk '{f += $1; d += $2} END {print f, d}'

find命令打印它找到的每个.c文件的目录名。
sort | uniq -c sort | uniq -c将告诉我们每个目录中有多少文件（这里sort可能是不必要的，不确定）
使用sed ，我用1替换目录名，从而消除所有可能的奇怪字符，只剩下计数和1
允许我用tr转换为换行符分隔的输出
然后我总结了awk，以获取文件总数和包含这些文件的目录数。注意，这里的d与NR基本相同。我可以省略在sed命令中插入1 ，并在这里打印NR ，但我认为这稍微清楚一点。

直到tr ，数据是NUL分隔的，对所有有效的文件名都是安全的。

使用zsh和bash，您可以使用printf %q来获取带引号的字符串，该字符串中不会有换行符。所以，你可以做类似的事情：

 shopt -s globstar dotglob nocaseglob printf "%q\n" **/*.c | awk -F/ '{NF--; f++} !c[$0]++{d++} END {print f, d}'

但是，即使**不应该将符号链接扩展到目录，我也无法在bash 4.4.18（1）（Ubuntu 16.04）上获得所需的输出。

 $ shopt -s globstar dotglob nocaseglob $ printf "%q\n" ./**/*.c | awk -F/ '{NF--; f++} !c[$0]++{d++} END {print f, d}' 34 15 $ echo $BASH_VERSION 4.4.18(1)-release

但是zsh运行正常，命令可以简化：

 $ printf "%q\n" ./**/*.c(D.:h) | awk '!c[$0]++ {d++} END {print NR, d}' 29 7

D使这个glob能够选择点文件. 选择常规文件（因此，不是符号链接），并且:h只打印目录路径而不打印文件名（如find ‘s %h ）（请参阅文件名生成和修饰符部分）。因此，使用awk命令，我们只需要计算出现的唯一目录的数量，行数就是文件数。

Python有os.walk ，这使得这样的任务变得简单，直观，并且即使面对奇怪的文件名（例如包含换行符的文件名）也能自动生成。我最初在聊天中发布的这个Python 3脚本旨在在当前目录中运行（但它不必位于当前目录中，您可以更改它传递给os.walk路径）：

 #!/usr/bin/env python3 import os dc = fc = 0 for _, _, fs in os.walk('.'): c = sum(f.endswith('.c') for f in fs) if c: dc += 1 fc += c print(dc, fc)

这将打印直接包含至少一个名称以.c结尾的文件的目录计数，后跟一个空格，后跟名称以.c结尾的文件计数。 “隐藏”文件 – 即名称以…开头的文件. – 包含，隐藏目录同样遍历。

os.walk递归方式遍历目录层次结构。它枚举了从您给出的起点可递归访问的所有目录，并将每个目录的信息作为三个值（ root, dirs, files的元组。对于它遍历的每个目录（包括你给它的名字的第一个目录）：

root保存该目录的路径名。请注意，这与系统的“根目录” / （并且与/root无关）完全无关，但如果从那里开始，它将会出现。在这种情况下， root从路径开始. –ie，当前目录 – 并在它下面的任何地方。
dirs包含当前名称以root dirs保存的目录的所有子目录的路径名列表。
files包含驻留在目录中的所有文件的路径名列表，该目录的名称当前保存在root但它们本身不是目录。请注意，这包括除常规文件之外的其他类型的文件，包括符号链接，但听起来您不希望任何此类条目以.c结尾，并且有兴趣看到任何这样做。

在这种情况下，我只需要检查元组的第三个元素，即files （我在脚本中调用fs ）。就像find命令一样，Python的os.walk为我遍历子目录; 我唯一需要检查的是每个文件的名称。但是，与find命令不同， os.walk自动为我提供这些文件名的列表。

该脚本不遵循符号链接。 您很可能不希望这样的操作遵循符号链接，因为它们可能形成循环，并且因为即使没有循环，如果可以通过不同的符号链接访问相同的文件和目录，也可以遍历和计数多次。

如果你曾经想让os.walk遵循符号链接 – 你通常followlinks=true – 那么你可以传递followlinks=true 。也就是说，不是编写os.walk('.')而是编写os.walk('.', followlinks=true) 。我重申你很少会想要这样，特别是对于像这样的任务，你递归地枚举整个目录结构，无论它有多大，并计算其中满足某些要求的所有文件。

查找+ Perl：

 $ find . -type f -iname '*.c' -printf '%h\0' | perl -0 -ne '$k{$_}++; }{ print scalar keys %k, " $.\n" ' 7 29

说明

find命令将查找任何常规文件（因此没有符号链接或目录），然后打印它们所在的目录名称（ %h ），后跟\0 。

perl -0 -ne ：逐行读取输入（ -n ）并将-e给出的脚本应用于每一行。 -0将输入行分隔符设置为\0因此我们可以读取空分隔的输入。
$k{$_}++ ： $_是一个特殊变量，它取当前行的值。这用作散列 %k的键，其值是每个输入行（目录名称）的查看次数。
}{ ：这是编写END{}的简写方式。在处理完所有输入之后， }{之后的任何命令都将执行一次。
print scalar keys %k, " $.\n" ： keys %k返回散列%k中的键数组。 scalar keys %k给出该数组中的元素数，即看到的目录数。这与$.的当前值一起打印$. ，一个保存当前输入行号的特殊变量。由于这是在最后运行，当前输入行号将是最后一行的编号，因此到目前为止看到的行数。

为清楚起见，您可以将perl命令扩展到此：

 find . -type f -iname '*.c' -printf '%h\0' | perl -0 -e 'while($line = ){ $dirs{$line}++; $tot++; } $count = scalar keys %dirs; print "$count $tot\n" '

这是我的建议：

 #!/bin/bash tempfile=$(mktemp) find -type f -name "*.c" -prune >$tempfile grep -c / $tempfile sed 's_[^/]*$__' $tempfile | sort -u | grep -c /

这个简短的脚本创建一个临时文件，查找当前目录中以及.c结尾的每个文件，并将列表写入临时文件。然后grep用于计算文件（以下如何使用命令行获取目录中的文件数？）两次：第二次，从每个文件名中删除文件名后，使用sort -u删除多次列出的目录使用sed行。

这也适用于文件名中的换行符： grep -c /只计算带斜杠的行，因此只考虑列表中多行文件名的第一行。

产量

 $ tree . ├── 1 │  ├── 1 │  │  ├── test2.c │  │  └── test.c │  └── 2 │  └── test.c └── 2  ├── 1  │  └── test.c  └── 2 $ tempfile=$(mktemp);find -type f -name "*.c" -prune >$tempfile;grep -c / $tempfile;sed 's_[^/]*$__' $tempfile | sort -u | grep -c / 4 3

小贝壳

我建议使用两个主命令行的小bash shellscript（和一个变量filetype ，以便于切换以查找其他文件类型）。

它不会在符号链接中查找，也不会查找常规文件。

 #!/bin/bash filetype=c #filetype=pdf # count the 'filetype' files find -type f -name "*.$filetype" -ls|sed 's#.* \./##'|wc -l | tr '\n' ' ' # count directories containing 'filetype' files find -type d -exec bash -c "ls -AF '{}'|grep -e '\.'${filetype}$ -e '\.'${filetype}'\*'$ > /dev/null && echo '{} contains file(s)'" \;|grep 'contains file(s)$'|wc -l

详细的shellcript

这是一个更冗长的版本，也考虑了符号链接，

 #!/bin/bash filetype=c #filetype=pdf # counting the 'filetype' files echo -n "number of $filetype files in the current directory tree: " find -type f -name "*.$filetype" -ls|sed 's#.* \./##'|wc -l echo -n "number of $filetype symbolic links in the current directory tree: " find -type l -name "*.$filetype" -ls|sed 's#.* \./##'|wc -l echo -n "number of $filetype normal files in the current directory tree: " find -type f -name "*.$filetype" -ls|sed 's#.* \./##'|wc -l echo -n "number of $filetype symbolic links in the current directory tree including linked directories: " find -L -type f -name "*.$filetype" -ls 2> /tmp/c-counter |sed 's#.* \./##' | wc -l; cat /tmp/c-counter; rm /tmp/c-counter # list directories with and without 'filetype' files (good for manual checking; comment away after test) echo '---------- list directories:' find -type d -exec bash -c "ls -AF '{}'|grep -e '\.'${filetype}$ -e '\.'${filetype}'\*'$ > /dev/null && echo '{} contains file(s)' || echo '{} empty'" \; echo '' #find -L -type d -exec bash -c "ls -AF '{}'|grep -e '\.'${filetype}$ -e '\.'${filetype}'\*'$ > /dev/null && echo '{} contains file(s)' || echo '{} empty'" \; # count directories containing 'filetype' files echo -n "number of directories with $filetype files: " find -type d -exec bash -c "ls -AF '{}'|grep -e '\.'${filetype}$ -e '\.'${filetype}'\*'$ > /dev/null && echo '{} contains file(s)'" \;|grep 'contains file(s)$'|wc -l # list and count directories including symbolic links, containing 'filetype' files echo '---------- list all directories including symbolic links:' find -L -type d -exec bash -c "ls -AF '{}' |grep -e '\.'${filetype}$ -e '\.'${filetype}'\*'$ > /dev/null && echo '{} contains file(s)' || echo '{} empty'" \; echo '' echo -n "number of directories (including symbolic links) with $filetype files: " find -L -type d -exec bash -c "ls -AF '{}'|grep -e '\.'${filetype}$ -e '\.'${filetype}'\*'$ > /dev/null && echo '{} contains file(s)'" \; 2>/dev/null |grep 'contains file(s)$'|wc -l # count directories without 'filetype' files (good for checking; comment away after test) echo -n "number of directories without $filetype files: " find -type d -exec bash -c "ls -AF '{}'|grep -e '\.'${filetype}$ -e '\.'${filetype}'\*'$ > /dev/null || echo '{} empty'" \;|grep 'empty$'|wc -l

测试输出

从简短的shellcript：

 $ ./ccntr 29 7

从详细的shellcript：

 $ LANG=C ./c-counter number of c files in the current directory tree: 29 number of c symbolic links in the current directory tree: 1 number of c normal files in the current directory tree: 29 number of c symbolic links in the current directory tree including linked directories: 42 find: './cfiles/2/2': Too many levels of symbolic links find: './cfiles/dirlink/2': Too many levels of symbolic links ---------- list directories: . empty ./cfiles contains file(s) ./cfiles/2 contains file(s) ./cfiles/2/b contains file(s) ./cfiles/2/a contains file(s) ./cfiles/3 empty ./cfiles/3/b contains file(s) ./cfiles/3/a empty ./cfiles/1 contains file(s) ./cfiles/1/b empty ./cfiles/1/a empty ./cfiles/space d contains file(s) number of directories with c files: 7 ---------- list all directories including symbolic links: . empty ./cfiles contains file(s) ./cfiles/2 contains file(s) find: './cfiles/2/2': Too many levels of symbolic links ./cfiles/2/b contains file(s) ./cfiles/2/a contains file(s) ./cfiles/3 empty ./cfiles/3/b contains file(s) ./cfiles/3/a empty ./cfiles/dirlink empty find: './cfiles/dirlink/2': Too many levels of symbolic links ./cfiles/dirlink/b contains file(s) ./cfiles/dirlink/a contains file(s) ./cfiles/1 contains file(s) ./cfiles/1/b empty ./cfiles/1/a empty ./cfiles/space d contains file(s) number of directories (including symbolic links) with c files: 9 number of directories without c files: 5 $

简单的Perl one衬里：

 perl -MFile::Find=find -le'find(sub{/\.c\z/ and -f and $c{$File::Find::dir}=++$c}, @ARGV); print 0 + keys %c, " $c"' dir1 dir2

或者使用find命令更简单：

 find dir1 dir2 -type f -name '*.c' -printf '%h\0' | perl -l -0ne'$c{$_}=1}{print 0 + keys %c, " $."'

如果你喜欢打高尔夫并且最近（比如不到十年的话）Perl：

 perl -MFile::Find=find -E'find(sub{/\.c$/&&-f&&($c{$File::Find::dir}=++$c)},".");say 0+keys%c," $c"'

 find -type f -name '*.c' -printf '%h\0'|perl -0nE'$c{$_}=1}{say 0+keys%c," $."'

考虑使用locate命令，它比find命令快得多。

运行测试数据

 $ sudo updatedb # necessary if files in focus were added `cron` daily. $ printf "Number Files: " && locate -0r "$PWD.*\.c$" | xargs -0 -I{} sh -c 'test ! -L "$1" && echo "regular file"' _ {} | wc -l && printf "Number Dirs.: " && locate -r "$PWD.*\.c$" | sed 's%/[^/]*$%/%' | uniq -cu | wc -l Number Files: 29 Number Dirs.: 7

感谢Muru的回答，帮助我从Unix和Linux答案中删除文件计数中的符号链接。

感谢Terdon在Unix和Linux的回答中回答了$PWD （不是针对我的）。

以下原始答案由评论引用

简写：

 $ cd / $ sudo updatedb $ printf "Number Files: " && locate -cr "$PWD.*\.c$" Number Files: 3523 $ printf "Number Dirs.: " && locate -r "$PWD.*\.c$" | sed 's%/[^/]*$%/%' | uniq -c | wc -l Number Dirs.: 648

sudo updatedb如果.c文件是今天创建的，或者如果你今天删除了.c文件，则更新locate命令使用的数据库。
locate -cr "$PWD.*\.c$" .c locate -cr "$PWD.*\.c$"找到当前目录中的所有.c文件及其子文件（ $PWD ）。而不是打印文件名，并使用-c参数打印计数。 r指定正则表达式而不是默认*pattern*匹配，这可能产生太多结果。
locate -r "$PWD.*\.c$" | sed 's%/[^/]*$%/%' | uniq -c | wc -l locate -r "$PWD.*\.c$" | sed 's%/[^/]*$%/%' | uniq -c | wc -l 。找到当前目录及以下的所有*.c文件。使用sed删除文件名只留下目录名。使用uniq -c计算每个目录中的文件数。使用wc -l计算目录数。

使用one-liner从当前目录开始

 $ cd /usr/src $ printf "Number Files: " && locate -cr "$PWD.*\.c$" && printf "Number Dirs.: " && locate -r "$PWD.*\.c$" | sed 's%/[^/]*$%/%' | uniq -c | wc -l Number Files: 3430 Number Dirs.: 624

请注意文件计数和目录计数是如何更改的。我相信所有用户都有/usr/src目录，并且可以运行具有不同计数的命令，具体取决于已安装内核的数量。

长表：

长forms包括时间，因此您可以看到locate位置有多快。即使你必须运行sudo updatedb它也比单个find /快许多倍。

 ─────────────────────────────────────────────────────────────────────────────────────────── rick@alien:~/Downloads$ sudo time updatedb 0.58user 1.32system 0:03.94elapsed 48%CPU (0avgtext+0avgdata 7568maxresident)k 48inputs+131920outputs (1major+3562minor)pagefaults 0swaps ─────────────────────────────────────────────────────────────────────────────────────────── rick@alien:~/Downloads$ time (printf "Number Files: " && locate -cr $PWD".*\.c$") Number Files: 3523 real 0m0.775s user 0m0.766s sys 0m0.012s ─────────────────────────────────────────────────────────────────────────────────────────── rick@alien:~/Downloads$ time (printf "Number Dirs.: " && locate -r $PWD".*\.c$" | sed 's%/[^/]*$%/%' | uniq -c | wc -l) Number Dirs.: 648 real 0m0.778s user 0m0.788s sys 0m0.027s ───────────────────────────────────────────────────────────────────────────────────────────

注意：这是所有驱动器和分区上的所有文件。即我们也可以搜索Windows命令：

 $ time (printf "Number Files: " && locate *.exe -c) Number Files: 6541 real 0m0.946s user 0m0.761s sys 0m0.060s ─────────────────────────────────────────────────────────────────────────────────────────── rick@alien:~/Downloads$ time (printf "Number Dirs.: " && locate *.exe | sed 's%/[^/]*$%/%' | uniq -c | wc -l) Number Dirs.: 3394 real 0m0.942s user 0m0.803s sys 0m0.092s

我在/etc/fstab自动安装了三个Windows 10 NTFS分区。请注意，知道一切都知道！

有趣的数量：

 $ time (printf "Number Files: " && locate / -c && printf "Number Dirs.: " && locate / | sed 's%/[^/]*$%/%' | uniq -c | wc -l) Number Files: 1637135 Number Dirs.: 286705 real 0m15.460s user 0m13.471s sys 0m2.786s

在286,705个目录中计算1,637,135个文件需要15秒。因人而异。

有关locate命令的正则表达式处理的详细分类（在此Q＆A中似乎不需要，但仅在案例中使用）请阅读：在某个特定目录下使用“locate”？

最近文章的补充阅读：

Tecmint – Linux新手的10个有用的’locate’命令实例
HowtoForge – Linux为初学者定位命令（8个示例）
计算机希望 – Linux定位命令

如何计算具有特定扩展名的文件及其所在的目录？

说明

产量

小贝壳

详细的shellcript

测试输出

运行测试数据

以下原始答案由评论引用

简写：

使用one-liner从当前目录开始

长表：

有趣的数量：

执行某些命令绝对没有

终端开始给出错误bash：/ usr / bin / lesspipe

命令别名中的空格

以三种不同方式打开的相同脚本会产生三种不同的结果为什么？

如何创建永久性Bash别名？

如何将所有USERS的默认shell更改为bash？

如何使用syslog for else输出（显示whois – 查询）？

这个环境变量在哪里设置？

看到一个有趣的命令，但不能别名

/ bin / sh，/ bin / bash和/ bin / dash全部坏了，我该如何重新安装它们

如何计算具有特定扩展名的文件及其所在的目录？

说明

产量

小贝壳

详细的shellcript

测试输出

运行测试数据

以下原始答案由评论引用

简写：

使用one-liner从当前目录开始

长表：

有趣的数量：

执行某些命令绝对没有

终端开始给出错误bash：/ usr / bin / lesspipe

命令别名中的空格

以三种不同方式打开的相同脚本会产生三种不同的结果 为什么？

如何创建永久性Bash别名？

如何将所有USERS的默认shell更改为bash？

如何使用syslog for else输出（显示whois – 查询）？

这个环境变量在哪里设置？

看到一个有趣的命令，但不能别名

/ bin / sh，/ bin / bash和/ bin / dash全部坏了，我该如何重新安装它们

以三种不同方式打开的相同脚本会产生三种不同的结果为什么？