在文件夹层次结构中搜索重复的文件名？

我有一个名为img的文件夹，这个文件夹有很多级别的子文件夹，所有子文件夹都包含图像。我要将它们导入图像服务器。

通常，图像（或任何文件）可以具有相同的名称，只要它们位于不同的目录路径中或具有不同的扩展名即可。但是，我导入它们的图像服务器要求所有图像名称都是唯一的（即使扩展名不同）。

例如，不允许使用images background.png和background.gif ，因为即使它们具有不同的扩展名，它们仍然具有相同的文件名。即使它们位于单独的子文件夹中，它们仍然需要是唯一的。

所以我想知道我是否可以在img文件夹中进行递归搜索，以查找具有相同名称（不包括扩展名）的文件列表。

有没有可以做到这一点的命令？

FSlint 安装fslint 是一个多function的重复查找器，包括查找重复名称的function：

FSlint

Ubuntu的FSlint包强调图形界面，但正如FSlint FAQ中所述，命令行界面可通过/usr/share/fslint/fslint/ 。使用--help选项作为文档，例如：

 $ /usr/share/fslint/fslint/fslint --help File system lint. A collection of utilities to find lint on a filesystem. To get more info on each utility run 'util --help'. findup -- find DUPlicate files findnl -- find Name Lint (problems with filenames) findu8 -- find filenames with invalid utf8 encoding findbl -- find Bad Links (various problems with symlinks) findsn -- find Same Name (problems with clashing names) finded -- find Empty Directories findid -- find files with dead user IDs findns -- find Non Stripped executables findrs -- find Redundant Whitespace in files findtf -- find Temporary Files findul -- find possibly Unused Libraries zipdir -- Reclaim wasted space in ext2 directory entries $ /usr/share/fslint/fslint/findsn --help find (files) with duplicate or conflicting names. Usage: findsn [-A -c -C] [[-r] [-f] paths(s) ...] If no arguments are supplied the $PATH is searched for any redundant or conflicting files. -A reports all aliases (soft and hard links) to files. If no path(s) specified then the $PATH is searched. If only path(s) specified then they are checked for duplicate named files. You can qualify this with -C to ignore case in this search. Qualifying with -c is more restictive as only files (or directories) in the same directory whose names differ only in case are reported. IE -c will flag files & directories that will conflict if transfered to a case insensitive file system. Note if -c or -C specified and no path(s) specifed the current directory is assumed.

用法示例：

 $ /usr/share/fslint/fslint/findsn /usr/share/icons/ > icons-with-duplicate-names.txt $ head icons-with-duplicate-names.txt -rw-r--r-- 1 root root 683 2011-04-15 10:31 Humanity-Dark/AUTHORS -rw-r--r-- 1 root root 683 2011-04-15 10:31 Humanity/AUTHORS -rw-r--r-- 1 root root 17992 2011-04-15 10:31 Humanity-Dark/COPYING -rw-r--r-- 1 root root 17992 2011-04-15 10:31 Humanity/COPYING -rw-r--r-- 1 root root 4776 2011-03-29 08:57 Faenza/apps/16/DC++.xpm -rw-r--r-- 1 root root 3816 2011-03-29 08:57 Faenza/apps/22/DC++.xpm -rw-r--r-- 1 root root 4008 2011-03-29 08:57 Faenza/apps/24/DC++.xpm -rw-r--r-- 1 root root 4456 2011-03-29 08:57 Faenza/apps/32/DC++.xpm -rw-r--r-- 1 root root 7336 2011-03-29 08:57 Faenza/apps/48/DC++.xpm -rw-r--r-- 1 root root 918 2011-03-29 09:03 Faenza/apps/16/Thunar.png

 find . -mindepth 1 -printf '%h %f\n' | sort -t ' ' -k 2,2 | uniq -f 1 --all-repeated=separate | tr ' ' '/'

正如评论所述，这也会找到文件夹。以下是将其限制为文件的命令：

 find . -mindepth 1 -type f -printf '%p %f\n' | ...

将其保存到名为duplicates.py的文件中

 #!/usr/bin/env python # Syntax: duplicates.py DIRECTORY import os, sys top = sys.argv[1] d = {} for root, dirs, files in os.walk(top, topdown=False): for name in files: fn = os.path.join(root, name) basename, extension = os.path.splitext(name) basename = basename.lower() # ignore case if basename in d: print(d[basename]) print(fn) else: d[basename] = fn

然后使文件可执行：

 chmod +x duplicates.py

像这样运行：

 ./duplicates.py ~/images

它应该输出具有相同基名（1）的文件对。用python编写，你应该可以修改它。

我假设你只需要看到这些“重复”，然后手动处理它们。如果是这样，这个bash4代码应该做你想要的。

 declare -A array=() dupes=() while IFS= read -r -d '' file; do base=${file##*/} base=${base%.*} if [[ ${array[$base]} ]]; then dupes[$base]+=" $file" else array[$base]=$file fi done < <(find /the/dir -type f -print0) for key in "${!dupes[@]}"; do echo "$key: ${array[$key]}${dupes[$key]}" done

有关关联数组语法的帮助，请参阅http://mywiki.wooledge.org/BashGuide/Arrays#Associative_Arrays和/或bash手册。

这是bname：

 #!/bin/bash # # find for jpg/png/gif more files of same basename # # echo "processing ($1) $2" bname=$(basename "$1" .$2) find -name "$bname.jpg" -or -name "$bname.png"

使其可执行：

 chmod a+x bname

调用它：

 for ext in jpg png jpeg gif tiff; do find -name "*.$ext" -exec ./bname "{}" $ext ";" ; done

优点：

它简单明了，因此可扩展。
处理文件名中的空白，制表符，换行符和换页符，afaik。（假设扩展名中没有这样的东西）。

缺点：

它总是找到文件本身，如果它找到a.gjpg的a.gif，它也会找到a.g的a.gif。因此对于10个相同basename的文件，它最终会找到100个匹配项。

根据我的需要改进loevborg的脚本（包括分组输出，黑名单，扫描时清洁输出）。我正在扫描10TB驱动器，所以我需要更清洁的输出。

用法：

python duplicates.py DIRNAME

duplicates.py

  #!/usr/bin/env python # Syntax: duplicates.py DIRECTORY import os import sys top = sys.argv[1] d = {} file_count = 0 BLACKLIST = [".DS_Store", ] for root, dirs, files in os.walk(top, topdown=False): for name in files: file_count += 1 fn = os.path.join(root, name) basename, extension = os.path.splitext(name) # Enable this if you want to ignore case. # basename = basename.lower() if basename not in BLACKLIST: sys.stdout.write( "Scanning... %s files scanned. Currently looking at ...%s/\r" % (file_count, root[-50:]) ) if basename in d: d[basename].append(fn) else: d[basename] = [fn, ] print("\nDone scanning. Here are the duplicates found: ") for k, v in d.items(): if len(v) > 1: print("%s (%s):" % (k, len(v))) for f in v: print (f)

在文件夹层次结构中搜索重复的文件名？

如何完全清除bash历史？

您将如何通过SSH备份远程Ubuntu VPS？

如何缩短命令行（bash）提示？

重定向apt和apt-get命令

如何让长命令行换行到下一行？

bash的“shopt extglob”在哪里为我的交互式shell打开了？

递归bash脚本，用于收集有关目录结构中每个文件的信息

Bash中的支架，支架，curl支架

如果通过管道输入，则“read”命令等待额外的换行符

唯一排序：将输出重定向到同一文件