下载Web的链接图像

是否可以下载在网络中链接的所有.jpg和.png文件? 我想从包含链接的[本论坛] [1]的每个post的每个post下载图像。 例如[this post] [2]包含[this file] [3]的链接。

我试过wget:

wget -r -np http://www.mtgsalvation.com/forums/creativity/artwork/340782-official-digital-rendering-thread? 

它复制了该线程的所有html文件。 虽然我不知道为什么它会从...thread?comment=336跳转到...thread?comment=3232 ,当它一个接一个地发送到评论336。

尝试使用此命令:

 wget -P path/where/save/result -A jpg,png -r http://www.mtgsalvation.com/forums/creativity/artwork/ 

根据wget手册页 :

  -A acclist --accept acclist Specify comma-separated lists of file name suffixes or patterns to accept or reject (@pxref{Types of Files} for more details). -P prefix Set directory prefix to prefix. The directory prefix is the direc‐ tory where all other files and subdirectories will be saved to, ie the top of the retrieval tree. The default is . (the current directory). -r --recursive Turn on recursive retrieving. 

试试这个:

  mkdir wgetDir wget -P wgetDir http://www.mtgsalvation.com/forums/creativity/artwork/340782-official-digital-rendering-thread?page=145 

此命令将获取html页面并将其放在wgetDir 。 当我尝试这个命令时,我找到了这个文件:

  340782-official-digital-rendering-thread?page=145 

然后,我尝试了这个命令:

  wget -P wgetDir -A png,jpg,jpeg,gif -nd --force-html -r -i "wgetDir/340782-official-digital-rendering-thread?page=145" 

它下载图像。 所以,它似乎工作,虽然我不知道这些图片是否是你想要下载的。

 #include  #include  // for using system calls #include  // for sleep int main () { char body[] = "forum-post-body-content", notes[] = "p-comment-notes", img[] = "img src=", link[200], cmd[200]={0}, file[10]; int c, pos = 0, pos2 = 0, fin = 0, i, j, num = 0, found = 0; FILE *fp; for (i = 1; i <= 149; ++i) { sprintf(cmd,"wget -O page%d.txt 'http://www.mtgsalvation.com/forums/creativity/artwork/340782-official-digital-rendering-thread?page=%d'",i,i); system(cmd); sprintf(file, "page%d.txt", i); fp = fopen (file, "r"); while ((c = fgetc(fp)) != EOF) { if (body[pos] == c) { if (pos == 22) { pos = 0; while (fin == 0) { c = fgetc (fp); if (feof (fp)) break; if (notes[pos] == c) { if (pos == 14) { fin = 1; pos = -1; } ++pos; } else { if(pos > 0) pos = 0; } if (img[pos2] == c) { if (pos2 == 7) { pos2 = 0; while (found == 0) { c = fgetc (fp); // get char from file link[pos2] = c; if (pos2 > 0) { if(link[pos2-1] == 'g' && link[pos2] == '\"') { found = 1; } } ++pos2; } --pos2; found = 0; char link2[pos2]; for (j = 1; j < pos2; ++j) { link2[j - 1] = link[j]; } link2[j - 1] = '\0'; sprintf(cmd, "wget -O /home/arturo/Dropbox/Digital_Renders/%d \'%s\'", ++num, link2); system(cmd); pos2 = -1; } ++pos2; } else { if(pos2 > 0) pos2 = 0; } } fin = 0; } ++pos; } else pos = 0; } // closing file fclose (fp); if (remove (file)) fprintf(stderr, "Can't remove file\n"); } }