读《Linux Shell脚本攻略》第5章笔记

1. wget
wget -t 10 –limit-rate 50k -Q 10M -c http://www.linuxeye.com -O linuxeye.html -o download.log
-t               指定重试次数
–limit-rate 下载限速
-Q             最大下载配额（quota）
-c              断点续传
-O             指定输出文件名
-o              指定一个日志文件

wget -r -N -l 2 http://www.linuxeye.com
-r             recursive，递归
-N              允许对文件使用时间戳
-l               向下遍历指定的页面级数

访问需要认证的http或ftp页面
wget –user username –password pass URL
也可以不在命令行中指定密码，而由网页提示并手动输入密码，这就需要将–password改成–ask-password 。

2. curl
curl http://www.linuxeye.com –silent -o linuxeye.html
–silent             不显示进度信息，如果需要这些信息，将–silent移除
-o                    将下载数据写入文件，而非标准输出
–progress       以#显示进度信息
-C                断点续转
–referer          设置参照页字符串
–cookie           设置cookie
–user-agent    设置用户代理字符串
–limit-rate        限制带宽
–max-filesize   指定最大下载量
-u                  认证（curl -u userpass http://www.linuxeye.com）
-I                     只答应响应头部信息

3. 从命令行访问Gmail
#curl -u [email protected]:password –silent
“https://mail.google.com/mail/feed/atom” | tr -d ‘\n’ | sed ‘s::\n:g’ |
sed
‘s/.*$.*$<\/title.*$[^<]*$<\/name>$[^<]*$.*/Author:
\2 [\3] \n Subject: \1\n/’ Author: Facebook
[[email protected]] Subject: Facebook的有趣专页

Author: offers [[email protected]]
Subject: Reminder: Get 25% OFF your order – no minimum!

Author: Google+ team [[email protected]]
Subject: Top 3 posts for you on Google+ this week

Author: Facebook [[email protected]]
Subject: Facebook的有趣专页

curl -u [email protected]:password –silent
“https://mail.google.com/mail/feed/atom” | perl -ne ‘print “\t” if //;
print “$2\n” if /<(title|name)>(.*)<\/\1>/;’

4. 从网友上抓取并下载图片的bash 脚本

#!/bin/bash
 #FileName : img_downloader.sh
 if [ $# -ne 3 ];
 then
 echo "Usage:$0 URL -d DIRECTORY"
 exit -1
 fi
 
for i in {1..4}
 do
 case $1 in
 -d) shift; directory=$1; shift ;;
 *) url=${url:-$1};shift;;
 esac
 done
 
mkdir -p $directory;
 baseurl=$(echo $url | egrep -o "https?://[a-z.]+")
 echo $baseurl
 curl -s $url | egrep -o "<img src=[^>]*>" | awk -F"\"|'" '{print $2}' > /tmp/$$.list
 
sed -i "s|^/|$baseurl/|" /tmp/$$.list
 cd $directory;
 
while read filename;
 do
 echo $filename
 curl -s -O "$filename" --silent
 done < /tmp/$$.list

说明：原书脚本用sed截取图片的绝对路径只能用双引号情况下，而很多图片绝对路径可能有单引号，于是我用awk处理，也可以用sed在原脚本的基础上修改

5. curl查找网上无效链接bash脚本

#!/bin/bash
 if [ $# -eq 2 ];
 then
 echo -e "$Usage $0 URL\n"
 exit -1;
 fi
 
echo Broken links:
 mkdir /tmp/$$.lynx
 cd /tmp/$$.lynx
 
lynx -traversal $1 > /dev/null
 count=0;
 
sort -u reject.dat > links.txt
 
while read link;
 do
 output=`curl -I $link -s | grep "HTTP/.*OK"`;
 if [[ -z $output ]];
 then
 echo $link;
 let count++
 fi
 done < links.txt
 [ $count -eq 0 ] && echo No broken links found.

Thu Jan 10 13:30:52 CST 2013