Recently I had to get an approximate word count of an entire site for estimating translation time. To do this I processed the template files to get all the non-html tag/logic content using find and sed, then counted the words using wc.
# from views directory # create .out files with HTML tags stripped find . -name '*.html' -exec sh -c 'sed -e "s/<[^>]*>//g" $1 > $1.out' -- {} \; # create .out.bout files with nunjucks/jinja2 tags stripped find . -name '*.html.out' -exec sh -c 'sed -e "s/{[^}]*}//g" $1 > $1.bout' -- {} \; # word count find . -name '*.html.out.bout' | xargs wc -w
Links:
Advertisements