Group: PYTHON WORKSHOP

Python 截断中文字符串的方法,以及中文字符个数统计。
lidongok | Nov 28, 2007 5:47:44 PM
# -*- coding: cp936 -*-
# count str have how many chars
# 统计文件中中文的字符数。并且演示截断中文字符串的样子。
file = open('test.txt','r')
countstr = file.read()
file.close()

total_len = len(countstr)
x = countstr.decode('gbk')
unicode_len = len(x)
print total_len
print unicode_len
none_cnstr_num =unicode_len * 2 - total_len #非中文字符个数
cnstr_num = unicode_len - none_cnstr_num #中文字符个数
for i in range(15):
    print x[:i] #正常的截断
    print countstr[:i] # 会显示半个中国字

print "cn char num: %d, en char num: %d, Total: %d " %(cnstr_num, none_cnstr_num, total_len)


Comment: (no reply)
To post your comment, Please login first.