python3.x - python 中的maketrans在utf-8文件中該怎么使用
問題描述
我寫了一個處理文本的文件就是把文本中所有的符號都替換掉,替換成空格。用的python中maketrans和translate。其中在使用對于ASCII編碼的文件時是正常的,但對于utf-8文件時,就報錯,提示maketrans中的參數不等長,但是明明是一樣長的?。?/p>
File '/Users/lgq/Desktop/p3.py', line 10, in text_to_words
'abcdefghijklmnopqrstuvwxyz ')
ValueError: the first two maketrans arguments must have equal length
我查了一下說是maketrans在utf-8下不能用,那我在utf-8下該怎么替換掉字符呢,求各位大神指點。
def text_to_words(the_text): ''' Return a list of words with all punctuation removed,and all in lowercase. ''' my_substitutions = the_text.maketrans(# If you find any of these'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!'#$%&()*+,-./:;<=>?@[]^_`{|}~’',# Replace them by these'abcdefghijklmnopqrstuvwxyz ') # Translate the text now. cleaned_text = the_text.translate(my_substitutions) wds = cleaned_text.split() return wdsdef get_words_in_book(filename): ''' Read a book from filename, and return a list of its words.''' f = open(filename, 'r', encoding = 'utf-8') content = f.read() f.close() wds = text_to_words(content) return wdsbook_words = get_words_in_book('alice.txt')print('There are {0} words in the book, the first 100 aren{1}'.format(len(book_words), book_words[:100]))
問題解答
回答1:首先 這兩個字符串長度不相等, ' 是一個字符, 也是一個字符你可以用 len() 查看。然后關于字符串什么的問題,最好說明 python 的版本
maketrans 參數長度不相等
my_substitutions = the_text.maketrans(# If you find any of these'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!'#$%&()*+,-./:;<=>?@[]^_`{|}~’',# Replace them by these'abcdefghijklmnopqrstuvwxyz ')
測試代碼:
from string import translate, maketransdef text_to_words(the_text): ''' Return a list of words with all punctuation removed,and all in lowercase. ''' my_substitutions = maketrans(# If you find any of these'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!'#$%&()*+,-./:;<=>?@[]^_`{|}~’',# Replace them by these'abcdefghijklmnopqrstuvwxyz ') # Translate the text now. cleaned_text = the_text.translate(my_substitutions) wds = cleaned_text.split() return wdstext_to_words(’ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!'#$%&()*+,-./:;<=>?@[]^_`{|}~’測試’)
output
[’abcdefghijklmnopqrstuvwxyz’, ’xe6xb5x8bxe8xafx95’]
這是 python2 的運行結果
相關文章:
1. vue ajax請求回來的數據沒有渲染到頁面2. javascript - node.js中stat() access() open() readFile()都能判斷文件是否存在?3. 一個mysql聯表查詢的問題4. html的qq快捷登錄怎么搞?求個源碼5. mysql - select查詢多個紀錄的條件怎么寫6. python中def定義的函數加括號和不加括號的區別?7. mysql - 分庫分表、分區、讀寫分離 這些都是用在什么場景下 ,會帶來哪些效率或者其他方面的好處8. mysql - 求SQL語句:查詢某個值介于兩個字段值之間的記錄。9. mysql 能不能創建一個 有列級函數 的聯合視圖?10. 編程小白 問關于python當中類的方法的參數問題
