openpyxl操控Excel文件

我们在python中引入openpyxl模块来操控excel文件。一个以.xlsx为扩张名的excel文件打开后叫工作簿workbook,每个工作簿可以包括多张表单worksheet,正在操作的这张表单被认为是活跃的active sheet。每张表单有行和列,行号1、2、3…,列号A、B、C...。在某一个特定行和特定列的小格子叫单元格cell。

python程序从excel文件中读数据基本遵循以下步骤:
1、import openpyxl
2、调用openpyxl模块下的load_workbook(‘你的文件名.xlsx’)函数打开excel文件,得到一个工作簿(workbook)对象wb
3、通过wb.active或wb的方法函数get_sheet_by_name(‘你想要访问的表单名称’)得到表单对象ws

4、通过索引获取单元格:ws[‘B2’]
     通过表单的方法函数cell()获取单元格:ws.cell(row=1, column=2)
     通过单元格的属性value,row,column,coordinate对单元格进行多方向立体式访问
5、行对象ws[10],列对象[‘C’],行切片ws[5:10]、列切片ws[‘C:D’],单元格左上角和右下角左边共同划定表单指定区域ws['A1':'C3']
6、ws.max_row和ws.max_column给出数据用到的最大行和列
7、from openpyxl.utils import get_column_letter, column_index_from_string引进来的两个函数实现excel表格列字母和数字的转换

 


实例部分我们用了automate the boring stuff里的一个程序,该程序对美国2010年的人口统计表进行了再处理。本想转成天朝的,发现不是免费的遂放弃。如果用天朝的情况做类比就是国家要做人口普查,由省及市,有市再把任务下达到各区(比如厦门各区:海沧、同安、思明、翔安、集美),这样报上来的表格就是各区的人口,程序要做的就是把每个城市各区人口累加,得到每个城市人口以及每个城市分区数目。统计完的数据我们通过pprint生成字符串,保存到了census2010.py文件里,如果想用结果数据,只要将census2010作为模块import到程序里即可。

 

import openpyxl

wb = openpyxl.load_workbook('example.xlsx')

# Getting sheets from the workbook

# print(wb.sheetnames)
#
# for sheet in wb:
# 	print(sheet.title)
#
# mySheet = wb.create_sheet('mySheet')
# print(wb.sheetnames)

# sheet3 = wb.get_sheet_by_name('Sheet3')
# sheet4 = wb['mySheet']

# Getting cells from the sheets
ws = wb.active
# print(ws)
# print(ws['A1'])
# print(ws['A1'].value)

# c = ws['B1']
# print('Row {}, Column {} is {}'.format(c.row, c.column, c.value))
# print('Cell {} is {}\n'.format(c.coordinate, c.value))
#
# print(ws.cell(row=1, column=2))
# print(ws.cell(row=1, column=2).value)
# for i in range(1hcgghjjrxb, 8, 2):
# 	print(i, ws.cell(row=i, column=2).value)

# Getting rows and columns from the sheets
# colC = ws['C']
# row6 = ws[6]
# col_range = ws['B:C']
# row_range = ws[2:6]

# for col in col_range:
# 	for cell in col:
# 		print(cell.value)
#
# for row in row_range:
# 	for cell in row:
# 		print(cell.value)

# for row in ws.iter_rows(min_row=1, max_row=2, max_col=2):
# 	for cell in row:
# 		print(cell)

# print(tuple(ws.rows))
# cell_range = ws['A1:C3']
#
# for rowOfCellObjects in cell_range:
# 	for cellObj in rowOfCellObjects:
# 		print(cellObj.coordinate, cellObj.value)
# 	print('------End of Row----------')

# print('{} * {}'.format(ws.max_row, ws.max_column))

from openpyxl.utils import get_column_letter, column_index_from_string
print(get_column_letter(2), get_column_letter(47), get_column_letter(900))
print(column_index_from_string('AAH'))






# readCensusExcel.py - Tabulates population and number of census tracts
# for each county.

import openpyxl, pprint

# Read the spreadsheet data
print('Opening workbook')
wb = openpyxl.load_workbook('censuspopdata.xlsx')
sheet = wb.active

countryData = {}

# Fill in countryData with each city's pop and tracts
for row in range(2, sheet.max_row+1):

	# Each row in the spreasheet has data
	state = sheet['B' + str(row)].value
	country = sheet['C' + str(row)].value
	pop = sheet['D' + str(row)].value

	# make sure the key state exists
	countryData.setdefault(state, {})
	# make sure the key for country in state exists
	countryData[state].setdefault(country,{'tracts':0, 'pop':0})
	# Each row represents one census tract, so increment by one
	countryData[state][country]['tracts'] += 1
	# Increase the country pop by the pop in this census tract
	countryData[state][country]['pop'] += int(pop)

# Open a new text file and write the contents fo countryData to it
print('Writing results...')
resultFile = open('census2010.py', 'w')
resultFile.write('allData = ' + pprint.pformat(countryData))

1 1 1 1 1 1 1 1 1 1 Rating 4.54 (40 Votes)