• 朝韩将军级会谈时隔11年后在板门店重启 2019-07-23
  • Conférence de presse du Premier ministre chinois 2019-07-09
  • 大观园举办“古琴雅集” 市民感受“古韵端午” 2019-06-30
  • 文脉颂中华——黄河新闻网 2019-06-30
  • 水费欠账竟“穿越”16年?用户质疑:为何没见催缴过? 2019-06-21
  • 买房怎么看风水这个真的实在是太重要了 ——凤凰网房产北京 2019-06-11
  • 6月14日凤凰直通车:茅台再开市场化招聘大门,32个部门要285人葡萄 种植 2019-06-07
  • 习近平为传统文化“代言” 2019-05-27
  • 中巴建交一周年 一系列庆祝活动在巴拿马举行 2019-05-24
  • 招聘启事丨西部网诚聘新媒体编辑记者、实习编辑等人员 2019-05-23
  • A title= href=httpwww.snrtv.comlivech=8 target= 2019-05-23
  • 不止消灭刘海屏 vivo NEX发布会看点汇总 2019-05-22
  • 世相【镜头中的陕西人】 2019-05-20
  • 邓紫棋首任明星制作人 吴亦凡身兼二职 2019-05-20
  • 陶昕然女儿正面照曝光 吃蛋糕萌到爆 2019-05-19
  • Examples using CSV files

    500元 倍投方案 稳赚 www.gvqn.net Parsing comma-separtated-values (CSV) files is a common task. There are many tools available in Python to deal with this. Let's start by using the built-in csv module.

    In [147]:
    import csv # using Python module
    

    Now, we want to open an example file, and read the contents:

    In [148]:
    f = open('example_1.csv','rb') # use binary mode if on MS windows
    d = [i for i in csv.reader(f) ] # use list comprehension to read from file
    f.close() # close file
    print d
    
    [['Name', 'DOB', 'Years', 'Degree'], ['Alice Jones', '12/1/1980', '3', 'MS'], ['Bob Smith', '1/1/1969', '4', 'BS'], ['John Book', '5/3/1980', '11', 'BA'], ['Billy Blanks', '6/9/2000', '8', 'AA']]
    

    Adding new fields as columns

    This reads the rows of the CSV file into a list-of-lists.

    Now, let's suppose we want to create columns for last name and first name instead of having just one name field. The first element in the list d is the header, so we had the additional fields there:

    In [149]:
    d[0].append('Last Name')
    d[0].append('First Name')
    print d[0]
    
    ['Name', 'DOB', 'Years', 'Degree', 'Last Name', 'First Name']
    

    Now, we want to split the original Name field into first and last names and put these at the ends of their respective rows.

    In [150]:
    for row in d[1:]: # start at 1st, not 0th column. Each row is a list
        first,last= row[0].split() # split on white-space
        row.append(last) # append to each row
        row.append(first)
        
    print d
    
    [['Name', 'DOB', 'Years', 'Degree', 'Last Name', 'First Name'], ['Alice Jones', '12/1/1980', '3', 'MS', 'Jones', 'Alice'], ['Bob Smith', '1/1/1969', '4', 'BS', 'Smith', 'Bob'], ['John Book', '5/3/1980', '11', 'BA', 'Book', 'John'], ['Billy Blanks', '6/9/2000', '8', 'AA', 'Blanks', 'Billy']]
    
    In [151]:
    #%qtconsole
    

    Writing updated CSV file

    Now, we want to write out our new data in CSV format

    In [152]:
    f = open('example_1_out.csv','wb') # write mode binary
    fw = csv.writer(f) # create csv writer
    fw.writerows(d) 
    f.close() # close file 
    

    Now, opening the file example_1_out.csv using excel (or another reader) should show the new column. That covers the most direct and pure-Python way to dealing with CSV files. However, there are many other tools available. For example, numpy provides power methods to access these files.

    Using Numpy to parse CSV files

    Let's see how to accomplish the same work as above by using Numpy.

    In [153]:
    import numpy as np 
    
    d = np.loadtxt('example_1.csv',delimiter=',',dtype=str)
    
    print d
    print d.dtype
    
    [['Name' 'DOB' 'Years' 'Degree']
     ['Alice Jones' '12/1/1980' '3' 'MS']
     ['Bob Smith' '1/1/1969' '4' 'BS']
     ['John Book' '5/3/1980' '11' 'BA']
     ['Billy Blanks' '6/9/2000' '8' 'AA']]
    |S12
    

    Notice that we did not have to use open to get at the contents of the file. By default, the delimiter is any whitespace, so we had to change this to the comma character. The dtype specifies we want everything to be read in as a string. Numpy can figure out how long that string needs to be as it goes through the file so we don't have to specify that ahead of time (we probably don't know it anyway). In this case, the dtype turns out to be a twelve character string S12. Note that this is the maximum length is used for all strings so there is obviously a lot of extra space if most of the strings are short and just a few are long.

    Numpy provides many more ways of reading data via the dtype. For example,

    In [154]:
    dt = [('name',   'S64'),   # The first element of the tuple is our name for each respective column
          ('dob',    'S64'),   # and the second element is the numpy dtype we want for that column.
          ('years',  'int'),   # Here we want years as an integer, not a string
          ('degree', 'S64'), ]
    
    d = np.loadtxt('example_1.csv',delimiter=',',dtype=dt,skiprows=1) # skip the header row
    print d
    
    [('Alice Jones', '12/1/1980', 3, 'MS') ('Bob Smith', '1/1/1969', 4, 'BS')
     ('John Book', '5/3/1980', 11, 'BA') ('Billy Blanks', '6/9/2000', 8, 'AA')]
    

    The advantage of doing it this way is that now we can compute the years column using numpy tools. For example, here is the mean of the years.

    In [155]:
    print d['years'].mean() # using numpy arrays
    
    6.5
    

    Now to get back to the main task at hand: splitting the name field into first and last name.

    In [156]:
    import string
    
    n=map(string.split,d['name'])
    w=array([tuple(i)+tuple(j) for i,j in zip(d,n)],    # list comprehension glues tuple-ized rows together
             dtype=dt+[('first','S64'),('last','S64')]) # append new dtypes to existing list of dtypes
    

    That was kind of non-simple, but now we can write this to a CSV using savetxt.

    In [157]:
    # the comments are set to '' to avoid hash marks on the first line.
    np.savetxt('np_output.csv',w,delimiter=',',fmt='%s',header='name,dob,years,degree,first,last',comments='')
    

    Now, you can inspect the so-generated file and verify it is a CSV.

    Using pandas to parse CSV files

    pandas is the real power tool for this job.

    In [158]:
    import pandas as pd
    
    d = pd.read_csv('example_1.csv')
    print d
    print type(d)
    
               Name        DOB  Years Degree
    0   Alice Jones  12/1/1980      3     MS
    1     Bob Smith   1/1/1969      4     BS
    2     John Book   5/3/1980     11     BA
    3  Billy Blanks   6/9/2000      8     AA
    <class 'pandas.core.frame.DataFrame'>
    

    Now, we have read the CSV file as a pandas DataFrame which is a super-structure that sits on top of numpy. Let's examine the columns of this DataFrame.

    In [159]:
    print d.columns
    
    Index([Name, DOB, Years, Degree], dtype=object)
    

    Notice that there is an extra space after the Name. This potentially makes it hard to access the columns using pandas slicing. For example,

    In [161]:
    print d.DOB # this works great when the column header name has no spaces in it.
    print d['Name'] # you can also refer to columns using this syntax
    
    0    12/1/1980
    1     1/1/1969
    2     5/3/1980
    3     6/9/2000
    Name: DOB, dtype: object
    0     Alice Jones
    1       Bob Smith
    2       John Book
    3    Billy Blanks
    Name: Name, dtype: object
    

    Luckily, this is not hard to fix. We just need to create another column that is free of these trailing spaces:

    In [ ]:
    d['name']=d['Name '] # easily create extra column
    print d.name         # now you can access this column using this syntax
    

    Pandas is a lot more powerful than this! We can parse the columns by types individually by providing a dtype for each column as a dictionary.

    In [ ]:
    d = pd.read_csv('example_1.csv',dtype={'Name ':'S64','DOB':'S64','Years':int,'Degree':'S64'})
    print d
    

    Now, we can compute along the columns as we did before with numpy.

    In [ ]:
    print d.Years.mean()
    

    You can also parse the DOB field to get a true timestamp instead of a string using the parse_dates keyword.

    In [171]:
    d = pd.read_csv('example_1.csv',dtype={'Name':'S64',
                                           'DOB':'S64',
                                           'Years':int,
                                           'Degree':'S64'},parse_dates=[1])
    

    Now, we can compute with these datetime objects as in the following.

    In [164]:
    # difference in birthdays between Alice Jones and John Book
    print d.DOB[0] -  d.DOB[2]
    
    212 days, 0:00:00
    

    Now we now how many days are between the respective birthdays of Alice Jones and John Book.

    In [ ]:
    %qtconsole
    
    In [166]:
    d['first']=map(lambda x:string.split(x)[0],d['Name'])
    d['last']=map(lambda x:string.split(x)[1],d['Name'])
    print d
    
               Name                 DOB  Years Degree  first    last
    0   Alice Jones 1980-12-01 00:00:00      3     MS  Alice   Jones
    1     Bob Smith 1969-01-01 00:00:00      4     BS    Bob   Smith
    2     John Book 1980-05-03 00:00:00     11     BA   John    Book
    3  Billy Blanks 2000-06-09 00:00:00      8     AA  Billy  Blanks
    
    In [167]:
    print d
    
               Name                 DOB  Years Degree  first    last
    0   Alice Jones 1980-12-01 00:00:00      3     MS  Alice   Jones
    1     Bob Smith 1969-01-01 00:00:00      4     BS    Bob   Smith
    2     John Book 1980-05-03 00:00:00     11     BA   John    Book
    3  Billy Blanks 2000-06-09 00:00:00      8     AA  Billy  Blanks
    

    Inject into a sqlite database

    In [170]:
    import pandas.io.sql as pd_sql
    import sqlite3 as sql # sqlite3 is built into Python
    
    con = sql.connect("example_1.db")
    pd_sql.write_frame(d,'data',con) # write to DB as table named "data"
    con.close()
    
    In [ ]:
     
    
  • 朝韩将军级会谈时隔11年后在板门店重启 2019-07-23
  • Conférence de presse du Premier ministre chinois 2019-07-09
  • 大观园举办“古琴雅集” 市民感受“古韵端午” 2019-06-30
  • 文脉颂中华——黄河新闻网 2019-06-30
  • 水费欠账竟“穿越”16年?用户质疑:为何没见催缴过? 2019-06-21
  • 买房怎么看风水这个真的实在是太重要了 ——凤凰网房产北京 2019-06-11
  • 6月14日凤凰直通车:茅台再开市场化招聘大门,32个部门要285人葡萄 种植 2019-06-07
  • 习近平为传统文化“代言” 2019-05-27
  • 中巴建交一周年 一系列庆祝活动在巴拿马举行 2019-05-24
  • 招聘启事丨西部网诚聘新媒体编辑记者、实习编辑等人员 2019-05-23
  • A title= href=httpwww.snrtv.comlivech=8 target= 2019-05-23
  • 不止消灭刘海屏 vivo NEX发布会看点汇总 2019-05-22
  • 世相【镜头中的陕西人】 2019-05-20
  • 邓紫棋首任明星制作人 吴亦凡身兼二职 2019-05-20
  • 陶昕然女儿正面照曝光 吃蛋糕萌到爆 2019-05-19
  • 重庆时时彩总和单双 一起来捉妖妖怪搭配 老挝磨丁赌场 异域狂兽客服 穿越之逐鹿三国下载 巴萨3-2巴列卡诺 2019蔚山现代对川崎前锋直播 2018以太币趋势 上海时时乐带线走势图 游戏茶苑通比牛牛手机版安卓版 福彩七乐彩走势图预测 奇迹觉醒后期最强职业 奇迹觉醒ios电脑版 腾讯分分彩结开奖记录 幸运飞艇走势图软下载 明日之后测试服下载