So, a while ago, one of the other writers here wrote a small tutorial on
parsing simple CSV files inC#. It mostly just
showed off the string
split
method, and only worked on really simple
CSV files - no quoted fields, etc. Well, we got a comment asking about
that, so today I sat down thinking I would write up a more robust
parser. I thought to myself, am I suffering from NIH (not invented here)
syndrome? Do I really need to write a full CSV parser?
And, as you might expect, the answer is no. Not only did I not need to
write a parser, I found that there is one that is built into OLEDB
subsystem of Windows! And it actually takes fewer lines of code to use
this built in parser than it does to do the simple string split
algorithm that was in the previous tutorial (and it feels a lot nicer
too). I was originally planning on this being a decently long tutorial,
when I thought I would write my own parser - but you are actually in for
a really short and simple one today. Less for me to write, less for you
to read, and more functionality to boot!
So, without further ado, I think we can jump straight into the code:
using System;
using System.Data;
using System.IO; //not used by default
using System.Data.OleDb; //not used by default
namespace CSVParserExample
{
class CSVParser
{
public static DataTable ParseCSV(string path)
{
if (!File.Exists(path))
return null;
string full = Path.GetFullPath(path);
string file = Path.GetFileName(full);
string dir = Path.GetDirectoryName(full);
//create the "database" connection string
string connString = "Provider=Microsoft.Jet.OLEDB.4.0;"
+ "Data Source=\"" + dir + "\\\";"
+ "Extended Properties=\"text;HDR=No;FMT=Delimited\"";
//create the database query
string query = "SELECT * FROM " + file;
//create a DataTable to hold the query results
DataTable dTable = new DataTable();
//create an OleDbDataAdapter to execute the query
OleDbDataAdapter dAdapter = new OleDbDataAdapter(query, connString);
try
{
//fill the DataTable
dAdapter.Fill(dTable);
}
catch (InvalidOperationException /*e*/)
{ }
dAdapter.Dispose();
return dTable;
}
}
}
So, first off, you will need to add the namespaces
System.IO
and
System.Data.OleDb
. The first we need because we will be doing some
path manipulation, and the second gives us access to what we will need
to do CSV parsing. I've created a nice static function here that takes a
path to a CSV file and returns the contents of the file in a DataTable
(which is really easy to view and manipulate).
The weird thing about all of this is that the CSV file gets treated as a
database table. We need to create a connection string with the Jet OLEDB
provider, and you set the Data Source to be the directory that contains
the CSV file. Under extended properties, 'text' means that we are
parsing a text file (as opposed to, say, an Excel file), the
HDR
property can be set to 'Yes' (the first row in the CSV files is header
information) or 'No' (there is no header row), and setting the FMT
property set to 'Delimited' essentially says that we will be working
with a comma separated value file. You can also set FMT
to
'FixedLength', for files where the fields are fixed length - but that
wouldn't be a CSV file anymore, would it?
The next part to do is create the actual query. In this case, we want
everything, so we have a "SELECT *". What are we selecting from? Well,
in this somewhat twisted worldview, the directory is the database, so
the file is the table we are selecting from.
Now we are into normal OLEDB territory - we create a
DataTable
that we
will be filling with results, and we create a OleDbDataAdapter
to
actually execute the query. Then (inside of a try block, because it can
throw an exception) we fill the data table with the results of the
query. Afterwords, we clean up after ourselves by disposing the
OleDbDataAdapter
, and we return the now filled data table.
And using the now filled data table is extremely simple - we actually
talk about it
here.
And there you go! I'm not sure how parsing CSV could be much easier.
Comments
Post a Comment