I recently had an perfect scenario that required a bit of automation. In short, we needed to complete some weird performance monitoring across an entire Hyper-V cluster that consisted of a large number of nodes. In order to get the data into a platform to analyse and report on the data within a reasonable amount of time, we decided to use SQL Server as a database.
Due to many different reasons there were specific limitations on the ways we could (1) get to the data and (2) the types (or at least the schema) of data we would receive.
As such the requirements were simple:
- Create a method to read multiple CSV files and dump the data in SQL Server
- The import mechanism needs to be able to adapt to changing import data (we would want to be able to add various metrics which would mean the schema of the CSV file would be dynamic). We could however assume that all the files would have the same schema.
The script below provides a mechanism of:
- Reading a collection of files in a target directory
- Extracting the schema off the CSV files
- Dropping any existing table in the target DB
- Creating a dynamic schema based on the import files
- Creating a table with the new schema
- Populating all the data from the various CSV files into the newly created table