SSIS – Importing an excel file with over 255 columns

One of the several limitations comes with the excel driver is the number of columns you can import, you will find that excel only reads the first 255 columns of the file and ignores the rest, this is often not a big issue as most of the files you work with will have less than 255 columns but when it has more then it’s a real issue as there doesn’t seem to be an easy workaround for it.

The best way I found to get around this is to use two data source components, each one reads a different set of columns then use a merge join transformation to combine the data together again, this approach works based on the fact that excel will read both sets of data in the same order they are in, which I found true after trying this on different files, here is how it works:

  1. Add an excel sources to the data flow, set the data access mode to SQL command and set the command to “Select * from [Sheet1$A:IT]“, this tells the driver to return the data for all columns between A and IT which will be the first 255 columns, I’m assuming here the sheet name is Sheet1.
    Excel Source
  2. add another excel source, configure it the same way you did the first one, you can reuse the same connection manager but make the SQL command ¬†“Select * from¬†[Sheet1$IU:MA]”, my last column was MA here so you will need to replace MA with the last column in your sheet.
  3. now we need to add an identity column to each data set so we can merge them together in the next step, we will add a script task and add the identity column as an output column then inside the script assign an incremental values to it, again the data order will not change between the 2 sources, so the first row in the excel file will take the value 1 as identity on each side. output column
  4. in order to use the merge join we need to sort both of it’s inputs, so we add a sort task on each side and set it to order the data based on the identity column we added earlier in the script task.
  5. now add the merge join and join both outputs.

Data Flow

this way you will have all the columns in the data flow then you can continue processing you data as normal.