These notes are saved as .PWI files. The good part about the notes is that they combine text and voice all in the same file. He did a lot of reading online, about how to use MS Word to manipulate .pwi files. But when nothing really worked out for him, he came to me with the problem.
So I wrote him a command line utility in python, that would help him extract all the audio from his files. In order to do this, I first opened the file in a hex editor (I use Bless Hex Editor on Ubuntu). Within a minute I realized that the format was extremely simplistic. All the audio was saved in the files as RIFF Wave (yes.. really really old). And what was even more convenient was the fact that all the audio was stored after a "RIFF" tag in the file.
Another point to note here was, that a single PWI file could store several audio snippets, each one starting out with a 'RIFF' tag.
This is what the file looked like:
So all you have to do is extract the pieces of the file starting with the 'RIFF' tags. Python, as always, would prove to be the most efficient language for a task like this. Simply read the entire file. Split the content into pieces using the 'RIFF' identifier. Ignore the first piece. Each successive piece would be an individual audio file. Take each piece and write it into an independant wave file.
########################################################################
# filename : pwi2wave.py
# utility : converts .pwi files to .wav files (handles multiple wave
# files stored in the same pwi note)
# author : Asim Mittal
#
########################################################################
import os,sys
if __name__ == '__main__':
filename = sys.argv[1]
filename.lstrip().rstrip()
if os.path.isfile(filename) is True: #check if the path provided was indeed a file
name,extension = os.path.splitext(filename)
try:
fRead = open(filename,'rb')
except:
print '\n\nError reading file!! Exiting'
sys.exit(0)
content = fRead.read() #read the entire file
lstPieces = content.split('RIFF') #split the file into pieces using the RIFF tag
fRead.close() #close the file i/o stream
#now lstpieces contains all the parts of the file separated
#by the RIFF tag. The first element of this list is not important
#so we'll take everything starting after.
#also note, when storing the wave part of the file, we will have to
#add the 'RIFF' tag again
del lstPieces[0] #not of interest to us
for index in range(0,len(lstPieces)):
newname = name + '_wave_' + str(index) + '.wav'
newContent = 'RIFF'+lstPieces[index]
fWrite = open(newname,'wb')
fWrite.write(newContent)
fWrite.close()
#the above loop creates as many wave files as there are audio segments
#in the pwi file
else:
#the filename didn't point to a file. tell the user about it
print '\n\nBad path provided. Please give the path to an actual file.'
And that is how its done!
Nice work.
ReplyDelete