"Digital cameras and image manipulation programs add hidden data to JPEG
files. For different reasons, one might want to remove these data before
publishing the files on the Internet.
Metadata in JPEG Files
The JPEG file format is the format most used for storing and transmitting
photographs on the Internet. In addition, a large number of digital cameras
store pictures as JPEG files. However, many users are likely to be unfamiliar
with the fact that a JPEG file can contain other data besides the actual
photograph.
The JPEG file format allows it to embed additional information called
"metadata" in the file header. (Other image file formats can contain
metadata, too.) The purpose of these metadata is to provide additional and
useful information along with the picture. Image manipulation programs and
especially digital cameras take advantage of this feature.
Metadata can be embedded in different ways. A common way is to store them
according to the Exif specification, which has been created by the Japan
Electronic Industry Development Association (JEIDA). Other popular
specifications are the IPTC headers defined by the International Press
Telecommunications Council (IPTC) and XMP developed by Adobe Systems. More
detailed information about these metadata formats as well as descriptions of
other metadata formats can be found on ExifTool's Tag Names page.
Among other things, the metadata section of a file can contain information
about:
* make and model of the digital camera
* time and date the picture was taken
* distance the camera was focused at
* location information (GPS) where the picture was taken
* small preview image (thumbnail) of the picture
* firmware version, serial numbers, name and version of the image manipulation
program, etc. ...
Should Metadata Be Removed?
If you intend to publish JPEG files on the Internet, you might want to remove
all metadata to reduce the file size of the JPEG files. Depending on what
kinds of metadata are stored in the file, the reduction can range between a
few bytes and several kilobytes. For example, if you have a website with
metered bandwidth or if you have visitors with dialup modems, you might be
interested in saving as much bytes as possible.
Another reason why you might want to consider removing all metadata beforehand
is that metadata can give away potentially sensitive information. This
information can mean a thread to your privacy or to other legitimate interests
(e. g. the interest of journalists to protect their sources). The following
fictitious and real-life examples try to illustrate the problematic nature of
metadata information:
* Many digital cameras embed a small preview image (thumbnail) of the picture
in the header of each JPEG file. This makes it possible to quickly browse the
pictures. Not all image manipulation programs update this thumbnail along with
the main picture. The consequence could be that an edited picture retains the
original unmodified version of the picture as an Exif datum. In some cases,
this may only be inconvenient; in other cases, this could create a significant
information leak. For example, a supposedly anonymized picture of a person
still shows his or her identity in the thumbnail. Another, more embarrassing
example is the case of television personality Cat Schwartz (e.g. TechTV).
Schwartz had published a photograph of herself on her personal blog. Because
the program she had used to edit the picture did not update the thumbnail, the
thumbnail revealed more nude facts than originally intended.
* The following real-life case happened in February 2006: The Washington Post
published an interview with a computer hacker: Invasion of the Computer
Snatchers. The hacker had agreed to be interviewed only if he was not
identified by name or hometown. In addition to the interview, a disguised
picture of the hacker was published. Unfortunately, the picture contained IPTC
metadata about the city and state where it was taken. With all the details
mentioned in the article, it could be possible to track down the hacker.
Other kinds of metadata could have meant a comparable thread: The Exif datum
"location information (GPS) where the picture was taken" enables one
to exactly locate the place where the picture was taken. The Exif datum
"distance the camera was focused at" allows at least to calculate
the exact position of the photographer if one knows the location of the
photographed object.
* A fictitious example: Bill does not want to go to uncle Linus' birthday
party. He would rather go to a concert of the Rolling Stones. He tells his
uncle that his boss wants him to work overtime to finish an important project.
At the concert, Bill's friend Steve takes a picture of Bill. Bill publishes
the picture on his homepage. Weeks later, uncle Linus visits Bill's homepage.
He examines the Exif data "time and date the picture was taken" and
discovers that Bill did not work overtime, but went to a concert on the day of
the birthday party."
for more, including how to remove metadata, read here:
http://netzreport.googlepages.com/hi...peg_files.html