Creating robots that can act autonomously in dynamic, unstructured environments is a major challenge. In such environments, learning to recognize and manipulate novel objects is an important capability. A truly autonomous robot acquires knowledge through interaction with its environment without using heuristics or prior information encoding human domain insights. Static images often provide insufficient information for inferring the relevant properties of the objects in a scene. Hence, a robot needs to explore these objects by interacting with them. However, there may be many exploratory actions possible, and a large portion of these actions may be non-informative. To learn quickly and efficiently, a robot must select actions that are expected to have the most informative outcomes. In the proposed bottom-up approach, the robot achieves this goal by quantifying the expected informativeness of its own actions. We use this approach to segment a scene into its constituent objects as a first step in learning the properties and affordances of objects. Evaluations showed that the proposed information-theoretic approach allows a robot to efficiently infer the composite structure of its environment.