Friday, March 6, 2009

Common Errors between Controller and Agents

· Error -10343: Communication error: Failed to connect to remote host

· Error -29989: Process "lr_bridge.exe" was not created on remote host , reason - communication error.


Resolution:

Different problems can cause the above error. Some of the things you can verify include:

1. Make sure that you apply the same LoadRunner version and Service Pack on the Controller and Load Generator machines.


2. Make sure that you can ping the Controller and host machine bidirectionally. You may need to add the IP address and machine name on the host file:

  1. Go to the host machine.
  2. Navigate to C:\WINNT\system32\drivers\etc\.
  3. Open the hosts file in a word editor.
  4. At the end of the file, add another line with the IP address and machine name of the Controller machine.

Example:
111.111.111.111 MI_Controller

  1. Repeat steps a - d for all the host machines.
  2. Repeat steps a - d for the Controller machine, but adding the IP and machine name of the hosts machines.

3. Make sure that the LoadRunner Agent is running either as a process or a service on the remote host machine.

4. Make sure the Controller and host are connected to the network. In some networks, the Microsoft LOOP back IP address 10.10.10.10 is used when a computer is not connected to the network. As a result, the Controller will not be able to detect the host machine. You will need to stop the loop back service, connect to the network, and make sure that the machine has a valid IP address.

5. If there are multiple network cards in the machines. Configure which NIC to be used by the process for communication. See below for some info…

==================================

Multiple NIC can cause communication problems between the Controller and the host

In general, having multiple NICs in a machine could cause a problem with the Controller-Load Generator connectivity.

The reason is that the communication may not always be tagged with the correct interface when sending a reply to a request from the Controller. If a message is sent out from one NIC to a host machine, but that host knows the Controller by the other, then that message will be considered to have come from a different Controller and ignored as a host can only serve one Controller at a time. Likewise, messages sent from a host machine to the Controller on an NIC other than the one the Controller knows, will not be marked as comming from the correct host. This may cause the Controller to think the host is not responding. In cases like this, the communication gets "lost" and will result in time-out errors or similar. Removing extra interfaces resolves the problem.

Other possible solutions involve always using the "primary" interface, which is listed under Network and Dial-Up Connections -> Advanced -> Advanced Settings -> Adapters and Bindings in Windows 2000. This dialog allows you to reorder the network interfaces to change their priority. Always reference the top most adapter when connecting with the LoadRunner Controller.

============================

6. It might as well be possible that some network environment issue might be the cause and the network monitor can be used to diagnose the problem. Make sure that you are using the latest driver and firmware for your network cards and routers. Also try to force the network card to use 100Mbs/Full Duplex instead of the 'Detect automatically the best speed settings'. In case of miscommunication with the router or other network device the automatic settings could be set in an inappropriate way and have huge consecuencies on the network performances.


Error -10344: Communication error: Failed to bind socket. A process on the machine is already bound to the same address.”

Resolution:

The LoadRunner Agent Process is trying to connect through a port that is busy

LoadRunner Agent Process/service starts itself at port 50500 and 54345 (For monitoring/running vuser over firewall, it is port 443). Do a netstat -an on the machine and check to see if 54345 and 50500 are occupied. If these ports are in used when you start the agent, you will get the above error. You will need to shut down the application that is using those ports, so that the ports are freed before restarting LoadRunner agent.

You may also get this error during replay if LoadRunner agent of the host machine is connecting back to the controller using a port that is already bound. By default, this is a dynamic port and LoadRunner agent will automatically try a different port.

· Error -29987: Process "traceroute_server.exe was not created..." when connecting to a remote host

Resolution:

Add the host name and IP address to the Hosts file

Verify that you are able to ping back and forth between the Controller and the Load Generator machines using both host names as well as IP addresses. If the ping in any one case fails:
1. Add the host name and IP address of Load Generator to the Hosts file on the Controller.
2. Add the host name and IP address of the Controller to the Hosts file on the Load Generator.

Note:
The Hosts file is usually under C:\Winnt\system32\driver\etc.


Error -30932: “Failed to open eve file -

Resolution:

Delete the old .eve files

Do the following:

1. The Load Generator files may be specified in the Controller options to be shared on a network drive, but they are being saved on a local drive. To check this, in the Controller go to Tools -> Options -> RunTime File Storage and select the "On the current Vuser machine" option.

2. The *.eve file is one which stores transaction times and is coallated at the end of a scenario run. If there are old *.eve file on the host machine it can be a problem. Delete the contents of the C:\temp folder, that will remove any old *.eve files from the machine that were not cleaned up from an earlier scenario run.

3. Shut down Controller. Go to C:\winnt folder. Look for wlrun.* and delete all files that the search returns. Go to \bin folder and run register_controller.bat.



5 comments:

  1. Resolution:
    Problem varies script by script.
    I had the same issue and in my case Network Delay Time (under Network Graph) had the value which was not correct for my testing. We got script from another QA group and they had different configuration. We didn't ned Network Delay Time at all. I removed that entry and it resolved the issue. Network Graph is under Available Graphs section of Run tab.

    ReplyDelete
  2. Thank you for posting this very use full information Sunil Kumar, especially. Error -30932 It helped me solve my load generator question. I wish Mercury would publish a book on how to read the Analysis charts and a good book for Load Runner 9.50. If you know of such a book exists please let me know. Best Regards, Jeremy e-mail: jj303606@hotmail.com

    ReplyDelete
  3. Hi, Iam currently facing the above -10344 issue, not sure how to check the netstat to understand the root cause, can you please help me here. Thanks in advance

    ReplyDelete