The essence of selenium operation: simulate user behavior, obtain content that was difficult to obtain by previous requests, and encrypted content
(1) Set the browser URL to get the source code of the webpage:
#The role of selenium is to simulate user behavior and get encrypted content #visit page # from selenium import webdriver # brower=webdriver.Chrome() #This step is to specify the browser # brower.get("https://uland.....com/") #This step is to find the corresponding website # print(brower.page_source) #Crawling the source code directly seems to be simpler than before... #
logic is
1. Invoke webdriver behavior
2. Set the browser used in webdriver
3. Use the get function for the browser brower In fact, this is similar to the initial analysis of the request after getting the URL
4. After the parsing is completed, you can directly browse, page_source to get the source code. Note that this is the same result as requests.
simpler in a sense
(2) Positioning elements, displaying content, attributes, and labels
About positioning elements:
(1) The simple point is to use the content behind the ID and Class_name to locate
a.get attribute brower.get('https://www...com') search=brower.find_element(By.ID,'J_Search') print(search) #This step is to get all the ele objects print(search.get_attribute('class')) #This is for the ele object to further get the middle ele value ctrl+F to open a new search window and put it into the class search
after get url
The brower.find_ele method locates the element. This is the By.ID method to get the corresponding id in the web page
After getting it, it is essentially an ele object, and you can continue to take out its properties
(2) But the more general method is: xpath (css and java_script are not yet used)
#xpath can also be selected input3=brower.find_element(By.XPATH,'//input[@id="q"]') print(input3.text)
This is to get the position corresponding to the text text
(3) Browser simulation behavior interaction (the most commonly used input and click later)
keyboard input mouse click input=brower.find_element(By.ID,"q") #Now I want to type into the text box input.send_keys('iphone') time.sleep(2) input.clear() #do a clear content here input.send_keys('ipad') #Enter new content here again botton=brower.find_element(By.CLASS_NAME,'btn-search') #This time use the class name to see the execution of the mouse click botton.click() #click completes the operation of the mouse click
After locating to the corresponding text box node
send_keys() fill in the content
time.sleep sets the response time to avoid being recognized
Then find the node corresponding to the button
botton.click to click to complete the mouse click action
(4) About the two ways of waiting
The first is to wait
2.2 page waiting # (1) Implicit waiting Implicit waiting The entire process from opening to closing the browser waits brower.implicitly_wait(10) brower.get('https://www..com') input=brower.find_element(By.CLASS_NAME,'logo') print(input) #The meaning of this line of code is that if the following code is not obtained within 10 seconds, it will report an error and exit #Waiting is to give the server time to respond. If you don't wait, you may not get it
This is to stipulate within a certain period of time (get the data within 10s), otherwise it will quit. The purpose is to prevent the subsequent code from being unable to proceed because of this timeout.
In fact, it is to show waiting (this is actually used more)
# (2) Display waiting is not waiting for a specific time, but waiting for an element to be executed when the condition is met from selenium.webdriver.support import expected_conditions as EC #One more condition, different time and other conditions brower.get('https://www..com') wait=WebDriverWait(brower,10) #Display waiting to set a maximum waiting time input=wait.until(EC.presence_of_element_located((By.ID,'q'))) #Here are the display conditions, until the id is found, there is no input button=wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR,'.btn-search'))) #don't go back to the bottom until the button is clicked print(input,button) #There is a special point that this EC. method is followed by nesting two parentheses and then inputting
The essence is to specify the sequence of program execution, so as to avoid the simultaneous execution of fast and slow, so that the file data cannot be obtained.
(1) Show waiting to introduce this EC method first
(2) Set a maximum waiting time similar to implicit waiting
(3) wait_until() gives the condition in parentheses until the condition is met, then continue down, otherwise just wait
(4) Normal printout can be done