Using the code below, I am trying to pull baseball lineups into a data frame. Starting at line 24, I am receiving the error "ValueError: not enough value to unpack (expected 2, got 1). Is anyone able to assist in resolving this issue? Thanks!
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = ""
soup = BeautifulSoup(requests.get(url).content, "html.parser")
def get_name(tag): if tag.select_one(".desktop-name"): return tag.select_one(".desktop-name").get_text() elif tag.select_one(".mobile-name"): return tag.select_one(".mobile-name").get_text() else: return tag.get_text()
data = []
for card in soup.select(".lineup-card"): header = [ c.get_text(strip=True, separator=" ") for c in card.select(".lineup-card-header .c") ] h_p1, h_p2 = [ get_name(p) for p in card.select(".lineup-card-header .player") ] data.append([*header, h_p1, h_p2]) for p1, p2 in zip( card.select(".col--min:nth-of-type(1) .player"), card.select(".col--min:nth-of-type(2) .player"), ): p1 = get_name(p1).split(maxsplit=1)[-1] p2 = get_name(p2).split(maxsplit=1)[-1] data.append([*header, p1, p2])
df = pd.DataFrame( data, columns=["Team1", "Date", "Team2", "Player1", "Player2"]
)
df.to_csv("MLB Games.csv", index=False)
print(df.head(10).to_markdown(index=False))I receive the following error code when running the code above:
\Users\15156\AppData\Local\Programs\Spyder\pkgs\pandas\compat\_optional.py", line 141, in import_optional_dependency raise ImportError(msg)
ImportError: Missing optional dependency 'tabulate'. Use pip or conda to install tabulate.When I type %pip install tabulate into the console I receive this error message:
Note: you may need to restart the kernel to use updated packages.
C:\Users\15156\AppData\Local\Programs\Spyder\Python\python.exe: No module named pipHowever, if I restart the kernel I still receive the same error message. I have looked around and tried installing the package using the code below:
(base) PS C:\Users\15156> conda activate base
(base) PS C:\Users\15156> conda create -n myenv spyder-kernels nltk
Collecting package metadata (current_repodata.json): done
Solving environment: done
==> WARNING: A newer version of conda exists. <== current version: 4.12.0 latest version: 4.13.0
Please update conda by running $ conda update -n base -c defaults conda
## Package Plan ## environment location: C:\Users\15156\miniconda3\envs\myenv added / updated specs: - nltk - spyder-kernelsThe packages were downloaded and installed, and I have looked into where it says the environment location is, however when I run %pip install kernel again it still says that the module cannot be found, spitting out the same error as above. Has anyone run into this issue before?
11 Answer
You have several errors in your code. First, you don't import requests. Next, the first two return statements in get_name() don't have anything following them - you need to bring the next line up to that line. Finally, since get_name() is returning objects where you called the get_text() method on them, it is actually returning strings, so you don't need to access the .text attribute on them when you're assigning to p1 and p2. Here is the corrected code:
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = ""
soup = BeautifulSoup(requests.get(url).content, "html.parser")
def get_name(tag): if tag.select_one(".desktop-name"): return tag.select_one(".desktop-name").get_text() elif tag.select_one(".mobile-name"): return tag.select_one(".mobile-name").get_text() else: return tag.get_text()
data = []
for card in soup.select(".lineup-card"): header = [ c.get_text(strip=True, separator=" ") for c in card.select(".lineup-card-header .c") ] h_p1, h_p2 = [ get_name(p) for p in card.select(".lineup-card-header .player") ] data.append([*header, h_p1, h_p2]) for p1, p2 in zip( card.select(".col--min:nth-of-type(1) .player"), card.select(".col--min:nth-of-type(2) .player"), ): p1 = get_name(p1).split(maxsplit=1)[-1] p2 = get_name(p2).split(maxsplit=1)[-1] data.append([*header, p1, p2])
df = pd.DataFrame( data, columns=["Team1", "Date", "Team2", "Player1", "Player2"]
)
df.to_csv("73264662.csv", index=False)
print(df.head(10).to_markdown(index=False))This prints:
| Team1 | Date | Team2 | Player1 | Player2 |
|:--------|:-----------------|:--------|:----------------------|:-------------------------|
| Marlins | August, 5 2:20pm | Cubs | Edward Cabrera (R) | Justin Steele (L) |
| Marlins | August, 5 2:20pm | Cubs | Miguel Rojas (R) SS | Rafael Ortega (L) CF |
| Marlins | August, 5 2:20pm | Cubs | Joey Wendle (L) 2B | Contreras |
| Marlins | August, 5 2:20pm | Cubs | Garrett Cooper (R) 1B | Patrick Wisdom (R) 1B |
| Marlins | August, 5 2:20pm | Cubs | Jesus Aguilar (R) DH | Ian Happ (S) LF |
| Marlins | August, 5 2:20pm | Cubs | De La Cruz | Nelson Velazquez (R) RF |
| Marlins | August, 5 2:20pm | Cubs | JJ Bleday (L) CF | Yan Gomes (R) C |
| Marlins | August, 5 2:20pm | Cubs | Peyton Burdick (R) LF | Zach McKinstry (L) 3B |
| Marlins | August, 5 2:20pm | Cubs | Stallings | Christopher Morel (R) SS |
| Marlins | August, 5 2:20pm | Cubs | Leblanc | Nick Madrigal (R) 2B |and produces a CSV with all of today's games.
14